Alpha-Beta woes, textbook-like resources, etc.

Meni Rosenfeld · Post by **Meni Rosenfeld** » Thu Jan 14, 2016 10:03 pm

Hi all,

I've been interested in chess programming for a long time, but only recently started being more active about it. Both as a hobby, and with the hopes of one day making meaningful contributions to the field. My expertise is in machine learning and statistical algorithms, and I hope to be able to apply that once I get a better grasp of the basics.

I did encounter some issues which I hope you can help with.

1. My program started behaving weirdly when I finished implementing both alpha-beta pruning and transposition tables. Looking into it I've realized that I have very naively simply stored in my table a numerical score without any indication of what kind of node this is. I understand now that the entry should state whether the score is >=x, =x or <=x. But is this optimal? For me it would be more natural to store a range for the score, [a, b]. The three node types are special cases, with [x, inf], [x, x], and [-inf, x]. But the range methodology is more general and can allow other possibilities.

So my question is, is there an actual advantage in storing arbitrary ranges? Or is the extra flexibility not ever going to be beneficial in practice, and I'm better off just storing a single number and a node type?

2. I'm also not sure if I should go for fail-soft or fail-hard alpha-beta. My understanding is this: Fail-hard is the "classical" version; if we search a node's score with a window of (alpha, beta), we expect to learn one of 3 outcomes: Score >= Beta, Score <= Alpha, or Score is some exact value between alpha and beta. The function calls itself recursively on next moves, and from the responses logically deduces one of these answers. With fail-soft, we divulge more information if we stumble upon it; if we find proof that Score >= x > Beta, we terminate (we don't waste time looking for a better x) but report Score >= x, which is more informative than simply Score >= Beta. But is this methodology consistent? Can it cause search instability or clash with other optimizations? Does it offer improvement in a context other than transposition table lookups?

3. I'm also wondering what are good ways to restrict search depth for nodes that don't show much promise. My original thought was that I would do iterative deepening, and when I search a node, I give it less depth the worse it scored in the previous shallow search. But with alpha-beta, I rarely know the score of a node, only bounds. So a move could be horrible, and not worthy of a serious search, but all I would know about it is that it's not as good as the best move. How can I make this work?

4. Should I store for each node the best move found so far, and in future searches examine it before the others? Or should I store a ranking for all moves, and later search them in order from best to worse? If the latter, how do I handle the problem that for most moves I don't know the score? Actually, in my current implementation I'm doing the latter, but it might be behaving inconsistently because of this issue.

5. What performance numbers should I expect, and how should I find bottlenecks in my implementaion? Currently alpha-beta is more or less my only search optimization over pure brute force, and it takes about 1 second to do a 4-ply search. Is this normal? How do I know if I need to focus on reducing the time it takes to evaluate a node, or on better search techniques?

6. My troubles with alpha-beta suggest to me that I don't really "get it". There are bits and pieces of information online (CPW, this forum etc.), but I think what could help me is a more textbook-like resource; doesn't necessarily have to be a book, but it should take a more structured, comprehensive, methodical approach to explaining the concepts. Where can I find such a resource?

Thanks!

AlvaroBegue · Post by **AlvaroBegue** » Thu Jan 14, 2016 10:34 pm

I'll answer what I can.

Meni Rosenfeld wrote:1. My program started behaving weirdly when I finished implementing both alpha-beta pruning and transposition tables. Looking into it I've realized that I have very naively simply stored in my table a numerical score without any indication of what kind of node this is. I understand now that the entry should state whether the score is >=x, =x or <=x. But is this optimal? For me it would be more natural to store a range for the score, [a, b]. The three node types are special cases, with [x, inf], [x, x], and [-inf, x]. But the range methodology is more general and can allow other possibilities.

So my question is, is there an actual advantage in storing arbitrary ranges? Or is the extra flexibility not ever going to be beneficial in practice, and I'm better off just storing a single number and a node type?

There are transposition-tables implementations that do store two bounds, and usually two depths as well. But you pay for this in making your hash entries larger, and the benefit doesn't seem to be much (it certainly wasn't worth it for my engine). There are minimax variants that would benefit a lot from having bounds on both sides, namely MTD(f).

3. I'm also wondering what are good ways to restrict search depth for nodes that don't show much promise. My original thought was that I would do iterative deepening, and when I search a node, I give it less depth the worse it scored in the previous shallow search. But with alpha-beta, I rarely know the score of a node, only bounds. So a move could be horrible, and not worthy of a serious search, but all I would know about it is that it's not as good as the best move. How can I make this work?

You can use late mode reductions (LMR) and other techniques of that sort. You have to learn to live with the fact that you only learn bounds about scores, because trying to extract more information is too costly. I sometimes get the feeling that alpha-beta is too powerful and it precludes many promising ideas.

4. Should I store for each node the best move found so far, and in future searches examine it before the others? Or should I store a ranking for all moves, and later search them in order from best to worse? If the latter, how do I handle the problem that for most moves I don't know the score? Actually, in my current implementation I'm doing the latter, but it might be behaving inconsistently because of this issue.

Best move only is the standard. I keep the order of the moves at the root from iteration to iteration, simply rotating the list every time a new best move is found. Perhaps it would be a good idea to do the same for a few nodes near the root, but it's certainly not a standard technique, and you probably shouldn't waste time with this at this stage.

5. What performance numbers should I expect, and how should I find bottlenecks in my implementaion? Currently alpha-beta is more or less my only search optimization over pure brute force, and it takes about 1 second to do a 4-ply search. Is this normal? How do I know if I need to focus on reducing the time it takes to evaluate a node, or on better search techniques?

You can measure how many positions you can visit per second, which should be of the order of a million in a modern CPU, unless you are doing something very wrong (and you probably are). A profiler can help you find bottlenecks. But you can start by removing every use of dynamic memory allocation in your code, since that's the mistake I've seen the most in rookies. You just can't afford to do that.

1 second for a 4-ply search is way too slow. You should probably be able to do ply-8 in 1 second without much trouble.

Can you post some output from your engine? Show at least depth, time and number of nodes.

Evert · Post by **Evert** » Thu Jan 14, 2016 10:51 pm

Meni Rosenfeld wrote: 1. My program started behaving weirdly when I finished implementing both alpha-beta pruning and transposition tables. Looking into it I've realized that I have very naively simply stored in my table a numerical score without any indication of what kind of node this is. I understand now that the entry should state whether the score is >=x, =x or <=x. But is this optimal? For me it would be more natural to store a range for the score, [a, b]. The three node types are special cases, with [x, inf], [x, x], and [-inf, x]. But the range methodology is more general and can allow other possibilities.

The most common approach is to store a score and the score type (lower bound, exact, upper bound), but this isn't the only way to do it: you can also score both lower and upper bounds for the score (the case where lower==upper corresponds to an exact score). As far as I know it doesn't make much difference overall.

So my question is, is there an actual advantage in storing arbitrary ranges? Or is the extra flexibility not ever going to be beneficial in practice, and I'm better off just storing a single number and a node type?

There are positions and search techniques that benefit from having both bounds.

2. I'm also not sure if I should go for fail-soft or fail-hard alpha-beta. My understanding is this: Fail-hard is the "classical" version; if we search a node's score with a window of (alpha, beta), we expect to learn one of 3 outcomes: Score >= Beta, Score <= Alpha, or Score is some exact value between alpha and beta. The function calls itself recursively on next moves, and from the responses logically deduces one of these answers. With fail-soft, we divulge more information if we stumble upon it; if we find proof that Score >= x > Beta, we terminate (we don't waste time looking for a better x) but report Score >= x, which is more informative than simply Score >= Beta. But is this methodology consistent? Can it cause search instability or clash with other optimizations? Does it offer improvement in a context other than transposition table lookups?

Again, as far as I know this makes no difference in practice. Fail soft may give you more information, but it's not so obvious that it's actually useful.

3. I'm also wondering what are good ways to restrict search depth for nodes that don't show much promise. My original thought was that I would do iterative deepening, and when I search a node, I give it less depth the worse it scored in the previous shallow search. But with alpha-beta, I rarely know the score of a node, only bounds. So a move could be horrible, and not worthy of a serious search, but all I would know about it is that it's not as good as the best move. How can I make this work?

This is very murky. In general, you can apply a number of heuristics near the tips that prune nodes speculatively: if a null-move fails high, prune the node. If the static evaluation (+margin) is far below alpha, prune. If the static evaluation (-margin) is far above beta, prune. If we have searched many moves, the node is probably an ALL-node, so you might as well search the remaining moves at reduced depth.

All of these things can be inaccurate.

4. Should I store for each node the best move found so far, and in future searches examine it before the others? Or should I store a ranking for all moves, and later search them in order from best to worse? If the latter, how do I handle the problem that for most moves I don't know the score? Actually, in my current implementation I'm doing the latter, but it might be behaving inconsistently because of this issue.

Store the best move in the transposition table; in 90%+ of cases, this is enough to cause a cut-off. Store other good moves as "killers". If you have transposition table moves, killer moves and order good (winning) captures highly, you have ~95% chance to have found a good enough move to cause a cut-off already.

5. What performance numbers should I expect, and how should I find bottlenecks in my implementaion? Currently alpha-beta is more or less my only search optimization over pure brute force, and it takes about 1 second to do a 4-ply search. Is this normal? How do I know if I need to focus on reducing the time it takes to evaluate a node, or on better search techniques?

This is hard to say in general. 1 second for a 4-ply search sounds long to me; I get depth 8-10 from the starting position depending on which of my programs I use. Use a profiler to find bottle-necks ("perft" is a good way to measure the speed of movegen/makemove, as well as verify that it's correct). Check what fraction of cut-offs are caused by the first move (or first three moves). If this is much below 90% you need to improve move ordering. Then look at things like null-move and futility pruning. It may be better to wait with late move reductions.

6. My troubles with alpha-beta suggest to me that I don't really "get it". There are bits and pieces of information online (CPW, this forum etc.), but I think what could help me is a more textbook-like resource; doesn't necessarily have to be a book, but it should take a more structured, comprehensive, methodical approach to explaining the concepts. Where can I find such a resource?

Try this: http://web.archive.org/web/200802160311 ... topics.htm, it's not a text book (obviously), but it's a well-written set of tutorials.

Evert · Post by **Evert** » Thu Jan 14, 2016 11:00 pm

AlvaroBegue wrote: There are transposition-tables implementations that do store two bounds, and usually two depths as well. But you pay for this in making your hash entries larger, and the benefit doesn't seem to be much (it certainly wasn't worth it for my engine).

Are they really larger? I just implemented a dual bound table (for the heck of it rather than because I expect it to give me a large benefit) and an entry is

Code: Select all

32 bits - lock
16 bits - move
16 bits - upper bound
16 bits - lower bound
16 bits - upper bound depth
16 bits - lower bound depth
16 bits - padding

The relevant depths could be stored in 8 bits each, leaving 32 bits for extra information. In total, that's 128 bits, which isn't that unusual. Of course, 16 bits may not be enough to encode a move and you may want more than 32 bits for the lock, but there are bits to spare...

kbhearn · Post by **kbhearn** » Thu Jan 14, 2016 11:41 pm

Meni Rosenfeld wrote:2. I'm also not sure if I should go for fail-soft or fail-hard alpha-beta. My understanding is this: Fail-hard is the "classical" version; if we search a node's score with a window of (alpha, beta), we expect to learn one of 3 outcomes: Score >= Beta, Score <= Alpha, or Score is some exact value between alpha and beta. The function calls itself recursively on next moves, and from the responses logically deduces one of these answers. With fail-soft, we divulge more information if we stumble upon it; if we find proof that Score >= x > Beta, we terminate (we don't waste time looking for a better x) but report Score >= x, which is more informative than simply Score >= Beta. But is this methodology consistent? Can it cause search instability or clash with other optimizations? Does it offer improvement in a context other than transposition table lookups?

Fail-soft is a no cost improvement in the most primitive of alpha-beta search but in practice many pruning/reduction techniques base themself around the bounds. Since this pruning is often very aggressive the tree can look very different when searching around alpha=beta=100cp than around alpha=beta=0cp and if you stored a failsoft that said node > 110 cp when searching at 0 cp, it's not all that reliable if you do have to research around 100 cp as much of that tree that should be searched will have been pruned away when searching around 0. There might still be some gain to be had from that extra information, but you'd have to do testing to confirm any gains.

3. I'm also wondering what are good ways to restrict search depth for nodes that don't show much promise. My original thought was that I would do iterative deepening, and when I search a node, I give it less depth the worse it scored in the previous shallow search. But with alpha-beta, I rarely know the score of a node, only bounds. So a move could be horrible, and not worthy of a serious search, but all I would know about it is that it's not as good as the best move. How can I make this work?

Null move pruning is the main technique for slashing away at branches of the tree that are just hopeless. Late Move Reductions for general shaping of the tree so that you're not oversearching moves that just aren't any more promising than their siblings you searched before them.

Meni Rosenfeld · Post by **Meni Rosenfeld** » Fri Jan 15, 2016 2:22 am

Thanks for all the answers! Will take me a while to work through all of the information.

AlvaroBegue wrote:You can measure how many positions you can visit per second, which should be of the order of a million in a modern CPU, unless you are doing something very wrong (and you probably are). A profiler can help you find bottlenecks. But you can start by removing every use of dynamic memory allocation in your code, since that's the mistake I've seen the most in rookies. You just can't afford to do that.

1 second for a 4-ply search is way too slow. You should probably be able to do ply-8 in 1 second without much trouble.

Can you post some output from your engine? Show at least depth, time and number of nodes.

Here is some output. For simplicity the conditions here are:
1. Starting position.
2. Search is done with Alpha-Beta, but nothing else. No transposition table, no move ordering, etc.
3. Evaluation is a very minimalistic material counting function, with a limited form of QS: If the last move was a capture, the ensuing recapture sequence is resolved and the evaluation is based on stopping at the best point. (I've built a custom procedure specifically for this, which I think is fairly efficient).
Output is depth (plies), nodes (including root, leaves and internal), and time (seconds).

1, 21, 0
2, 421, 0.0040002
3, 9098, 0.0640037
4, 199225, 0.6320361
5, 4676116, 20.6511811

Now, I've also tried (it was actually an accident, but with interesting results) to precompute the list of legal moves for each node. So when I do the actual search, I need to do the static evaluation and the alpha-beta logic, but not to generate the legal moves for each position. Then, the search is much faster:

1, 21, 0
2, 421, 0.0010001
3, 9074, 0.0080004
4, 188980, 0.0930053
5, 4318367, 1.7801018

So, including the part of generating legal moves, I calculate something like 250K nodes/sec. Without needing to generate moves, it goes up to 2M nodes/sec. This suggests that move generation is the most expensive part for me.

So, I followed your advice regarding dynamic allocation, and it looks like you're onto something. I did a very easy removal of two cases of dynamic allocation in my move generating function, and it seems to almost triple performance!

1, 21, 0
2, 421, 0.001
3, 9098, 0.034002
4, 199159, 0.438025
5, 4527316, 6.7253847

I didn't realize the allocation would really be that bad, but turns out it is. Now I'm at 650K/sec. I have some more of these to weed out, but it would require some work.

Part of the problem, I think, is that I use the GenerateMoves function not only to generate moves, but also to verify that a move is legal and doesn't put the king in check. In the latter case some parts are omitted but still much of the logic is shared, so this function is called many, many times.

By the way, I'm not sure why exactly the node count is different in each run. This could follow if moves are tried in a different order each time, leading to different cutoffs, but all the relevant parts in my implementation are deterministic AFAIR. I should look into that...

AlvaroBegue · Post by **AlvaroBegue** » Fri Jan 15, 2016 2:36 am

So you have some bugs.

The first thing that seems wrong is your depth-2 node count. 421 means your alpha-beta procedure didn't produce a single cutoff, and that's just wrong. If you are only counting material, all nodes should have 0 evaluation and alpha-beta should visit something like 41 nodes.

Your speed in nodes per second is not stellar, but it would probably do for now. I would just concentrate on getting a bug-free search. The fact that you get different node counts on different executions also indicates a [probably different] bug.

Did you implement perft yet? It's basically a tree search to a fixed depth where instead of backing up a minimax score you simply count how many nodes you visited. It was mentioned by Evert earlier in the thread, and it's a great tool to make sure your move generation and your do/undo move routines are working correctly.

Meni Rosenfeld · Post by **Meni Rosenfeld** » Fri Jan 15, 2016 5:43 am

AlvaroBegue wrote: The first thing that seems wrong is your depth-2 node count. 421 means your alpha-beta procedure didn't produce a single cutoff, and that's just wrong. If you are only counting material, all nodes should have 0 evaluation and alpha-beta should visit something like 41 nodes.

Hm, right. Well this particular bug was obvious enough - my condition for breaking was value > beta instead of value >= beta. I think I switched between these versions when I was doing experiments and it stuck on the "wrong" version.

It's indeed much more noticeable now when I still use just a simple material sum. I figured that it shouldn't matter that much once I implement a more nuanced evaluation, since exact equality should be rare. But it's good to fix it anyway.

Now there are 60 nodes for depth 2, which makes sense: 1 (root) + 21 (1 white move, and 20 possible black responses) + 19*2 (remaining white moves, and one black refutation move for each). 40 of these are leaves.

Performance of course improves considerably:
1, 21, 0
2, 60, 0.004
3, 524, 0.012
4, 1417, 0.057
5, 12972, 0.142
6, 37216, 0.367021
7, 368584, 1.2110693

AlvaroBegue wrote: Did you implement perft yet? It's basically a tree search to a fixed depth where instead of backing up a minimax score you simply count how many nodes you visited. It was mentioned by Evert earlier in the thread, and it's a great tool to make sure your move generation and your do/undo move routines are working correctly.

Not yet. I'll work on it.

hgm · Post by **hgm** » Fri Jan 15, 2016 9:01 am

In micro-Max I use two bits to indicate the bound type: one to indicate its a lower bound, the other for upper bound. For an exact score both are set. This is equivalent to the standard method of recognizing 3 bound types, but it helped simplifying the code.

As other people remarked, you can store both an upper and a lower bound, (i.e. an interval) but then in general the bounds have different depths, as they are from different searches. And you have to think carefully about what to do when you get conflicting bounds. E.g. an 8-ply search might give you score >= 25, while there was a stored upper bound from a 6-ply search of 12.

I experimented once with this on the very simple K vs R Suicide Chess end-game. It turned out that (with hash table) alpha-beta was more than an order of magnitude slower than full minimax. (Yes, slower. This is no typo!) The reason was that in unpruned minimax all scores are exact, and every hash probe had usable bound types. When I went to a dual-bound table alpha-beta was only 3 times slower than unpruned minimax. So there can be an advantage to using dual bounds.

To benefit from alpha-beta pruning, move ordering is crucial.

As you already noticed, it is difficult to make pruning decisions on scores with pure alpha-beta. You have no information on how bad a score is. The logical consequence of pruning as much as you can is that the search gets very 'lazy', and usually delivers you a score very close to the edge of the search window when it fails high or low. But you could of course get that information by asking the search to provide it. This might be costly, however, but more accurate pruning might earn you back the investment. I think that many engines now do 'singular extension', which means searching deeper than originally requested when all other moves than the best score at least M centiPawn below the latter in a reduced search. But to know that you would have to shift alpha (which was just increased to the score of your best move) of the search window in that node down by M. Normally you would just use the score of the best move, to prove that all other moves are worse, (but not how much worse). Shifting it to alpha-M might need more search work. OTOH, one man's alpha is another man's beta, and what makes it harder to achieve for one side, makes it easier to refute for the other, so this is not a certainty. When you used a null-window on alpha-M, and a move fails high against that, you know that the best move was not singular, and you can switch back to the original alpha for the remaining moves. But you would have to re-search the move that failed high against alpha-M, because it could be above alpha also. And this is obviousy extra work. If all non-best moves fail low against alpha-M in a reduced search, you can then re-search the best move at increased depth, while if any of the other moves failed high in this 'singularity search', you are happy as it is.

I think comparatively little has been done in terms of window manipulation that goes beyond the standard PVS algorithm.

mar · Post by **mar** » Thu Jan 28, 2016 3:17 am

Meni Rosenfeld wrote:6. My troubles with alpha-beta suggest to me that I don't really "get it". There are bits and pieces of information online (CPW, this forum etc.), but I think what could help me is a more textbook-like resource; doesn't necessarily have to be a book, but it should take a more structured, comprehensive, methodical approach to explaining the concepts. Where can I find such a resource?

i'll try to explain plain (=not negamax) alpha-beta to see if it helps.
it's 3 am here so I hope others will correct me if I mess up

white player tries to maximize alpha, black player tries to minimize beta
so white is max player and black is min player
leaves are evaluated from white's pov
think of alpha as "score of best white move found so far" and
beta as "score of best black move found so far" - this makes sense as you usually start with -inf, +inf (assume no aspiration windows)
move is probably not accurate so let's say it's variation instead of move,
but for alphabeta it doesn't really matter as it operates on bounds (collecting PV is only informative, except for first opponent reply which is useful for pondering)

usually you'll see pictures with root being max player (white),
but root can naturally also be min player (black) if black is to move

when done with the node, white player passes up alpha, black player passes up beta (let's assume clamped so fail-hard)

as you go deeper, bounds get potentially tighter
you can cutoff as soon as you close the interval;

white can get beta cutoff, this means that you found a move that is as good as or better than "best black move" from white's pov,
with other words black has already found equally good or better move (or variation) from his POV and won't enter this node so we can stop

similarly for black you can get alpha cutoff (exactly the same analogy but flipped)

now negamax transform (-alphabeta(depth-1, -beta, -alpha) is only used to simplify program logic, so you always maximize from "player to move"'s pov,
so alpha is my best score so far and beta is opponent's best score so far and you always pass up alpha and leaves are evaluated from "side to move"'s pov
all the sign flipping just adjusts scores from "side to move"'s pov,
alpha beta swapping is also logical because my beta becomes opponent's alpha and vice-versa

naturally, alpha-beta depends heavily on move ordering, in worst case (choosing worst possible moves) it would degenerate to minimax

Alpha-Beta woes, textbook-like resources, etc.

Alpha-Beta woes, textbook-like resources, etc.

Re: Alpha-Beta woes, textbook-like resources, etc.

Re: Alpha-Beta woes, textbook-like resources, etc.

Re: Alpha-Beta woes, textbook-like resources, etc.

Re: Alpha-Beta woes, textbook-like resources, etc.

Re: Alpha-Beta woes, textbook-like resources, etc.

Re: Alpha-Beta woes, textbook-like resources, etc.

Re: Alpha-Beta woes, textbook-like resources, etc.

Re: Alpha-Beta woes, textbook-like resources, etc.

Re: Alpha-Beta woes, textbook-like resources, etc.