Aspiration Windows: Rubbish!

Don · Post by **Don** » Wed Jul 18, 2012 1:54 pm

hgm wrote:You can futility-prune in a PV node after alpha is raised, right? The window is only {-INF, INF} on the search of the first move. After that alpha will be raised to some finite value, which very well might be above the current eval. E.g. suppose the first move captured a hanging Queen. After that, all moves that do not capture at least a Hawk are pretty futile.

You prune in PV nodes with futility or you can also use null move pruning - but whether you should is another question. We have never made either work for us.

hgm · Post by **hgm** » Wed Jul 18, 2012 2:15 pm

It depends on what you mean by 'work'. Of course it would not get any better from futility pruning in PV nodes, because there aren't enough of those to have a measurable effect. But it should not wreck the program when you do either. So there would be no reason to put in extra code to disable it when the node is PV.

Don · Post by **Don** » Wed Jul 18, 2012 2:26 pm

hgm wrote:It depends on what you mean by 'work'. Of course it would not get any better from futility pruning in PV nodes, because there aren't enough of those to have a measurable effect. But it should not wreck the program when you do either. So there would be no reason to put in extra code to disable it when the node is PV.

Komodo is written in C and the code to do the PV search is separate - so it is not extra code. The PV code is just as sensitive to things like this as the non-pv code - and doing futility the same way as in the non-pv clearly weakens the program. Yes, it's true the number of nodes in the PV search as just a fraction of the number of nodes in the non-pv part of the search but apparently those nodes are more critical.

hgm · Post by **hgm** » Wed Jul 18, 2012 2:56 pm

I always wondered if this really pays off. Ippolit, for example, has an enormous number of search routines, that are often only infinetessimally different. And their number is duplicated by the fact they all exist in a black and a white version. This seems a very expensive way to save an occasional if-statement on the color. Good caching has usually an order of magnitude more impact on computer code than the quality of the code itself, with these modern out-of-order CPUs.

asanjuan · Post by **asanjuan** » Wed Jul 18, 2012 3:21 pm

hgm wrote:You can futility-prune in a PV node after alpha is raised, right? The window is only {-INF, INF} on the search of the first move. After that alpha will be raised to some finite value, which very well might be above the current eval. E.g. suppose the first move captured a hanging Queen. After that, all moves that do not capture at least a Hawk are pretty futile.

What I wanted to say is that if you do a search like this:

Code: Select all

depth=1, alpha = -INF, beta= +INF

The root node will be a PV node and a frontier node at the same time.

and imagine you have a condition before make a move like this:

Code: Select all

if &#40;depth ==1 && board_eval + material_gain&#40;move&#41;  + FUTILE_MARGIN < alpha&#41;
  continue;

This contidion won't affect to anything, because you'll never enter inside the 'if'. Your alpha is -INF, so nothing is lower than this.

But if you do a search like this:

Code: Select all

depth=1, alpha = -10, beta= 10

Then you can enter inside the 'if' always and then your aspiration search won't be returning very good results.
And imagine that you have another silly search bug, then you can get the same problems that started this thread.

But, it was only a suggestion. I didn't intend to say that futility was bad for PV nodes.

Regards.

wgarvin · Post by **wgarvin** » Wed Jul 18, 2012 3:31 pm

hgm wrote:I always wondered if this really pays off. Ippolit, for example, has an enormous number of search routines, that are often only infinetessimally different. And their number is duplicated by the fact they all exist in a black and a white version. This seems a very expensive way to save an occasional if-statement on the color. Good caching has usually an order of magnitude more impact on computer code than the quality of the code itself, with these modern out-of-order CPUs.

My guess is that it helps speed, at least a little. Merging those routines into one would mean adding if-statements to check for the conditions. Even if they are predictable branches it would put more pressure on the branch predictor.

Color is especially the same way: every time it consults a bitboard, or computes an evaluation feature, it needs to take the color into account. Doing this at compile time is surely going to be faster than doing it with extra math instructions (which lengthen dep chains) or branches (pollute predictor even if they are completely predictable). Plus its easy to do at compile time using templates.

I know ipp has a lot of eval functions, but I don't think it uses so many of them at one time, or that they're large enough, to really blow the code cache. The only x86 programs I've ever seen where code cache was a serious bottleneck on the program were bytecode interpreters (e.g. a Flash/actionscript VM, or a huge generated routine that had a switch statement with 256 cases, each with 100+ bytes of code.)

Which is how I sort of formed my opinion, that code cache is only critical when you have a lot of entry points (like an interpreter's main switch statement), because then you pay the penalty over and over. With straight-line code, if its not cached when you jump to it there's a small cost (I imigine no worse than an L1 data miss), but it "catches up" pretty quickly. And most code is bottlenecked by data cache misses and the dependency chains of actually executing it, anyway.

I have seen a Flash bytecode interpreter get 5-10% faster after I hoisted some partially-redundant computations out of the case statements and had one copy of them before the giant switch (which reduced the codesize by a decent amount too) but I'm not even sure the improvement had to do with the code caching--I think the real win was shortening the dependency chains in the target bytecodes. Even though the hoisted calculations were wholly unused in about half of the switch cases, doing them up there was significantly faster than doing them in the proper cases after the dynamic branch, probably because the dep chain was shorter. (or: the dynamic branch, code cache miss, etc. hid the latency of the hoisted calculations).

Anyway: I think separate search routines will have performance benefits over the "uber-function" with runtime ifs (but to stay sane, write them as templates--don't keep copypasta everywhere, thats too painful!)

Don · Post by **Don** » Wed Jul 18, 2012 3:36 pm

hgm wrote:I always wondered if this really pays off. Ippolit, for example, has an enormous number of search routines, that are often only infinetessimally different. And their number is duplicated by the fact they all exist in a black and a white version. This seems a very expensive way to save an occasional if-statement on the color. Good caching has usually an order of magnitude more impact on computer code than the quality of the code itself, with these modern out-of-order CPUs.

The payoff for me is that we do thing substantially different in PV and Non-pv nodes and a single routine would be peppered with conditional statements. On the other hand it's usually good programming practice to use one routine instead of 2 (or more) so it's a tradeoff. I think it has protected me because we rarely do things to both routines, we focus on one or the other anyway. Our extension rules are even different.

Nevertheless, after the release of Komodo 5 I am probably going to do a major code cleanup and I may yet combine these routines. You can usually hide much of the complexity with the creative use of macro's and functions that abstract the concepts.

Don · Post by **Don** » Wed Jul 18, 2012 3:39 pm

wgarvin wrote:
hgm wrote:I always wondered if this really pays off. Ippolit, for example, has an enormous number of search routines, that are often only infinetessimally different. And their number is duplicated by the fact they all exist in a black and a white version. This seems a very expensive way to save an occasional if-statement on the color. Good caching has usually an order of magnitude more impact on computer code than the quality of the code itself, with these modern out-of-order CPUs.
My guess is that it helps speed, at least a little. Merging those routines into one would mean adding if-statements to check for the conditions. Even if they are predictable branches it would put more pressure on the branch predictor.

Color is especially the same way: every time it consults a bitboard, or computes an evaluation feature, it needs to take the color into account. Doing this at compile time is surely going to be faster than doing it with extra math instructions (which lengthen dep chains) or branches (pollute predictor even if they are completely predictable). Plus its easy to do at compile time using templates.

I know ipp has a lot of eval functions, but I don't think it uses so many of them at one time, or that they're large enough, to really blow the code cache. The only x86 programs I've ever seen where code cache was a serious bottleneck on the program were bytecode interpreters (e.g. a Flash/actionscript VM, or a huge generated routine that had a switch statement with 256 cases, each with 100+ bytes of code.)

A version of Socrates that I was working on at MIT had a separate move generator for every piece of every color for every square - and it was indeed a speedup if you were on the right platform. It would not even compile with certain compilers! A special program generated the source code for this. It worked great on the DEC Alpha but on my puny x86 laptop it was a real dog!

Which is how I sort of formed my opinion, that code cache is only critical when you have a lot of entry points (like an interpreter's main switch statement), because then you pay the penalty over and over. With straight-line code, if its not cached when you jump to it there's a small cost (I imigine no worse than an L1 data miss), but it "catches up" pretty quickly. And most code is bottlenecked by data cache misses and the dependency chains of actually executing it, anyway.

I have seen a Flash bytecode interpreter get 5-10% faster after I hoisted some partially-redundant computations out of the case statements and had one copy of them before the giant switch (which reduced the codesize by a decent amount too) but I'm not even sure the improvement had to do with the code caching--I think the real win was shortening the dependency chains in the target bytecodes. Even though the hoisted calculations were wholly unused in about half of the switch cases, doing them up there was significantly faster than doing them in the proper cases after the dynamic branch, probably because the dep chain was shorter. (or: the dynamic branch, code cache miss, etc. hid the latency of the hoisted calculations).

Anyway: I think separate search routines will have performance benefits over the "uber-function" with runtime ifs (but to stay sane, write them as templates--don't keep copypasta everywhere, thats too painful!)

ZirconiumX · Post by **ZirconiumX** » Wed Jul 18, 2012 7:28 pm

I have examined my implementation, and hope this pseudo code may clear things up:

Code: Select all

void search&#40;) &#123;
    InitEverything&#40;);
    for &#40;int depth = 1;;depth++) &#123;
       if &#40;depth <= 3&#41; &#123;
          alpha = -ValueInf;
          beta = +ValueInf;
       &#125; else &#123;
          alpha = last_search_value - 32;
          beta = last_search_value + 32;
       &#125;
       alphabetaroot&#40;);
       if &#40;search_is_fail_high&#41; &#123; // This limits it to depth 4
          depth -= 3; // This may well be the culprit
       &#125;
       if &#40;search_is_fail_low&#41; &#123;
          alpha *= 2; // /= 2???
          beta *= 2;
       &#125;
   &#125;
&#125;

So, the depth 4 search fails high, and the depth is reduced by three, giving me...a depth 1 search; maybe Robert Houdart has a minimum depth condition of >= 10. Since the depth 4 search finishes quickly, I only *just* see the depth 1 search, which sticks in my mind.

Yes, I know, you cannot work with information you aren't given, but really Lucas and Vincent, what you said was unkind.

Matthew:out

Matthew:out

bob · Post by **bob** » Wed Jul 18, 2012 8:10 pm

ZirconiumX wrote:I have examined my implementation, and hope this pseudo code may clear things up:
Code: Select all
void search&#40;) &#123;
    InitEverything&#40;);
    for &#40;int depth = 1;;depth++) &#123;
       if &#40;depth <= 3&#41; &#123;
          alpha = -ValueInf;
          beta = +ValueInf;
       &#125; else &#123;
          alpha = last_search_value - 32;
          beta = last_search_value + 32;
       &#125;
       alphabetaroot&#40;);
       if &#40;search_is_fail_high&#41; &#123; // This limits it to depth 4
          depth -= 3; // This may well be the culprit
       &#125;
       if &#40;search_is_fail_low&#41; &#123;
          alpha *= 2; // /= 2???
          beta *= 2;
       &#125;
   &#125;
&#125;
So, the depth 4 search fails high, and the depth is reduced by three, giving me...a depth 1 search; maybe Robert Houdart has a minimum depth condition of >= 10. Since the depth 4 search finishes quickly, I only *just* see the depth 1 search, which sticks in my mind.

Yes, I know, you cannot work with information you aren't given, but really Lucas and Vincent, what you said was unkind.

Matthew:out

Matthew:out

How can that work? You fail high at 4, so you have no exact scores in the hash table. You re-search starting at 1, and you end up with either (a) the same best move or (b) a move that failed high (due to the stored TT entry from depth=4) which you can't get a true score on since you can't see deep enough to do that...

Looks like a recipe for an infinite loop if you are not careful. The idea of re-starting can work, but there are risks in doing so, and you have to do a little clever programming in order to avoid the risk of never getting past some specific depth because you keep re-starting over and over...

Aspiration Windows: Rubbish!

Re: Aspiration Windows: Rubbish!

Re: Aspiration Windows: Rubbish!

Re: Aspiration Windows: Rubbish!

Re: Aspiration Windows: Rubbish!

Re: Aspiration Windows: Rubbish!

Re: Aspiration Windows: Rubbish!

Re: Aspiration Windows: Rubbish!

Re: Aspiration Windows: Rubbish!

Re: Aspiration Windows: Rubbish!

Re: Aspiration Windows: Rubbish!