lazy eval discussion

Kempelen · Post by **Kempelen** » Mon Mar 21, 2011 1:48 pm

bob wrote:
UncombedCoconut wrote:Would it be interesting to quantify the benefit from the speed-up and the cost of the error separately? (This would involve an asm hack to produce a Crafty that does the full eval's calculations every time, but returns the same result as default Crafty.) With margins on the scale you've mentioned, I'm guessing it wouldn't be, but I figured I would ask.
This technique adds error and saves time. Within reason, the time saved more than offsets the error. But you can tweak the lazy eval margin to be more aggressive and it certainly plays worse. We tuned that value to its optimal setting with our cluster testing...

Hi,

An idea that comes to my mind on the fly, .... What about if instead of a static margin, to use a dynamic margin based on curren situation?. P.e., if you are in a position with passed pawns, or a king attack type, the are more probability that margins cost you errors, so if you change it when same situations arise then that could improve the search accuracy. what your opinion?

bob · Post by **bob** » Mon Mar 21, 2011 4:53 pm

Kempelen wrote:
bob wrote:
UncombedCoconut wrote:Would it be interesting to quantify the benefit from the speed-up and the cost of the error separately? (This would involve an asm hack to produce a Crafty that does the full eval's calculations every time, but returns the same result as default Crafty.) With margins on the scale you've mentioned, I'm guessing it wouldn't be, but I figured I would ask.
This technique adds error and saves time. Within reason, the time saved more than offsets the error. But you can tweak the lazy eval margin to be more aggressive and it certainly plays worse. We tuned that value to its optimal setting with our cluster testing...
Hi,

An idea that comes to my mind on the fly, .... What about if instead of a static margin, to use a dynamic margin based on curren situation?. P.e., if you are in a position with passed pawns, or a king attack type, the are more probability that margins cost you errors, so if you change it when same situations arise then that could improve the search accuracy. what your opinion?

We have a dynamic margin at the second lazy-eval exit point. But at the first one, we don't know about passed pawns and such...

bob · Post by **bob** » Mon Mar 21, 2011 7:50 pm

Here are the 4 test versions:

Code: Select all

    Crafty-23.5R01-0     2658
    Crafty-23.5R01-2     2643
    Crafty-23.5R01-1     2642
    Crafty-23.5R01-3     2620

What are those?

the -n is a 2 bit value. if rightmost bit is 1, disable the first lazy eval exit. If leftmost bit is 1, disable second lazy eval exit. If both are 1, disable both. Obviously 0 is normal.

So, eliminating either early exit is a 15-16 elo drop. eliminating both is a 38 Elo drop. All within the usual +/-4 error bar.

This is a classic trade-off. Lazy eval produces evaluation errors, but it speeds things up significantly.

For example, for one test position, 4 runs to fixed depth, showing nodes searched and nps, in the same order as above, normal, one or other disabled, both disabled:

Code: Select all

              time=27.82  mat=0  n=50036932  fh=94%  nps=1.8M
              time=29.76  mat=0  n=50036944  fh=94%  nps=1.7M
              time=30.21  mat=0  n=49723316  fh=94%  nps=1.6M
              time=36.67  mat=0  n=49723331  fh=94%  nps=1.4M

I have not tried to quantify that .4M nps drop to see how that stacks up in Elo (typically double speed and gain +70 elo in the previous tests I have run and reported here. But that looks to be fairly close in terms of expected Elo loss, which must mean that there is not much of an error increase, just a speed increase, based on the above...

mcostalba · Post by **mcostalba** » Mon Mar 21, 2011 8:44 pm

Thanks for testing and publishing the results.

We will try another attempt at this ....

bob · Post by **bob** » Mon Mar 21, 2011 9:29 pm

mcostalba wrote:Thanks for testing and publishing the results.

We will try another attempt at this ....

For a quick note.

Our early exit margin is a piece (300. Actually 305, which probably factors in the WTM bonus).

Our later exit is more complicated but does factor in a bonus that can grow if initial scoring suggests that the king position (based only on pawn shelter) is unsafe. The margin is roughly a pawn . This is used after all pawn scoring has been done so we are not testing with current score +/- this margin against alpha/beta values...

I suspect the second test value (and perhaps even the first) is optimal for Crafty, it is probably not for everyone else and will need tuning. But the code is very simple, so it should be easy to try the idea...

bob · Post by **bob** » Tue Mar 22, 2011 1:31 am

FlavusSnow wrote:I reread your post. I had originally taken it to mean you had just added two break points to the eval function to exit early and you were testing them to find if there was any ELO. Now I see you had those two points in there all along and were trying to put an ELO value to them.

Correct. That was the question. What kind of Elo gain do each individually, or both together produce? That's answered in the "results" sub-thread..

bob · Post by **bob** » Tue Mar 22, 2011 5:08 am

Ferdy wrote:
bob wrote:
Ferdy wrote:
bob wrote:Last week we discussed lazy eval and I said I would try to test it to see what the gain is in Crafty. We've been going thru yet another cluster fiasco and over the week I have finally gotten a test run that worked.

First, we have two lazy eval cutoffs. One right at the top, which avoids doing everything if the score is a piece or so outside the AB window. One near the bottom after pawns and passed pawns are evaluated, to bypass the individual piece scoring if the score at that point is outside a somewhat narrower window.

I first disabled the first test only, leaving the second. The net loss was -13 Elo. I am trying to run with the second test disabled, and then with both disabled. I will report as the results come in...
At one point in my development, I tried lazy eval same as your first one with additional conditions, one side should not do lazy eval successively, and both sides should not do lazy eval successively - I call this limited lazy eval. The idea is to prevent total dependency on lazy eval which probably at times will miss winning or drawing oppurtunities. I got around +3 elo on this from time control of 40 moves / 20 sec (repeating).

example:
white: lazy
black: no lazy (because white used lazy)

white: no lazy (because white used lazy previously)
black: lazy

white: no lazy (because black used lazy)
black: no lazy (because black used lazy previously)
When you say +3 elo, that implies around 50K games total to measure within +/-3???
It's only 16k both engines actually and I give up further testing. I tested like 2k, then 2k, then 2k ... for both. The engine with lazy eval just don't go down, but could not gain a sizeable advantage either.

That's a classic mistake. The Elo error bar is much wider than the gap between the two programs. I have seen lots of cases where two versions start off close, but end up 15 or whatever apart. Never forget the error bar, or you can be misled into keeping something bad or tossing something good.

Ferdy · Post by **Ferdy** » Tue Mar 22, 2011 7:49 am

bob wrote:
Ferdy wrote:
bob wrote:
Ferdy wrote:
bob wrote:Last week we discussed lazy eval and I said I would try to test it to see what the gain is in Crafty. We've been going thru yet another cluster fiasco and over the week I have finally gotten a test run that worked.

First, we have two lazy eval cutoffs. One right at the top, which avoids doing everything if the score is a piece or so outside the AB window. One near the bottom after pawns and passed pawns are evaluated, to bypass the individual piece scoring if the score at that point is outside a somewhat narrower window.

I first disabled the first test only, leaving the second. The net loss was -13 Elo. I am trying to run with the second test disabled, and then with both disabled. I will report as the results come in...
At one point in my development, I tried lazy eval same as your first one with additional conditions, one side should not do lazy eval successively, and both sides should not do lazy eval successively - I call this limited lazy eval. The idea is to prevent total dependency on lazy eval which probably at times will miss winning or drawing oppurtunities. I got around +3 elo on this from time control of 40 moves / 20 sec (repeating).

example:
white: lazy
black: no lazy (because white used lazy)

white: no lazy (because white used lazy previously)
black: lazy

white: no lazy (because black used lazy)
black: no lazy (because black used lazy previously)
When you say +3 elo, that implies around 50K games total to measure within +/-3???
It's only 16k both engines actually and I give up further testing. I tested like 2k, then 2k, then 2k ... for both. The engine with lazy eval just don't go down, but could not gain a sizeable advantage either.
That's a classic mistake. The Elo error bar is much wider than the gap between the two programs. I have seen lots of cases where two versions start off close, but end up 15 or whatever apart. Never forget the error bar, or you can be misled into keeping something bad or tossing something good.

I am aware of error bar. From my experience though after 15k for both engines, the leading engine will often prevail when testing further. Of course I did not use the limited lazy eval because of this small elo difference that I get and the uncertainty of not taking into account some positional evaluations. It is a different story when for example I will add new king safety feature - I will gamble and take the change even if I only get +3 elo (I don't care the error bars) that is after 15k games.

Milos · Post by **Milos** » Tue Mar 22, 2011 8:12 am

Kempelen wrote:An idea that comes to my mind on the fly, .... What about if instead of a static margin, to use a dynamic margin based on curren situation?. P.e., if you are in a position with passed pawns, or a king attack type, the are more probability that margins cost you errors, so if you change it when same situations arise then that could improve the search accuracy. what your opinion?

That's already been done in Rybka 3. And the benefit compared to simple (static) lazy eval is almost not measurable (less than 5 elo). Simply put it's not worth the effort.

marcelk · Post by **marcelk** » Tue Mar 22, 2011 8:50 am

bob wrote:Here are the 4 test versions:

Code: Select all

    Crafty-23.5R01-0     2658
    Crafty-23.5R01-2     2643
    Crafty-23.5R01-1     2642
    Crafty-23.5R01-3     2620

Thanks for the measurement. A measurement is always worth more than an expert's opinion.
I'm pretty sure that the original discussion was regarding lazy eval in pv-nodes. But I can't find back the thread anymore.

I remember I replied that I don't do that, in pv-nodes of PVS, because it didn't make an overall difference over there. Besides, in my specific case in my code it is slightly simpler to call eval_full when in PV.

Do you use PVS in Crafty? If yes, do you have a measurement that corresponds to not doing lazy eval in PV-nodes only? What is the ratio of slow evals vs. fast evals in PV nodes?

lazy eval discussion

Re: lazy eval discussion

Re: lazy eval discussion

Re: lazy eval discussion -final results

Re: lazy eval discussion -final results

Re: lazy eval discussion -final results

Re: lazy eval discussion

Re: lazy eval discussion

Re: lazy eval discussion

Re: lazy eval discussion

Re: lazy eval discussion -final results