lazy eval discussion

UncombedCoconut · Post by **UncombedCoconut** » Tue Mar 22, 2011 9:10 am

marcelk wrote:I'm pretty sure that the original discussion was regarding lazy eval in pv-nodes. But I can't find back the thread anymore.

I think this is what you were looking for. Crafty is a PVS program. In Crafty, both alpha and beta are parameters for Evaluate(), so in its specific code it's simpler to enable lazy eval at every node.
At least, that's what I see looking through the source.

bob · Post by **bob** » Tue Mar 22, 2011 4:07 pm

marcelk wrote:
bob wrote:Here are the 4 test versions:
Code: Select all
    Crafty-23.5R01-0     2658
    Crafty-23.5R01-2     2643
    Crafty-23.5R01-1     2642
    Crafty-23.5R01-3     2620   
Thanks for the measurement. A measurement is always worth more than an expert's opinion.
I'm pretty sure that the original discussion was regarding lazy eval in pv-nodes. But I can't find back the thread anymore.

I remember I replied that I don't do that, in pv-nodes of PVS, because it didn't make an overall difference over there. Besides, in my specific case in my code it is slightly simpler to call eval_full when in PV.

Do you use PVS in Crafty? If yes, do you have a measurement that corresponds to not doing lazy eval in PV-nodes only? What is the ratio of slow evals vs. fast evals in PV nodes?

I have been using PVS since roughly 1978 when Murray Campbell came up with the idea and asked me to test it while we were in Washington at the ACM tournament.

I'd have to think about how to run that test (no LE in PV only). I suppose I could avoid it if alpha != beta - 1. I don't have a "slow or fast eval" so I assume you mean what is the ratio of the three evals I have (early exit, late exit, full eval)? I do not measure that, but could. I believe that restricting LE to non-PV nodes won't make any significant difference. It will likely hurt a very small amount, since PV nodes are a small part of the total nodes searched.

I'll try to run this when I can add the tests and counters.

bob · Post by **bob** » Tue Mar 22, 2011 4:08 pm

UncombedCoconut wrote:
marcelk wrote:I'm pretty sure that the original discussion was regarding lazy eval in pv-nodes. But I can't find back the thread anymore.
I think this is what you were looking for. Crafty is a PVS program. In Crafty, both alpha and beta are parameters for Evaluate(), so in its specific code it's simpler to enable lazy eval at every node.
At least, that's what I see looking through the source.

You can still do the if (alpha != beta -1) then this is a PV node type of test...

bob · Post by **bob** » Tue Mar 22, 2011 4:10 pm

Ferdy wrote:
bob wrote:
Ferdy wrote:
bob wrote:
Ferdy wrote:
bob wrote:Last week we discussed lazy eval and I said I would try to test it to see what the gain is in Crafty. We've been going thru yet another cluster fiasco and over the week I have finally gotten a test run that worked.

First, we have two lazy eval cutoffs. One right at the top, which avoids doing everything if the score is a piece or so outside the AB window. One near the bottom after pawns and passed pawns are evaluated, to bypass the individual piece scoring if the score at that point is outside a somewhat narrower window.

I first disabled the first test only, leaving the second. The net loss was -13 Elo. I am trying to run with the second test disabled, and then with both disabled. I will report as the results come in...
At one point in my development, I tried lazy eval same as your first one with additional conditions, one side should not do lazy eval successively, and both sides should not do lazy eval successively - I call this limited lazy eval. The idea is to prevent total dependency on lazy eval which probably at times will miss winning or drawing oppurtunities. I got around +3 elo on this from time control of 40 moves / 20 sec (repeating).

example:
white: lazy
black: no lazy (because white used lazy)

white: no lazy (because white used lazy previously)
black: lazy

white: no lazy (because black used lazy)
black: no lazy (because black used lazy previously)
When you say +3 elo, that implies around 50K games total to measure within +/-3???
It's only 16k both engines actually and I give up further testing. I tested like 2k, then 2k, then 2k ... for both. The engine with lazy eval just don't go down, but could not gain a sizeable advantage either.
That's a classic mistake. The Elo error bar is much wider than the gap between the two programs. I have seen lots of cases where two versions start off close, but end up 15 or whatever apart. Never forget the error bar, or you can be misled into keeping something bad or tossing something good.
I am aware of error bar. From my experience though after 15k for both engines, the leading engine will often prevail when testing further. Of course I did not use the limited lazy eval because of this small elo difference that I get and the uncertainty of not taking into account some positional evaluations. It is a different story when for example I will add new king safety feature - I will gamble and take the change even if I only get +3 elo (I don't care the error bars) that is after 15k games.

The problem is that if you have +3 after 10K games, with an error bar of +/- 10, you could actually be at -7 without knowing. You need to drive the error bar down below the delta gain you expect...

marcelk · Post by **marcelk** » Tue Mar 22, 2011 7:30 pm

bob wrote: I'd have to think about how to run that test (no LE in PV only). I suppose I could avoid it if alpha != beta - 1. I don't have a "slow or fast eval" so I assume you mean what is the ratio of the three evals I have (early exit, late exit, full eval)?

Indeed, that is what I mean. My prediction is #full eval > #late exit > #early exit.

bob wrote: I do not measure that, but could. I believe that restricting LE to non-PV nodes won't make any significant difference. It will likely hurt a very small amount, since PV nodes are a small part of the total nodes searched.

My expectation less than 1 elo difference, or a difference well within the error margin.

I'll try to run this when I can add the tests and counters.

Please, only if you have nothing else to run. It is kind of hypothetical to me, only because I found the original question so strange. But if the outcome surprises us, one more reason to celebrate the power of measuring changes.

bob · Post by **bob** » Tue Mar 22, 2011 7:58 pm

marcelk wrote:
bob wrote: I'd have to think about how to run that test (no LE in PV only). I suppose I could avoid it if alpha != beta - 1. I don't have a "slow or fast eval" so I assume you mean what is the ratio of the three evals I have (early exit, late exit, full eval)?
Indeed, that is what I mean. My prediction is #full eval > #late exit > #early exit.

bob wrote: I do not measure that, but could. I believe that restricting LE to non-PV nodes won't make any significant difference. It will likely hurt a very small amount, since PV nodes are a small part of the total nodes searched.
My expectation less than 1 elo difference, or a difference well within the error margin.

I'll try to run this when I can add the tests and counters.
Please, only if you have nothing else to run. It is kind of hypothetical to me, only because I found the original question so strange. But if the outcome surprises us, one more reason to celebrate the power of measuring changes.

I have learned a lot thru testing. I didn't think much of the MVV/LVA ordering (using SEE to select, but MVV/LVA to actually order) until I ran the test and saw that it was clearly better. So surprises (for me) happen regularly. The current last-4-plies of forward pruning was discovered by accident. I had been doing 3 for years, thanks to Heinz, but when I rewrote and cleaned up code, I got the test wrong and the program was significantly stronger. I couldn't figure out why since the changes were cosmetic or minor efficiency issues. A tree dump showed the extra ply of pruning, and after tuning, it was a significant gain. I now have the ability to prune beyond the last 4 as well, although so far, testing has not shown a benefit. But testing needs to be done for longer time controls as it might be more significant there. Just much harder to test...

Kempelen · Post by **Kempelen** » Wed Mar 23, 2011 11:13 am

bob wrote:
Kempelen wrote:
bob wrote:
UncombedCoconut wrote:Would it be interesting to quantify the benefit from the speed-up and the cost of the error separately? (This would involve an asm hack to produce a Crafty that does the full eval's calculations every time, but returns the same result as default Crafty.) With margins on the scale you've mentioned, I'm guessing it wouldn't be, but I figured I would ask.
This technique adds error and saves time. Within reason, the time saved more than offsets the error. But you can tweak the lazy eval margin to be more aggressive and it certainly plays worse. We tuned that value to its optimal setting with our cluster testing...
Hi,

An idea that comes to my mind on the fly, .... What about if instead of a static margin, to use a dynamic margin based on curren situation?. P.e., if you are in a position with passed pawns, or a king attack type, the are more probability that margins cost you errors, so if you change it when same situations arise then that could improve the search accuracy. what your opinion?
We have a dynamic margin at the second lazy-eval exit point. But at the first one, we don't know about passed pawns and such...

A posible idea would be to do fist eval function at first of every node that use any kind of margin (futility, lazy_eval, razoring, delta prunning, etc.....) and to base thouse margins in dynamic considerations. As around depth > 3 it usually does not do those decisions, it would be only at certains nodos . Maybe that could cost time, but maybe the trade of for accuracy would be positive. At least an idea to test.....

lazy eval discussion

Re: lazy eval discussion -final results

Re: lazy eval discussion -final results

Re: lazy eval discussion -final results

Re: lazy eval discussion

Re: lazy eval discussion -final results

Re: lazy eval discussion -final results

Re: lazy eval discussion