On LMR, and statically predicting moves

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: On LMR, and statically predicting moves

Post by matthewlai »

mvk wrote:
So the first line is the frequency the best move predicted by the NN happens to be the correct best move from search. Then the frequency the second best move predicted by the NN happens to be the correct best move, etc.
But then I find it strange that you only get to ~60% at move 20, with 0.3% there. Is the remaining 40% in the tail, or is it in other node types? If it is in the tail, your tail seems too long: 40/0.3 >> 100.
Good catch! I intended to not count best moves that are winning captures, but my tallying code used total number of positions as denominator by accident.

Though this is also somewhat interesting - in about 40% of positions, the best move is a SSE-winning capture.

I re-ran it, and here are the correct results -

Code: Select all

0: 28.3733% (28.3733)
1: 15.9732% (44.3465)
2: 11.0295% (55.3761)
3: 7.99478% (63.3709)
4: 5.93898% (69.3098)
5: 4.01371% (73.3235)
6: 3.58949% (76.913)
7: 3.13265% (80.0457)
8: 2.85528% (82.901)
9: 2.44738% (85.3483)
10: 1.99054% (87.3389)
11: 1.86001% (89.1989)
12: 1.63159% (90.8305)
13: 1.14211% (91.9726)
14: 1.37053% (93.3431)
15: 1.09316% (94.4363)
16: 0.930005% (95.3663)
17: 0.734214% (96.1005)
18: 0.571056% (96.6716)
19: 0.489476% (97.161)
Average Confidence: 0.157208
Pretty amazing that 55% of best moves are in the first 3 predicted moves!

Would be really nice if someone can provide numbers for history heuristics for comparison. I could do it myself, but that means I would have to implement it...
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: On LMR, and statically predicting moves

Post by bob »

matthewlai wrote:
bob wrote: Your numbers look a bit strange. Most of us keep a simple fail high counter for the first move, which means we count the number of fail highs that occur, and the number that fail high on the first move. This is typically > 90%, at least for my program. Not the 15% or so you mentioned. This with all the usual ordering such as hash, non-losing captures, killers, counter moves, then history heuristic and the rest of the moves...
It is entirely possible that path-dependent heuristics is the way to go. I haven't established that's not the case yet, and I am not really convinced that it's not the case.

However, this is with no path-dependent stuff at all. No hash move, no killer, no history, no counter moves, etc. And no winning captures.

The moves are predicted simply by looking at the position, with no other information.

Do you happen to have data for history heuristics? I would be very interested in that. Of the positions where either there's no hash move, killers, etc, or where those moves failed to produce a cutoff, how many cutoffs happen on the first move sorted by history? And second move, third move, etc?

Also, the original spirit of this post would still hold with history heuristics.

For example, if hash, killer, good captures, etc don't produce a cutoff, and the history counters are "100, 99, 98, 97", the reduction schedule should probably be different from if it was "100, 0, 0, 0".
I've not had good luck with history and did not use it for a LONG time. I added it back in (again) last year because it does offer some benefit with LMR, since moving "good moves" in front of "bad moves" causes the good moves to be reduced less, and vice-versa. Years ago (Cray Blitz days) I had converted to using two killers per ply, and in the current ply trying first the two killers saved here, and then searching the two killers from two plies back (closer to the root). Once I started doing this, history heuristic stopped giving me any performance gain and I stopped using it.

One other note, Somewhere during the few months as I have been cleaning up my office in preparation for moving about 50 feet, I decided to skim through the ICCA/ICGA journals. I ran across a reference to "relative history counters" or something similar and I thought about it for a bit. Schaeffer recommended using 1<<depth to update the history counter but that obviously only worked back in the days of 5 ply or so searches. I started to use depth * depth as an alternative since the Cray was significantly faster and most still use that today. the relative history idea was to get rid of the depth, and just increment history counters by some sort of constant, and since I am not convinced that a good move at ply=3 is better than a good move at ply=15, when I am ordering moves at ply=31, I decided to give this a try and I like the results better than the depth*depth.

I also switched to a "saturating counter" so that the values don't get too large, nor wrap around when decrementing them. I have not really tested this on how much it improves ordering compared to killers and such, but it did seem to improve LMR performance. I am again re-tuning LMR as I write this as it appears that a more aggressive reduction schedule is working better as move ordering has been improved. The old ICGA journals have lots of info that I had not thought about. Counter-moves actually offer some interesting things when combined with LMR, the idea being to get potential moves searched earlier rather than later, as opposed to just producing a quicker cutoff.

LMR opens up a whole world of such ideas, and I have other to-do things to try (a very simple per-piece eval so that better moves get tried before worse moves (i.e. Nc3-e4 before Nc3-a2) for the same reason.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: On LMR, and statically predicting moves

Post by mvk »

bob wrote:I started to use depth * depth as an alternative since the Cray was significantly faster and most still use that today. the relative history idea was to get rid of the depth, and just increment history counters by some sort of constant, and since I am not convinced that a good move at ply=3 is better than a good move at ply=15, when I am ordering moves at ply=31, I decided to give this a try and I like the results better than the depth*depth.
Are you sure about cray blitz? I looked at crayblitz.tar.gz, but it looks to me it has no history table. But my fortran is rusty. Reason I ask is that if I recall correctly, the idea to use a quadratic increment instead of an exponential first appeared in rgcc when crafty already existed.
LMR opens up a whole world of such ideas, and I have other to-do things to try (a very simple per-piece eval so that better moves get tried before worse moves (i.e. Nc3-e4 before Nc3-a2) for the same reason.
You can look at the static eval, and reduce more if it goes down with the move. Very old idea.
[Account deleted]
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: On LMR, and statically predicting moves

Post by bob »

mvk wrote:
bob wrote:I started to use depth * depth as an alternative since the Cray was significantly faster and most still use that today. the relative history idea was to get rid of the depth, and just increment history counters by some sort of constant, and since I am not convinced that a good move at ply=3 is better than a good move at ply=15, when I am ordering moves at ply=31, I decided to give this a try and I like the results better than the depth*depth.
Are you sure about cray blitz? I looked at crayblitz.tar.gz, but it looks to me it has no history table. But my fortran is rusty. Reason I ask is that if I recall correctly, the idea to use a quadratic increment instead of an exponential first appeared in rgcc when crafty already existed.
LMR opens up a whole world of such ideas, and I have other to-do things to try (a very simple per-piece eval so that better moves get tried before worse moves (i.e. Nc3-e4 before Nc3-a2) for the same reason.
You can look at the static eval, and reduce more if it goes down with the move. Very old idea.
We only found one version which was fairly late in the CB era. I just looked at the code, and that version is the one that tries killers from other plies as well as the current ply. Which would make it one of the versions after history counters were removed, or before History counters were added. I think I used history moves after Schaeffer's ICCA paper which was published somewhere in the late 80's. I noticed that the comments in main.f do not mention history counters, which probably makes good sense since I had pegged this version at around the 1989 WCCC, so it might have been either before or right around the time of Schaeffer's paper. Let me look and I'll finish this reply. The reference I found was in IEEE PAMI 1989. The Wiki says it was "invented in 1983. My recollection was late in the 80's.

In any case, this version does not do the history stuff. Whether this is post-history (if he reported on it in 1983 then that would be correct, or if it is pre-history (assuming the IEEE citation is correct) that also makes sense. Because this version does try a bunch of killer moves. I looked at "phase.f" which is the various phases of move generation and selection, and phase3() appears to do even more killers than I remembered, trying ALL killers from any point in the list, skipping every other ply to do the right side only. Didn't even remember using that back then but obviously we did.

As far as depth*depth goes, it was such a simple thing I have no idea when that was done.

That is something that is probably lost in terms of when it was first used.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: On LMR, and statically predicting moves

Post by mvk »

bob wrote:
mvk wrote:
bob wrote:I started to use depth * depth as an alternative since the Cray was significantly faster and most still use that today. the relative history idea was to get rid of the depth, and just increment history counters by some sort of constant, and since I am not convinced that a good move at ply=3 is better than a good move at ply=15, when I am ordering moves at ply=31, I decided to give this a try and I like the results better than the depth*depth.
Are you sure about cray blitz? I looked at crayblitz.tar.gz, but it looks to me it has no history table. But my fortran is rusty. Reason I ask is that if I recall correctly, the idea to use a quadratic increment instead of an exponential first appeared in rgcc when crafty already existed.
As far as depth*depth goes, it was such a simple thing I have no idea when that was done.

That is something that is probably lost in terms of when it was first used.
Could be at this point:

Code: Select all

From&#58; hy...&#40;at&#41;willis.cis.uab.edu &#40;Robert Hyatt&#41;
Subject&#58; Re&#58; Fractional depth increments
Date&#58; 1996/01/29
Message-ID&#58; <4einh8$irv@pelham.cis.uab.edu>#1/1

--- snip ---
I've been using "depth" for several months.  I tested 1<<depth vs depth, and
didn't see any difference to speak of.  Also, after a complete search is done,
and I display Crafty's chosen move, I don't clear the history values, rather 
I >>8 them.  This means that after 4 moves a history move will be zero,
unless it's used during some of the searches.  This prevents starting a search
from scratch with no history data.

Maybe, after reading the above, I might try depth*depth to see how that
impacts things... 
----
[Account deleted]
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: On LMR, and statically predicting moves

Post by bob »

mvk wrote:
bob wrote:
mvk wrote:
bob wrote:I started to use depth * depth as an alternative since the Cray was significantly faster and most still use that today. the relative history idea was to get rid of the depth, and just increment history counters by some sort of constant, and since I am not convinced that a good move at ply=3 is better than a good move at ply=15, when I am ordering moves at ply=31, I decided to give this a try and I like the results better than the depth*depth.
Are you sure about cray blitz? I looked at crayblitz.tar.gz, but it looks to me it has no history table. But my fortran is rusty. Reason I ask is that if I recall correctly, the idea to use a quadratic increment instead of an exponential first appeared in rgcc when crafty already existed.
As far as depth*depth goes, it was such a simple thing I have no idea when that was done.

That is something that is probably lost in terms of when it was first used.
Could be at this point:

Code: Select all

From&#58; hy...&#40;at&#41;willis.cis.uab.edu &#40;Robert Hyatt&#41;
Subject&#58; Re&#58; Fractional depth increments
Date&#58; 1996/01/29
Message-ID&#58; <4einh8$irv@pelham.cis.uab.edu>#1/1

--- snip ---
I've been using "depth" for several months.  I tested 1<<depth vs depth, and
didn't see any difference to speak of.  Also, after a complete search is done,
and I display Crafty's chosen move, I don't clear the history values, rather 
I >>8 them.  This means that after 4 moves a history move will be zero,
unless it's used during some of the searches.  This prevents starting a search
from scratch with no history data.

Maybe, after reading the above, I might try depth*depth to see how that
impacts things... 
----
Could well be. Nice find.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: On LMR, and statically predicting moves

Post by matthewlai »

bob wrote:
matthewlai wrote:
bob wrote: Your numbers look a bit strange. Most of us keep a simple fail high counter for the first move, which means we count the number of fail highs that occur, and the number that fail high on the first move. This is typically > 90%, at least for my program. Not the 15% or so you mentioned. This with all the usual ordering such as hash, non-losing captures, killers, counter moves, then history heuristic and the rest of the moves...
It is entirely possible that path-dependent heuristics is the way to go. I haven't established that's not the case yet, and I am not really convinced that it's not the case.

However, this is with no path-dependent stuff at all. No hash move, no killer, no history, no counter moves, etc. And no winning captures.

The moves are predicted simply by looking at the position, with no other information.

Do you happen to have data for history heuristics? I would be very interested in that. Of the positions where either there's no hash move, killers, etc, or where those moves failed to produce a cutoff, how many cutoffs happen on the first move sorted by history? And second move, third move, etc?

Also, the original spirit of this post would still hold with history heuristics.

For example, if hash, killer, good captures, etc don't produce a cutoff, and the history counters are "100, 99, 98, 97", the reduction schedule should probably be different from if it was "100, 0, 0, 0".
I've not had good luck with history and did not use it for a LONG time. I added it back in (again) last year because it does offer some benefit with LMR, since moving "good moves" in front of "bad moves" causes the good moves to be reduced less, and vice-versa. Years ago (Cray Blitz days) I had converted to using two killers per ply, and in the current ply trying first the two killers saved here, and then searching the two killers from two plies back (closer to the root). Once I started doing this, history heuristic stopped giving me any performance gain and I stopped using it.

One other note, Somewhere during the few months as I have been cleaning up my office in preparation for moving about 50 feet, I decided to skim through the ICCA/ICGA journals. I ran across a reference to "relative history counters" or something similar and I thought about it for a bit. Schaeffer recommended using 1<<depth to update the history counter but that obviously only worked back in the days of 5 ply or so searches. I started to use depth * depth as an alternative since the Cray was significantly faster and most still use that today. the relative history idea was to get rid of the depth, and just increment history counters by some sort of constant, and since I am not convinced that a good move at ply=3 is better than a good move at ply=15, when I am ordering moves at ply=31, I decided to give this a try and I like the results better than the depth*depth.

I also switched to a "saturating counter" so that the values don't get too large, nor wrap around when decrementing them. I have not really tested this on how much it improves ordering compared to killers and such, but it did seem to improve LMR performance. I am again re-tuning LMR as I write this as it appears that a more aggressive reduction schedule is working better as move ordering has been improved. The old ICGA journals have lots of info that I had not thought about. Counter-moves actually offer some interesting things when combined with LMR, the idea being to get potential moves searched earlier rather than later, as opposed to just producing a quicker cutoff.

LMR opens up a whole world of such ideas, and I have other to-do things to try (a very simple per-piece eval so that better moves get tried before worse moves (i.e. Nc3-e4 before Nc3-a2) for the same reason.
I just implemented and tested history heuristics in my engine (log(nodeBudget)^2 increment, which is roughly equivalent to depth^2), and found that it does pretty much nothing most of the time as well, at higher depths. However, I am not doing LMR. Maybe there would be some improvements with LMR.

I am now using it at a few plies to leaves, and there's some but not a lot of improvement (12 Elo after 3000 games). It's nice because I use neural networks to get very good ordering and node allocation at higher depths, but it's too slow to run at shallower depths. Now I can have at least somewhat non-random quiet move ordering at shallow depths as well (beyond killers). I should probably look into LMR/LMP for shallow depths.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.