Testing Move Order Quality

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Cheney
Posts: 104
Joined: Thu Sep 27, 2012 2:24 am

Testing Move Order Quality

Post by Cheney »

Hi,

I was wondering how to test move order quality, not just how often a first move fails high, but doing this at various depths and also what type of move fails high (types of moves being hash/TT, capture, killer, etc.).

I did find a few references to this subject on talkchess and the computer chess club archives and was able to model some statistics, but have a question or two.

When reviewing the type of moves:

* Hash/TT move never fails high at any depth.
* Captures fail high at their peak at depth=1 and these fail highs decrease as the depth gets closer to the root.
* Killers and "all other moves" follow the same pattern as captures but they have less than the captures.

When reviewing what move number failed high - first move to tenth move was measured - this is what I observed when reaching a depth of 8 on a random position:

Depths 6, 7, and 8 - no fail highs.
Depth 5 - first move failed high 44 times
Depth 4 - first move - 299 cuts, second and fifth moves - 1 cut.
Depth 3 - first move - 1138 cuts, second and seventh moves - 1 cut.
Depth 2 - first move - 5613 cuts, second move - 15, fourth move - 1 cut.
Depth 1 - first move, 19933 cuts, 2nd move - 68 cuts, 3rd move, 11 cuts, 4th move - 2 cuts

Total fail highs (for all moves, not just the first 10 per ply): 27139
Total first fail highs: 27033
Overall Fail High for First Move: 99%

The observation for these two reviews is the same: the closer to the leaf nodes, the more the cuts; the closer to the root, the less the cuts.

The observation is consistent across other positions, with exception to the fact that other positions will have a lower cut percentage for the first move (I've seen as low as 75%).

I guess my first question on this observation is if it normal to not see cuts within a ply or two from the root? It is my understanding the higher we can cut then less nodes will be searched and more time will be available for searching. However, I would believe this to be a valid point - there are just less moves searched at depth=6 when compared to, let's say, depth=2. Plus, after all the iterations, reaching a depth of 8, I'd think maybe there are less chances of a "too good of a move" to exist at depth 7 and possibly depth 6.

A second question would be why my hash/TT does not cut? I am a little surprised that it does not cut anywhere in the entire search. I do write the best move for the node to the TT, whether it is an exact, lower, or upper. Maybe I'll track this same stat but for the best move beating alpha, maybe that's a valid test?

Last question would be if the process I am following is good then should I expect more cuts across the first move (or even first 5 moves) if I do improve move ordering?

Thank you in advance :)
Cheney
Posts: 104
Joined: Thu Sep 27, 2012 2:24 am

Re: Testing Move Order Quality

Post by Cheney »

Sorry, correction about the Hash/TT... I needed to quadruple check my code :oops:

The TT/Hash is indeed failing high at many nodes, it was incorrectly accounted for. Please ignore my question about the hash/TT. :)
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Testing Move Order Quality

Post by hgm »

Good to hear, because it would have been very fishy. People don't search the hash move first for no reason, and especially not because it is the worst possible thing to do! :D

My experience is also that it is very usefult to take the stats per depth, as otherwise everything will be dominated by the lowest depth (i.e. QS).
Cheney
Posts: 104
Joined: Thu Sep 27, 2012 2:24 am

Re: Testing Move Order Quality

Post by Cheney »

Ty :)

I am collecting this data across all depths reached and do see the lower depths do dominate the fail high counts (mostly silent moves, some killer) while the higher depths are cut by the hash and some captures.

From looking at this data for a few different positions, I do believe my current move ordering is good. I have seen "first move" fail highs hit anywhere from 80% to 99% of all cuts. But, I have seen 95% to 97% of all cuts happen within the first four moves.

I am going to collect this data from the next round of games the engine plays, just to eyeball its "overall" performance for fail highs.

Also, I have one other study I'd like to do. When my engine generates "silent" moves, I order those moves in this order:
1. Promotions
2. Pawn pushes to ranks 6 or 7, Passed Pawns moving
3. If the move reveals check on the opponent.
4. Castle moves.
5. History moves.

What I plan to do is track these same fail high counts and percentages for each one of these types of silent moves. My question in this is: if I see something like castle moves cuts more than pawn moves then is it safe to say to move castle moves higher in the order?
Cardoso
Posts: 362
Joined: Thu Mar 16, 2006 7:39 pm
Location: Portugal
Full name: Alvaro Cardoso

Re: Testing Move Order Quality

Post by Cardoso »

hgm wrote: Mon Jun 29, 2020 11:24 pm My experience is also that it is very useful to take the stats per depth, as otherwise everything will be dominated by the lowest depth (i.e. QS).
Same here, I never took the stats per depth but have noticed long ago that if I gather move ordering from all depths then my move ordering isn't great, but if I restrict the gathering to above 5*PLY or 9*PLY then things look fantastic for move ordering.
Cheney
Posts: 104
Joined: Thu Sep 27, 2012 2:24 am

Re: Testing Move Order Quality

Post by Cheney »

Cardoso wrote: Tue Jun 30, 2020 3:37 pm Same here, I never took the stats per depth but have noticed long ago that if I gather move ordering from all depths then my move ordering isn't great, but if I restrict the gathering to above 5*PLY or 9*PLY then things look fantastic for move ordering.
Interesting... basically, are you saying to not record stats until I obtained a higher ply? I'm going to try this out.
Cardoso
Posts: 362
Joined: Thu Mar 16, 2006 7:39 pm
Location: Portugal
Full name: Alvaro Cardoso

Re: Testing Move Order Quality

Post by Cardoso »

Cheney wrote: Tue Jun 30, 2020 5:18 pm
Cardoso wrote: Tue Jun 30, 2020 3:37 pm Same here, I never took the stats per depth but have noticed long ago that if I gather move ordering from all depths then my move ordering isn't great, but if I restrict the gathering to above 5*PLY or 9*PLY then things look fantastic for move ordering.
Interesting... basically, are you saying to not record stats until I obtained a higher ply? I'm going to try this out.
Just take the PV produced by your engine, as you approach the the end of it, the moves will be less precise.
Also LMR causes lines to have different depths, and so killers near the tips are affected.
So near the tips things are a bit chaotic and random, gathering move ordering stats there is a bit noisy.
First check your move ordering stats at every node.
Then guard you move ordering gathering stats by:

Code: Select all

if (depth > 7*PLY) {
   ... code for move ordering stats
}
You will see a difference.
http://talkchess.com/forum3/viewtopic.p ... ng#p787987
Cheney
Posts: 104
Joined: Thu Sep 27, 2012 2:24 am

Re: Testing Move Order Quality

Post by Cheney »

Ty :)

I have a few set positions I run tests against and I'd say the average depth I reach for these is 10 (I only have null move pruning, no other pruning or reduction method at this time).

When using the if(depth > 7 * PLY), there are a lot of zeros in the report. So, I tried if depth > 3 * PLY. What I was performing before was if(PLY > 5). Now that I have looked at three type of reports, I can see what you mean by the tips will be noisy.

Now I need to understand how to read this and just what this data is telling me. My original assumption was if I saw a move type with a high volume of cut offs then I would move that move type higher in the ordered list. However, when I compared each of the searches from the before and after, I may not have reached the same depth or took more time searching. Based on that, I felt resorting that movetype was not best.

Here is the report based on if(depth > 3*PLY):

Code: Select all

Search Nodes:            26819821
Quiesce Nodes:           16983979
Total Nodes:             43803800
(* Note * Nodes are total for all searches, recorded across all PLYs)

(* Note* The following data is generated based on depth > 3 * PLY)
Total cuts:         1360
First Move cuts:    1210
First Move %:       88%

TopN Measure to move number: 5
TopN Cuts:          1331
TopN %:             97%


Branching Factor Best:    1
Branching Factor Worst:   5
Branching Factor Average: 2

Depth | Hash Count | Capt. Count | CapPromo | Kilr Count | Silent Count | First Move
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 13   |          0 |           0 |        0 |          4 |            0 |         0
 12   |          6 |           4 |        0 |          5 |            0 |         9
 11   |         26 |          22 |        0 |          4 |            0 |        51
 10   |         14 |          25 |        0 |          4 |            0 |        34
  9   |         11 |          12 |        3 |          4 |            0 |        22
  8   |         68 |          46 |       10 |         20 |            0 |       129
  7   |        102 |         127 |       10 |         21 |            1 |       236
  6   |         12 |         124 |       10 |         25 |            0 |       161
  5   |        262 |           0 |        0 |          6 |            0 |       262
  4   |        306 |           5 |        0 |          4 |            3 |       306
  3   |          0 |           0 |        0 |          0 |            0 |         0
  2   |          0 |           0 |        0 |          0 |            0 |         0
  1   |          0 |           0 |        0 |          0 |            0 |         0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total |        807 |         365 |       33 |         97 |            4 |      1210
 %:   |        59% |         26% |       2% |         7% |           0% |       88%


Depth | Promo | GoodPawn | Castling | Checking | History | Counter Move
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 13   |     0 |        0 |        0 |        0 |       0 |       1
 12   |     0 |        0 |        0 |        0 |       1 |       0
 11   |     0 |        0 |        0 |        1 |       5 |       0
 10   |     0 |        1 |        0 |        0 |       4 |       0
  9   |     0 |        0 |        0 |        0 |       0 |       0
  8   |     0 |        0 |        0 |        6 |       7 |       3
  7   |     0 |        0 |        0 |        0 |       7 |       4
  6   |     0 |        0 |        0 |        2 |       0 |       2
  5   |     0 |        0 |        0 |        2 |       2 |       0
  4   |     0 |        0 |        0 |        0 |       6 |       0
  3   |     0 |        0 |        0 |        0 |       0 |       0
  2   |     0 |        0 |        0 |        0 |       0 |       0
  1   |     0 |        0 |        0 |        0 |       0 |       0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total |     0 |        1 |        0 |       11 |      32 |      10
 %:   |    0% |       0% |       0% |       0% |      2% |      0%
 
My move ordering is - the tables above do not reflect this order:

* Hash
* Capture with promotion
* Captures based on MVV/LVA
* Killer
* Good Pawn (passed pawn moves, pushes to rank 6 or 7)
* Checking moves
* Castle Moves
* Counter move (new, just added)
* History move

My thoughts:
  • I know when accounting for all depths, first move cuts is around 94% (pulled from my old style of reports).
  • Silent Count is moves with no sorting value.
  • Hash is 59% of the cuts, followed by captures, and promotions.
  • I "think" killers might be better than captures that promote. Is this an accurate conclusion? Maybe sort these better than capturing promotions?
  • As for the 2nd table, I can conclude history is better than the other sorted move types but with zeros for the other types, it make it seem like having them is invalid.
As for the last bullet, take the promotions and good pawns for example, sort the history above those and in end game positions, the search is slower and not as deep.

In the end, I was hoping these numbers would point out the value in creating a sort score for a specific move type. Is my train of thought on the right tracks or do I need to look at this differently?
Cardoso
Posts: 362
Joined: Thu Mar 16, 2006 7:39 pm
Location: Portugal
Full name: Alvaro Cardoso

Re: Testing Move Order Quality

Post by Cardoso »

Cheney wrote: Thu Jul 02, 2020 12:08 pm Ty :)

I have a few set positions I run tests against and I'd say the average depth I reach for these is 10 (I only have null move pruning, no other pruning or reduction method at this time).

When using the if(depth > 7 * PLY), there are a lot of zeros in the report. So, I tried if depth > 3 * PLY. What I was performing before was if(PLY > 5). Now that I have looked at three type of reports, I can see what you mean by the tips will be noisy.

Now I need to understand how to read this and just what this data is telling me. My original assumption was if I saw a move type with a high volume of cut offs then I would move that move type higher in the ordered list. However, when I compared each of the searches from the before and after, I may not have reached the same depth or took more time searching. Based on that, I felt resorting that movetype was not best.

Here is the report based on if(depth > 3*PLY):

Code: Select all

Search Nodes:            26819821
Quiesce Nodes:           16983979
Total Nodes:             43803800
(* Note * Nodes are total for all searches, recorded across all PLYs)

(* Note* The following data is generated based on depth > 3 * PLY)
Total cuts:         1360
First Move cuts:    1210
First Move %:       88%

TopN Measure to move number: 5
TopN Cuts:          1331
TopN %:             97%


Branching Factor Best:    1
Branching Factor Worst:   5
Branching Factor Average: 2

Depth | Hash Count | Capt. Count | CapPromo | Kilr Count | Silent Count | First Move
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 13   |          0 |           0 |        0 |          4 |            0 |         0
 12   |          6 |           4 |        0 |          5 |            0 |         9
 11   |         26 |          22 |        0 |          4 |            0 |        51
 10   |         14 |          25 |        0 |          4 |            0 |        34
  9   |         11 |          12 |        3 |          4 |            0 |        22
  8   |         68 |          46 |       10 |         20 |            0 |       129
  7   |        102 |         127 |       10 |         21 |            1 |       236
  6   |         12 |         124 |       10 |         25 |            0 |       161
  5   |        262 |           0 |        0 |          6 |            0 |       262
  4   |        306 |           5 |        0 |          4 |            3 |       306
  3   |          0 |           0 |        0 |          0 |            0 |         0
  2   |          0 |           0 |        0 |          0 |            0 |         0
  1   |          0 |           0 |        0 |          0 |            0 |         0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total |        807 |         365 |       33 |         97 |            4 |      1210
 %:   |        59% |         26% |       2% |         7% |           0% |       88%


Depth | Promo | GoodPawn | Castling | Checking | History | Counter Move
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 13   |     0 |        0 |        0 |        0 |       0 |       1
 12   |     0 |        0 |        0 |        0 |       1 |       0
 11   |     0 |        0 |        0 |        1 |       5 |       0
 10   |     0 |        1 |        0 |        0 |       4 |       0
  9   |     0 |        0 |        0 |        0 |       0 |       0
  8   |     0 |        0 |        0 |        6 |       7 |       3
  7   |     0 |        0 |        0 |        0 |       7 |       4
  6   |     0 |        0 |        0 |        2 |       0 |       2
  5   |     0 |        0 |        0 |        2 |       2 |       0
  4   |     0 |        0 |        0 |        0 |       6 |       0
  3   |     0 |        0 |        0 |        0 |       0 |       0
  2   |     0 |        0 |        0 |        0 |       0 |       0
  1   |     0 |        0 |        0 |        0 |       0 |       0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total |     0 |        1 |        0 |       11 |      32 |      10
 %:   |    0% |       0% |       0% |       0% |      2% |      0%
 
My move ordering is - the tables above do not reflect this order:

* Hash
* Capture with promotion
* Captures based on MVV/LVA
* Killer
* Good Pawn (passed pawn moves, pushes to rank 6 or 7)
* Checking moves
* Castle Moves
* Counter move (new, just added)
* History move

My thoughts:
  • I know when accounting for all depths, first move cuts is around 94% (pulled from my old style of reports).
  • Silent Count is moves with no sorting value.
  • Hash is 59% of the cuts, followed by captures, and promotions.
  • I "think" killers might be better than captures that promote. Is this an accurate conclusion? Maybe sort these better than capturing promotions?
  • As for the 2nd table, I can conclude history is better than the other sorted move types but with zeros for the other types, it make it seem like having them is invalid.
As for the last bullet, take the promotions and good pawns for example, sort the history above those and in end game positions, the search is slower and not as deep.

In the end, I was hoping these numbers would point out the value in creating a sort score for a specific move type. Is my train of thought on the right tracks or do I need to look at this differently?
I think depth 10 is too little to really observe this noisy effect of move ordering at the tips, as you develop your engine and use LMR, shallow pruning mechanisms, you will start to have much higher depths, and you will see more intensely this effect.
Also my engine is a checkers engine, not chess, sorry I really can't go into too much details about my move ordering (wich is quite good) because I'm competing against a very good spanish engine and an even stronger russian engine. It probably wouldn't help either since chess and checkers are too much different. Just a tip, don't be afraid of pruning like hell, as long as you have a good testing system to really be sure a change gives ELO increase, go ahead and prune like there is no tomorrow.
Cheney
Posts: 104
Joined: Thu Sep 27, 2012 2:24 am

Re: Testing Move Order Quality

Post by Cheney »

Ty Cardoso :)

Pruning is the next item on the chopping block. Before I went there, I wanted to make sure I had a sound process for move ordering. I also wanted to develop a way to flag moves as important (to not prune or reduce).

As for this sort ordering, I was able to use the data and shift move types around to look for a better move order. I was also able to use this data to measure if a not-so-frequent move type was better than using history moves. I was able to conclude that special pawn moves and moves that generate attacks were better off getting a higher sorting score than what that existed in the history move array. Attacking moves made more higher cuts than history moves.

I may want to dive deeper into this but feel that if I continue to add move types that are more specific than a history move, I will eventually hit a point where there is a speed loss. Besides, I hit the point where I am ready to study pruning.

I am going to post a new thread on Futility pruning because I, like many others, am having a hard time seeing improvements.