Hi,
I was wondering how to test move order quality, not just how often a first move fails high, but doing this at various depths and also what type of move fails high (types of moves being hash/TT, capture, killer, etc.).
I did find a few references to this subject on talkchess and the computer chess club archives and was able to model some statistics, but have a question or two.
When reviewing the type of moves:
* Hash/TT move never fails high at any depth.
* Captures fail high at their peak at depth=1 and these fail highs decrease as the depth gets closer to the root.
* Killers and "all other moves" follow the same pattern as captures but they have less than the captures.
When reviewing what move number failed high - first move to tenth move was measured - this is what I observed when reaching a depth of 8 on a random position:
Depths 6, 7, and 8 - no fail highs.
Depth 5 - first move failed high 44 times
Depth 4 - first move - 299 cuts, second and fifth moves - 1 cut.
Depth 3 - first move - 1138 cuts, second and seventh moves - 1 cut.
Depth 2 - first move - 5613 cuts, second move - 15, fourth move - 1 cut.
Depth 1 - first move, 19933 cuts, 2nd move - 68 cuts, 3rd move, 11 cuts, 4th move - 2 cuts
Total fail highs (for all moves, not just the first 10 per ply): 27139
Total first fail highs: 27033
Overall Fail High for First Move: 99%
The observation for these two reviews is the same: the closer to the leaf nodes, the more the cuts; the closer to the root, the less the cuts.
The observation is consistent across other positions, with exception to the fact that other positions will have a lower cut percentage for the first move (I've seen as low as 75%).
I guess my first question on this observation is if it normal to not see cuts within a ply or two from the root? It is my understanding the higher we can cut then less nodes will be searched and more time will be available for searching. However, I would believe this to be a valid point - there are just less moves searched at depth=6 when compared to, let's say, depth=2. Plus, after all the iterations, reaching a depth of 8, I'd think maybe there are less chances of a "too good of a move" to exist at depth 7 and possibly depth 6.
A second question would be why my hash/TT does not cut? I am a little surprised that it does not cut anywhere in the entire search. I do write the best move for the node to the TT, whether it is an exact, lower, or upper. Maybe I'll track this same stat but for the best move beating alpha, maybe that's a valid test?
Last question would be if the process I am following is good then should I expect more cuts across the first move (or even first 5 moves) if I do improve move ordering?
Thank you in advance
Testing Move Order Quality
Moderators: hgm, Rebel, chrisw
-
- Posts: 104
- Joined: Thu Sep 27, 2012 2:24 am
Re: Testing Move Order Quality
Sorry, correction about the Hash/TT... I needed to quadruple check my code
The TT/Hash is indeed failing high at many nodes, it was incorrectly accounted for. Please ignore my question about the hash/TT.
The TT/Hash is indeed failing high at many nodes, it was incorrectly accounted for. Please ignore my question about the hash/TT.
-
- Posts: 27817
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Testing Move Order Quality
Good to hear, because it would have been very fishy. People don't search the hash move first for no reason, and especially not because it is the worst possible thing to do!
My experience is also that it is very usefult to take the stats per depth, as otherwise everything will be dominated by the lowest depth (i.e. QS).
My experience is also that it is very usefult to take the stats per depth, as otherwise everything will be dominated by the lowest depth (i.e. QS).
-
- Posts: 104
- Joined: Thu Sep 27, 2012 2:24 am
Re: Testing Move Order Quality
Ty
I am collecting this data across all depths reached and do see the lower depths do dominate the fail high counts (mostly silent moves, some killer) while the higher depths are cut by the hash and some captures.
From looking at this data for a few different positions, I do believe my current move ordering is good. I have seen "first move" fail highs hit anywhere from 80% to 99% of all cuts. But, I have seen 95% to 97% of all cuts happen within the first four moves.
I am going to collect this data from the next round of games the engine plays, just to eyeball its "overall" performance for fail highs.
Also, I have one other study I'd like to do. When my engine generates "silent" moves, I order those moves in this order:
1. Promotions
2. Pawn pushes to ranks 6 or 7, Passed Pawns moving
3. If the move reveals check on the opponent.
4. Castle moves.
5. History moves.
What I plan to do is track these same fail high counts and percentages for each one of these types of silent moves. My question in this is: if I see something like castle moves cuts more than pawn moves then is it safe to say to move castle moves higher in the order?
I am collecting this data across all depths reached and do see the lower depths do dominate the fail high counts (mostly silent moves, some killer) while the higher depths are cut by the hash and some captures.
From looking at this data for a few different positions, I do believe my current move ordering is good. I have seen "first move" fail highs hit anywhere from 80% to 99% of all cuts. But, I have seen 95% to 97% of all cuts happen within the first four moves.
I am going to collect this data from the next round of games the engine plays, just to eyeball its "overall" performance for fail highs.
Also, I have one other study I'd like to do. When my engine generates "silent" moves, I order those moves in this order:
1. Promotions
2. Pawn pushes to ranks 6 or 7, Passed Pawns moving
3. If the move reveals check on the opponent.
4. Castle moves.
5. History moves.
What I plan to do is track these same fail high counts and percentages for each one of these types of silent moves. My question in this is: if I see something like castle moves cuts more than pawn moves then is it safe to say to move castle moves higher in the order?
-
- Posts: 362
- Joined: Thu Mar 16, 2006 7:39 pm
- Location: Portugal
- Full name: Alvaro Cardoso
Re: Testing Move Order Quality
Same here, I never took the stats per depth but have noticed long ago that if I gather move ordering from all depths then my move ordering isn't great, but if I restrict the gathering to above 5*PLY or 9*PLY then things look fantastic for move ordering.
-
- Posts: 104
- Joined: Thu Sep 27, 2012 2:24 am
Re: Testing Move Order Quality
Interesting... basically, are you saying to not record stats until I obtained a higher ply? I'm going to try this out.
-
- Posts: 362
- Joined: Thu Mar 16, 2006 7:39 pm
- Location: Portugal
- Full name: Alvaro Cardoso
Re: Testing Move Order Quality
Just take the PV produced by your engine, as you approach the the end of it, the moves will be less precise.
Also LMR causes lines to have different depths, and so killers near the tips are affected.
So near the tips things are a bit chaotic and random, gathering move ordering stats there is a bit noisy.
First check your move ordering stats at every node.
Then guard you move ordering gathering stats by:
Code: Select all
if (depth > 7*PLY) {
... code for move ordering stats
}
http://talkchess.com/forum3/viewtopic.p ... ng#p787987
-
- Posts: 104
- Joined: Thu Sep 27, 2012 2:24 am
Re: Testing Move Order Quality
Ty
I have a few set positions I run tests against and I'd say the average depth I reach for these is 10 (I only have null move pruning, no other pruning or reduction method at this time).
When using the if(depth > 7 * PLY), there are a lot of zeros in the report. So, I tried if depth > 3 * PLY. What I was performing before was if(PLY > 5). Now that I have looked at three type of reports, I can see what you mean by the tips will be noisy.
Now I need to understand how to read this and just what this data is telling me. My original assumption was if I saw a move type with a high volume of cut offs then I would move that move type higher in the ordered list. However, when I compared each of the searches from the before and after, I may not have reached the same depth or took more time searching. Based on that, I felt resorting that movetype was not best.
Here is the report based on if(depth > 3*PLY):
My move ordering is - the tables above do not reflect this order:
* Hash
* Capture with promotion
* Captures based on MVV/LVA
* Killer
* Good Pawn (passed pawn moves, pushes to rank 6 or 7)
* Checking moves
* Castle Moves
* Counter move (new, just added)
* History move
My thoughts:
In the end, I was hoping these numbers would point out the value in creating a sort score for a specific move type. Is my train of thought on the right tracks or do I need to look at this differently?
I have a few set positions I run tests against and I'd say the average depth I reach for these is 10 (I only have null move pruning, no other pruning or reduction method at this time).
When using the if(depth > 7 * PLY), there are a lot of zeros in the report. So, I tried if depth > 3 * PLY. What I was performing before was if(PLY > 5). Now that I have looked at three type of reports, I can see what you mean by the tips will be noisy.
Now I need to understand how to read this and just what this data is telling me. My original assumption was if I saw a move type with a high volume of cut offs then I would move that move type higher in the ordered list. However, when I compared each of the searches from the before and after, I may not have reached the same depth or took more time searching. Based on that, I felt resorting that movetype was not best.
Here is the report based on if(depth > 3*PLY):
Code: Select all
Search Nodes: 26819821
Quiesce Nodes: 16983979
Total Nodes: 43803800
(* Note * Nodes are total for all searches, recorded across all PLYs)
(* Note* The following data is generated based on depth > 3 * PLY)
Total cuts: 1360
First Move cuts: 1210
First Move %: 88%
TopN Measure to move number: 5
TopN Cuts: 1331
TopN %: 97%
Branching Factor Best: 1
Branching Factor Worst: 5
Branching Factor Average: 2
Depth | Hash Count | Capt. Count | CapPromo | Kilr Count | Silent Count | First Move
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13 | 0 | 0 | 0 | 4 | 0 | 0
12 | 6 | 4 | 0 | 5 | 0 | 9
11 | 26 | 22 | 0 | 4 | 0 | 51
10 | 14 | 25 | 0 | 4 | 0 | 34
9 | 11 | 12 | 3 | 4 | 0 | 22
8 | 68 | 46 | 10 | 20 | 0 | 129
7 | 102 | 127 | 10 | 21 | 1 | 236
6 | 12 | 124 | 10 | 25 | 0 | 161
5 | 262 | 0 | 0 | 6 | 0 | 262
4 | 306 | 5 | 0 | 4 | 3 | 306
3 | 0 | 0 | 0 | 0 | 0 | 0
2 | 0 | 0 | 0 | 0 | 0 | 0
1 | 0 | 0 | 0 | 0 | 0 | 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total | 807 | 365 | 33 | 97 | 4 | 1210
%: | 59% | 26% | 2% | 7% | 0% | 88%
Depth | Promo | GoodPawn | Castling | Checking | History | Counter Move
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13 | 0 | 0 | 0 | 0 | 0 | 1
12 | 0 | 0 | 0 | 0 | 1 | 0
11 | 0 | 0 | 0 | 1 | 5 | 0
10 | 0 | 1 | 0 | 0 | 4 | 0
9 | 0 | 0 | 0 | 0 | 0 | 0
8 | 0 | 0 | 0 | 6 | 7 | 3
7 | 0 | 0 | 0 | 0 | 7 | 4
6 | 0 | 0 | 0 | 2 | 0 | 2
5 | 0 | 0 | 0 | 2 | 2 | 0
4 | 0 | 0 | 0 | 0 | 6 | 0
3 | 0 | 0 | 0 | 0 | 0 | 0
2 | 0 | 0 | 0 | 0 | 0 | 0
1 | 0 | 0 | 0 | 0 | 0 | 0
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Total | 0 | 1 | 0 | 11 | 32 | 10
%: | 0% | 0% | 0% | 0% | 2% | 0%
* Hash
* Capture with promotion
* Captures based on MVV/LVA
* Killer
* Good Pawn (passed pawn moves, pushes to rank 6 or 7)
* Checking moves
* Castle Moves
* Counter move (new, just added)
* History move
My thoughts:
- I know when accounting for all depths, first move cuts is around 94% (pulled from my old style of reports).
- Silent Count is moves with no sorting value.
- Hash is 59% of the cuts, followed by captures, and promotions.
- I "think" killers might be better than captures that promote. Is this an accurate conclusion? Maybe sort these better than capturing promotions?
- As for the 2nd table, I can conclude history is better than the other sorted move types but with zeros for the other types, it make it seem like having them is invalid.
In the end, I was hoping these numbers would point out the value in creating a sort score for a specific move type. Is my train of thought on the right tracks or do I need to look at this differently?
-
- Posts: 362
- Joined: Thu Mar 16, 2006 7:39 pm
- Location: Portugal
- Full name: Alvaro Cardoso
Re: Testing Move Order Quality
I think depth 10 is too little to really observe this noisy effect of move ordering at the tips, as you develop your engine and use LMR, shallow pruning mechanisms, you will start to have much higher depths, and you will see more intensely this effect.Cheney wrote: ↑Thu Jul 02, 2020 12:08 pm Ty
I have a few set positions I run tests against and I'd say the average depth I reach for these is 10 (I only have null move pruning, no other pruning or reduction method at this time).
When using the if(depth > 7 * PLY), there are a lot of zeros in the report. So, I tried if depth > 3 * PLY. What I was performing before was if(PLY > 5). Now that I have looked at three type of reports, I can see what you mean by the tips will be noisy.
Now I need to understand how to read this and just what this data is telling me. My original assumption was if I saw a move type with a high volume of cut offs then I would move that move type higher in the ordered list. However, when I compared each of the searches from the before and after, I may not have reached the same depth or took more time searching. Based on that, I felt resorting that movetype was not best.
Here is the report based on if(depth > 3*PLY):
My move ordering is - the tables above do not reflect this order:Code: Select all
Search Nodes: 26819821 Quiesce Nodes: 16983979 Total Nodes: 43803800 (* Note * Nodes are total for all searches, recorded across all PLYs) (* Note* The following data is generated based on depth > 3 * PLY) Total cuts: 1360 First Move cuts: 1210 First Move %: 88% TopN Measure to move number: 5 TopN Cuts: 1331 TopN %: 97% Branching Factor Best: 1 Branching Factor Worst: 5 Branching Factor Average: 2 Depth | Hash Count | Capt. Count | CapPromo | Kilr Count | Silent Count | First Move ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 13 | 0 | 0 | 0 | 4 | 0 | 0 12 | 6 | 4 | 0 | 5 | 0 | 9 11 | 26 | 22 | 0 | 4 | 0 | 51 10 | 14 | 25 | 0 | 4 | 0 | 34 9 | 11 | 12 | 3 | 4 | 0 | 22 8 | 68 | 46 | 10 | 20 | 0 | 129 7 | 102 | 127 | 10 | 21 | 1 | 236 6 | 12 | 124 | 10 | 25 | 0 | 161 5 | 262 | 0 | 0 | 6 | 0 | 262 4 | 306 | 5 | 0 | 4 | 3 | 306 3 | 0 | 0 | 0 | 0 | 0 | 0 2 | 0 | 0 | 0 | 0 | 0 | 0 1 | 0 | 0 | 0 | 0 | 0 | 0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total | 807 | 365 | 33 | 97 | 4 | 1210 %: | 59% | 26% | 2% | 7% | 0% | 88% Depth | Promo | GoodPawn | Castling | Checking | History | Counter Move ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 13 | 0 | 0 | 0 | 0 | 0 | 1 12 | 0 | 0 | 0 | 0 | 1 | 0 11 | 0 | 0 | 0 | 1 | 5 | 0 10 | 0 | 1 | 0 | 0 | 4 | 0 9 | 0 | 0 | 0 | 0 | 0 | 0 8 | 0 | 0 | 0 | 6 | 7 | 3 7 | 0 | 0 | 0 | 0 | 7 | 4 6 | 0 | 0 | 0 | 2 | 0 | 2 5 | 0 | 0 | 0 | 2 | 2 | 0 4 | 0 | 0 | 0 | 0 | 6 | 0 3 | 0 | 0 | 0 | 0 | 0 | 0 2 | 0 | 0 | 0 | 0 | 0 | 0 1 | 0 | 0 | 0 | 0 | 0 | 0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total | 0 | 1 | 0 | 11 | 32 | 10 %: | 0% | 0% | 0% | 0% | 2% | 0%
* Hash
* Capture with promotion
* Captures based on MVV/LVA
* Killer
* Good Pawn (passed pawn moves, pushes to rank 6 or 7)
* Checking moves
* Castle Moves
* Counter move (new, just added)
* History move
My thoughts:
As for the last bullet, take the promotions and good pawns for example, sort the history above those and in end game positions, the search is slower and not as deep.
- I know when accounting for all depths, first move cuts is around 94% (pulled from my old style of reports).
- Silent Count is moves with no sorting value.
- Hash is 59% of the cuts, followed by captures, and promotions.
- I "think" killers might be better than captures that promote. Is this an accurate conclusion? Maybe sort these better than capturing promotions?
- As for the 2nd table, I can conclude history is better than the other sorted move types but with zeros for the other types, it make it seem like having them is invalid.
In the end, I was hoping these numbers would point out the value in creating a sort score for a specific move type. Is my train of thought on the right tracks or do I need to look at this differently?
Also my engine is a checkers engine, not chess, sorry I really can't go into too much details about my move ordering (wich is quite good) because I'm competing against a very good spanish engine and an even stronger russian engine. It probably wouldn't help either since chess and checkers are too much different. Just a tip, don't be afraid of pruning like hell, as long as you have a good testing system to really be sure a change gives ELO increase, go ahead and prune like there is no tomorrow.
-
- Posts: 104
- Joined: Thu Sep 27, 2012 2:24 am
Re: Testing Move Order Quality
Ty Cardoso
Pruning is the next item on the chopping block. Before I went there, I wanted to make sure I had a sound process for move ordering. I also wanted to develop a way to flag moves as important (to not prune or reduce).
As for this sort ordering, I was able to use the data and shift move types around to look for a better move order. I was also able to use this data to measure if a not-so-frequent move type was better than using history moves. I was able to conclude that special pawn moves and moves that generate attacks were better off getting a higher sorting score than what that existed in the history move array. Attacking moves made more higher cuts than history moves.
I may want to dive deeper into this but feel that if I continue to add move types that are more specific than a history move, I will eventually hit a point where there is a speed loss. Besides, I hit the point where I am ready to study pruning.
I am going to post a new thread on Futility pruning because I, like many others, am having a hard time seeing improvements.
Pruning is the next item on the chopping block. Before I went there, I wanted to make sure I had a sound process for move ordering. I also wanted to develop a way to flag moves as important (to not prune or reduce).
As for this sort ordering, I was able to use the data and shift move types around to look for a better move order. I was also able to use this data to measure if a not-so-frequent move type was better than using history moves. I was able to conclude that special pawn moves and moves that generate attacks were better off getting a higher sorting score than what that existed in the history move array. Attacking moves made more higher cuts than history moves.
I may want to dive deeper into this but feel that if I continue to add move types that are more specific than a history move, I will eventually hit a point where there is a speed loss. Besides, I hit the point where I am ready to study pruning.
I am going to post a new thread on Futility pruning because I, like many others, am having a hard time seeing improvements.