Lazy Sort Issue

Chessnut1071 · Post by **Chessnut1071** » Wed Mar 09, 2022 6:10 am

after reading a recent topic on lazy sorts, I tried to substitute a lazy sort for my bitboard evaluation which uses a basic quicksort. I found absolutely no improvement in time or reductions in engine calls for my tests up to 10 ply. In fact, 40% of the games evaluated even slower times to solution. I'm using minimum moves to checkmate with known one-solution puzzles. Anybody else have any success with lazy sorts, because it doesn't seem to do much for me.

hgm · Post by **hgm** » Wed Mar 09, 2022 8:56 am

Whether lazy sort is helpful depends on the reliability of the criterion you sort by. Lazy sort is not a very efficient method of sorting in itself; it is in fact one of the slowest. So if you would assign random sort keys to the moves, which in no way correlate with the probability the move is any good, it would definitely lose compared to quick-sort.

So perhaps the explanation for your observation is that you sort the moves in a way that is not helpful. Your test involves a task that is different from playing games. And the conventional move ordering (i.e. captures first, ordered by decreasing victim value) was developed for good performance in game play, and thus might not be a good move order for mate puzzles at all. E.g. proximity to the King might be a better criterion to order the moves than the conventional piece values.

JohnWoe · Post by **JohnWoe** » Wed Mar 09, 2022 10:36 am

First of all LazySort and QuickSort/Whatever should return the same bench. Otherwise we are comparing apples to orangutans.
Because you are not searching the same tree.

In C++ std::sort is swapping same score nodes. Only std::stable_sort returns the same bench.

Lazy in LazySort means avoid sorting everything if some early termination is possible.
It should result a speedup. If not then compiler/interpreter are not very good.

LazySort() is good because it actually removes complexity of your program.
Here is my LazySort() just 3 lines. Pretty hard to optimize more

Code: Select all

// Only sort when necessary (See: lazy-sorting-algorithm paper)
// Sort only node-by-node (Avoid costly n! / tons of operations)
inline void LazySort(const int ply, const int nth, const int total_moves) {
  for (auto i = nth + 1; i < total_moves; ++i)
    if (g_boards[ply][i].score > g_boards[ply][nth].score)
      std::swap(g_boards[ply][nth], g_boards[ply][i]);
}

Chessnut1071 · Post by **Chessnut1071** » Wed Mar 09, 2022 3:07 pm

hgm wrote: ↑Wed Mar 09, 2022 8:56 am Whether lazy sort is helpful depends on the reliability of the criterion you sort by. Lazy sort is not a very efficient method of sorting in itself; it is in fact one of the slowest. So if you would assign random sort keys to the moves, which in no way correlate with the probability the move is any good, it would definitely lose compared to quick-sort.

So perhaps the explanation for your observation is that you sort the moves in a way that is not helpful. Your test involves a task that is different from playing games. And the conventional move ordering (i.e. captures first, ordered by decreasing victim value) was developed for good performance in game play, and thus might not be a good move order for mate puzzles at all. E.g. proximity to the King might be a better criterion to order the moves than the conventional piece values.

I think my evaluation sort is extremely effective. I'm optimized on 10-ply puzzles. Evaluation metrics below:

1) alpha/beta + Zobirst hash: engine calls: 232,850,313 time 29:59:9723262 {mm:ss:ddddd}
2) above + evaluation: engine calls: 2,346,945 time: 18:2369834 {ss:ddddd]

reductiom: engine calls 0.0100792; time 0.0101318 or about 99%

My optimized evaluation parameters differs significantly from ELO:
white/black
1) double check: 192; 256
2) single check: 162; 170
3) discovered check 80; 168
4) history + : 90; 80
5) history - : 22; 80
6) capture B,R,Q: 73; 160
7) capture P, N: 70; 80
8) mobility (B,R,Q): 27; 0
9) ent passant.: 80, 0
10) pawn promotion: 80,0

Based on the above metrics I think my sort is helpful; however, my lazy sort may not be efficient. My lazy sort must first search N moves to find the maximum evaluation or an average of N/2 on the 1st pass. Each pass reduces the average search set by (N-1)/2. Each maximum must be marked so it doesn't pop up again. After the additional overhead of marking elements used and if statements on the remaining move set, I think the time saved in the initial sort is lost in the if statements and remaining search. It looks to me that quicksort is better on almost 1/2 the games in my test bank. I'm wondering if optimized ELO is more compatible, although that's hard to believe.

hgm · Post by **hgm** » Wed Mar 09, 2022 4:35 pm

I think you are too easily impressed. You save a factor 100 in node count. But in 10 ply that is only a reduction of a factor 1.58 in effective branching ratio. (1.58 to the power 10 = 100). Perhaps the EBF could have been reduced by a factor 2 instead of 1.58. Then you would have had a speedup of 1024 instead of a mere 100. Chess positions typically have a lot of moves, so with poor move ordering the EBF can be really high. Which means you can reduce it by a lot through good move ordering. Perhaps you could even reduce it by a factor 4. That would give you a speedup of a million.

Moral lesson: everything looks large when you raise it to the power 10.

pedrojdm2021 · Post by **pedrojdm2021** » Wed Mar 09, 2022 4:52 pm

Lazy sort is helpful depending on how you score your moves.

In my engine i score from most positive, to most negative (using SEE scores in captures for example), and it works very good

dangi12012 · Post by **dangi12012** » Wed Mar 09, 2022 5:15 pm

Its best to think outside of common wisdom.
If you have a list to sort - think why there is a list in the first place.

Maybe the order of the move is determined by if its a taking move or not and then you dont need to sort at all since the movegen can return the taking moves first. Maybe your movegen does not need to return a list at all - but a pointer into all the possible moves that can be done from any position (Just around 2^16) which could presorted.

And finally a heap (not list + sort) splits the cost of maintaining order between the pusher and the picker.
Both operations are O(log n), instead of O(n log n)

What I am saying is:
Fastest code is no code - so try that first. (Either no list - or no sort needed)
If thats not possible - You can use your domain knowledge to replace std::sort with an appropriate sorter.

Finally you can overload std::swap so you dont copy members around that surely are the same among the moves of the same ply. (color, parent etc.)
https://stackoverflow.com/questions/115 ... ad-stdswap

If you really want to go down a rabbit hole and you are 100% sure you need the best sorting this was also solved with sorting networks. You even get appropriate template overloads that build a perfect template class with the Bose-Nelson algorithm:
https://stackoverflow.com/questions/197 ... r-networks

But this only works for fixed size problems. (8,16,32 list sizes)
I can verify this order of list performance:

Code: Select all

Direct call to qsort library function   : 342.26
Naive implementation (insertion sort)   : 136.76
Insertion Sort (Daniel Stutzbach)       : 101.37
Insertion Sort Unrolled                 : 110.27
Rank Order                              : 90.88
Rank Order with registers               : 90.29
Sorting Networks (Daniel Stutzbach)     : 93.66
Sorting Networks (Paul R)               : 31.54
Sorting Networks 12 with Fast Swap      : 32.06
Sorting Networks 12 reordered Swap      : 29.74
Reordered Sorting Network w/ fast swap  : 25.28
Templated Sorting Network (this class)  : 25.01

hgm · Post by **hgm** » Wed Mar 09, 2022 6:57 pm

Agreed. Usually the captures and non-captures are added to the move list by different pieces of code. Because the sort key has to be calculated in a different way. So you might as well store those in separate lists, so that the sort doesn't have to separate them. In 'The Mailbox Trials' I generated captures in MVV / LVA order. So no reason to sort those.

If you want to sort non-captures by history, you can divide the history range into narrow bins, and maintain a separate list for each bin. If the bins are narrow enough there will hadly ever be more than a single move per bin, and if there are their history score will be so similar that the order doesn't matter. So no reason to sort within the bins. Just run through the bins top-down to search the moves they contain, without any additional sort. E.g. for storing a move into bin n of 64 you can do:

Code: Select all

uint64_t used = 0; // at start of non-capture generation

// for each generated move
int u = used >> n & 1;   // test whether bin empty
int h = (ptr[n] + 1)*u; // initialize or increment depending on u
ptr[n] = h;
bin[n][h] = MOVE; // store the move
used |= 1 << n; // mark as non-empty

For searching just extract the next bit in 'used', and run through the moves in the corresponding bin.

Chessnut1071 · Post by **Chessnut1071** » Wed Mar 09, 2022 8:21 pm

dangi12012 wrote: ↑Wed Mar 09, 2022 5:15 pm Its best to think outside of common wisdom.
If you have a list to sort - think why there is a list in the first place.

Maybe the order of the move is determined by if its a taking move or not and then you dont need to sort at all since the movegen can return the taking moves first. Maybe your movegen does not need to return a list at all - but a pointer into all the possible moves that can be done from any position (Just around 2^16) which could presorted.

And finally a heap (not list + sort) splits the cost of maintaining order between the pusher and the picker.
Both operations are O(log n), instead of O(n log n)

What I am saying is:
Fastest code is no code - so try that first. (Either no list - or no sort needed)
If thats not possible - You can use your domain knowledge to replace std::sort with an appropriate sorter.

Finally you can overload std::swap so you dont copy members around that surely are the same among the moves of the same ply. (color, parent etc.)
https://stackoverflow.com/questions/115 ... ad-stdswap

If you really want to go down a rabbit hole and you are 100% sure you need the best sorting this was also solved with sorting networks. You even get appropriate template overloads that build a perfect template class with the Bose-Nelson algorithm:
https://stackoverflow.com/questions/197 ... r-networks

But this only works for fixed size problems. (8,16,32 list sizes)
I can verify this order of list performance:
Code: Select all
Direct call to qsort library function   : 342.26
Naive implementation (insertion sort)   : 136.76
Insertion Sort (Daniel Stutzbach)       : 101.37
Insertion Sort Unrolled                 : 110.27
Rank Order                              : 90.88
Rank Order with registers               : 90.29
Sorting Networks (Daniel Stutzbach)     : 93.66
Sorting Networks (Paul R)               : 31.54
Sorting Networks 12 with Fast Swap      : 32.06
Sorting Networks 12 reordered Swap      : 29.74
Reordered Sorting Network w/ fast swap  : 25.28
Templated Sorting Network (this class)  : 25.01 

I totally agree, I need to do this without direct sorting if it can be done. As far as Muller's point, I'm optimized at 1.58 EBF based on the 10 parameters listed above. In mating solutions, check moves & history have higher priority than capture moves. So far, I have not found a way to order the list without a direct sort that works all the time. Some problems are specifically set up to stifle computer algorithms and the top authors like to pick the most unsuspecting moves, which is directly opposite of what ELO optimization tries to do. Checking out your stackoverflow suggestion, that's probably a better solution than my lazy sort.

dangi12012 · Post by **dangi12012** » Wed Mar 09, 2022 8:33 pm

Chessnut1071 wrote: ↑Wed Mar 09, 2022 8:21 pm I totally agree, I need to do this without direct sorting if it can be done. As far as Muller's point, I'm optimized at 1.58 EBF based on the 10 parameters listed above. In mating solutions, check moves & history have higher priority than capture moves. So far, I have not found a way to order the list without a direct sort that works all the time. Some problems are specifically set up to stifle computer algorithms and the top authors like to pick the most unsuspecting moves, which is directly opposite of what ELO optimization tries to do. Checking out your stackoverflow suggestion, that's probably a better solution than my lazy sort.

Having the movelist be prepared for all possible moves of a square - is an idea that brings huge advantages. When your movegenerator just spits out a config similar to the hypercube sliding piece lookup (so literally a 16 bit number that points to a list of N moves) you can have a sorting network prepared that sorts all the moves of these squares faster than any algo used by anyone here on this board garuanteed.

People here like to create 16 bit structs and put them in a list one by one - and then sort that list. Very slow.

Lazy Sort Issue

Lazy Sort Issue

Re: Lazy Sort Issue

Re: Lazy Sort Issue

Re: Lazy Sort Issue

Re: Lazy Sort Issue

Re: Lazy Sort Issue

Re: Lazy Sort Issue

Re: Lazy Sort Issue

Re: Lazy Sort Issue

Re: Lazy Sort Issue