## Sorting algorithms

**Moderators:** hgm, Dann Corbit, Harvey Williamson

**Forum rules**

This textbox is used to restore diagrams posted with the [d] tag before the upgrade.

### Re: Measurement data

For the Shellsort version, the moves are sorted right after move generation, so that's before doing anything with the hash tables.

Selection sort is performed right before iterating over the node moves, so each move has a chance to get dealt with by the hash table lookup in the next depth level as that is the first thing that is tried.

The hybrid is a mixture - the first MVA/LVV move can be a hash hit, the other moves are sorted before entering the next depth level.

However, I don't think this makes a big difference because the main hash tables have only 2*4098 entries. They are more useful towards the endgame.

Selection sort is performed right before iterating over the node moves, so each move has a chance to get dealt with by the hash table lookup in the next depth level as that is the first thing that is tried.

The hybrid is a mixture - the first MVA/LVV move can be a hash hit, the other moves are sorted before entering the next depth level.

However, I don't think this makes a big difference because the main hash tables have only 2*4098 entries. They are more useful towards the endgame.

- hgm
**Posts:**25527**Joined:**Fri Mar 10, 2006 9:06 am**Location:**Amsterdam**Full name:**H G Muller-
**Contact:**

### Re: Measurement data

But that is extremely time-wasting no matter what method of sorting you use. So optimizing the sort method for this case is like finding the one-eyed king in the country of the blind.

To get anything remotely efficient, one searches the hash move first, then does move generation, and only then the problem which move to pick comes up. Most of the time the hash move cuts, so you don't even get to move generation, let alone sorting.

If you only sort after the hash move failed to cut, or you had no hash move, this will strongly affect the statistics of the first-move cut efficiency. Which could totally alter your conclusion. The first-move cut rate might now be too low to make it pay using a less efficient method for extracting the first move.

To get anything remotely efficient, one searches the hash move first, then does move generation, and only then the problem which move to pick comes up. Most of the time the hash move cuts, so you don't even get to move generation, let alone sorting.

If you only sort after the hash move failed to cut, or you had no hash move, this will strongly affect the statistics of the first-move cut efficiency. Which could totally alter your conclusion. The first-move cut rate might now be too low to make it pay using a less efficient method for extracting the first move.

### Re: Measurement data

Well there can be also other spots to optimise, but the point was to compare the sorting strategies, all else being equal (even if equally inefficient). Still, this gets interesting.hgm wrote:But that is extremely time-wasting no matter what method of sorting you use. So optimizing the sort method for this case is like finding the one-eyed king in the country of the blind.

OK, first (referring to the entry of the Negascout), there is the position lookup that returns value and possibly the hash move (if stored). In this case, no moves are generated.To get anything remotely efficient, one searches the hash move first, then does move generation

There can also be a hash match that isn't deep enough, but still with a stored hash move. In this case, the iteration doesn't return, but grabs the hash move.

When generating the moves, this move gets a maximum MVV/LVA to make sure it gets spilled to the top of the move list. IID is not used if there is a hash move.

If I understand you correctly, then it would be a better idea not to generate the moves if there is a hash move? Instead, trying this move, and only if that doesn't give a cut-off, then generating the full move list? The hash move could then be dropped from the list because it has already been tried in this case. Like, doing selection sort one time and starting with the second move of the (still unsorted) list.

That would mean not just deferring the Shellsort to the i==1 case if there is a hash move, but to defer the whole move generation to i==1?

Isn't it the other way round? The Selection sort might now be overly efficient because it is "misused" to cover hash cuts, which could be done without sorting?Which could totally alter your conclusion. The first-move cut rate might now be too low to make it pay using a less efficient method for extracting the first move.

On the other hand, I'm not sure how much I could gain. First, the hash tables are small. Second, the main point of the hash move still is there, to get a fast tree cutoff, even if the node is handled less efficiently than possible. What do you think?

### Re: Measurement data

Some more data here. I counted the nodes right before the loop that iterates over the moves (i.e. full hash table hits are not counted). The other conditions were the same as previously. The data confirm that a hash best move is very likely to yield a cut.

Small hash tables, 8k elements (12 bytes per element = 96 kB):

=> Probability that a hash best move cuts: 93%

=> Probability that there is a hash best move: 3.2%

=> inefficient nodes: 3%

Conclusion: at least in this phase of the game, deferring the move generation to after trying the hash move will only yield a minor speed-up. Mainly because the hash tables are small. So what if they were bigger?

Big hash tables, 537k elements (6.5 MB):

=> Probability that a hash best move cuts: 96%

=> Probability that there is a hash best move: 15%

=> inefficient nodes: 14%

Conclusion: although 6.5 MB is still very small for PC standards, the percentage of inefficient nodes rises. A factor of 64 in hash table size multiplies the inefficient nodes by 5 because there are more table hits. That will be some kind of S-shaped curve, so pumping up the hash tables to around 400 Mb might end up at 30% inefficient nodes.

So, yes, for PC engines, that should be a noticable speedup if the move generation is only done after trying the hash best move.

The other interesting thing is that with the big hash tables, the iterating nodes went down by 22%, thanks to full hash table hits.

Small hash tables, 8k elements (12 bytes per element = 96 kB):

Code: Select all

```
Iterating nodes: 28397484
Nodes with hashbest: 914099
Nodes not cut: 11394595
Nodes not cut with hashbest: 62054
```

=> Probability that there is a hash best move: 3.2%

=> inefficient nodes: 3%

Conclusion: at least in this phase of the game, deferring the move generation to after trying the hash move will only yield a minor speed-up. Mainly because the hash tables are small. So what if they were bigger?

Big hash tables, 537k elements (6.5 MB):

Code: Select all

```
Iterating nodes: 22050997
Nodes with hashbest: 3226857
Nodes not cut: 8459761
Nodes not cut with hashbest: 143279
```

=> Probability that there is a hash best move: 15%

=> inefficient nodes: 14%

Conclusion: although 6.5 MB is still very small for PC standards, the percentage of inefficient nodes rises. A factor of 64 in hash table size multiplies the inefficient nodes by 5 because there are more table hits. That will be some kind of S-shaped curve, so pumping up the hash tables to around 400 Mb might end up at 30% inefficient nodes.

So, yes, for PC engines, that should be a noticable speedup if the move generation is only done after trying the hash best move.

The other interesting thing is that with the big hash tables, the iterating nodes went down by 22%, thanks to full hash table hits.

### Re: Measurement data

Yes.Ras wrote: If I understand you correctly, then it would be a better idea not to generate the moves if there is a hash move? Instead, trying this move, and only if that doesn't give a cut-off, then generating the full move list? The hash move could then be dropped from the list because it has already been tried in this case. Like, doing selection sort one time and starting with the second move of the (still unsorted) list.

That would mean not just deferring the Shellsort to the i==1 case if there is a hash move, but to defer the whole move generation to i==1?

This is an obvious gain because you end up not doing a (costly) movegen. There is no real drawback, but you need to drop the hash move from the remainder of the move list (or not - searching it again returns an immediate hit anyway) and you should make sure the first move is searched the same way independently of whether it comes from the hash table or not.

If you use staged move generation this is easy.

- hgm
**Posts:**25527**Joined:**Fri Mar 10, 2006 9:06 am**Location:**Amsterdam**Full name:**H G Muller-
**Contact:**

### Re: Measurement data

Often I don't bother removing the move, because, as you say, the duplicate should give an immediate hash hit (with equal score, so failing low compared to the first copy). It looks a bit strange during multi-PV analysis, though, to see the duplicat (with a one-move PV). For Rasmus' application, where the hash table is so small that there is a fair chance the entry will be overwritten before the duplicat comes up, it would probably be better to remove the duplicat, though.

This can also be done 'on demand', though; you don't need to make a special pass through the move list for it. You can just compare the next move you want to search with the hash move, and skip the former if equal.

This can also be done 'on demand', though; you don't need to make a special pass through the move list for it. You can just compare the next move you want to search with the hash move, and skip the former if equal.

Code: Select all

```
first = last = 0;
generated = FALSE;
duplicat = INVALID;
if(hashMove != INVALID) moveList[last++] = hashMove;
for(i=first; ; i++) { // loop over moves
if(i >= last) {
if(generated) break; // all moves done
generated = TRUE;
GenerateMoves(&last);
duplicat = hashMove;
}
move = moveList[i];
if(move == duplicat) continue;
MakeMove(move);
...
}
```

### Re: Measurement data

These promotion moves are annoying. You add them high in the list and after captures you collect quiet moves but then you forget that there are non capturing promotions so you add them twice. But perft will tell.

### Re: Measurement data

There should be no chance to overwrite the hash move if the sorting algorithm would sort it to the top. In a way this seems odd, because you only get there if the move wasn't good enough.hgm wrote:For Rasmus' application, where the hash table is so small that there is a fair chance the entry will be overwritten before the duplicat comes up, it would probably be better to remove the duplicat, though.

In my search function I treat the first move a little differently:

Code: Select all

```
...
move = first_move();
play(move);
score = -search(-beta, -alpha, depth-1, ply+1);
back(move);
alpha = min(alpha, score);
while (alpha<beta && move=next_move()) {
r = get_reduction();
play(move);
score = -search(-(alpha+1), -alpha, depth-1-r, ply+1);
back(move);
alpha = min(alpha, score);
}
```

### Re: Measurement data

Which is a bit of a problem as of now because there's also IID and PV move preference in that code part. The PV move might be a hash move or not, and in this part, the moves might be generated or not, depending on whether the Negascout is called with a generated move list or not. Which is is at root or with evasions. There's special handling anyway all along.Evert wrote:If you use staged move generation this is easy.

I've been thinking a bit about the necessary changes, and it would clearly be a major headache to implement. All of that for 3% of the nodes affected so that I can't expect a major speedup. Of course, towards the endgame, and especially in king/pawn endgames, the hash tables become more important. Then again, the move generation is cheaper because there are fewer pieces.

There's even one nice safeguard feature in doing it this way: it prevents negative effects when there is a true hash collision. Since the hash move is matched against the generated move list, this ensures that a possible illegal hash move can't affect the system. That's pretty much the same design as for the opening book.

Bob Hyatt's well-known hash collision paper says that 48 bit are enough. Storing the upper 32 bits plus the implicit index bits plus fiddling in 6 additional bits into the hash flag byte (hash flag needs only two bits for alpha/beta/exact) gets me to 48 bits, but still.

On the other hand, my profiling shows that even without much hash table influence, I get 60% cuts on the first move thanks to MVV/LVA, so using Selection sort for the first move wins me 4.4% total performance without much headache.

Sorting used to be my 2nd most time consuming function after static eval, that has changed now. That would be an even greater win for NG-Play, which is using the slow C library quicksort. That costs about 70% more sorting time compared to my hacky Shellsort implementation.

### Re: Measurement data

Correction, JFTR: for the "big" hash table scenario, I got the order of magnitudes wrong. That would be 6.5 GB hash tables instead of the 6.5 MB I gave in the previous posting. I even wonder why the program ran given that I compiled with -m32.

So, changing the hash table size from 0xfff to 0xffffff (there are separated ones for Black and White), I get to 400MB total hash table size, not counting the pawn hash table. The conclusions remain pretty much the same, except that I don't think that increasing the size further will change much.

Big hash tables, 33.6M elements (400 MB):

=> Percentage node cuts with 1st move: 62%

=> Probability that a hash best move cuts: 96%

=> Probability that there is a hash best move: 15%

=> inefficient nodes: 14%

So for a PC engine, deferring the move generation to after trying a hash best move saves generating the moves for 14% of the nodes. No major gain, but still a little speedup.

So, changing the hash table size from 0xfff to 0xffffff (there are separated ones for Black and White), I get to 400MB total hash table size, not counting the pawn hash table. The conclusions remain pretty much the same, except that I don't think that increasing the size further will change much.

Big hash tables, 33.6M elements (400 MB):

Code: Select all

```
Iterating nodes: 22040689
Nodes with hashbest: 3224275
Nodes not cut: 8456858
Nodes not cut with hashbest: 143221
```

=> Probability that a hash best move cuts: 96%

=> Probability that there is a hash best move: 15%

=> inefficient nodes: 14%

So for a PC engine, deferring the move generation to after trying a hash best move saves generating the moves for 14% of the nodes. No major gain, but still a little speedup.