Hybride replacemment strategy worse than always-replace

mathmoi · Post by **mathmoi** » Wed Apr 24, 2024 7:27 pm

Hi,

My chess engine m8 is still in early developpement and I'm trying to improve move ordering to reduce the branching factor. One thing I tried was to implement a new TT replacement strategy. My replacement strategy was depth-prefered. When the TT becomes full deep entries "clog" the tables and the shallow parts of the search can't store entries anymore (that's my theory).

I tried a couple of new strategies :

Always replace
Two entries per bucket one depth-prefered, one always replace
Four entries per bucket always-replaced the shallowest
A couple of others strategies.

I tested theses strategies by having them play 500 games matchs against 4 engines (2000 games per strategy) at 60+1 time control.

The always replace strategy is the one that performed the best, by a large margin. However I find this result hard to explain. It seems to me that the Hybrid or the always-replace-4-bucket should give a better result, since the "always" too, but choose the entries to replace base on depth instead of juste replacing the single entry.

Someone understand how this is possible? I'm thinking maybe there is a bug in my implementation, but I can't find one (does not mean there is not).

Thanks.

hgm · Post by **hgm** » Wed Apr 24, 2024 8:13 pm

How many entries were there in your hash table, and what is the nps of your engine?

Unless you had a hash table very much smaller than 1/10 of the typical number of nodes in a search, your results would be purely noise, no matter how convincing or statistically significant it might seem.

Playing games is a very inefficient method to measure this. Much better is to just take a couple of hundred positions from games, and measure average time to depth. (Again, with a very small hash table.)

It is well known that it is very important to always put a new search result in the hash table. If you have no choice but to overwrite a deep entry (as might be the case in a simple always-replace scheme), you should still do it. Replacing the shallowest of two should perform better, though. Because it at least gives you some leeway in avoiding overwriting of the valuable deep entries.

Ciekce · Post by **Ciekce** » Wed Apr 24, 2024 8:20 pm

hgm wrote: ↑Wed Apr 24, 2024 8:13 pm Playing games is a very inefficient method to measure this. Much better is to just take a couple of hundred positions from games, and measure average time to depth. (Again, with a very small hash table.)

OP, ignore this entirely, it's just flat out wrong

Viz · Post by **Viz** » Wed Apr 24, 2024 8:25 pm

Also - play them against each other, and not for 500 games, rather use 5000 games but at shorter time controls.

Mike Sherwin · Post by **Mike Sherwin** » Wed Apr 24, 2024 9:06 pm

I think that the 50 move counter could help. If the 50 move counter is > some small number then the hash table is likely filled with mostly useless junk and one should just always replace. If the 50 move counter is very low then most of the entries are still valid and one should use depth replace.

mathmoi · Post by **mathmoi** » Thu Apr 25, 2024 2:44 pm

hgm wrote: ↑Wed Apr 24, 2024 8:13 pm How many entries were there in your hash table, and what is the nps of your engine?

Unless you had a hash table very much smaller than 1/10 of the typical number of nodes in a search, your results would be purely noise, no matter how convincing or statistically significant it might seem.

Playing games is a very inefficient method to measure this. Much better is to just take a couple of hundred positions from games, and measure average time to depth. (Again, with a very small hash table.)

It is well known that it is very important to always put a new search result in the hash table. If you have no choice but to overwrite a deep entry (as might be the case in a simple always-replace scheme), you should still do it. Replacing the shallowest of two should perform better, though. Because it at least gives you some leeway in avoiding overwriting of the valuable deep entries.

Hummm, I see what you mean.

My nps, excluding qnodes because I don't store in the TT in qnodes is about 2.1mnps and my hash tables had 16 millions entries. At 60+1 time control, my tt where probably too large for this test. This being said, the elo difference was very large with a LOS >99%.

I also tested with my 'bench' function and the hybrid method was indeed faster.

mathmoi · Post by **mathmoi** » Thu Apr 25, 2024 2:53 pm

Viz wrote: ↑Wed Apr 24, 2024 8:25 pm Also - play them against each other, and not for 500 games, rather use 5000 games but at shorter time controls.

Each version played 500 games against four opponents for a total of 2000 games. I'll try to put them against each other, but I though it was better to play them agains a couple of opponents.

pgg106 · Post by **pgg106** » Thu Apr 25, 2024 4:13 pm

You want to look up what an "SPRT" is, it's the testing SF and basically all the top 100 engines use, far better than picking an arbitrary number of games and letting the engines play. Do not listen to anyone who says otherwise.

hgm · Post by **hgm** » Thu Apr 25, 2024 4:14 pm

mathmoi wrote: ↑Thu Apr 25, 2024 2:44 pmMy nps, excluding qnodes because I don't store in the TT in qnodes is about 2.1mnps and my hash tables had 16 millions entries. At 60+1 time control, my tt where probably too large for this test. This being said, the elo difference was very large with a LOS >99%.

That should give you on average some 4M nodes per search. But that count includes many duplicates. My experience is that the time-to-depth does not really increase until you make the TT smaller than 1/10 of the number of nodes. That suggests the number of different positions in the search tree is only 400k. That is such a small fraction of the 16M-entry TT that you would have virtually no replacement at all.

I have no explanation for the gauntlet result you see, but it seems very unlikely that a difference in replacement scheme can be the cause.

To get a better impression of what is going on you could include some diagnostics code in your engine, which counts the number of hash hits of exact depth, insufficient depth and extra depth, and the number of replacements (as opposed to storing in an unused entry) of equal, lower or higher-depth entries, and then search a typical middle-game position.

hgm · Post by **hgm** » Thu Apr 25, 2024 4:25 pm

pgg106 wrote: ↑Thu Apr 25, 2024 4:13 pm You want to look up what an "SPRT" is, it's the testing SF and basically all the top 100 engines use, far better than picking an arbitrary number of games and letting the engines play. Do not listen to anyone who says otherwise.

Of course it is only better in the sense that on average you would need fewer games games; if a pre-determined number of games eventually produces a LOS > 99% you would have reached the same conclusion with an SPRT on setting the confidence level to, say, 95%. It would just have stopped somewhat earlier.

A disadvantage of SPRT is that it only tests a single feature at the time, and is not easy to adapt to testing multiple changes simultaneously. That can make it unnecessarily slow.

Hybride replacemment strategy worse than always-replace

Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace

Re: Hybride replacemment strategy worse than always-replace