Hybride replacemment strategy worse than always-replace

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mathmoi
Posts: 289
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Hybride replacemment strategy worse than always-replace

Post by mathmoi »

Hi,

My chess engine m8 is still in early developpement and I'm trying to improve move ordering to reduce the branching factor. One thing I tried was to implement a new TT replacement strategy. My replacement strategy was depth-prefered. When the TT becomes full deep entries "clog" the tables and the shallow parts of the search can't store entries anymore (that's my theory).

I tried a couple of new strategies :
  • Always replace
  • Two entries per bucket one depth-prefered, one always replace
  • Four entries per bucket always-replaced the shallowest
  • A couple of others strategies.
I tested theses strategies by having them play 500 games matchs against 4 engines (2000 games per strategy) at 60+1 time control.

The always replace strategy is the one that performed the best, by a large margin. However I find this result hard to explain. It seems to me that the Hybrid or the always-replace-4-bucket should give a better result, since the "always" too, but choose the entries to replace base on depth instead of juste replacing the single entry.

Someone understand how this is possible? I'm thinking maybe there is a bug in my implementation, but I can't find one (does not mean there is not).

Thanks.
User avatar
hgm
Posts: 27837
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Hybride replacemment strategy worse than always-replace

Post by hgm »

How many entries were there in your hash table, and what is the nps of your engine?

Unless you had a hash table very much smaller than 1/10 of the typical number of nodes in a search, your results would be purely noise, no matter how convincing or statistically significant it might seem.

Playing games is a very inefficient method to measure this. Much better is to just take a couple of hundred positions from games, and measure average time to depth. (Again, with a very small hash table.)

It is well known that it is very important to always put a new search result in the hash table. If you have no choice but to overwrite a deep entry (as might be the case in a simple always-replace scheme), you should still do it. Replacing the shallowest of two should perform better, though. Because it at least gives you some leeway in avoiding overwriting of the valuable deep entries.
Ciekce
Posts: 127
Joined: Sun Oct 30, 2022 5:26 pm
Full name: Conor Anstey

Re: Hybride replacemment strategy worse than always-replace

Post by Ciekce »

hgm wrote: Wed Apr 24, 2024 8:13 pm Playing games is a very inefficient method to measure this. Much better is to just take a couple of hundred positions from games, and measure average time to depth. (Again, with a very small hash table.)
OP, ignore this entirely, it's just flat out wrong
Viz
Posts: 64
Joined: Tue Apr 09, 2024 6:24 am
Full name: Michael Chaly

Re: Hybride replacemment strategy worse than always-replace

Post by Viz »

Also - play them against each other, and not for 500 games, rather use 5000 games but at shorter time controls.
Mike Sherwin
Posts: 869
Joined: Fri Aug 21, 2020 1:25 am
Location: Planet Earth, Sol system
Full name: Michael J Sherwin

Re: Hybride replacemment strategy worse than always-replace

Post by Mike Sherwin »

I think that the 50 move counter could help. If the 50 move counter is > some small number then the hash table is likely filled with mostly useless junk and one should just always replace. If the 50 move counter is very low then most of the entries are still valid and one should use depth replace.
mathmoi
Posts: 289
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Re: Hybride replacemment strategy worse than always-replace

Post by mathmoi »

hgm wrote: Wed Apr 24, 2024 8:13 pm How many entries were there in your hash table, and what is the nps of your engine?

Unless you had a hash table very much smaller than 1/10 of the typical number of nodes in a search, your results would be purely noise, no matter how convincing or statistically significant it might seem.

Playing games is a very inefficient method to measure this. Much better is to just take a couple of hundred positions from games, and measure average time to depth. (Again, with a very small hash table.)

It is well known that it is very important to always put a new search result in the hash table. If you have no choice but to overwrite a deep entry (as might be the case in a simple always-replace scheme), you should still do it. Replacing the shallowest of two should perform better, though. Because it at least gives you some leeway in avoiding overwriting of the valuable deep entries.
Hummm, I see what you mean.

My nps, excluding qnodes because I don't store in the TT in qnodes is about 2.1mnps and my hash tables had 16 millions entries. At 60+1 time control, my tt where probably too large for this test. This being said, the elo difference was very large with a LOS >99%.

I also tested with my 'bench' function and the hybrid method was indeed faster.
mathmoi
Posts: 289
Joined: Mon Mar 13, 2006 5:23 pm
Location: Québec

Re: Hybride replacemment strategy worse than always-replace

Post by mathmoi »

Viz wrote: Wed Apr 24, 2024 8:25 pm Also - play them against each other, and not for 500 games, rather use 5000 games but at shorter time controls.
Each version played 500 games against four opponents for a total of 2000 games. I'll try to put them against each other, but I though it was better to play them agains a couple of opponents.
pgg106
Posts: 25
Joined: Wed Mar 09, 2022 3:40 pm
Full name: . .

Re: Hybride replacemment strategy worse than always-replace

Post by pgg106 »

You want to look up what an "SPRT" is, it's the testing SF and basically all the top 100 engines use, far better than picking an arbitrary number of games and letting the engines play. Do not listen to anyone who says otherwise.
To anyone reading this post in the future, don't ask for help on talkchess, it's a dead site where you'll only get led astray, the few people talking sense here come from the Stockfish discord server, just join it and actual devs will help you.
User avatar
hgm
Posts: 27837
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Hybride replacemment strategy worse than always-replace

Post by hgm »

mathmoi wrote: Thu Apr 25, 2024 2:44 pmMy nps, excluding qnodes because I don't store in the TT in qnodes is about 2.1mnps and my hash tables had 16 millions entries. At 60+1 time control, my tt where probably too large for this test. This being said, the elo difference was very large with a LOS >99%.
That should give you on average some 4M nodes per search. But that count includes many duplicates. My experience is that the time-to-depth does not really increase until you make the TT smaller than 1/10 of the number of nodes. That suggests the number of different positions in the search tree is only 400k. That is such a small fraction of the 16M-entry TT that you would have virtually no replacement at all.

I have no explanation for the gauntlet result you see, but it seems very unlikely that a difference in replacement scheme can be the cause.

To get a better impression of what is going on you could include some diagnostics code in your engine, which counts the number of hash hits of exact depth, insufficient depth and extra depth, and the number of replacements (as opposed to storing in an unused entry) of equal, lower or higher-depth entries, and then search a typical middle-game position.
User avatar
hgm
Posts: 27837
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Hybride replacemment strategy worse than always-replace

Post by hgm »

pgg106 wrote: Thu Apr 25, 2024 4:13 pm You want to look up what an "SPRT" is, it's the testing SF and basically all the top 100 engines use, far better than picking an arbitrary number of games and letting the engines play. Do not listen to anyone who says otherwise.
Of course it is only better in the sense that on average you would need fewer games games; if a pre-determined number of games eventually produces a LOS > 99% you would have reached the same conclusion with an SPRT on setting the confidence level to, say, 95%. It would just have stopped somewhat earlier.

A disadvantage of SPRT is that it only tests a single feature at the time, and is not easy to adapt to testing multiple changes simultaneously. That can make it unnecessarily slow.