Hi all,
I've been trying to implement aging within my transposition table, but with no real profit. I used to clear my hashtable before every new search from a different position (i.e. go command in UCI), so I tried to leave the hash untouched between positions and to add a flag inside every TT entry representing whether the entry comes from the current search or from a previous one. It does not occupy any more space actually, since 6 bits from the bound parameter were unused.
The old replacement scheme was simple, preferring the lowest depth entry within a bucket of size 4. If the depth of the new search was >= than that of the entry to be replaced, then the overwrite occurs.
The new replacement scheme, now considering aging, always replaces the first entry of the bucket that comes from a previous search, and if no one can be found, the one with the lowest depth is replaced.
Now, the results are really disappointing, since I got more than 80 elo difference between the two schemes, obviously not in favor of aging. Also, since I collect the PV from the main hash table, I noticed that whether the next position differs from the expected PV of the last search, the new PV gets often truncated.
Should I look for a bug or is this a behavior that could happen in some engines?
TT aging
Moderators: hgm, Rebel, chrisw
-
- Posts: 27790
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: TT aging
This is very bad, as it would mean you don't always store the latest result. You should always store the latest result, even if it is a d=1 result overwriting a d=30 result. 'Recent' is far more important than 'deep'.crybotmark wrote:The old replacement scheme was simple, preferring the lowest depth entry within a bucket of size 4. If the depth of the new search was >= than that of the entry to be replaced, then the overwrite occurs.
That is very suspect. It suggests that hash entries made in the previous search damage the current one, rather than help it. Which again suggests that they contain wrong information rather than correct information. This could happen, for instance, when there is an error in updating the hash key when you make a move at game level, so that the position you probe and use the score of is an entirely different one than what you should probe.Now, the results are really disappointing, since I got more than 80 elo difference between the two schemes, obviously not in favor of aging.
Perhaps you should advance in small steps, rather than trying too many changes at once. E.g. rather than erasing the entire TT before every move, just erase part of it, not altering the replacement scheme. Keeping 50% of the TT for the next move should never make the engine play worse than running it with half the hash-table size. If the typical search tree would fit entirely in half the TT (which usually is the case, with current memory sizes), it should not even be worse than running with the same, or with infinite hash size.
With TT PVs can be truncated. Whether you clear them or not.Also, since I collect the PV from the main hash table, I noticed that whether the next position differs from the expected PV of the last search, the new PV gets often truncated.
Should I look for a bug or is this a behavior that could happen in some engines?
-
- Posts: 37
- Joined: Thu May 09, 2013 9:06 pm
Re: TT aging
I'm sorry, I may have badly described the situation: before implementing aging, the TT was totally cleared between moves, this meaning newer searches were always stored inside the hash table. What I was referring to is that new results of the same root position were replaced inside the TT only if their associated depth was large enough.hgm wrote:This is very bad, as it would mean you don't always store the latest result. You should always store the latest result, even if it is a d=1 result overwriting a d=30 result. 'Recent' is far more important than 'deep'.crybotmark wrote:The old replacement scheme was simple, preferring the lowest depth entry within a bucket of size 4. If the depth of the new search was >= than that of the entry to be replaced, then the overwrite occurs.
That is a good point, I need to ensure that this is the case. How would you proceed?hgm wrote:
That is very suspect. It suggests that hash entries made in the previous search damage the current one, rather than help it. Which again suggests that they contain wrong information rather than correct information. This could happen, for instance, when there is an error in updating the hash key when you make a move at game level, so that the position you probe and use the score of is an entirely different one than what you should probe.
-
- Posts: 27790
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: TT aging
This is what I understood, and it sounds very bad, because the 'if' suggest that you would not store them if their associated depth is too low. And you should always store the result of every search. Replacement is just about which entry you should overwrite. Not about whether you should overwrite something.crybotmark wrote:What I was referring to is that new results of the same root position were replaced inside the TT only if their associated depth was large enough.
If you are sure that entries left over from a previous search are helpful rather than toxic, you can try to keep track of the search, and rather than clearing the entire table before every search, only clear entries that were not accessed in the preceding search. This should be better than clearing half of the table randomly.That is a good point, I need to ensure that this is the case. How would you proceed?
I am not sure how you clear the hash table now, and how your replacement algortithm benefits from some entries being 'clear'. Resetting the depth to 0 should be sufficient to make sure the normal 'shallowest-of-four' replacement would prefer to overwrite the 'cleared' entries, and thus get rid of those quickly.
-
- Posts: 37
- Joined: Thu May 09, 2013 9:06 pm
Re: TT aging
Thank you hgm, I had to test Napoleon in order to give you more information to continue our discussion.
I tested different changes within my TT implementation (without aging):
'Clearing the table' is intended as just a memset() over the selected table memory area, which in turn sets each entry's depth to 0.
I tested different changes within my TT implementation (without aging):
- 1) Bucket size = 4, Always replace lowest depth entry, Clear TT between moves - ELO = 2263;
2) Bucket size = 4, Old replacement strategy (i.e. replace lowest depth entry only if current depth is large enough), clear TT between moves - ELO = 2269;
3) Bucket size = 1, Always replace lowest depth entry, clear TT between moves - ELO = 2278;
4) Bucket size = 1, Old replacement strategy (same as 2), clear TT between moves - ELO = 2265;
5) Bucket size = 1, Always replace lowest depth entry, Clear first half of TT between moves - ELO = 2276;
6) Bucket size = 1, Always replace lowest depth entry, Do not clear TT at all - ELO = ? (CURRENTLY TESTING);
'Clearing the table' is intended as just a memset() over the selected table memory area, which in turn sets each entry's depth to 0.
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: TT aging
How are you getting those Elo measurements? What kind of error bar do they have?crybotmark wrote:Thank you hgm, I had to test Napoleon in order to give you more information to continue our discussion.
I tested different changes within my TT implementation (without aging):
- 1) Bucket size = 4, Always replace lowest depth entry, Clear TT between moves - ELO = 2263;
2) Bucket size = 4, Old replacement strategy (i.e. replace lowest depth entry only if current depth is large enough), clear TT between moves - ELO = 2269;
3) Bucket size = 1, Always replace lowest depth entry, clear TT between moves - ELO = 2278;
4) Bucket size = 1, Old replacement strategy (same as 2), clear TT between moves - ELO = 2265;
5) Bucket size = 1, Always replace lowest depth entry, Clear first half of TT between moves - ELO = 2276;
6) Bucket size = 1, Always replace lowest depth entry, Do not clear TT at all - ELO = ? (CURRENTLY TESTING);
No, that doesn't make a lot of sense.I forgot to mention that the hash key I store inside the TT, is the full 64 bit Zobrist key. This is why (I think), 4 buckets were hardly needed, as collisions are rare.
-
- Posts: 37
- Joined: Thu May 09, 2013 9:06 pm
Re: TT aging
I use cutechess-cli with a gauntlet of other 6 engines (3 stronger, 3 weaker). 2400 games at 15"+0.3" seem to converge to a stable result (even after 5000 games the elo doesn't change more than 1-2 elo). But still, i'm only interested in stating that A is not weaker than B, and viceverse.AlvaroBegue wrote: How are you getting those Elo measurements? What kind of error bar do they have?
The elo is computed with ordo.
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: TT aging
I would expect each Elo estimate to have an error bar of around 5, so the difference between two estimates would have an error bar of around 7 (=5*sqrt(2)).crybotmark wrote:I use cutechess-cli with a gauntlet of other 6 engines (3 stronger, 3 weaker). 2400 games at 15"+0.3" seem to converge to a stable result (even after 5000 games the elo doesn't change more than 1-2 elo).AlvaroBegue wrote: How are you getting those Elo measurements? What kind of error bar do they have?
So keep in mind that you don't have a whole lot of statistical significance here.
If you want to state that A is not weaker than B, play a match between A and B directly, ignore the draws and then use the binomial distribution to get a p-value for the result under the hypothesis that A and B are equally strong.But still, i'm only interested in stating that A is not weaker than B, and viceverse.
The elo is computed with ordo.
-
- Posts: 37
- Joined: Thu May 09, 2013 9:06 pm
Re: TT aging
Wouldn't playing self games produce bad results if I'm trying to generalize playing strength? A may be better than B, but worse than C. playing strength is not transitive.
But i get your point. I do not have strong statistical evidence, yet I can't run much longer tests. I only have 2 machines with 4 cores each and 2400 games with this TC take about 4 hours.
But i get your point. I do not have strong statistical evidence, yet I can't run much longer tests. I only have 2 machines with 4 cores each and 2400 games with this TC take about 4 hours.
-
- Posts: 931
- Joined: Tue Mar 09, 2010 3:46 pm
- Location: New York
- Full name: Álvaro Begué (RuyDos)
Re: TT aging
This is an old debate. The main distortion introduced by matching very similar versions of an engine is that the differences are magnified. So you might measure a change as a +10 Elo improvement, and when playing against a varied mix of opponents it might be only +5 Elo (completely made up numbers, for illustration). But if you just want to know which version is better, this distortion actually helps.crybotmark wrote:Wouldn't playing self games produce bad results if I'm trying to generalize playing strength? A may be better than B, but worse than C. playing strength is not transitive.
But i get your point. I do not have strong statistical evidence, yet I can't run much longer tests. I only have 2 machines with 4 cores each and 2400 games with this TC take about 4 hours.
So what I do is match only with the previous version to determine if I accept a change, and then every so often I play matches against a few reference opponents to measure the magnitude of the improvement over the months.