Idea #9228: Efficiently clearing stale transposition table entries
Premises:
1) Leaving transposition table entries in the table indefinitely is wasteful and inefficient.
2) It takes time and space to implement an age field in each entry.
3) Blasting the entire table before or after each search eats a lot of time.
The idea: Pick a small integral power of two equal to 16. Prior to the Nth search, run a loop pointed at the (N mod 16) * (table_length / 16) part of the table and then blast the entries starting at that point for 1/16 of the total table length.
Some good stuff might be lost each time, but not too much.
Some stale stuff might hang around for a while, but not for too long.
Idea #9228: Clearing stale transtable entries
Moderators: hgm, Rebel, chrisw
-
- Posts: 892
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
Re: Idea #9228: Clearing stale transtable entries
1) Just 2 bits is enough for the "age" field.
2) If you want to blast a random entry it is better to do that at some given probability with current cluster at the time you normally writing to it. This method scales perfectly to any transposition table size, load factor and time control.
2) If you want to blast a random entry it is better to do that at some given probability with current cluster at the time you normally writing to it. This method scales perfectly to any transposition table size, load factor and time control.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Multi-threaded transposition table clearing
Multi-threaded transposition table clearing
Has anyone tried using multi-threaded transposition table clearing with a machine which has available idle cores? It seems to me that the memory write pathway would be nearly saturated with just a single core writing, but I haven't tried any experiments.
The above would be useful with perft() calculations where a program clears the whole table prior to running each position. With table sizes up to 24 GiB (on a 32 GiB RAM machine), clearing the table with only a single thread can eat a lot of time overall.
Has anyone tried using multi-threaded transposition table clearing with a machine which has available idle cores? It seems to me that the memory write pathway would be nearly saturated with just a single core writing, but I haven't tried any experiments.
The above would be useful with perft() calculations where a program clears the whole table prior to running each position. With table sizes up to 24 GiB (on a 32 GiB RAM machine), clearing the table with only a single thread can eat a lot of time overall.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Idea #9228: Clearing stale transtable entries
what is wrong with aging? You only need a couple of bits, and it is almost always available somewhere in the hash entry.sje wrote:Idea #9228: Efficiently clearing stale transposition table entries
Premises:
1) Leaving transposition table entries in the table indefinitely is wasteful and inefficient.
2) It takes time and space to implement an age field in each entry.
3) Blasting the entire table before or after each search eats a lot of time.
The idea: Pick a small integral power of two equal to 16. Prior to the Nth search, run a loop pointed at the (N mod 16) * (table_length / 16) part of the table and then blast the entries starting at that point for 1/16 of the total table length.
Some good stuff might be lost each time, but not too much.
Some stale stuff might hang around for a while, but not for too long.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
A couple of bits
A couple of bits is not much, but those bits still eat space and require shifting/masking on some combination of stash/fetch/compare operations.bob wrote:what is wrong with aging? You only need a couple of bits, and it is almost always available somewhere in the hash entry.
More than a couple of bits will be required to identify more than four different search advent cohorts.
----
It's been a long time since I looked at the Crafty source code and I might be mistaken, but I seem to remember that there was a section where the transposition table would be reset to avoid problems related to an impending approach of a 50 move rule draw claim. If this recollection is somewhat correct, then wouldn't such a reset be avoided with my 1/16th region sweep?
-
- Posts: 892
- Joined: Sun Nov 19, 2006 9:16 pm
- Location: Russia
Re: A couple of bits
There is no principal difference between blasting the entire table or 1/16 of the entire table. It is still blind and very costly.
The solution is better replacement scheme either with or without age field.
The solution is better replacement scheme either with or without age field.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Re: Multi-threaded transposition table clearing
I'm working on trying a multi-threaded approach to transposition table clearing, and have made a preliminary attempt to improve on the standard library function memset(). Note that memset() is the preferred replacement for the deprecated bzero() library routine which I had used before.
To my disappointment, the single thread performance of my auto-alinging, eight bytes at a time, super-duper memory blaster was no better than that of the memset() routine supplied by the Mac OS/X C run time library.
So I'll go back to memset() and continue with the pthread dance to see if I can get a speed gain when clearing a big table or parts thereof. Any gain realized will be good for both the segment sweep clearing and clearing a whole table. Therefore, any program might also benefit from multi-threaded clearing, although perhaps only at program start time and at game start time.
To my disappointment, the single thread performance of my auto-alinging, eight bytes at a time, super-duper memory blaster was no better than that of the memset() routine supplied by the Mac OS/X C run time library.
So I'll go back to memset() and continue with the pthread dance to see if I can get a speed gain when clearing a big table or parts thereof. Any gain realized will be good for both the segment sweep clearing and clearing a whole table. Therefore, any program might also benefit from multi-threaded clearing, although perhaps only at program start time and at game start time.
-
- Posts: 4675
- Joined: Mon Mar 13, 2006 7:43 pm
Memory blaster code fragment
Memory blaster code fragment
This is what I'd used at the heart of my table clearing routine. It is called only for regions aligned on an eight byte boundary. I do not understand why it's no faster than the simple memset() which has a loop iteration for each byte cleared. (All runs were on a 64 bit machine.)
This is what I'd used at the heart of my table clearing routine. It is called only for regions aligned on an eight byte boundary. I do not understand why it's no faster than the simple memset() which has a loop iteration for each byte cleared. (All runs were on a 64 bit machine.)
Code: Select all
typedef unsinged long long int ui64;
void MemClearAlignUi64(void * const baseptr, const size_t bytelength)
{
const ui64 ui64limit = (ui64) bytelength / sizeof(ui64);
ui64 *ui64ptr = (ui64 *) (void *) baseptr;
for (ui64 index = 0; index < ui64limit; index++)
*ui64ptr++ = 0;
}
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: A couple of bits
I don't actually clear the ttable, I just rip through it and say "scores are no good, best move should still be searched first". I'd be concerned that only clearing a part of it each search would let bad draw scores propagate across and into the just cleared info due to transpositions. You'd always have 15/16 old data to propagate into the newly cleared data.sje wrote:A couple of bits is not much, but those bits still eat space and require shifting/masking on some combination of stash/fetch/compare operations.bob wrote:what is wrong with aging? You only need a couple of bits, and it is almost always available somewhere in the hash entry.
More than a couple of bits will be required to identify more than four different search advent cohorts.
----
It's been a long time since I looked at the Crafty source code and I might be mistaken, but I seem to remember that there was a section where the transposition table would be reset to avoid problems related to an impending approach of a 50 move rule draw claim. If this recollection is somewhat correct, then wouldn't such a reset be avoided with my 1/16th region sweep?
-
- Posts: 27808
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Idea #9228: Clearing stale transtable entries
There is no need for a separate pass through the table for clearing entries. You can just overwrite stale entries during hash replacement, when their bucket is cached anyway. I usually consider any entry in the primary location with a draft already over-represented in the table (i.e. there are already more of that depth than of the lowest depth you store in the depth-preferred table) as empty, and just overwrite it. If not, the entry with the lowest draft in the bucket is replaced, when that draft is lower than the current one. This guarantees stale entries being pushed out of the table sooner or later, especially since the search depth tends to increase during the game.