PERFT transposition table funny?!

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
MartinBryant
Posts: 69
Joined: Thu Nov 21, 2013 12:37 am
Location: Manchester, UK
Full name: Martin Bryant

PERFT transposition table funny?!

Post by MartinBryant »

I've just implemented a fully legal move generator in Colossus and have been using PERFT to verify the correctness. All the numbers are good now but I noticed a weird effect with increasing transposition table size. As the size increased the time taken reduced as you would expect. However, when I got to 256MB table size the time taken started increasing!
(I was running a PERFT 7 search from the starting position. All searches returned the correct 3,195,901,860 count. The time taken to allocate and initialise the table memory is not included in the timing.)

MB Time(ms)
0 30890
1 11016
2 10328
4 9828
8 9046
16 8250
32 7734
64 7657
128 7579
256 7703 (!)
512 7985
1024 8484
2048 8594
4096 8734

Processor AMD Athlon(tm) X4 860K Quad Core Processor 3.70 GHz
Installed RAM 16.0 GB
System type 64-bit operating system, x64-based processor

Does anybody have any suggestions as to why this might be happening?
The only thing I could think of was an L1/L2 cache issue where it's filling them with rarely used entries from the larger tranposition table in preference to more commonly used stuff?!

(Windows task manager showed the total memory load never exceeded about 9GB, well below the 16GB installed.)
Sesse
Posts: 300
Joined: Mon Apr 30, 2018 11:51 pm

Re: PERFT transposition table funny?!

Post by Sesse »

Check performance counters, but it sounds like a case of TLB trashing (ie., your hash table is so large that the CPU can no longer keep efficiently track of where its memory pages are). Huge pages would help if that's the issue, but I have no idea whether such an old CPU supports it.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: PERFT transposition table funny?!

Post by mvanthoor »

MartinBryant wrote: Sat Apr 10, 2021 2:25 pm I've just implemented a fully legal move generator in Colossus and have been using PERFT to verify the correctness. All the numbers are good now but I noticed a weird effect with increasing transposition table size. As the size increased the time taken reduced as you would expect. However, when I got to 256MB table size the time taken started increasing!
In my engine, I observed the same effect, but later, at 1024 MB:

Code: Select all

MB		ms
0		93557
1		67264
2		56872
4		44286
8		35723
16		29090
32		23101
64		19454
128		17848
256		16547
512		15872
1024		16270 (!)
2048		16530
4096		16990
8192		17388
16384		18070
At 128 MB, it's for the first time where the hash table is not full after the run. When it is made even bigger, it just gets less full; i.e., the same information is spread over a bigger amount of memory. As you can see, after 128 MB, the gains become very marginal, and at 1024 MB, the time starts to increase. I think that at that point, the information is spread across such a huge part of the memory, that it is taking longer and longer to retrieve it. Also, when the hash table is not full, it's not full... increasing it after that size gains very little. In short time controls such as 2m+1s, a single core often can't get a 256 MB hash table filled up before the game is over, so going bigger doesn't gain anything.

Maybe supporting huge pages solves this issue, but I haven't looked into that yet. As long as I don't do multi-threading, and the CPU the engine is running on is not one of the very latest models, it can't fill up the 256 MB hash table in my testing games. As I prune more moves, I expect even less moves will end up in the hash table, so it becomes less of an issue. I'll probably start looking into that in the (far) future when I'm going to test with 8+ cores and I need 256 MB of hash per core.

PS: Rustic has a pseudo-legal move generator (for now), and the CPU is an i7-6700K. The system has 32GB RAM.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
MartinBryant
Posts: 69
Joined: Thu Nov 21, 2013 12:37 am
Location: Manchester, UK
Full name: Martin Bryant

Re: PERFT transposition table funny?!

Post by MartinBryant »

I think you could be right! I just had a quick look and with a 4GB table about 99% of the entries aren't used :lol:
Not very good for caching :(
Thank you for your help!
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: PERFT transposition table funny?!

Post by mvanthoor »

By the way: Rustic is using a hash table that has a bucket with four slots at each index. Using buckets in the hash table makes the thing a lot more efficient, because you can actually store multiple positions at the same index, which you can't do if you don't use buckets. That means that without buckets, you need to have a much bigger hash table to be able to store the same number of positions. (Excuse me if your engine already uses buckets; in that case, there isn't an improvement option there.)
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
Ronald
Posts: 160
Joined: Tue Jan 23, 2018 10:18 am
Location: Rotterdam
Full name: Ronald Friederich

Re: PERFT transposition table funny?!

Post by Ronald »

Using a bucket in the hashtable doesn't give you more entries, if you don't use a bucket in your case it will just get you 4 times as much entries. The advantage of using a bucket system is that you have more influence on the replacement of already used entries (although replacements will occur 4 times as much)..