cutechess-cli: not restarting an engine because of tt

Ras · Post by **Ras** » Tue Jul 25, 2017 9:04 pm

hgm wrote:I certainly would not waste precious bits on extending the signature. Stockfish gets away with a 16-bit signature!

What do you mean with signature? Storing parts of the hash key? Remember that Stockfish has way bigger hash tables than I have (on the target platform), I only have 4096 entries for each of the main hash tables. That gives only 11 implicit key bits in the index. If you have 8 bytes per entry and use 128 MB, that's 16.7 million entries with 24 implicit bits. But I have to store the 13 bits difference explicitely.

I want to have at least 48 bits total hash key part because of collisions (referring to Bob's paper). The alpha/beta/exact flag needs two bits, so I fiddle in 6 additional key bits there, and I'm storing the upper uint32_t of the position index in a uint32_t. Plus the 11 implicit bits, that gives 49 bits, which is good because with small hash tables, there are a lot more index collisions than usually which open up the possibility of full collisions.

I think that even on a PC, 16 bits signature plus 24 implicit bits are too few.

If a move is present when probing the hash tables, I'm also doing some basic sanity checking, e.g. bishops have to move diagonally. If that doesn't match, I treat the lookup as "no match", also for the position value.

Note that for encoding Chess moves in a not-very-cumbersome way 13 bits would be sufficient

16 bits or 13 bits don't matter because it won't get useful anyway. I'll store from and to as 6 bits each and then 4 status bits. Your bit counter already fits into the depth with takes 6 bits.

What I intend to do is sweeping through the table before each move and deleting every entry with the oldest modulo-counter. So, taking the current counter (i.e. the one that the upcoming search is about to use), adding up one (modulo 4) and deleting every entry that has this counter. This spares the overhead when hash probing in the search. I will have to benchmark, however, whether that approach becomes a killer with 1 GB hash.

An interesting side fact, I did some benchmarking and profiling for the pawn hash table to get real numbers:

I have 2048 entries and get 90.8% hit rate over the game, pretty consistent even in the opening. Doubling it to 4096 gives 92.8% hit rate. So there isn't much to gain.

Profiling shows that with 2048 entries, the pawn eval function takes 1.7% of the computing time (the mcount profiler function already calculated away to refer to real computing time). I could reduce that by 22% if I double the size. That would increase the overall speed by 0.4%, which isn't really useful.

Plus that part of that would be eaten up because I would need to split up the position hash uint32_t into to uint16_t to avoid struct padding (packed structs are a hardware dependency I want to avoid), and I'd have to reassemble things. Plus that the move compression and decompression would also be there.

So the overall gain would be even smaller than 0.4%. I will have to think about what to do with 16k additional RAM, or if I just leave it as growth potential (in which case I can defer the implementation).

What it also shows that for the PC version, I can limit the pawn hash table to maybe 10k entries and better use the rest of the configured memory usage for the actual hash tables.

You could even consider using a 24-bit signature, so that you can pack 8 entries in a 64-byte cache line.

The cache on the target works very differently anyway, referring to the flash ROM to make up for the wait states. RAM operates at full core speed and thus doesn't need cache. The whole hash table design, even before the port, is treating the memory as uniform.

hgm · Post by **hgm** » Tue Jul 25, 2017 9:58 pm

Ras wrote:What do you mean with signature? Storing parts of the hash key?

Indeed, the part of the key stored in the hash entry.

Remember that Stockfish has way bigger hash tables than I have (on the target platform), I only have 4096 entries for each of the main hash tables. That gives only 11 implicit key bits in the index. If you have 8 bytes per entry and use 128 MB, that's 16.7 million entries with 24 implicit bits. But I have to store the 13 bits difference explicitely.

That should not be relevant. In all cases the table eventually fills up and many positions will map to the same bucket. The probability that the position that is stored there now accidentally has the same signature as the one you are probing for will be 1 in 2^16, irrespective of the size of the table. It doesn't get worse if you overload the table 1000 times compared to overloading it 10 times, because all the overwritten engines that could have given you a collision are no longer there to collide.

I want to have at least 48 bits total hash key part because of collisions (referring to Bob's paper).

With 32-bit signature you already have an incredibly low collision rate. Accidental signature matches will only occur once every 4 billion probes. (Once every billion probes if you probe 4 entries in a bucket.) At 1Mnps that would be one false probe result every 1000 sec. That is not dependent on the size of the table, as long as de table is entirely filled. Even a 256MB table (16M entries) would be entirely filled after 16 sec, at 1Mnps. The only way the bigger table helps is that it takes longer to get filled. When the table is only 10% full, the fact that you find something in the entry your probe maps to makes it 90% certain that must be the sought position, as there only is 10% probability that another position accidentally mapped there too. (And then the signature must do the rest.) But that only improves the collision rate by a factor 10, i.e. ~3 bits, no matter how many bits there were in the index. The index bits only help reducing collision rates by as much as their number suggest when there is only a single other entry in the table.

What I intend to do is sweeping through the table before each move and deleting every entry with the oldest modulo-counter. So, taking the current counter (i.e. the one that the upcoming search is about to use), adding up one (modulo 4) and deleting every entry that has this counter. This spares the overhead when hash probing in the search. I will have to benchmark, however, whether that approach becomes a killer with 1 GB hash.

An interesting side fact, I did some benchmarking and profiling for the pawn hash table to get real numbers:

Sounds good, but Pawn has is a completely different matter from the TT.

Ras · Post by **Ras** » Tue Jul 25, 2017 10:53 pm

hgm wrote:because all the overwritten engines that could have given you a collision are no longer there to collide.

I'm not sure whether this logic really applies. Let's assume the (theoretical) extreme case to make the table so big that the index is 64 bit wide - then no collisions at all can happen that are not already rooted in identical position hashes. Which, of course, additionally is also there with smaller signatures, but that's a different issue of the hash function itself.

With such a big table, the signature would be enough with 0 bits because all the information is already in the index. Conversely, if the index has 0 bits, i.e. if the table has only one entry, then the same safety for probe collisions can be reached if the signature has 64 bits.

So it does seem to depend on the table size, and I think it's plausible from the extreme cases to assume that the effective length is index bits plus signature. That is, unless there are good arguments why the extreme cases don't apply because in-between, things are following a completely different logic.

With 32-bit signature you already have an incredibly low collision rate.

That's not what I got from Bob's paper: "In summary, 32 bit signatures produced wrong moves and scores, while 64 bits did not. An “in-between” value of 48 bits also proved to be just as safe as 64"

Although his main point that Crafty would crash with an illegal move doesn't apply for me because the hash move isn't executed; it's only matched against the move list after the generation. Maybe you remember that discussion where you pointed out that this isn't the most efficient way. However, I just can't get myself to knowingly code something that might crash.

Accidental signature matches will only occur once every 4 billion probes. (Once every billion probes if you probe 4 entries in a bucket.) At 1Mnps that would be one false probe result every 1000 sec.

Then with Stockfish's 16 bit signature, this would happen once in 65536 probes, so at 1 Mnps that's every 65ms, or 15 times per second?! I can't imagine that this could work, except if they are using some additional clever trick.

Sounds good, but Pawn has is a completely different matter from the TT.

Yeah of course; it showed only up because additional 16k won't allow increasing the hash tables, just the pawn table - so I first benchmarked what effect I would get out of that.

cdani · Post by **cdani** » Wed Jul 26, 2017 12:45 am

Ras wrote: Then with Stockfish's 16 bit signature, this would happen once in 65536 probes, so at 1 Mnps that's every 65ms, or 15 times per second?! I can't imagine that this could work, except if they are using some additional clever trick.

Not really any trick, just validation of the move. The search seems to be robust enough to absorb such inaccuracies.
Similarly in Andscacs I have a random prune, that by saving some work wins more than what loses.

Ras · Post by **Ras** » Wed Jul 26, 2017 1:08 am

cdani wrote:Not really any trick, just validation of the move.

I've implemented a basic sanitiser, so if there's e.g. a hash move where a bishop doesn't move diagonally, I treat the whole lookup as failed. That's an additional source of redundancy.

But there's still the old "quod licet Iovi, non licet bovi". If you are among the top 10, which Stockfish is as well as Andsacs, then the rules are somewhat different. You can make trade-offs that might even lead to occasional crashes if that buys you more wins than the crashes cost you. It does make a difference if you are No. 7 or No 3. Even a crash on that level can be a calculated sacrifice.

But for other engines, this should not be a consideration because it doesn't make a difference whether you are at No. 261 or 257. A crash on that level is just crap.

cdani · Post by **cdani** » Wed Jul 26, 2017 8:34 am

Ras wrote:
cdani wrote:Not really any trick, just validation of the move.
I've implemented a basic sanitiser, so if there's e.g. a hash move where a bishop doesn't move diagonally, I treat the whole lookup as failed. That's an additional source of redundancy.

But there's still the old "quod licet Iovi, non licet bovi". If you are among the top 10, which Stockfish is as well as Andsacs, then the rules are somewhat different. You can make trade-offs that might even lead to occasional crashes if that buys you more wins than the crashes cost you. It does make a difference if you are No. 7 or No 3. Even a crash on that level can be a calculated sacrifice.

But for other engines, this should not be a consideration because it doesn't make a difference whether you are at No. 261 or 257. A crash on that level is just crap.

Excluding the crashes, which I don't like at all and I avoid them, I understand what you mean. Sure, I sacrifice some things to win a very little little, as many times is the only way I can find to advance.

hgm · Post by **hgm** » Wed Jul 26, 2017 9:47 am

Ras wrote:I'm not sure whether this logic really applies. Let's assume the (theoretical) extreme case to make the table so big that the index is 64 bit wide - then no collisions at all can happen that are not already rooted in identical position hashes. Which, of course, additionally is also there with smaller signatures, but that's a different issue of the hash function itself.

The point is that if a table that big is overloaded, you will have many full-key collisions. If such a table is 10% filled with other position, every new position you add will have a 10% chance to find the entry it maps to already occupied by another position. With which it is then confused, because you have no independent key bits left for signature that could be used to reject the hit. So 10% of your probes would be collisions.

It is not the total size of the table that helps in avoiding collisions. It is its filling fraction.

With such a big table, the signature would be enough with 0 bits because all the information is already in the index. Conversely, if the index has 0 bits, i.e. if the table has only one entry, then the same safety for probe collisions can be reached if the signature has 64 bits.

Well, a 1-entry table is always 100% filled.

That's not what I got from Bob's paper: "In summary, 32 bit signatures produced wrong moves and scores, while 64 bits did not. An “in-between” value of 48 bits also proved to be just as safe as 64

Are you sure that this statement applied to the root? It could have meant "somewhere in the tree", to judge the damage crashing on a wrong hash move would do.

I don't see how with a signature of 32 bit, and a bucket of N entries, you could get more than one collision per 2^32/N probes. Assuming N=4 for Crafty, it means one in a billion probes. Bob must have tested very long even to get one collision somewhere in the tree. And the erroneous score obtained from that would hardly ever propagate to the root. I think the paper also mentions that to affect the root move he had to intentionally spoil about 1 in a 1000 hash probes (by returning random score).

Although his main point that Crafty would crash with an illegal move doesn't apply for me because the hash move isn't executed; it's only matched against the move list after the generation. Maybe you remember that discussion where you pointed out that this isn't the most efficient way. However, I just can't get myself to knowingly code something that might crash.

Well, if even a single wrong hash move could crash the engine, then 1 collision in 1000 sec is unacceptably high. So what is an acceptable collision rate depeds on how likely it is a random move for hash move would crash your engine. Matching against the generated move list completely excludes that, so you would only start to see effects on the root move with signatures of 10 bits or less.

On the point of efficiency: complete vetting of the hash move is hardly ever necessary to exclude crashes. In most implementations it does not matter at all whether a Bishop moves like a Rook or a Knight, or jumps over pieces that should have blocked it. All these could have been valid moves in some Chess variant with fairy pieces, which would employ exactly the same board representation and MakeMove(), and just a different move generator. If your engine uses pseudo-legal move generation, it should also be resistent against an invalid hash move leaving you in check. Things that in the past have caused trouble for me are:
* Moving with an enemy piece (because it could put myself in check in a way the normal check test does not detect)
* Moving with an empty square
* Capturing a King (later check tests would access off-board squares)
* e.p. capture of a King
* Promoting non-Pawns
* Capturing pieces with a castling
* Castling when the Rook is not there
To combat this I do test if there is an own piece on the from-square of the hash move, my own King on the to-square (moves that capture an enemy King are not harmful in my pseudo-legal scheme) or a special move. Only in the latter case I test for the piece being a Pawn, or, for castling, whether the corresponding castling right existed. And I make my MakeMove-UnMake handle capturing castlings correctly, by also saving and restoring the 'Rook victim'. Any King-victim would be automatically handled correctly, like in normal moves. A similar precaution is taken with e.p. captures, where both the content of the e.p. square and the to-square are saved and restored (the to-square as always, and the e.p.square beig tested for being own King). This 'partial vetting' of hash moves in combination with a robust MakeMove-UnMake is far cheaper than generating all moves, and searchg through it to find the hash move.

Another problem I encountered was when I record the in-check status in the TT. This saves you the trouble of doing the in-check test before trying the hash move, in order to know whether you should apply a check extension. But on a collision you might falsely think you are in check, and in combination with a true check in the reply, I sometimes ran into infinitely extended branches.

Apart from partial vetting of the hash move, which can be very simple, there also is the need to determine whether killers are pseudo-legal. This requires more rigor, as it occurs very frequently that a move from a sibbling node is not legal in the current node (because it is now blocked, or the piece that moved was captured in the preceding ply, so there now is an opponent stading in its place). I also like to do that before generating all moves.

Then with Stockfish's 16 bit signature, this would happen once in 65536 probes, so at 1 Mnps that's every 65ms, or 15 times per second?! I can't imagine that this could work, except if they are using some additional clever trick.

You know how it works with Stockfish: if it doesn't improve Elo, it doesn't go in. The extra entries that you can afford because of the smaller signature apparently help more than the 1-in-64K collision rate hurts.

cutechess-cli: not restarting an engine because of tt

Re: cutechess-cli: not restarting an engine because of tt

Re: cutechess-cli: not restarting an engine because of tt

Re: cutechess-cli: not restarting an engine because of tt

Re: cutechess-cli: not restarting an engine because of tt

Re: cutechess-cli: not restarting an engine because of tt

Re: cutechess-cli: not restarting an engine because of tt

Re: cutechess-cli: not restarting an engine because of tt