transpositiontable, zobrist and > 15% collisions

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: transpositiontable, zobrist and > 15% collisions

Post by matthewlai »

xmas79 wrote: I know, and I'm sorry for not being clear in advance. I didn't say this is the problem, what I really meant is that I generally tend to avoid code patterns that could lead to silly mistakes, and this belongs to one of these IMHO.
What kind of silly mistakes?

I would say the way you have coded it is much more dangerous.

Imagine the user probing, getting a pointer, does another search (for example, a null move search) before using that entry. The search could have overwritten that entry, and now the entry is invalid for the original position.

This becomes an even bigger problem once you start doing multi-threaded lookups.

Making a copy of the entry is the safer option. The copy is never going to change from under your feet. What is dangerous about that?
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: transpositiontable, zobrist and > 15% collisions

Post by xmas79 »

matthewlai wrote:What kind of silly mistakes?
Something like assigning a pointer in the structure to a local variable (common when working with strings).
matthewlai wrote:I would say the way you have coded it is much more dangerous.

Imagine the user probing, getting a pointer, does another search (for example, a null move search) before using that entry. The search could have overwritten that entry, and now the entry is invalid for the original position.
Why would you do something else before trying to use the hash results? When I get a hash hit I check if I can immediately take a cut-off, if not I try the hash move, and from this moment onward I don't look at this hash entry anymore. So am I missing something?
matthewlai wrote:This becomes an even bigger problem once you start doing multi-threaded lookups.

Making a copy of the entry is the safer option. The copy is never going to change from under your feet. What is dangerous about that?
This is a good point, I'll remember (I hope...) when my engine will be at the multithread stage!
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: transpositiontable, zobrist and > 15% collisions

Post by matthewlai »

xmas79 wrote:Something like assigning a pointer in the structure to a local variable (common when working with strings).
C++ strings (as well as all standard containers) can handle that just fine. They have proper copy constructors.

Returning something that contains a pointer to a local variable is always a problem, and it has nothing to do with returning copies of structs.

You could have also assigned a pointer in the struct to a local variable before returning a pointer to that struct. The effect is exactly the same.
Why would you do something else before trying to use the hash results? When I get a hash hit I check if I can immediately take a cut-off, if not I try the hash move, and from this moment onward I don't look at this hash entry anymore. So am I missing something?
Yes, because you still have a copy of that pointer in scope, even though you are not allowed to touch it anymore. That is dangerous design. With very few exceptions (eg. explicit use of move semantics), in a well designed program, you should never have something in scope that you are not allowed to touch.

You may forget about this restriction in a few months, and start doing something with it.

For example, a common addition to the TT is to store whether it's worthwhile to do a null move search in this position. So you would write to the entry after doing a null move search.

You may remember to not do that now, but you probably won't in a few months time. You see a pointer, you see that it's not null, so you write to it.

And now you have an extremely subtle bug that you will probably never find. You have just set a flag in a hash table entry, that, with a 1/100000 chance, is actually for another position.

One way to get around that in this case is to never return the entry, and have the TTable code do the checking and return whether it's a hit or not. Returning a copy of the entry is safe, but potentially slower.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: transpositiontable, zobrist and > 15% collisions

Post by xmas79 »

matthewlai wrote:...
Yes, because you still have a copy of that pointer in scope, even though you are not allowed to touch it anymore. That is dangerous design. With very few exceptions (eg. explicit use of move semantics), in a well designed program, you should never have something in scope that you are not allowed to touch.
Having pointers here and there that you are not allowed to touch is done all the time. After the hash hit the pointer is useless (at least in my case) because it point to something that has been algorithmically "consumed", and there is absolutely no reason to reuse it later for something else like a null-move search.
You may forget about this restriction in a few months, and start doing something with it.

For example, a common addition to the TT is to store whether it's worthwhile to do a null move search in this position. So you would write to the entry after doing a null move search.

You may remember to not do that now, but you probably won't in a few months time. You see a pointer, you see that it's not null, so you write to it.

And now you have an extremely subtle bug that you will probably never find. You have just set a flag in a hash table entry, that, with a 1/100000 chance, is actually for another position.
Being the pointer tied to the hash key, updating the content of the TT entry by reusing the same pointer or through another method that will point to the same memory address (like the above TT implementation without multiple buckets per index) doesn't change the result. I will overwrite the same memory address. So I don't see the point. Agree that in multithread environments this is going to explode...
One way to get around that in this case is to never return the entry, and have the TTable code do the checking and return whether it's a hit or not. Returning a copy of the entry is safe, but potentially slower.
Agree.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: transpositiontable, zobrist and > 15% collisions

Post by matthewlai »

xmas79 wrote:Having pointers here and there that you are not allowed to touch is done all the time. After the hash hit the pointer is useless (at least in my case) because it point to something that has been algorithmically "consumed", and there is absolutely no reason to reuse it later for something else like a null-move search.
That's something I have not seen in high quality code. Even if it is done, it should be avoided because it's, in your words, a "code [pattern] that could lead to silly mistakes".

The pointer may be useless for now, but you may not remember the fact that it needs to be useless, and decide to use it again later, after you have completely forgotten the restriction.

There is a reason to reuse it. Conceptually, when you are reading the code, you are thinking of the pointer as a pointer to the entry corresponding to this position. If you want to write something about this position into the entry, it makes perfect sense to do it through the pointer. If you don't want to allow that, you can make the probing function return a const pointer. However, even in that case, if you decide to read something else from it after a search, it would still be buggy.

This is also very dangerous if you are not the only developer on the project. Other developers may not be aware of such a restriction, and they may decide to use it.
Being the pointer tied to the hash key, updating the content of the TT entry by reusing the same pointer or through another method that will point to the same memory address (like the above TT implementation without multiple buckets per index) doesn't change the result. I will overwrite the same memory address. So I don't see the point. Agree that in multithread environments this is going to explode...
And here you see just how easy it is to make a mistake in this case.

The pointer is not tied to the hash key (as you would conceptually expect from reading the code). It is tied to the hash slot.

By the time you access it again, it may have a different hash key already, from a different position that mapped to the same slot. The slot may now belong to another position. Anything you read from it may be wrong, and anything you write to it will overwrite something that you probably did not intend to overwrite.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: transpositiontable, zobrist and > 15% collisions

Post by Dann Corbit »

For the moment I'll rip out the incremental code and go ponder on a smarter implementation (while still being thread-safe and scalable et).
If you use Zobrist hashing, it is easy to do it incrementally.
There are lots of samples you can look at.
Almost everyone does it that way.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: transpositiontable, zobrist and > 15% collisions

Post by bob »

matthewlai wrote:
xmas79 wrote:Having pointers here and there that you are not allowed to touch is done all the time. After the hash hit the pointer is useless (at least in my case) because it point to something that has been algorithmically "consumed", and there is absolutely no reason to reuse it later for something else like a null-move search.
That's something I have not seen in high quality code. Even if it is done, it should be avoided because it's, in your words, a "code [pattern] that could lead to silly mistakes".

The pointer may be useless for now, but you may not remember the fact that it needs to be useless, and decide to use it again later, after you have completely forgotten the restriction.

There is a reason to reuse it. Conceptually, when you are reading the code, you are thinking of the pointer as a pointer to the entry corresponding to this position. If you want to write something about this position into the entry, it makes perfect sense to do it through the pointer. If you don't want to allow that, you can make the probing function return a const pointer. However, even in that case, if you decide to read something else from it after a search, it would still be buggy.

This is also very dangerous if you are not the only developer on the project. Other developers may not be aware of such a restriction, and they may decide to use it.
Being the pointer tied to the hash key, updating the content of the TT entry by reusing the same pointer or through another method that will point to the same memory address (like the above TT implementation without multiple buckets per index) doesn't change the result. I will overwrite the same memory address. So I don't see the point. Agree that in multithread environments this is going to explode...
And here you see just how easy it is to make a mistake in this case.

The pointer is not tied to the hash key (as you would conceptually expect from reading the code). It is tied to the hash slot.

By the time you access it again, it may have a different hash key already, from a different position that mapped to the same slot. The slot may now belong to another position. Anything you read from it may be wrong, and anything you write to it will overwrite something that you probably did not intend to overwrite.

You have to REALLY be careful here, because you are describing this as a solution when it is not. A hash entry is generally > 8 bytes, and any read > 8 bytes is NOT atomic on X86. So you can, even doing a read/copy, still get two parts of two different hash entries...

Making a copy only narrows the window of opportunity for problems, it comes nowhere near "closing it" completely...
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: transpositiontable, zobrist and > 15% collisions

Post by matthewlai »

bob wrote:
matthewlai wrote:
xmas79 wrote:Having pointers here and there that you are not allowed to touch is done all the time. After the hash hit the pointer is useless (at least in my case) because it point to something that has been algorithmically "consumed", and there is absolutely no reason to reuse it later for something else like a null-move search.
That's something I have not seen in high quality code. Even if it is done, it should be avoided because it's, in your words, a "code [pattern] that could lead to silly mistakes".

The pointer may be useless for now, but you may not remember the fact that it needs to be useless, and decide to use it again later, after you have completely forgotten the restriction.

There is a reason to reuse it. Conceptually, when you are reading the code, you are thinking of the pointer as a pointer to the entry corresponding to this position. If you want to write something about this position into the entry, it makes perfect sense to do it through the pointer. If you don't want to allow that, you can make the probing function return a const pointer. However, even in that case, if you decide to read something else from it after a search, it would still be buggy.

This is also very dangerous if you are not the only developer on the project. Other developers may not be aware of such a restriction, and they may decide to use it.
Being the pointer tied to the hash key, updating the content of the TT entry by reusing the same pointer or through another method that will point to the same memory address (like the above TT implementation without multiple buckets per index) doesn't change the result. I will overwrite the same memory address. So I don't see the point. Agree that in multithread environments this is going to explode...
And here you see just how easy it is to make a mistake in this case.

The pointer is not tied to the hash key (as you would conceptually expect from reading the code). It is tied to the hash slot.

By the time you access it again, it may have a different hash key already, from a different position that mapped to the same slot. The slot may now belong to another position. Anything you read from it may be wrong, and anything you write to it will overwrite something that you probably did not intend to overwrite.

You have to REALLY be careful here, because you are describing this as a solution when it is not. A hash entry is generally > 8 bytes, and any read > 8 bytes is NOT atomic on X86. So you can, even doing a read/copy, still get two parts of two different hash entries...

Making a copy only narrows the window of opportunity for problems, it comes nowhere near "closing it" completely...
Atomicity is a different problem.

The problem we are (primarily) talking about also exists in single threaded search.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

General comments

Post by sje »

General comments

1) It's a good practice to code with multithreaded safety for any resource which might be shared in present and future deployments.

2) The region locking technique for transposition tables with, say, 256 spinlocks per table, is fast and portable. If the number of regions is, say, 16 or more times the number of worker threads, then waiting for a region lock will be rare.

3) Very long time C programmers sometimes flinch at code returning a value which doesn't fit into a register. In the Old Days, returning a structure could fail at times if an interrupt occurred at the wrong time and messed with the stack. Not an easy bug to find! The only worry today is returning a pointer or reference to a temporary value; but a decent compiler will issue a warning. If you ever write a compiler, you'll see how it must be the responsibility of the caller to reserve space for a result and to never access an activation frame for a called function.

4) Operation of a transposition table should be well encapsulated. In Symbolic, all access by a search works through the call pair Probe() and Stash(). Pointer to entries are never exposed and neither is the structure of a table entry. Examples from the perft() table:

Code: Select all

  bool Probe(const Position& position, const ui draft, NodeCount& count);
  void Stash(const Position& position, const ui draft, const NodeCount count);
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: transpositiontable, zobrist and > 15% collisions

Post by bob »

matthewlai wrote:
bob wrote:
matthewlai wrote:
xmas79 wrote:Having pointers here and there that you are not allowed to touch is done all the time. After the hash hit the pointer is useless (at least in my case) because it point to something that has been algorithmically "consumed", and there is absolutely no reason to reuse it later for something else like a null-move search.
That's something I have not seen in high quality code. Even if it is done, it should be avoided because it's, in your words, a "code [pattern] that could lead to silly mistakes".

The pointer may be useless for now, but you may not remember the fact that it needs to be useless, and decide to use it again later, after you have completely forgotten the restriction.

There is a reason to reuse it. Conceptually, when you are reading the code, you are thinking of the pointer as a pointer to the entry corresponding to this position. If you want to write something about this position into the entry, it makes perfect sense to do it through the pointer. If you don't want to allow that, you can make the probing function return a const pointer. However, even in that case, if you decide to read something else from it after a search, it would still be buggy.

This is also very dangerous if you are not the only developer on the project. Other developers may not be aware of such a restriction, and they may decide to use it.
Being the pointer tied to the hash key, updating the content of the TT entry by reusing the same pointer or through another method that will point to the same memory address (like the above TT implementation without multiple buckets per index) doesn't change the result. I will overwrite the same memory address. So I don't see the point. Agree that in multithread environments this is going to explode...
And here you see just how easy it is to make a mistake in this case.

The pointer is not tied to the hash key (as you would conceptually expect from reading the code). It is tied to the hash slot.

By the time you access it again, it may have a different hash key already, from a different position that mapped to the same slot. The slot may now belong to another position. Anything you read from it may be wrong, and anything you write to it will overwrite something that you probably did not intend to overwrite.

You have to REALLY be careful here, because you are describing this as a solution when it is not. A hash entry is generally > 8 bytes, and any read > 8 bytes is NOT atomic on X86. So you can, even doing a read/copy, still get two parts of two different hash entries...

Making a copy only narrows the window of opportunity for problems, it comes nowhere near "closing it" completely...
Atomicity is a different problem.

The problem we are (primarily) talking about also exists in single threaded search.
Yes, but you presented your idea as a fix for BOTH, when it is not. That is the same sort of mistake you were accusing him of making if you think about it, the idea of having a pointer that points to something that is no longer relevant. Just copying it doesn't make it right for threads, which you suggested it would.

BTW in a single threaded search, why would you change the hash table between creating the address pointer and using the pointer? Without threads, I can't imagine how/why that would be done...