Problems with TT, sometimes makes blunder moves

hgm · Post by **hgm** » Sat Apr 15, 2017 12:18 am

Sven Schüle wrote:
hgm wrote:You mean you are unable to see why the code I shown above would work?

That indeed seems like a problem...
I am certainly able to understand C code, even if it looks like that:
Code: Select all
int slot=0;
HashEntry *entry = hashTable + &#40;key & hashMask&#41;;
if&#40;entry&#91;slot&#93;.lock == key || entry&#91;slot^=1&#93;.lock == key ||
   entry&#91;slot^=2&#93;.lock == key || entry&#91;slot^=1&#93;.lock == key&#41; &#123; // hit
  entry += slot;
  ...
&#125; else &#123; // miss
  slot = -1;
&#125;
I simply don't like it that way. A simple loop does the same since the compiler will unroll it due to the number of slots being a compile time constant. Updating the slot variable with some XORs may be a nice hacker trick but why invest energy if an obvious solution is available?
Code: Select all
inline HashEntry * ttProbe&#40;uint64_t key&#41; &#123;
    HashEntry * entry = hashTable + &#40;key & hashMask&#41;;
    for &#40;int slot = 0; slot < 4; slot++) &#123;
        if &#40;entry&#91;slot&#93;.lock == key&#41; &#123;
            return &&#40;entry&#91;slot&#93;);
        &#125;
    &#125;
    return 0;
&#125;

Well, for one, you would return no information in case of a miss, so that on storing you would have to recalculate the etry address.

Secondly, since you seem so bent on being able to easily modify the code in every possible way (a grave violation of the maximum-flexibility-minimum-usefullness principle, btw), what if the distance between the slots isn't always the same. E.g. if I would have 4 slots in a bucket, and want to store in slot, slot^1 (replacing the lowest depth if the new entry is deeper) or slot+2 (the always-replace backup)? Whether you explicitly unroll loops of 2 or 3 iterations is largely a matter of taste, and it is definitely more flexible to explicitly unroll them, as that allows probing of other patterns than just sequential.

That being said, it still doesn't have any bearing on whether introducing a intermediate level of organization in the hash table (multi-entry buckets) is a good idea. The unstructured set of entries, (with the mask seeing to it all primary hits go to a multiple of 4) could also be probed with a loop:

Code: Select all

HashEntry *entry = hashTable + &#40;key & mask&#41;;
for&#40;slot=3; slot>=0; slot--) if&#40;entry&#91;slot&#93;.lock == key&#41; break;
if&#40;slot >= 0&#41; &#123; // hit
  entry += slot;
  ...
&#125;

The ^= stepping through the entries is not even needed here; it is only needed if the primary hit can come on any entry, rather than just every 4th entry, and you still want to guarantee you will not cross a cache-line boundary during probing. This can have a real advantage, e.g. in combination which aging, where you store in the primary hit location when that contained an obsolete entry. That will drive up the number of 'first time hits' in the loop on probing. If you just calculate the bucket address, you don't automatically single out a 'primary entry', and you would need extra code to derive it from the hash key. Complicating the code, deminishing the readability, etc.

Your position seems perilously close to:
* if it is fast and efficient code, it is a 'hackers trick'
* if it is cumbersome and inefficiet code, it is 'software developement'

tttony · Post by **tttony** » Sat Apr 15, 2017 12:56 am

Here is the log with full pv , is not activated vs engines because of a bug that rewrites the first pv move and sometimes has a blunder move

Code: Select all

FEN&#58; 8/3kP3/8/3K4/8/8/8/8 w - - 0 11 

Skiull 0.2 x64 popcnt&#58;
   1	00&#58;00	          20	20	+0,60	e7-e8Q+
   2	00&#58;00	          40	40	+5,60	e7-e8R Kd7-c7
   3	00&#58;00	         137	137	+0,60	e7-e8R Kd7xe8 Kd5-d4
   4	00&#58;00	         240	240	+0,60	e7-e8R Kd7xe8 Kd5-e6 Ke8-d8
   5	00&#58;00	         756	756	+1,84	Kd5-e5 Kd7-e8 Ke5-d6 Ke8-f7 Kd6-d7
   6	00&#58;00	       1.441	1.441	+0,50	e7-e8Q+ Kd7xe8 Kd5-e6 Ke8-f8 Ke6-d6 Kf8-f7
   7	00&#58;00	       9.273	9.273	+9,95	Kd5-e5 Kd7-c7 Ke5-f6 Kc7-d7 Kf6-f7 Kd7-d6 e7-e8Q
   8	00&#58;00	      17.764	17.764	+0,50	e7-e8Q+ Kd7xe8 Kd5-e6 Ke8-d8 Ke6-d6 Kd8-c8 Kd6-c6 Kc8-d8
   9	00&#58;00	      28.625	28.625	+3,45	e7-e8N Kd7-e7 Ne8-c7 Ke7-f6 Kd5-e4 Kf6-g5 Nc7-d5 Kg5-g4 Ke4-d4
  10	00&#58;00	     923.024	9.924.989	+M4	e7-e8Q+ Kd7-c7 Qe8-e7+ Kc7-b8 Qe7-d7 Kb8-a8 Qd7-a7+
  11	00&#58;00	  11.270.723	11.477.315	+M4	e7-e8Q+ Kd7-c7 Qe8-e7+ Kc7-b8 Qe7-d7 Kb8-a8 Qd7-a7+
  12	00&#58;03	  32.303.197	10.406.957	+M4	e7-e8Q+ Kd7-c7 Qe8-e7+ Kc7-b8 Kd5-c6 Kb8-a8 Qe7-a7+
  13	00&#58;24	 289.900.041	11.702.258	+M4	e7-e8Q+ Kd7-c7 Qe8-b5 Kc7-c8 Kd5-d6 Kc8-d8 Qb5-d7+
  14	01&#58;20	 840.697.981	10.421.704	+M4	e7-e8Q+ Kd7-c7 Kd5-c5 Kc7-b7 Qe8-d8 Kb7-a6 Qd8-b6+

What if I test adding a char fen[256] to the hash_entry_s struct, when saving to the TT it saves the current fen and when probing check if the fen it's equal to the current position

tttony · Post by **tttony** » Sat Apr 15, 2017 1:11 am

Now it's getting weird, I use VS 2013 and the debug build does not show the mate

I added the fen field and in this position no problem

Code: Select all

Skiull 0.2 x64 popcnt
Compiled&#58; Apr 14 2017 19&#58;05&#58;36
uci
id name Skiull 0.2 x64 popcnt
id author Tony Soares
uciok
position fen 8/3kP3/8/3K4/8/8/8/8 w - - 0 11
go
info depth 1 score cp 60 nodes 20 qnodes 17 time 0 nps 20 pv e7e8q
info depth 2 score cp 60 nodes 40 qnodes 27 time 0 nps 40 pv e7e8q d7e8
info depth 3 score cp 60 nodes 105 qnodes 72 time 0 nps 105 pv e7e8q d7e8 d5d4
info depth 4 score cp 60 nodes 208 qnodes 110 time 16 nps 13000 pv e7e8q d7e8 d5
e6 e8d8
info depth 5 score cp 60 nodes 471 qnodes 276 time 16 nps 29438 pv e7e8q d7e8 d5
e6 e8d8 e6d5
info depth 6 score cp 50 nodes 864 qnodes 401 time 16 nps 54000 pv e7e8q d7e8 d5
e6 e8f8 e6d6 f8f7
info depth 7 score cp 60 nodes 1657 qnodes 818 time 16 nps 103563 pv e7e8q d7e8
d5e6 e8f8 e6f6 f8e8 f6e5
info depth 8 score cp 50 nodes 2747 qnodes 1088 time 16 nps 171688 pv e7e8q d7e8
 d5e6 e8d8 e6d6 d8c8 d6c6 c8d8
info depth 9 score cp 50 nodes 4751 qnodes 2062 time 31 nps 153258 pv e7e8q d7e8
 d5e6 e8d8 e6d6 d8e8 d6e6 e8d8
info depth 10 score cp 50 nodes 7212 qnodes 2505 time 31 nps 232645 pv e7e8q d7e
8 d5e6 e8d8 e6d6 d8e8 d6e6 e8d8
info depth 11 score cp 50 nodes 11844 qnodes 4528 time 47 nps 252000 pv e7e8q d7
e8 d5e6 e8d8 e6d6 d8e8 d6e6 e8d8
info depth 12 score cp 40 nodes 19791 qnodes 6322 time 47 nps 421085 pv e7e8q d7
e8 d5e6 e8f8 e6f6 f8g8 f6g6 g8h8 g6f6 h8h7 f6f5 h7h6
info depth 13 score cp 40 nodes 30800 qnodes 10301 time 62 nps 496774 pv e7e8q d
7e8 d5e6 e8d8 e6d6 d8e8 d6e6 e8f8
info depth 14 score cp 40 nodes 45885 qnodes 13390 time 94 nps 488138 pv e7e8q d
7e8 d5e6 e8d8 e6d6 d8c8 d6c6 c8b8 c6b6 b8a8 b6a6 a8b8 a6b6 b8c8
info depth 15 score cp 40 nodes 70455 qnodes 21688 time 109 nps 646376 pv e7e8q
d7e8 d5e6 e8d8 e6d6 d8c8 d6c6 c8d8 c6d6 d8e8 d6e6 e8d8
info depth 16 score cp 40 nodes 105885 qnodes 29985 time 156 nps 678750 pv e7e8q
 d7e8 d5e6 e8d8 e6d6 d8c8 d6c6 c8b8 c6b6 b8c8 b6c6 c8d8
info depth 17 score cp 40 nodes 164047 qnodes 48084 time 203 nps 808113 pv e7e8q
 d7e8 d5e6 e8d8 e6d6 d8c8 d6c6 c8b8 c6b6 b8c8 b6c6 c8d8 c6d6 d8e8 d6c5 e8d7 c5d4

info depth 18 score cp 40 nodes 250324 qnodes 67988 time 296 nps 845689 pv e7e8q
 d7e8 d5e6 e8d8 e6d6 d8c8 d6c6 c8b8 c6b6 b8a8 b6a6 a8b8 a6b6 b8a8 b6a6 a8b8 a6b6
 b8c8
info depth 19 score cp 40 nodes 395242 qnodes 111666 time 421 nps 938817 pv e7e8
q d7e8 d5e6 e8d8 e6d6 d8c8 d6c6 c8b8 c6b6 b8c8 b6c6 c8d8 c6d6 d8e8 d6e6 e8d8
info depth 20 score cp 40 nodes 647563 qnodes 174931 time 671 nps 965072 pv e7e8
q d7e8 d5e6 e8d8 e6d6 d8c8 d6c6 c8b8 c6b6 b8c8 b6c6 c8d8 c6d6 d8e8 d6e6 e8d8
info depth 21 score cp 40 nodes 1052391 qnodes 295909 time 1030 nps 1021739 pv e
7e8q d7e8 d5e6 e8d8 e6d6 d8c8 d6c6 c8b8 c6b6 b8a8 b6a6 a8b8 a6b6 b8c8 b6c6 c8d8
c6d6 d8e8 d6c5 e8d7 c5d5
info depth 22 score cp 40 nodes 1632093 qnodes 434728 time 1591 nps 1025828 pv e
7e8q d7e8 d5e6 e8d8 e6d6 d8c8 d6c6 c8b8 c6b6 b8a8 b6a6 a8b8 a6b6 b8a8 b6a6 a8b8
a6b6 b8a8 b6a6 a8b8 a6b6 b8c8
info depth 23 score cp 55 nodes 2280940 qnodes 615361 time 2200 nps 1036791 pv e
7e8q d7e8 d5e6 e8d8 e6d6 d8c8 d6c6 c8b8 c6b6 b8a8 b6a6 a8b8 a6b6 b8a8 b6a6 a8b8
a6b6 b8a8 b6a6 a8b8 a6b6 b8c8 b6c5
info depth 24 score cp 40 nodes 3631661 qnodes 915185 time 3510 nps 1034661 pv e
7e8q d7e8 d5e6 e8f8 e6f6 f8g8 f6g6 g8h8 g6h6 h8g8 h6g6 g8h8 g6h6 h8g8 h6g6 g8h8
g6h6 h8g8 h6g6 g8h8 g6h6 h8g8 h6g6 g8f8
info depth 25 score cp 40 nodes 5579385 qnodes 1429416 time 5366 nps 1039766 pv
e7e8q d7e8 d5e6 e8d8 e6d6 d8c8 d6c6 c8b8 c6b6 b8a8 b6a6 a8b8 a6b6 b8a8 b6a6 a8b8
 a6b6 b8c8 b6c6 c8d8 c6d6 d8e8 d6c5 e8d7 c5d5
stop
bestmove e7e8q

Sven · Post by **Sven** » Sat Apr 15, 2017 1:16 am

hgm wrote:Your position seems perilously close to:
* if it is fast and efficient code, it is a 'hackers trick'
* if it is cumbersome and inefficiet code, it is 'software developement'

Go on in your style. It is good for you but not my style. Just please don't teach me in programming.

tttony · Post by **tttony** » Sat Apr 15, 2017 4:09 am

hgm wrote:Well, first it would help if you would print a complete PV, rather than just the first move. Then you can see in which position it makes the wrong move that allows pomotion. Or is that because the PV is cut by a TT hit?

If so, you should print the key of the hit at the end of the PV. Then you can test in the program where you store with that key, and print the position there. It seems the engine is mixing up positions. Are you sure your hash keys are OK (and not zero for a large part)?

I have created this function:

Code: Select all

void write_tree&#40;char *msg, uint8_t root_depth, uint8_t depth, uint8_t ply, int32_t move, uint64_t key, int32_t score, int32_t alpha, int32_t beta, uint8_t side&#41;
&#123;
	FILE *f = fopen&#40;"skiull_tree.log", "a+");
	if &#40;f == NULL&#41; &#123;
		fprintf&#40;stderr, "Error opening the log file\n");
		return;
	&#125;

	char buf&#91;4096 * 2&#93;;
	sprintf&#40;buf, "%2d&#58; %2d/%2d, %5s &#40;0x%016" PRIX64 "), score&#58; %6d, alpha&#58; %6d, beta&#58; %6d, %6s, %s%s\n", 
		root_depth, depth, ply, move == 0 ? "0" &#58; MoveToStr&#40;move&#41;, key, score, alpha, beta, side == 1 ? "WHITE" &#58; "BLACK", score >= beta ? "BCO, " &#58; "   , ", msg&#41;;

	fprintf&#40;f, buf&#41;;

	fclose&#40;f&#41;;
&#125;

I will put this function before UnMakeMove() and if (tt_probe()) { write_tree() }, I have to test it in the release build because in the debug it does not shows the mate

Dann Corbit · Post by **Dann Corbit** » Sat Apr 15, 2017 4:45 am

file open and close are very slow.

I would put the file open on program startup and the file close on program exit.

hgm · Post by **hgm** » Sat Apr 15, 2017 10:16 am

Sven Schüle wrote:Go on in your style. It is good for you but not my style. Just please don't teach me in programming.

I am not trying to teach you anything. It is you who are trying to teach other people things, and I am just warning those other people that these things are counter-productive...

It is also not a matter of 'style', but a matter of performance. That the more performant code is also simpler and more easy to understand is just an additional plus.

hgm · Post by **hgm** » Sat Apr 15, 2017 10:24 am

tttony wrote:Here is the log with full pv , is not activated vs engines because of a bug that rewrites the first pv move and sometimes has a blunder move

Code: Select all

FEN&#58; 8/3kP3/8/3K4/8/8/8/8 w - - 0 11 

Skiull 0.2 x64 popcnt&#58;
   1	00&#58;00	          20	20	+0,60	e7-e8Q+
   2	00&#58;00	          40	40	+5,60	e7-e8R Kd7-c7
   3	00&#58;00	         137	137	+0,60	e7-e8R Kd7xe8 Kd5-d4
   4	00&#58;00	         240	240	+0,60	e7-e8R Kd7xe8 Kd5-e6 Ke8-d8
   5	00&#58;00	         756	756	+1,84	Kd5-e5 Kd7-e8 Ke5-d6 Ke8-f7 Kd6-d7
   6	00&#58;00	       1.441	1.441	+0,50	e7-e8Q+ Kd7xe8 Kd5-e6 Ke8-f8 Ke6-d6 Kf8-f7
   7	00&#58;00	       9.273	9.273	+9,95	Kd5-e5 Kd7-c7 Ke5-f6 Kc7-d7 Kf6-f7 Kd7-d6 e7-e8Q
   8	00&#58;00	      17.764	17.764	+0,50	e7-e8Q+ Kd7xe8 Kd5-e6 Ke8-d8 Ke6-d6 Kd8-c8 Kd6-c6 Kc8-d8
   9	00&#58;00	      28.625	28.625	+3,45	e7-e8N Kd7-e7 Ne8-c7 Ke7-f6 Kd5-e4 Kf6-g5 Nc7-d5 Kg5-g4 Ke4-d4
  10	00&#58;00	     923.024	9.924.989	+M4	e7-e8Q+ Kd7-c7 Qe8-e7+ Kc7-b8 Qe7-d7 Kb8-a8 Qd7-a7+
  11	00&#58;00	  11.270.723	11.477.315	+M4	e7-e8Q+ Kd7-c7 Qe8-e7+ Kc7-b8 Qe7-d7 Kb8-a8 Qd7-a7+
  12	00&#58;03	  32.303.197	10.406.957	+M4	e7-e8Q+ Kd7-c7 Qe8-e7+ Kc7-b8 Kd5-c6 Kb8-a8 Qe7-a7+
  13	00&#58;24	 289.900.041	11.702.258	+M4	e7-e8Q+ Kd7-c7 Qe8-b5 Kc7-c8 Kd5-d6 Kc8-d8 Qb5-d7+
  14	01&#58;20	 840.697.981	10.421.704	+M4	e7-e8Q+ Kd7-c7 Kd5-c5 Kc7-b7 Qe8-d8 Kb7-a6 Qd8-b6+

What if I test adding a char fen[256] to the hash_entry_s struct, when saving to the TT it saves the current fen and when probing check if the fen it's equal to the current position

This should be very easy to track down, as you already have a major blunder in the 2-ply search, where only 40 nodes are searched. So you can afford to dump the entire tree. The thing you are interested in is why it plays Kc7 in the 2nd ply (after e8=R), rather than Kxe8. So printing the score of the moves there, and tracing how it obtained those scores, should tell you what the problem is.

Sven · Post by **Sven** » Sat Apr 15, 2017 12:01 pm

hgm wrote:
Sven Schüle wrote:Go on in your style. It is good for you but not my style. Just please don't teach me in programming.
I am not trying to teach you anything. It is you who are trying to teach other people things, and I am just warning those other people that these things are counter-productive...

It is also not a matter of 'style', but a matter of performance. That the more performant code is also simpler and more easy to understand is just an additional plus.

No, it was you who wrote this:

hgm wrote:But why would you want to change it? Best is usual to design things as they must be, and then leave them alone forever. Designing code with the purpose to change it seems a very roundabout way of doing things.

That is simply an incredibly wrong attitude, period. I sincerely hope that there are less than 10% of all programmers/SW developers thinking that way.

The crappy and inflexible "entry[slot ^= 1]" code which was also your proposal is not faster than a simple loop-based code. It is inflexible because it does not allow to do other things between two subsequent entry accesses, e.g. to check for an empty entry (lock == 0) which is what you typically need.

And if you don't know what is productive and what is counter-productive then it is not my fault, also not my fault that you seem to be still living in the 70's programming-wise, also that you seem to prefer writing unmaintainable code (you wrote youself that you don't want to maintain code, you just want to write it once so that it works

).

Bye.

hgm · Post by **hgm** » Sat Apr 15, 2017 1:09 pm

Sven Schüle wrote:
hgm wrote:But why would you want to change it? Best is usual to design things as they must be, and then leave them alone forever. Designing code with the purpose to change it seems a very roundabout way of doing things.
That is simply an incredibly wrong attitude, period. I sincerely hope that there are less than 10% of all programmers/SW developers thinking that way.

That could be your opinion. But I am sure that the overwhelming majority of developers would see that it makes to sense to make code extra cumbersome and unreadable just to make it easier for others to wreck the performance of it later...

The crappy and inflexible "entry[slot ^= 1]" code which was also your proposal is not faster than a simple loop-based code. It is inflexible because it does not allow to do other things between two subsequent entry accesses, e.g. to check for an empty entry (lock == 0) which is what you typically need.

Ah, you never heard of the && operator? This allows me to do anything in between. Obviously unrolling a small loop is always more flexible, as it doesn't force you to do the same thing on every iteration. The code I presented actually did make use of that, as in one place it does slot^=2 instead of slot ^= 1. I could have done slot++, and it would have been equivalent to a simple incrementing loop. I can step through the entries in any conceivable way. That is what flexibility means.

Not sure why you think "lock == 0" would be relevant during probing anyway. Or in fact whether you would bother at all, as after the first few seconds hash tables are completely filled anyway. So anything that tests for lock == 0 (which is a perfectly valid hash key, btw) just seems completely useless overhead, even if you do it in the replacement code, rather than in the probing code.

Btw, even if you have a more rational method of determining whether an entry is 'stale', hunting for one in the current bucket is a mistake that only slows you down. It is faster to only test the location of the primary hit for being stale, and use depth testing when it isn't, to see if there is a better place for it in the bucket.

Your inflexible way of doing things with a fixed-stride loop, with a fixed start and a fixed end, convicts you to use a horribly inefficient algorithm in comparison...

And if you don't know what is productive and what is counter-productive then it is not my fault, also not my fault that you seem to be still living in the 70's programming-wise, also that you seem to prefer writing unmaintainable code (you wrote youself that you don't want to maintain code, you just want to write it once so that it works ).

Indeed. But it would be your fault if you don't know what is productive and counter productive. And as it doesn't seem you are open to any kind of reason, it is extra important that unsuspecting readers are warned about the controverial nature of your claims...

Problems with TT, sometimes makes blunder moves

Re: Problems with TT, sometimes makes blunder moves

Re: Problems with TT, sometimes makes blunder moves

Re: Problems with TT, sometimes makes blunder moves

Re: Problems with TT, sometimes makes blunder moves

Re: Problems with TT, sometimes makes blunder moves

Re: Problems with TT, sometimes makes blunder moves

Re: Problems with TT, sometimes makes blunder moves

Re: Problems with TT, sometimes makes blunder moves

Re: Problems with TT, sometimes makes blunder moves

Re: Problems with TT, sometimes makes blunder moves