Crafty 25.6 search stability

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

jhaglund2
Posts: 65
Joined: Mon Jan 16, 2017 6:28 pm

Crafty 25.6 search stability

Post by jhaglund2 »

Something is causing a flux in consistency. I think there needs to be something along the line(s) of:

Code: Select all

    InitializeChessBoard(tree);
    InitializeHashTables(0);
after each "setboard"... or maybe something similar to what is done in your Bench() function. Something isn't right here at the moment. Also, this may be the culprit to fixing the "new" command that was removed, due to "unknown" influences in your test results.

After, analyzing positions, I noticed something is off.
Using a set depth of 20, setboard, and ponder switched off after the first "go":

sd 20
go

Code: Select all

 20->   8.25/42.00    0.27   1. e4 Nc6 2. d4 d5 3. e5 Bf5 4. Nf3 e6 5. Bb5 Qd7 6. Nc3 O-O-O 7. O-O Be7 8. Bg5 f6 9. Bf4 Nh6 10. exf6 Bxf6
ponder off
setboard 2kr3r/pppq2pp/2n1pb1n/1B1p1b2/3P1B2/2N2N2/PPP2PPP/R2Q1RK1 w - - 0 11
go

Code: Select all

 20->   4.06/16.29    0.41   1. Na4 a6 2. Nc5 Qe8 3. Bxc6 Qxc6 4. Re1 Nf7 5. c3 Rhg8 6. Qd2 g5 7. Ne5 Nxe5 8. Bxe5 Be7 9. b4 g4 10. Qf4 h5
setboard 2kr2r1/1pp1b3/p1q1p3/2NpBb1p/1P1P1Qp1/2P5/P4PPP/R3R1K1 w - h6 0 21
go

Code: Select all

 20->   2.51/26.75   -0.16   1. Qe3 h4 2. a4 Bxc5 3. bxc5 h3 4. g3 Be4 5. Re2 Kb8 6. Rb2 Ka8 7. Bf4 Rdf8 8. a5 Rf7 9. Qe2 Qd7 10. Re1 Qc6
setboard k5r1/1pp2r2/p1q1p3/P1Pp4/3PbBp1/2P3Pp/1R2QP1P/4R1K1 w - - 5 31
go

Code: Select all

 20->   2.38/16.53    0.59   1. Rb4 Re7 2. Qe3 Bc2 3. Ra1 e5 4. Bxe5 Rge8 5. Qd2 Qg6 6. Re1 Be4 7. Re2 Qf5 8. Re3 Rf7 9. Bf4 Rfe7 10. Qb2 Qc8
[d]k1q1r3/1pp1r3/p7/P1Pp4/1R1PbBp1/2P1R1Pp/1Q3P1P/6K1 w - - 13 41

Code: Select all

 20->   1.98/27.17    0.70   1. Qa2 Qf5 2. Be5 Bg2 3. Rb2 Qd7 4. Bf4 Rxe3 5. fxe3 Qc6 6. Qb3 Rf8 7. Qb4 Rf7 8. c4 Be4 9. Rb3 Bf3 10. cxd5 Qxd5
Why would Crafty choose 9. Rb3?? in the PV line?
8. c4? Be4. 9. Rb3?? dxc4, White cannot recapture with 10. Qxc4?? Bd5!
I think the problem is with 8. c4.
8. c4 dxc4 9. Qxc4
Rather:
8. c4 Be4 9. cxd5 Bd5
By moving 9. Rb3?? the things it does, is entice Black's d-pawn to capture to c4, which removes the central pawn-bishop chain.
By moving 8 c4? it prevents a drawish looking position. It attacks the pawn and center. It activates a doubled-pawn. It tries to open a diagonal for the Queen after recapture.
Instead of Rb3:
9. Qb3 dxc4 10. Qxc4 Bd5 ...
... is more of the line Crafty was probably trying to think of.
Also, why doesn't Black see this continulation blunder 9. ... Bf3? 10. cxd5 Qxd5? (10. ... with Bd5!)
Why such clear blunder(s) in the PV line?
----------------------------------------------------------------------
NOW -- reproducing this is another thing.
Quit program. Opening a new Crafty process:
sd 20
ponder off
[d]k1q1r3/1pp1r3/p7/P1Pp4/1R1PbBp1/2P1R1Pp/1Q3P1P/6K1 w - - 13 41
This yielded a different PV line.

Code: Select all

20->   1.66/26.70    0.64   1. Qb3 Bf3 2. Be5 Rf8 3. Qa3 Bg2 4. Qa4 Qe8 5. Qxe8+ Rfxe8 6. c4 dxc4 7. Rxc4 Bf3 8. Rb4 Re6 9. Bxc7 Rxe3 10. fxe3 Rxe3 11. Rb2 Bd5
Why the inconsistencies ?
All the positions previously looked at also have different PV lines for depth 20.

Code: Select all

20->   4.19/25.88    0.23   1. e4 Nc6 2. Nf3 e5 3. Nc3 Nf6 4. Bc4 Be7 5. O-O O-O 6. d3 d6 7. Bb3 Kh8 8. Qe2 a6 9. Be3 Ng4 10. a3 Nxe3 11. Qxe3

Code: Select all

20->  11.66/28.97    1.18   1. Na4 Qe8 2. Nc5 Nf7 3. c4 g5 4. Qa4 a5 5. cxd5 Rxd5 6. Be3 g4 7. Nd2 h5 8. Ndb3 Nd6 9. Bxc6 Qxc6 10. Qxc6 bxc6 11. Nxa5

Code: Select all

20->   2.42/38.10   -0.22   1. Qe3 h4 2. a4 Bxc5 3. bxc5 h3 4. g3 Be4 5. Re2 Kb8 6. Rb2 Ka8 7. Rb4 Rdf8 8. Bf4 Rf7 9. a5 Rf5 10. Re1 Rgf8

Code: Select all

 20->   0.44/18.00    0.00   1. Qd1 Rf6 2. Re3 Bf5 3. Qb3 Rf7 4. Qb4 Be4 5. Re1 Rf5 6. Ree2 Rf6 7. Re3 Rf7 <3-fold>
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty 25.6 search stability

Post by bob »

I will look. If anything, I would suspect the PV is wrong, but the score is right. I don't know that I clear the "PV hash table" where I store PV's that go with EXACT table hash entries. I am looking and don't see where that is done. I will look further and comment in a bit...

OK, I looked. Here's what I think is happening... the hash_path stuff saves a PV for EXACT positions in the regular hash, so that if you get a hash hit that would normally produce a "short PV" you get a good one. The path hash table has 64K entries, which requires a 16 bit address. Out of 64 bits in the hash signature. That's not very many possible entries and it is within possible that you get a hit and graft the wrong PV onto the end of the truncated one. Should not happen very often. However, that being said, I normally don't pay a lot of attention to PV's unless I am watching a live game. Which means position to position should share EXACT entries and therefore on a hit they should extend the PV. Doesn't work 100% of the time, but it works pretty well. I'd accept an occasional fishy PV over all the truncated PV's I see from other programs (and from versions of Crafty prior to this addition.)

I don't clear the hash table at all, as (a) 64 bits ought to be safe enough and (b) clearing hash entries can be quite a time-consuming operation on a large memory/large hash table/NUMA box.

I have noticed many times that if I run a set of positions (say Win At Chess) and find something odd in one of them, when I go back and run it by itself to debug, it does't do the odd thing. Which suggests that doing something might be useful for test positions. So long as nobody tries to use a fractional second time limit with a large hash, since clearing the hash could take more than the allotted search time given.

Will give this some thought...

Interesting that you caught this. Most don't watch that closely. Including me. :)
jhaglund2
Posts: 65
Joined: Mon Jan 16, 2017 6:28 pm

Re: Crafty 25.6 search stability

Post by jhaglund2 »

This is probably related:

You have 4 programs. Crafty A, B, C, & D.

Open Crafty A, type "analyze". Let it get to 'x' depth and then "exit" to stop analysis.
Open Crafty B, type "analyze". Let it get to 'x' depth and then "exit" to stop analysis.

The resulting output will be the same for both programs.
Close A & B.

However...

Open A & B.
Crafty A: type "go" and let it complete the move, and continue to ponder output.
Crafty B: type "go" and let it complete the move, and continue to ponder output.

Crafty B will have different ponder PV lines and analysis.

Open a Crafty C. Repeat the process. Type "go", let it move and continue to ponder. The pondering lines will be different from A & B.
Continue, type "go" on A, B, & C. All analysis and pondering will be different.

Close A & B, but keep C open.

Open a Crafty D. Type, "go". The PV lines and analysis will be different. It may also select a different first move.

This is suggesting that Crafty can have it's search externally manipulated from running processes, to the point it even picks a different move. While, "analyze" will produce the same consistent starting results, but only until actual gameplay is made. The search results will be different for the same position, if the there has been any "external" searching made from another process.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: Crafty 25.6 search stability

Post by MikeB »

jhaglund2 wrote: Thu Apr 23, 2020 5:37 pm This is probably related:

You have 4 programs. Crafty A, B, C, & D.

Open Crafty A, type "analyze". Let it get to 'x' depth and then "exit" to stop analysis.
Open Crafty B, type "analyze". Let it get to 'x' depth and then "exit" to stop analysis.

The resulting output will be the same for both programs.
Close A & B.

However...

Open A & B.
Crafty A: type "go" and let it complete the move, and continue to ponder output.
Crafty B: type "go" and let it complete the move, and continue to ponder output.

Crafty B will have different ponder PV lines and analysis.

Open a Crafty C. Repeat the process. Type "go", let it move and continue to ponder. The pondering lines will be different from A & B.
Continue, type "go" on A, B, & C. All analysis and pondering will be different.

Close A & B, but keep C open.

Open a Crafty D. Type, "go". The PV lines and analysis will be different. It may also select a different first move.

This is suggesting that Crafty can have it's search externally manipulated from running processes, to the point it even picks a different move. While, "analyze" will produce the same consistent starting results, but only until actual gameplay is made. The search results will be different for the same position, if the there has been any "external" searching made from another process.
So it is possible that the hash/memory is being shared between all running Craftys?
Image
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: Crafty 25.6 search stability

Post by brianr »

Crafty with just one thread should be deterministic.
More than one...
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Crafty 25.6 search stability

Post by Ras »

jhaglund2 wrote: Thu Apr 23, 2020 5:37 pmThis is suggesting that Crafty can have it's search externally manipulated from running processes, to the point it even picks a different move.
Of course is can - multi-threading with several threads writing to the hash table isn't deterministic, and it depends on when which thread writes and reads in relation to other worker threads. Even totally different applications have influence because the OS scheduler schedules the threads differently.

However, the chosen PVs and root moves should be of equivalent quality.
Rasmus Althoff
https://www.ct800.net
jhaglund2
Posts: 65
Joined: Mon Jan 16, 2017 6:28 pm

Re: Crafty 25.6 search stability

Post by jhaglund2 »

So it is possible that the hash/memory is being shared between all running Craftys?
Something changes...

Another good example would be the historical Crafty benchmark.

Bob would have the official total nodes number. The only thing that would change is the total time elapsed and NPS. All of the rest of data would be the same.

Today, you can get varied node totals each time you run it.
mt 1 =
mt 2 =
mt 4 =
mt 8 =

To me, they should all add up to the same total, like a calculator, but they don't.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty 25.6 search stability

Post by bob »

jhaglund2 wrote: Thu Apr 23, 2020 5:37 pm This is probably related:

You have 4 programs. Crafty A, B, C, & D.

Open Crafty A, type "analyze". Let it get to 'x' depth and then "exit" to stop analysis.
Open Crafty B, type "analyze". Let it get to 'x' depth and then "exit" to stop analysis.

The resulting output will be the same for both programs.
Close A & B.

However...

Open A & B.
Crafty A: type "go" and let it complete the move, and continue to ponder output.
Crafty B: type "go" and let it complete the move, and continue to ponder output.

Crafty B will have different ponder PV lines and analysis.

Open a Crafty C. Repeat the process. Type "go", let it move and continue to ponder. The pondering lines will be different from A & B.
Continue, type "go" on A, B, & C. All analysis and pondering will be different.

Close A & B, but keep C open.

Open a Crafty D. Type, "go". The PV lines and analysis will be different. It may also select a different first move.

This is suggesting that Crafty can have it's search externally manipulated from running processes, to the point it even picks a different move. While, "analyze" will produce the same consistent starting results, but only until actual gameplay is made. The search results will be different for the same position, if the there has been any "external" searching made from another process.
That is a potentially broken test.

Take the first example. If you let them both compute for unlimited time, each should have exactly the same output at the same depths. Period. There is nothing that is non-deterministic here.

For the second example, there are timing issues. IE if you don't search to a fixed depth before pondering, there are slight timing differences, which will result in slightly different info in the hash tables and such. And now you can see significant differences... For example, run the C/D test again. But do something like SD=25 (or whatever works without taking too long). Compare the output and they will match exactly. And then when they start pondering, they will also match exactly. Because you are not introducing slight timing variances that can have a much bigger effect that you might expect.

I ran your last test, but using SD for both, and everything matched. And YES, any chess program can have its search manipulated by external processes. All you need is a process that burns a few seconds of CPU time on one version but not the other. Now you have discovered one of those non-deterministic timing issues. If you want to see this, just play Crafty vs Crafty with 1 second per move, pondering on. Repeat and play a second game. They will not match move for move or score for score. The timing variations caused by interrupts, timer sampling accuracy, etc will change things. In fact, play 100 games. It is likely NONE of the games will be the same. All perfectly normal and expected.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty 25.6 search stability

Post by bob »

MikeB wrote: Thu Apr 23, 2020 6:16 pm
jhaglund2 wrote: Thu Apr 23, 2020 5:37 pm This is probably related:

You have 4 programs. Crafty A, B, C, & D.

Open Crafty A, type "analyze". Let it get to 'x' depth and then "exit" to stop analysis.
Open Crafty B, type "analyze". Let it get to 'x' depth and then "exit" to stop analysis.

The resulting output will be the same for both programs.
Close A & B.

However...

Open A & B.
Crafty A: type "go" and let it complete the move, and continue to ponder output.
Crafty B: type "go" and let it complete the move, and continue to ponder output.

Crafty B will have different ponder PV lines and analysis.

Open a Crafty C. Repeat the process. Type "go", let it move and continue to ponder. The pondering lines will be different from A & B.
Continue, type "go" on A, B, & C. All analysis and pondering will be different.

Close A & B, but keep C open.

Open a Crafty D. Type, "go". The PV lines and analysis will be different. It may also select a different first move.

This is suggesting that Crafty can have it's search externally manipulated from running processes, to the point it even picks a different move. While, "analyze" will produce the same consistent starting results, but only until actual gameplay is made. The search results will be different for the same position, if the there has been any "external" searching made from another process.
So it is possible that the hash/memory is being shared between all running Craftys?
No. That is impossible unless deliberately done inside Crafty, and it doesn't do this at all. And I don't think Joshua was talking about a multi-threaded version either as everyone knows how non-deterministic that would be. But anytime you use time to terminate a search, if you repeat the test several times, you can STILL see non-deterministic behavior. Has been discussed dozens of times here...
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Crafty 25.6 search stability

Post by bob »

jhaglund2 wrote: Fri Apr 24, 2020 1:54 am
So it is possible that the hash/memory is being shared between all running Craftys?
Something changes...

Another good example would be the historical Crafty benchmark.

Bob would have the official total nodes number. The only thing that would change is the total time elapsed and NPS. All of the rest of data would be the same.

Today, you can get varied node totals each time you run it.
mt 1 =
mt 2 =
mt 4 =
mt 8 =

To me, they should all add up to the same total, like a calculator, but they don't.
I assume you are talking about mt = # of threads to start? Anything but 0 or 1 will produce different node counts every time you run it, in any version of Crafty. I only wish it was deterministic as it would greatly simplify debugging the search. But it will never happen on any existing multi-threaded chess program.

With more than one threads, it won't choose the same move each time, probably won't have exactly the same score each time, and will never produce the same node count.

For example, I ran "bench", then re-started and ran "mt=2;bench".

Here is just the first position for each run:

Code: Select all

        time=0.09(100%)  nodes=595345(595.3K)  fh1=94%  pred=0  nps=6.6M
        time=0.08(83%)  nodes=851522(851.5K)  fh1=92%  pred=0  nps=10.6M
Notice the node counts are nowhere near the same. Really too short of a test, but it still makes the point...