testing of a different sort

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

jarkkop
Posts: 198
Joined: Thu Mar 09, 2006 2:44 am
Location: Helsinki, Finland

Re: Changing time value resolution

Post by jarkkop »

Please release the quiescent check version at least as a test version, because last time it was in Crafty must be release 9.x or something similar.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Changing time value resolution

Post by bob »

jarkkop wrote:Please release the quiescent check version at least as a test version, because last time it was in Crafty must be release 9.x or something similar.
I can certainly do that. I have currently run the non-qsearch-checks version against my 4,000 starting positions and have the 40K game BayesElo output. The current null=3~2 with qsearch checks is running the same test. And once that is finished I have a null=3 always test scheduled. And I will probably even try a null=4~3 just for fun. Each test is taking about 24 hours as the cluster has 1/2 the nodes powered down due to an A/C compressor failure. Was supposed to be fixed this week but no news yet. I should have the results sometime this weekend. But at present, there is essentially 0 difference between checks and no checks...

I had noped for something significant. I suspect that many that say "checks in the qsearch are a good thing" did the wrong kind of testing. yes, they pick things up 1-2 plies quicker in positions where that last check is important (mates and such). But when you consider time, the two versions end up being fairly close.. I'm beginning to suspect nobody followed this up with significant game testing..

Here's an example. Position is from Cray Blitz - Belle, 1981 ACM computer chess championship;

Code: Select all

                9     0.35  -0.74   1. Qxb6 Rf1 2. Qd8+ Kh7 3. Qd3+ e4
                                    4. Qxf1 Qxf1 5. b4 e3 6. a4 e2
                9     0.52  -0.42   1. Bxh6 Qxa1 2. Qxb6 Qxa2 3. Bg5 Qb1
                                    4. b4 Qf5 5. Be3
                9     0.57  -0.39   1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2
                                    4. Kg3 g6
                9->   0.58  -0.39   1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2
                                    4. Kg3 g6
              time=0.58  mat=1  n=1499626  fh=93%  nps=2.6M
              ext-> check=143K 1rep=12K mate=2K pp=0 reduce=845K/80K

                9     0.51  -0.74   1. Qxb6 Rf1 2. Qd8+ Kh7 3. Qd3+ e4
                                    4. Qxf1 Qxf1 5. Kg3 e3 <HT>
                9     0.79   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+
                9->   0.83   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+
              time=0.83  mat=1  n=1866905  fh=93%  nps=2.2M
              ext-> check=148K qcheck=134K 1rep=13K mate=5K reduce=833K/76K
But if I give both an equal time limit:

Code: Select all

                9->   0.58  -0.39   1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2
                                    4. Kg3 g6
               10     0.60  -0.34   1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2
                                    4. Kg3 Qa1 5. Bd4 Qe1+ 6. Kg4
               10     0.74     +1   1. Bxh6!!
               10     0.74   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+
               10->   0.88   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+
               11     0.97   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+

                9->   0.82   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+
               10     0.90   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+
              time=1.00  mat=1  n=2327325  fh=93%  nps=2.3M
              ext-> check=194K qcheck=153K 1rep=16K mate=7K reduce=1.1M/100K
So old version finds Bxh6 is a forced draw after .74 seconds, new version finds it after .78 seconds. A ply sooner, but same time as the qsearch checks add time to each iteration.

Is one better than the other in that light? Does not look like it, and the cluster game testing is verifying this. It sounds good to save a ply or two. But in reality, it is just a number.
Carey
Posts: 313
Joined: Wed Mar 08, 2006 8:18 pm

Cluster lover....

Post by Carey »

bob wrote:I'm beginning to suspect nobody followed this up with significant game testing..
Bob, how can you expect anybody to have done any significant testing in the past??

As your own testing has been showing, a few dozen or even hundreds of test games aren't good enough to get reliable results about whether it's an improvement or not.

Weren't too many people back then had clusters to test with. Come to think of it, there still aren't many people who have clusters to test with...


Personally, I'm looking forward to when you finish these test and show some data bout how many positions are truely needed for various confidence levels of testing. Just for those of us who are trying to be energy efficient and turn off our cluster in the summer months.... :)


(While you are at it, it might be interesting to compare the results of those quick game tests you've been doing with slower time controls, just to make sure the results are comparable. Be a shame to tune for ultra fast games only to discover that real games are different enough the changes actually hurt.)
Uri Blass
Posts: 10460
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Changing time value resolution

Post by Uri Blass »

bob wrote:
jarkkop wrote:Please release the quiescent check version at least as a test version, because last time it was in Crafty must be release 9.x or something similar.
I can certainly do that. I have currently run the non-qsearch-checks version against my 4,000 starting positions and have the 40K game BayesElo output. The current null=3~2 with qsearch checks is running the same test. And once that is finished I have a null=3 always test scheduled. And I will probably even try a null=4~3 just for fun. Each test is taking about 24 hours as the cluster has 1/2 the nodes powered down due to an A/C compressor failure. Was supposed to be fixed this week but no news yet. I should have the results sometime this weekend. But at present, there is essentially 0 difference between checks and no checks...

I had noped for something significant. I suspect that many that say "checks in the qsearch are a good thing" did the wrong kind of testing. yes, they pick things up 1-2 plies quicker in positions where that last check is important (mates and such). But when you consider time, the two versions end up being fairly close.. I'm beginning to suspect nobody followed this up with significant game testing..

Here's an example. Position is from Cray Blitz - Belle, 1981 ACM computer chess championship;

Code: Select all

                9     0.35  -0.74   1. Qxb6 Rf1 2. Qd8+ Kh7 3. Qd3+ e4
                                    4. Qxf1 Qxf1 5. b4 e3 6. a4 e2
                9     0.52  -0.42   1. Bxh6 Qxa1 2. Qxb6 Qxa2 3. Bg5 Qb1
                                    4. b4 Qf5 5. Be3
                9     0.57  -0.39   1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2
                                    4. Kg3 g6
                9->   0.58  -0.39   1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2
                                    4. Kg3 g6
              time=0.58  mat=1  n=1499626  fh=93%  nps=2.6M
              ext-> check=143K 1rep=12K mate=2K pp=0 reduce=845K/80K

                9     0.51  -0.74   1. Qxb6 Rf1 2. Qd8+ Kh7 3. Qd3+ e4
                                    4. Qxf1 Qxf1 5. Kg3 e3 <HT>
                9     0.79   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+
                9->   0.83   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+
              time=0.83  mat=1  n=1866905  fh=93%  nps=2.2M
              ext-> check=148K qcheck=134K 1rep=13K mate=5K reduce=833K/76K
But if I give both an equal time limit:

Code: Select all

                9->   0.58  -0.39   1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2
                                    4. Kg3 g6
               10     0.60  -0.34   1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2
                                    4. Kg3 Qa1 5. Bd4 Qe1+ 6. Kg4
               10     0.74     +1   1. Bxh6!!
               10     0.74   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+
               10->   0.88   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+
               11     0.97   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+

                9->   0.82   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+
               10     0.90   0.01   1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
                                    4. Qg6+ Kh8 5. Qh6+
              time=1.00  mat=1  n=2327325  fh=93%  nps=2.3M
              ext-> check=194K qcheck=153K 1rep=16K mate=7K reduce=1.1M/100K
So old version finds Bxh6 is a forced draw after .74 seconds, new version finds it after .78 seconds. A ply sooner, but same time as the qsearch checks add time to each iteration.

Is one better than the other in that light? Does not look like it, and the cluster game testing is verifying this. It sounds good to save a ply or two. But in reality, it is just a number.
I remember that I used test suites and found that checks in the qsearch sometimes helped to get more than 2 plies advantage in some cases
and generally helped in test suites based on time per solution(I believe that they also helped in games but I did not play enough games to be sure)
I used not only easy test suite like WAC but also other test suites like the ECM-GCP test suite.

I suggest to try the following simple position that I composed to show cases when there is a big difference.

check extensions saved fruit 4 plies in seeing the mate.

I give analysis without check extensions in the qsearch and with check extensions in the qsearch.



New game - Fruit 2.1
[D]5rk1/5p1p/5Bp1/8/8/1n6/n6Q/6K1 w - - 0 1

Analysis by Fruit 2.1(no check extensions):

1.Qh2-h6
³ (-0.34) Depth: 1/1 00:00:00
1.Qh2xa2
+- (2.38) Depth: 1/1 00:00:00
1.Qh2xa2 Rf8-b8
+- (2.05) Depth: 2/3 00:00:00
1.Qh2xa2 Rf8-b8 2.Qa2-a7
+- (2.42) Depth: 3/5 00:00:00
1.Qh2xa2 Rf8-b8 2.Qa2-a7 Rb8-e8
+- (2.30) Depth: 4/6 00:00:00
1.Qh2xa2 Rf8-b8 2.Qa2-a7 Rb8-e8 3.Qa7-d7
+- (2.46) Depth: 5/7 00:00:00
1.Qh2xa2 Rf8-b8 2.Qa2-a7 Rb8-e8 3.Kg1-g2 Re8-e4
+- (2.38) Depth: 6/7 00:00:00 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 6/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 7/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 8/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 9/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 10/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 11/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 12/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 13/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 14/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 15/7 00:00:01 2kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 16/7 00:00:01 2kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 17/7 00:00:01 2kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 18/7 00:00:01 2kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 19/7 00:00:01 3kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 20/7 00:00:01 4kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 21/7 00:00:01 5kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 22/7 00:00:01 6kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 23/7 00:00:01 8kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 24/7 00:00:01 10kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 25/7 00:00:01 14kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 26/7 00:00:01 18kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 27/7 00:00:02 26kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 28/7 00:00:02 35kN

(so k, 29.08.2008)

New game - Fruit 2.1
5rk1/5p1p/5Bp1/8/8/1n6/n6Q/6K1 w - - 0 1

Analysis by Fruit 2.1:

1.Qh2xa2
+- (2.38) Depth: 1/1 00:00:00
1.Qh2xa2 Rf8-b8
+- (2.05) Depth: 2/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 2/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 3/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 4/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 5/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 6/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 7/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 8/5 00:00:00 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 9/5 00:00:00 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 10/5 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 11/5 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 12/5 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 13/5 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 14/5 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 15/5 00:00:01 1kN

(so k, 29.08.2008)
Uri Blass
Posts: 10460
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Changing time value resolution

Post by Uri Blass »

Some data about fruit2.1's analysis of the same position Cray Blitz - Belle, 1981 ACM

With checks in the qsearch depth 5 without checks depth 8(needed depth 8 to see that b4 is losing when it was no problem for the version without qsearch)

It may be interesting to compare Crafty with checks in the qsearch and without checks in the qsearch in the following position that could happen in Cray blitz-Belle(Cray Blitz chose a different losing move)

[D]5r1k/6p1/1n2Q2p/4p3/1P6/7P/P5PK/R1B1q3 b - - 0 28

default fruit needs depth 3 to find Rf1
fruit without check extensions in the qsearch needs depth 7 to find Rf1

Uri
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Changing time value resolution - data

Post by bob »

Had to abort the test a little early. The first set of data is normal crafty with no checks in q-search. Second set is identical version except for checks added.

Code: Select all

crafty-22.2R5
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2.1       172    7    7  7534   77%   -44   17%
   2 Fruit 2.1           56    7    6  7534   64%   -44   23%
   3 opponent-21.7       12    6    6  7518   58%   -44   34%
   4 Glaurung 1.1 SMP    -1    7    7  7534   56%   -44   20%
   5 Crafty-22.2        -44    4    3 37646   43%     9   23%
   6 Arasan 10.0       -194    7    7  7526   30%   -44   19%

crafty-22.2R6
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2.1       173    7    7  7782   78%   -47   17%
   2 Fruit 2.1           66    6    7  7782   65%   -47   22%
   3 opponent-21.7       13    6    6  7782   59%   -47   33%
   4 Glaurung 1.1 SMP    10    6    7  7782   58%   -47   20%
   5 Crafty-22.2        -47    4    4 38910   42%     9   22%
   6 Arasan 10.0       -214    7    7  7782   28%   -47   18%
Almost the same number of games for each, and a bunch of games at that, with final Elos very close (within error bar limits). This has the new glaurung 2.1 added, old 2.0e5 or whatever removed. Still the same hacked up evaluation (even more hacked up now) but the only difference between the two runs is the q-search checks.

Will run null R=3/3 and R=4/3 when we get the A/C up.
Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 10:01 pm
Location: Irvine, CA, USA

Re: Changing time value resolution - data

Post by Dirt »

bob wrote:Had to abort the test a little early. The first set of data is normal crafty with no checks in q-search. Second set is identical version except for checks added.

Code: Select all

crafty-22.2R5
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2.1       172    7    7  7534   77%   -44   17%
   2 Fruit 2.1           56    7    6  7534   64%   -44   23%
   3 opponent-21.7       12    6    6  7518   58%   -44   34%
   4 Glaurung 1.1 SMP    -1    7    7  7534   56%   -44   20%
   5 Crafty-22.2        -44    4    3 37646   43%     9   23%
   6 Arasan 10.0       -194    7    7  7526   30%   -44   19%

crafty-22.2R6
Rank Name                   Elo    +    - games score oppo. draws
   1 Glaurung 2.1       173    7    7  7782   78%   -47   17%
   2 Fruit 2.1           66    6    7  7782   65%   -47   22%
   3 opponent-21.7       13    6    6  7782   59%   -47   33%
   4 Glaurung 1.1 SMP    10    6    7  7782   58%   -47   20%
   5 Crafty-22.2        -47    4    4 38910   42%     9   22%
   6 Arasan 10.0       -214    7    7  7782   28%   -47   18%
Almost the same number of games for each, and a bunch of games at that, with final Elos very close (within error bar limits). This has the new glaurung 2.1 added, old 2.0e5 or whatever removed. Still the same hacked up evaluation (even more hacked up now) but the only difference between the two runs is the q-search checks.

Will run null R=3/3 and R=4/3 when we get the A/C up.
Again I will suggest getting the LOS numbers from BayesElo. Your intuitive understanding of the numbers may be very good, but the LOS numbers could refine that. At least try it for a few runs to see what you think.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Changing time value resolution

Post by bob »

I tried it and get the same results. old version took 6 plies to get around the null-move reduction that hides Qg7#. New version takes 2 plies. I had already tested a similar position as this is the classic null-move-killer position where the forced mate gets hidden by the null-move eating up all the non-capture plies.

I am still trying a couple of tweaks. The first is that I do a normal capture search, although if any capture is a check, it requires a full-width escape search. After the capture search is completed, it then drops into the non-capture checking move search. Most of the time a capture ends things quickly and I avoid generating the non-capturing checks. This is a bit of a speedup. I will test again to see if I am gaining any ground...
Harald
Posts: 318
Joined: Thu Mar 09, 2006 1:07 am

Re: Changing time value resolution

Post by Harald »

bob wrote:I tried it and get the same results. old version took 6 plies to get around the null-move reduction that hides Qg7#. New version takes 2 plies. I had already tested a similar position as this is the classic null-move-killer position where the forced mate gets hidden by the null-move eating up all the non-capture plies.

I am still trying a couple of tweaks. The first is that I do a normal capture search, although if any capture is a check, it requires a full-width escape search. After the capture search is completed, it then drops into the non-capture checking move search. Most of the time a capture ends things quickly and I avoid generating the non-capturing checks. This is a bit of a speedup. I will test again to see if I am gaining any ground...
What if you do checks in qsearch only if there were null moves in the path?
The checks could compensate for the ply loss or zugzwang and not to check
can gain performance.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Changing time value resolution

Post by bob »

Harald wrote:
bob wrote:I tried it and get the same results. old version took 6 plies to get around the null-move reduction that hides Qg7#. New version takes 2 plies. I had already tested a similar position as this is the classic null-move-killer position where the forced mate gets hidden by the null-move eating up all the non-capture plies.

I am still trying a couple of tweaks. The first is that I do a normal capture search, although if any capture is a check, it requires a full-width escape search. After the capture search is completed, it then drops into the non-capture checking move search. Most of the time a capture ends things quickly and I avoid generating the non-capturing checks. This is a bit of a speedup. I will test again to see if I am gaining any ground...
What if you do checks in qsearch only if there were null moves in the path?
The checks could compensate for the ply loss or zugzwang and not to check
can gain performance.
That's the primary gain I was looking for. The dangerous part of null-move is that when you play one, you lop a couple of extra plies off the search and quite often drop the search directly into a quiescence search. If you look at the position Uri posted, Qg7# is the threat that can not be met, except to simply not let white play Qg7. And the way to do that is to exhaust all normal plies so that only the q-search is left, which would normally not include any checks. This new code addresses that. This could be addressed by only doing a check if the last move was a null. And I will probably experiment with limiting things somewhat.