testing of a different sort
Moderators: hgm, Rebel, chrisw
-
- Posts: 198
- Joined: Thu Mar 09, 2006 2:44 am
- Location: Helsinki, Finland
Re: Changing time value resolution
Please release the quiescent check version at least as a test version, because last time it was in Crafty must be release 9.x or something similar.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Changing time value resolution
I can certainly do that. I have currently run the non-qsearch-checks version against my 4,000 starting positions and have the 40K game BayesElo output. The current null=3~2 with qsearch checks is running the same test. And once that is finished I have a null=3 always test scheduled. And I will probably even try a null=4~3 just for fun. Each test is taking about 24 hours as the cluster has 1/2 the nodes powered down due to an A/C compressor failure. Was supposed to be fixed this week but no news yet. I should have the results sometime this weekend. But at present, there is essentially 0 difference between checks and no checks...jarkkop wrote:Please release the quiescent check version at least as a test version, because last time it was in Crafty must be release 9.x or something similar.
I had noped for something significant. I suspect that many that say "checks in the qsearch are a good thing" did the wrong kind of testing. yes, they pick things up 1-2 plies quicker in positions where that last check is important (mates and such). But when you consider time, the two versions end up being fairly close.. I'm beginning to suspect nobody followed this up with significant game testing..
Here's an example. Position is from Cray Blitz - Belle, 1981 ACM computer chess championship;
Code: Select all
9 0.35 -0.74 1. Qxb6 Rf1 2. Qd8+ Kh7 3. Qd3+ e4
4. Qxf1 Qxf1 5. b4 e3 6. a4 e2
9 0.52 -0.42 1. Bxh6 Qxa1 2. Qxb6 Qxa2 3. Bg5 Qb1
4. b4 Qf5 5. Be3
9 0.57 -0.39 1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2
4. Kg3 g6
9-> 0.58 -0.39 1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2
4. Kg3 g6
time=0.58 mat=1 n=1499626 fh=93% nps=2.6M
ext-> check=143K 1rep=12K mate=2K pp=0 reduce=845K/80K
9 0.51 -0.74 1. Qxb6 Rf1 2. Qd8+ Kh7 3. Qd3+ e4
4. Qxf1 Qxf1 5. Kg3 e3 <HT>
9 0.79 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
4. Qg6+ Kh8 5. Qh6+
9-> 0.83 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
4. Qg6+ Kh8 5. Qh6+
time=0.83 mat=1 n=1866905 fh=93% nps=2.2M
ext-> check=148K qcheck=134K 1rep=13K mate=5K reduce=833K/76K
Code: Select all
9-> 0.58 -0.39 1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2
4. Kg3 g6
10 0.60 -0.34 1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2
4. Kg3 Qa1 5. Bd4 Qe1+ 6. Kg4
10 0.74 +1 1. Bxh6!!
10 0.74 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
4. Qg6+ Kh8 5. Qh6+
10-> 0.88 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
4. Qg6+ Kh8 5. Qh6+
11 0.97 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
4. Qg6+ Kh8 5. Qh6+
9-> 0.82 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
4. Qg6+ Kh8 5. Qh6+
10 0.90 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8
4. Qg6+ Kh8 5. Qh6+
time=1.00 mat=1 n=2327325 fh=93% nps=2.3M
ext-> check=194K qcheck=153K 1rep=16K mate=7K reduce=1.1M/100K
Is one better than the other in that light? Does not look like it, and the cluster game testing is verifying this. It sounds good to save a ply or two. But in reality, it is just a number.
-
- Posts: 313
- Joined: Wed Mar 08, 2006 8:18 pm
Cluster lover....
Bob, how can you expect anybody to have done any significant testing in the past??bob wrote:I'm beginning to suspect nobody followed this up with significant game testing..
As your own testing has been showing, a few dozen or even hundreds of test games aren't good enough to get reliable results about whether it's an improvement or not.
Weren't too many people back then had clusters to test with. Come to think of it, there still aren't many people who have clusters to test with...
Personally, I'm looking forward to when you finish these test and show some data bout how many positions are truely needed for various confidence levels of testing. Just for those of us who are trying to be energy efficient and turn off our cluster in the summer months....
(While you are at it, it might be interesting to compare the results of those quick game tests you've been doing with slower time controls, just to make sure the results are comparable. Be a shame to tune for ultra fast games only to discover that real games are different enough the changes actually hurt.)
-
- Posts: 10460
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Changing time value resolution
I remember that I used test suites and found that checks in the qsearch sometimes helped to get more than 2 plies advantage in some casesbob wrote:I can certainly do that. I have currently run the non-qsearch-checks version against my 4,000 starting positions and have the 40K game BayesElo output. The current null=3~2 with qsearch checks is running the same test. And once that is finished I have a null=3 always test scheduled. And I will probably even try a null=4~3 just for fun. Each test is taking about 24 hours as the cluster has 1/2 the nodes powered down due to an A/C compressor failure. Was supposed to be fixed this week but no news yet. I should have the results sometime this weekend. But at present, there is essentially 0 difference between checks and no checks...jarkkop wrote:Please release the quiescent check version at least as a test version, because last time it was in Crafty must be release 9.x or something similar.
I had noped for something significant. I suspect that many that say "checks in the qsearch are a good thing" did the wrong kind of testing. yes, they pick things up 1-2 plies quicker in positions where that last check is important (mates and such). But when you consider time, the two versions end up being fairly close.. I'm beginning to suspect nobody followed this up with significant game testing..
Here's an example. Position is from Cray Blitz - Belle, 1981 ACM computer chess championship;But if I give both an equal time limit:Code: Select all
9 0.35 -0.74 1. Qxb6 Rf1 2. Qd8+ Kh7 3. Qd3+ e4 4. Qxf1 Qxf1 5. b4 e3 6. a4 e2 9 0.52 -0.42 1. Bxh6 Qxa1 2. Qxb6 Qxa2 3. Bg5 Qb1 4. b4 Qf5 5. Be3 9 0.57 -0.39 1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2 4. Kg3 g6 9-> 0.58 -0.39 1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2 4. Kg3 g6 time=0.58 mat=1 n=1499626 fh=93% nps=2.6M ext-> check=143K 1rep=12K mate=2K pp=0 reduce=845K/80K 9 0.51 -0.74 1. Qxb6 Rf1 2. Qd8+ Kh7 3. Qd3+ e4 4. Qxf1 Qxf1 5. Kg3 e3 <HT> 9 0.79 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8 4. Qg6+ Kh8 5. Qh6+ 9-> 0.83 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8 4. Qg6+ Kh8 5. Qh6+ time=0.83 mat=1 n=1866905 fh=93% nps=2.2M ext-> check=148K qcheck=134K 1rep=13K mate=5K reduce=833K/76K
So old version finds Bxh6 is a forced draw after .74 seconds, new version finds it after .78 seconds. A ply sooner, but same time as the qsearch checks add time to each iteration.Code: Select all
9-> 0.58 -0.39 1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2 4. Kg3 g6 10 0.60 -0.34 1. Bf4 Qxa1 2. Bxe5 Kh7 3. Qxb6 Qxa2 4. Kg3 Qa1 5. Bd4 Qe1+ 6. Kg4 10 0.74 +1 1. Bxh6!! 10 0.74 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8 4. Qg6+ Kh8 5. Qh6+ 10-> 0.88 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8 4. Qg6+ Kh8 5. Qh6+ 11 0.97 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8 4. Qg6+ Kh8 5. Qh6+ 9-> 0.82 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8 4. Qg6+ Kh8 5. Qh6+ 10 0.90 0.01 1. Bxh6 Qxa1 2. Qg6 gxh6 3. Qxh6+ Kg8 4. Qg6+ Kh8 5. Qh6+ time=1.00 mat=1 n=2327325 fh=93% nps=2.3M ext-> check=194K qcheck=153K 1rep=16K mate=7K reduce=1.1M/100K
Is one better than the other in that light? Does not look like it, and the cluster game testing is verifying this. It sounds good to save a ply or two. But in reality, it is just a number.
and generally helped in test suites based on time per solution(I believe that they also helped in games but I did not play enough games to be sure)
I used not only easy test suite like WAC but also other test suites like the ECM-GCP test suite.
I suggest to try the following simple position that I composed to show cases when there is a big difference.
check extensions saved fruit 4 plies in seeing the mate.
I give analysis without check extensions in the qsearch and with check extensions in the qsearch.
New game - Fruit 2.1
[D]5rk1/5p1p/5Bp1/8/8/1n6/n6Q/6K1 w - - 0 1
Analysis by Fruit 2.1(no check extensions):
1.Qh2-h6
³ (-0.34) Depth: 1/1 00:00:00
1.Qh2xa2
+- (2.38) Depth: 1/1 00:00:00
1.Qh2xa2 Rf8-b8
+- (2.05) Depth: 2/3 00:00:00
1.Qh2xa2 Rf8-b8 2.Qa2-a7
+- (2.42) Depth: 3/5 00:00:00
1.Qh2xa2 Rf8-b8 2.Qa2-a7 Rb8-e8
+- (2.30) Depth: 4/6 00:00:00
1.Qh2xa2 Rf8-b8 2.Qa2-a7 Rb8-e8 3.Qa7-d7
+- (2.46) Depth: 5/7 00:00:00
1.Qh2xa2 Rf8-b8 2.Qa2-a7 Rb8-e8 3.Kg1-g2 Re8-e4
+- (2.38) Depth: 6/7 00:00:00 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 6/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 7/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 8/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 9/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 10/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 11/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 12/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 13/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 14/7 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 15/7 00:00:01 2kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 16/7 00:00:01 2kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 17/7 00:00:01 2kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 18/7 00:00:01 2kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 19/7 00:00:01 3kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 20/7 00:00:01 4kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 21/7 00:00:01 5kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 22/7 00:00:01 6kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 23/7 00:00:01 8kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 24/7 00:00:01 10kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 25/7 00:00:01 14kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 26/7 00:00:01 18kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 27/7 00:00:02 26kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 28/7 00:00:02 35kN
(so k, 29.08.2008)
New game - Fruit 2.1
5rk1/5p1p/5Bp1/8/8/1n6/n6Q/6K1 w - - 0 1
Analysis by Fruit 2.1:
1.Qh2xa2
+- (2.38) Depth: 1/1 00:00:00
1.Qh2xa2 Rf8-b8
+- (2.05) Depth: 2/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 2/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 3/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 4/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 5/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 6/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 7/5 00:00:00
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 8/5 00:00:00 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 9/5 00:00:00 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 10/5 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 11/5 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 12/5 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 13/5 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 14/5 00:00:01 1kN
1.Qh2-h6 Rf8-e8 2.Qh6-g7#
+- (#2) Depth: 15/5 00:00:01 1kN
(so k, 29.08.2008)
-
- Posts: 10460
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Changing time value resolution
Some data about fruit2.1's analysis of the same position Cray Blitz - Belle, 1981 ACM
With checks in the qsearch depth 5 without checks depth 8(needed depth 8 to see that b4 is losing when it was no problem for the version without qsearch)
It may be interesting to compare Crafty with checks in the qsearch and without checks in the qsearch in the following position that could happen in Cray blitz-Belle(Cray Blitz chose a different losing move)
[D]5r1k/6p1/1n2Q2p/4p3/1P6/7P/P5PK/R1B1q3 b - - 0 28
default fruit needs depth 3 to find Rf1
fruit without check extensions in the qsearch needs depth 7 to find Rf1
Uri
With checks in the qsearch depth 5 without checks depth 8(needed depth 8 to see that b4 is losing when it was no problem for the version without qsearch)
It may be interesting to compare Crafty with checks in the qsearch and without checks in the qsearch in the following position that could happen in Cray blitz-Belle(Cray Blitz chose a different losing move)
[D]5r1k/6p1/1n2Q2p/4p3/1P6/7P/P5PK/R1B1q3 b - - 0 28
default fruit needs depth 3 to find Rf1
fruit without check extensions in the qsearch needs depth 7 to find Rf1
Uri
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Changing time value resolution - data
Had to abort the test a little early. The first set of data is normal crafty with no checks in q-search. Second set is identical version except for checks added.
Almost the same number of games for each, and a bunch of games at that, with final Elos very close (within error bar limits). This has the new glaurung 2.1 added, old 2.0e5 or whatever removed. Still the same hacked up evaluation (even more hacked up now) but the only difference between the two runs is the q-search checks.
Will run null R=3/3 and R=4/3 when we get the A/C up.
Code: Select all
crafty-22.2R5
Rank Name Elo + - games score oppo. draws
1 Glaurung 2.1 172 7 7 7534 77% -44 17%
2 Fruit 2.1 56 7 6 7534 64% -44 23%
3 opponent-21.7 12 6 6 7518 58% -44 34%
4 Glaurung 1.1 SMP -1 7 7 7534 56% -44 20%
5 Crafty-22.2 -44 4 3 37646 43% 9 23%
6 Arasan 10.0 -194 7 7 7526 30% -44 19%
crafty-22.2R6
Rank Name Elo + - games score oppo. draws
1 Glaurung 2.1 173 7 7 7782 78% -47 17%
2 Fruit 2.1 66 6 7 7782 65% -47 22%
3 opponent-21.7 13 6 6 7782 59% -47 33%
4 Glaurung 1.1 SMP 10 6 7 7782 58% -47 20%
5 Crafty-22.2 -47 4 4 38910 42% 9 22%
6 Arasan 10.0 -214 7 7 7782 28% -47 18%
Will run null R=3/3 and R=4/3 when we get the A/C up.
-
- Posts: 2851
- Joined: Wed Mar 08, 2006 10:01 pm
- Location: Irvine, CA, USA
Re: Changing time value resolution - data
Again I will suggest getting the LOS numbers from BayesElo. Your intuitive understanding of the numbers may be very good, but the LOS numbers could refine that. At least try it for a few runs to see what you think.bob wrote:Had to abort the test a little early. The first set of data is normal crafty with no checks in q-search. Second set is identical version except for checks added.
Almost the same number of games for each, and a bunch of games at that, with final Elos very close (within error bar limits). This has the new glaurung 2.1 added, old 2.0e5 or whatever removed. Still the same hacked up evaluation (even more hacked up now) but the only difference between the two runs is the q-search checks.Code: Select all
crafty-22.2R5 Rank Name Elo + - games score oppo. draws 1 Glaurung 2.1 172 7 7 7534 77% -44 17% 2 Fruit 2.1 56 7 6 7534 64% -44 23% 3 opponent-21.7 12 6 6 7518 58% -44 34% 4 Glaurung 1.1 SMP -1 7 7 7534 56% -44 20% 5 Crafty-22.2 -44 4 3 37646 43% 9 23% 6 Arasan 10.0 -194 7 7 7526 30% -44 19% crafty-22.2R6 Rank Name Elo + - games score oppo. draws 1 Glaurung 2.1 173 7 7 7782 78% -47 17% 2 Fruit 2.1 66 6 7 7782 65% -47 22% 3 opponent-21.7 13 6 6 7782 59% -47 33% 4 Glaurung 1.1 SMP 10 6 7 7782 58% -47 20% 5 Crafty-22.2 -47 4 4 38910 42% 9 22% 6 Arasan 10.0 -214 7 7 7782 28% -47 18%
Will run null R=3/3 and R=4/3 when we get the A/C up.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Changing time value resolution
I tried it and get the same results. old version took 6 plies to get around the null-move reduction that hides Qg7#. New version takes 2 plies. I had already tested a similar position as this is the classic null-move-killer position where the forced mate gets hidden by the null-move eating up all the non-capture plies.
I am still trying a couple of tweaks. The first is that I do a normal capture search, although if any capture is a check, it requires a full-width escape search. After the capture search is completed, it then drops into the non-capture checking move search. Most of the time a capture ends things quickly and I avoid generating the non-capturing checks. This is a bit of a speedup. I will test again to see if I am gaining any ground...
I am still trying a couple of tweaks. The first is that I do a normal capture search, although if any capture is a check, it requires a full-width escape search. After the capture search is completed, it then drops into the non-capture checking move search. Most of the time a capture ends things quickly and I avoid generating the non-capturing checks. This is a bit of a speedup. I will test again to see if I am gaining any ground...
-
- Posts: 318
- Joined: Thu Mar 09, 2006 1:07 am
Re: Changing time value resolution
What if you do checks in qsearch only if there were null moves in the path?bob wrote:I tried it and get the same results. old version took 6 plies to get around the null-move reduction that hides Qg7#. New version takes 2 plies. I had already tested a similar position as this is the classic null-move-killer position where the forced mate gets hidden by the null-move eating up all the non-capture plies.
I am still trying a couple of tweaks. The first is that I do a normal capture search, although if any capture is a check, it requires a full-width escape search. After the capture search is completed, it then drops into the non-capture checking move search. Most of the time a capture ends things quickly and I avoid generating the non-capturing checks. This is a bit of a speedup. I will test again to see if I am gaining any ground...
The checks could compensate for the ply loss or zugzwang and not to check
can gain performance.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Changing time value resolution
That's the primary gain I was looking for. The dangerous part of null-move is that when you play one, you lop a couple of extra plies off the search and quite often drop the search directly into a quiescence search. If you look at the position Uri posted, Qg7# is the threat that can not be met, except to simply not let white play Qg7. And the way to do that is to exhaust all normal plies so that only the q-search is left, which would normally not include any checks. This new code addresses that. This could be addressed by only doing a check if the last move was a null. And I will probably experiment with limiting things somewhat.Harald wrote:What if you do checks in qsearch only if there were null moves in the path?bob wrote:I tried it and get the same results. old version took 6 plies to get around the null-move reduction that hides Qg7#. New version takes 2 plies. I had already tested a similar position as this is the classic null-move-killer position where the forced mate gets hidden by the null-move eating up all the non-capture plies.
I am still trying a couple of tweaks. The first is that I do a normal capture search, although if any capture is a check, it requires a full-width escape search. After the capture search is completed, it then drops into the non-capture checking move search. Most of the time a capture ends things quickly and I avoid generating the non-capturing checks. This is a bit of a speedup. I will test again to see if I am gaining any ground...
The checks could compensate for the ply loss or zugzwang and not to check
can gain performance.