testing of a different sort

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

testing of a different sort

Post by bob »

I have recently added checks to the q-search, since several have reported better results. And I have run some tests (not on the cluster) using three versions:

1. old non-check version
2. new qsearch check version, everything else the same
3. new qsearch check version with null-move R=3 rather than the adaptive 2-3 I have been using for many years.

I am looking for a reasonable number of test positions to see how this behaves tactically. I have tried WAC but have no luck. Crafty gets all but a couple in 1 second per move. At .5 seconds per move it is hardly worse. And at that time resolution, the "jitter" becomes an issue. I'd like to have a set that the normal crafty might get 50 out of 100 at something reasonable like 10-15 seconds per move, so that I can determine if either of the new versions solves more in the same time limit, or solves the same 50 but in less time.

Anybody have any favorites that do not have all these "easy for today's programs" positions that need a fraction of a second to find? And no, no Nolot positions as I'd like to experiment and get answers back more quickly than a day per position. :)
Dann Corbit
Posts: 12662
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: testing of a different sort

Post by Dann Corbit »

bob wrote:I have recently added checks to the q-search, since several have reported better results. And I have run some tests (not on the cluster) using three versions:

1. old non-check version
2. new qsearch check version, everything else the same
3. new qsearch check version with null-move R=3 rather than the adaptive 2-3 I have been using for many years.

I am looking for a reasonable number of test positions to see how this behaves tactically. I have tried WAC but have no luck. Crafty gets all but a couple in 1 second per move. At .5 seconds per move it is hardly worse. And at that time resolution, the "jitter" becomes an issue. I'd like to have a set that the normal crafty might get 50 out of 100 at something reasonable like 10-15 seconds per move, so that I can determine if either of the new versions solves more in the same time limit, or solves the same 50 but in less time.

Anybody have any favorites that do not have all these "easy for today's programs" positions that need a fraction of a second to find? And no, no Nolot positions as I'd like to experiment and get answers back more quickly than a day per position. :)
It seems likely that mate positions will benefit from checks in qsearch.
Here are some mate position sets:
http://cap.connx.com/EPD/Les_Fernandez_ ... th.epd.bz2
http://cap.connx.com/EPD/M20.EPD.bz2
http://cap.connx.com/EPD/MATEIN2.EPD.bz2
http://cap.connx.com/EPD/MATESRCH.EPD.bz2
http://cap.connx.com/EPD/dm001.epd.bz2
http://cap.connx.com/EPD/dm002.epd.bz2
http://cap.connx.com/EPD/dm003.epd.bz2
http://cap.connx.com/EPD/dm004.epd.bz2
http://cap.connx.com/EPD/dm005.epd.bz2
http://cap.connx.com/EPD/dm006.epd.bz2
http://cap.connx.com/EPD/dm007.epd.bz2
http://cap.connx.com/EPD/dm008.epd.bz2
http://cap.connx.com/EPD/dm009.epd.bz2
http://cap.connx.com/EPD/dm010.epd.bz2
http://cap.connx.com/EPD/dm011.epd.bz2
http://cap.connx.com/EPD/dm012.epd.bz2
http://cap.connx.com/EPD/dm013.epd.bz2
http://cap.connx.com/EPD/dm014.epd.bz2
http://cap.connx.com/EPD/dm015.epd.bz2
http://cap.connx.com/EPD/dm016.epd.bz2
http://cap.connx.com/EPD/dm017.epd.bz2
http://cap.connx.com/EPD/dm018.epd.bz2
http://cap.connx.com/EPD/dm019.epd.bz2
http://cap.connx.com/EPD/dm020.epd.bz2
http://cap.connx.com/EPD/dm021.epd.bz2
http://cap.connx.com/EPD/dm022.epd.bz2
http://cap.connx.com/EPD/dm023.epd.bz2
http://cap.connx.com/EPD/dm024.epd.bz2
http://cap.connx.com/EPD/dm025.epd.bz2
http://cap.connx.com/EPD/dm026.epd.bz2
http://cap.connx.com/EPD/dm027.epd.bz2
http://cap.connx.com/EPD/dm028.epd.bz2
http://cap.connx.com/EPD/dm029.epd.bz2
http://cap.connx.com/EPD/dm030.epd.bz2
http://cap.connx.com/EPD/dm031.epd.bz2
http://cap.connx.com/EPD/dm032.epd.bz2
http://cap.connx.com/EPD/dm033.epd.bz2
http://cap.connx.com/EPD/dm034.epd.bz2
http://cap.connx.com/EPD/dm035.epd.bz2
http://cap.connx.com/EPD/dm036.epd.bz2
http://cap.connx.com/EPD/dm037.epd.bz2
http://cap.connx.com/EPD/dm038.epd.bz2
http://cap.connx.com/EPD/dm039.epd.bz2
http://cap.connx.com/EPD/dm040.epd.bz2
http://cap.connx.com/EPD/dm041.epd.bz2
http://cap.connx.com/EPD/dm042.epd.bz2
http://cap.connx.com/EPD/dm043.epd.bz2
http://cap.connx.com/EPD/dm044.epd.bz2
http://cap.connx.com/EPD/dm045.epd.bz2
http://cap.connx.com/EPD/dm046.epd.bz2
http://cap.connx.com/EPD/dm047.epd.bz2
http://cap.connx.com/EPD/dm048.epd.bz2
http://cap.connx.com/EPD/dm050.epd.bz2
http://cap.connx.com/EPD/dm051.epd.bz2
http://cap.connx.com/EPD/dm052.epd.bz2
http://cap.connx.com/EPD/dm053.epd.bz2
http://cap.connx.com/EPD/dm054.epd.bz2
http://cap.connx.com/EPD/dm055.epd.bz2
http://cap.connx.com/EPD/dm056.epd.bz2
http://cap.connx.com/EPD/dm057.epd.bz2
http://cap.connx.com/EPD/dm058.epd.bz2
http://cap.connx.com/EPD/dm060.epd.bz2
http://cap.connx.com/EPD/dm061.epd.bz2
http://cap.connx.com/EPD/dm062.epd.bz2
http://cap.connx.com/EPD/dm063.epd.bz2
http://cap.connx.com/EPD/dm064.epd.bz2
http://cap.connx.com/EPD/dm065.epd.bz2
http://cap.connx.com/EPD/dm066.epd.bz2
http://cap.connx.com/EPD/dm067.epd.bz2
http://cap.connx.com/EPD/dm069.epd.bz2
http://cap.connx.com/EPD/dm070.epd.bz2
http://cap.connx.com/EPD/dm071.epd.bz2
http://cap.connx.com/EPD/dm072.epd.bz2
http://cap.connx.com/EPD/dm074.epd.bz2
http://cap.connx.com/EPD/dm075.epd.bz2
http://cap.connx.com/EPD/dm077.epd.bz2
http://cap.connx.com/EPD/dm082.epd.bz2
http://cap.connx.com/EPD/dm087.epd.bz2
http://cap.connx.com/EPD/dm089.epd.bz2
http://cap.connx.com/EPD/dm092.epd.bz2
http://cap.connx.com/EPD/dm093.epd.bz2
http://cap.connx.com/EPD/dm096.epd.bz2
http://cap.connx.com/EPD/dm1.epd.bz2
http://cap.connx.com/EPD/dm100.epd.bz2
http://cap.connx.com/EPD/dm101.epd.bz2
http://cap.connx.com/EPD/dm102.epd.bz2
http://cap.connx.com/EPD/dm103.epd.bz2
http://cap.connx.com/EPD/dm104.epd.bz2
http://cap.connx.com/EPD/dm105.epd.bz2
http://cap.connx.com/EPD/dm110.epd.bz2
http://cap.connx.com/EPD/dm119.epd.bz2
http://cap.connx.com/EPD/dm120.epd.bz2
http://cap.connx.com/EPD/dm121.epd.bz2
http://cap.connx.com/EPD/dm125.epd.bz2
http://cap.connx.com/EPD/dm126.epd.bz2
http://cap.connx.com/EPD/dm130.epd.bz2
http://cap.connx.com/EPD/dm135.epd.bz2
http://cap.connx.com/EPD/dm14.epd.bz2
http://cap.connx.com/EPD/dm2.epd.bz2
http://cap.connx.com/EPD/dm255.epd.bz2
http://cap.connx.com/EPD/dm3.epd.bz2
http://cap.connx.com/EPD/dm4.epd.bz2
http://cap.connx.com/EPD/dm5.epd.bz2
http://cap.connx.com/EPD/dm9.epd.bz2
http://cap.connx.com/EPD/dmt.epd.bz2
http://cap.connx.com/EPD/dm001.epd.bz2
http://cap.connx.com/EPD/m1.epd.bz2
http://cap.connx.com/EPD/m10.epd.bz2
http://cap.connx.com/EPD/m10a.epd.bz2
http://cap.connx.com/EPD/m11.epd.bz2
http://cap.connx.com/EPD/m12.epd.bz2
http://cap.connx.com/EPD/m15.epd.bz2
http://cap.connx.com/EPD/m16.epd.bz2
http://cap.connx.com/EPD/m2.epd.bz2
http://cap.connx.com/EPD/m2t.epd.bz2
http://cap.connx.com/EPD/m3.epd.bz2
http://cap.connx.com/EPD/m30.epd.bz2
http://cap.connx.com/EPD/m3a.epd.bz2
http://cap.connx.com/EPD/m3t.epd.bz2
http://cap.connx.com/EPD/m7.epd.bz2
http://cap.connx.com/EPD/m8.epd.bz2
http://cap.connx.com/EPD/ma.epd.bz2
http://cap.connx.com/EPD/many.epd.bz2
http://cap.connx.com/EPD/mate.epd.bz2
http://cap.connx.com/EPD/mates.epd.bz2
http://cap.connx.com/EPD/matesm.epd.bz2
http://cap.connx.com/EPD/matetest2.epd.bz2
http://cap.connx.com/EPD/tmat.epd.bz2
http://cap.connx.com/EPD/tmate2.epd.bz2
http://cap.connx.com/EPD/tmates.epd.bz2
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Changing time value resolution

Post by sje »

Perhaps you might consider changing the time value resolution in Crafty to allow for very short searches with consistent duration. Symbolic uses microsecond resolution as that's common enough on decent platforms. The CIL Toolkit also uses microsecond resolution when that's supported by the underlying Lisp processor.

With microsecond resolution, elapsed time values from the beginning of the Thompson Epoch (1970.01.01) fit into 64 bit integers with room to spare.

Probably, millisecond resolution should be sufficient.
User avatar
Zach Wegner
Posts: 1922
Joined: Thu Mar 09, 2006 12:51 am
Location: Earth

Re: Changing time value resolution

Post by Zach Wegner »

Oh no, not this again...

;)
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: testing of a different sort

Post by bob »

Dann Corbit wrote:
bob wrote:I have recently added checks to the q-search, since several have reported better results. And I have run some tests (not on the cluster) using three versions:

1. old non-check version
2. new qsearch check version, everything else the same
3. new qsearch check version with null-move R=3 rather than the adaptive 2-3 I have been using for many years.

I am looking for a reasonable number of test positions to see how this behaves tactically. I have tried WAC but have no luck. Crafty gets all but a couple in 1 second per move. At .5 seconds per move it is hardly worse. And at that time resolution, the "jitter" becomes an issue. I'd like to have a set that the normal crafty might get 50 out of 100 at something reasonable like 10-15 seconds per move, so that I can determine if either of the new versions solves more in the same time limit, or solves the same 50 but in less time.

Anybody have any favorites that do not have all these "easy for today's programs" positions that need a fraction of a second to find? And no, no Nolot positions as I'd like to experiment and get answers back more quickly than a day per position. :)
It seems likely that mate positions will benefit from checks in qsearch.
Here are some mate position sets:
http://cap.connx.com/EPD/Les_Fernandez_ ... th.epd.bz2
http://cap.connx.com/EPD/M20.EPD.bz2
http://cap.connx.com/EPD/MATEIN2.EPD.bz2
http://cap.connx.com/EPD/MATESRCH.EPD.bz2
http://cap.connx.com/EPD/dm001.epd.bz2
http://cap.connx.com/EPD/dm002.epd.bz2
http://cap.connx.com/EPD/dm003.epd.bz2
http://cap.connx.com/EPD/dm004.epd.bz2
http://cap.connx.com/EPD/dm005.epd.bz2
http://cap.connx.com/EPD/dm006.epd.bz2
http://cap.connx.com/EPD/dm007.epd.bz2
http://cap.connx.com/EPD/dm008.epd.bz2
http://cap.connx.com/EPD/dm009.epd.bz2
http://cap.connx.com/EPD/dm010.epd.bz2
http://cap.connx.com/EPD/dm011.epd.bz2
http://cap.connx.com/EPD/dm012.epd.bz2
http://cap.connx.com/EPD/dm013.epd.bz2
http://cap.connx.com/EPD/dm014.epd.bz2
http://cap.connx.com/EPD/dm015.epd.bz2
http://cap.connx.com/EPD/dm016.epd.bz2
http://cap.connx.com/EPD/dm017.epd.bz2
http://cap.connx.com/EPD/dm018.epd.bz2
http://cap.connx.com/EPD/dm019.epd.bz2
http://cap.connx.com/EPD/dm020.epd.bz2
http://cap.connx.com/EPD/dm021.epd.bz2
http://cap.connx.com/EPD/dm022.epd.bz2
http://cap.connx.com/EPD/dm023.epd.bz2
http://cap.connx.com/EPD/dm024.epd.bz2
http://cap.connx.com/EPD/dm025.epd.bz2
http://cap.connx.com/EPD/dm026.epd.bz2
http://cap.connx.com/EPD/dm027.epd.bz2
http://cap.connx.com/EPD/dm028.epd.bz2
http://cap.connx.com/EPD/dm029.epd.bz2
http://cap.connx.com/EPD/dm030.epd.bz2
http://cap.connx.com/EPD/dm031.epd.bz2
http://cap.connx.com/EPD/dm032.epd.bz2
http://cap.connx.com/EPD/dm033.epd.bz2
http://cap.connx.com/EPD/dm034.epd.bz2
http://cap.connx.com/EPD/dm035.epd.bz2
http://cap.connx.com/EPD/dm036.epd.bz2
http://cap.connx.com/EPD/dm037.epd.bz2
http://cap.connx.com/EPD/dm038.epd.bz2
http://cap.connx.com/EPD/dm039.epd.bz2
http://cap.connx.com/EPD/dm040.epd.bz2
http://cap.connx.com/EPD/dm041.epd.bz2
http://cap.connx.com/EPD/dm042.epd.bz2
http://cap.connx.com/EPD/dm043.epd.bz2
http://cap.connx.com/EPD/dm044.epd.bz2
http://cap.connx.com/EPD/dm045.epd.bz2
http://cap.connx.com/EPD/dm046.epd.bz2
http://cap.connx.com/EPD/dm047.epd.bz2
http://cap.connx.com/EPD/dm048.epd.bz2
http://cap.connx.com/EPD/dm050.epd.bz2
http://cap.connx.com/EPD/dm051.epd.bz2
http://cap.connx.com/EPD/dm052.epd.bz2
http://cap.connx.com/EPD/dm053.epd.bz2
http://cap.connx.com/EPD/dm054.epd.bz2
http://cap.connx.com/EPD/dm055.epd.bz2
http://cap.connx.com/EPD/dm056.epd.bz2
http://cap.connx.com/EPD/dm057.epd.bz2
http://cap.connx.com/EPD/dm058.epd.bz2
http://cap.connx.com/EPD/dm060.epd.bz2
http://cap.connx.com/EPD/dm061.epd.bz2
http://cap.connx.com/EPD/dm062.epd.bz2
http://cap.connx.com/EPD/dm063.epd.bz2
http://cap.connx.com/EPD/dm064.epd.bz2
http://cap.connx.com/EPD/dm065.epd.bz2
http://cap.connx.com/EPD/dm066.epd.bz2
http://cap.connx.com/EPD/dm067.epd.bz2
http://cap.connx.com/EPD/dm069.epd.bz2
http://cap.connx.com/EPD/dm070.epd.bz2
http://cap.connx.com/EPD/dm071.epd.bz2
http://cap.connx.com/EPD/dm072.epd.bz2
http://cap.connx.com/EPD/dm074.epd.bz2
http://cap.connx.com/EPD/dm075.epd.bz2
http://cap.connx.com/EPD/dm077.epd.bz2
http://cap.connx.com/EPD/dm082.epd.bz2
http://cap.connx.com/EPD/dm087.epd.bz2
http://cap.connx.com/EPD/dm089.epd.bz2
http://cap.connx.com/EPD/dm092.epd.bz2
http://cap.connx.com/EPD/dm093.epd.bz2
http://cap.connx.com/EPD/dm096.epd.bz2
http://cap.connx.com/EPD/dm1.epd.bz2
http://cap.connx.com/EPD/dm100.epd.bz2
http://cap.connx.com/EPD/dm101.epd.bz2
http://cap.connx.com/EPD/dm102.epd.bz2
http://cap.connx.com/EPD/dm103.epd.bz2
http://cap.connx.com/EPD/dm104.epd.bz2
http://cap.connx.com/EPD/dm105.epd.bz2
http://cap.connx.com/EPD/dm110.epd.bz2
http://cap.connx.com/EPD/dm119.epd.bz2
http://cap.connx.com/EPD/dm120.epd.bz2
http://cap.connx.com/EPD/dm121.epd.bz2
http://cap.connx.com/EPD/dm125.epd.bz2
http://cap.connx.com/EPD/dm126.epd.bz2
http://cap.connx.com/EPD/dm130.epd.bz2
http://cap.connx.com/EPD/dm135.epd.bz2
http://cap.connx.com/EPD/dm14.epd.bz2
http://cap.connx.com/EPD/dm2.epd.bz2
http://cap.connx.com/EPD/dm255.epd.bz2
http://cap.connx.com/EPD/dm3.epd.bz2
http://cap.connx.com/EPD/dm4.epd.bz2
http://cap.connx.com/EPD/dm5.epd.bz2
http://cap.connx.com/EPD/dm9.epd.bz2
http://cap.connx.com/EPD/dmt.epd.bz2
http://cap.connx.com/EPD/dm001.epd.bz2
http://cap.connx.com/EPD/m1.epd.bz2
http://cap.connx.com/EPD/m10.epd.bz2
http://cap.connx.com/EPD/m10a.epd.bz2
http://cap.connx.com/EPD/m11.epd.bz2
http://cap.connx.com/EPD/m12.epd.bz2
http://cap.connx.com/EPD/m15.epd.bz2
http://cap.connx.com/EPD/m16.epd.bz2
http://cap.connx.com/EPD/m2.epd.bz2
http://cap.connx.com/EPD/m2t.epd.bz2
http://cap.connx.com/EPD/m3.epd.bz2
http://cap.connx.com/EPD/m30.epd.bz2
http://cap.connx.com/EPD/m3a.epd.bz2
http://cap.connx.com/EPD/m3t.epd.bz2
http://cap.connx.com/EPD/m7.epd.bz2
http://cap.connx.com/EPD/m8.epd.bz2
http://cap.connx.com/EPD/ma.epd.bz2
http://cap.connx.com/EPD/many.epd.bz2
http://cap.connx.com/EPD/mate.epd.bz2
http://cap.connx.com/EPD/mates.epd.bz2
http://cap.connx.com/EPD/matesm.epd.bz2
http://cap.connx.com/EPD/matetest2.epd.bz2
http://cap.connx.com/EPD/tmat.epd.bz2
http://cap.connx.com/EPD/tmate2.epd.bz2
http://cap.connx.com/EPD/tmates.epd.bz2
The ones that are particularly interesting are the positions where late null moves break things. For example positions where we end up with a queen at f6 and a pawn at h6, with the unstoppable mate threat of Qg7#, but if I don't move (null-move) I collapse the search into the quiescence phase and evaluate a lost position as equal. The report has been that checks in the q-search reduce those kinds of errors allowing more aggressive null-move settings. But the mates are also interesting, just so they are not too easy... Measuring fractions of a second is problematic.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Changing time value resolution

Post by bob »

sje wrote:Perhaps you might consider changing the time value resolution in Crafty to allow for very short searches with consistent duration. Symbolic uses microsecond resolution as that's common enough on decent platforms. The CIL Toolkit also uses microsecond resolution when that's supported by the underlying Lisp processor.

With microsecond resolution, elapsed time values from the beginning of the Thompson Epoch (1970.01.01) fit into 64 bit integers with room to spare.

Probably, millisecond resolution should be sufficient.
If you use CPU time, you might pull that off. But for elapsed time, which is all I measure, that won't work, since elapsed time is not that accurate. If I am going to compare two things, I need very accurate measurements, and operating system interference can affect very short time measurements. Hence my wanting positions that 15-30 seconds of time is enough so that I can use enough time to wash out the "jitter effect".
Dann Corbit
Posts: 12662
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: testing of a different sort

Post by Dann Corbit »

There are also plenty of other tests there besides:
http://cap.connx.com/EPD/

Pull down any that you like and give them a go. Perhaps some of them will have the characteristics you are after.

If zugzwang positions are what you are after, here are some specifics:
zug.epd.bz2
zugged.epd.bz2
zughard.epd.bz2
zugzwang.epd.bz2

It seems to me that the property most desired in this case is to perform general tests without any harm, and to solve problems where a check sequence would reveal trouble faster. I do not think I have ever built an EPD test specifically for that purpose but it does sound useful.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Changing time value resolution

Post by bob »

Zach Wegner wrote:Oh no, not this again...

;)
Nope. Said all I intend to say about time jitter here. :)
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: Changing time value resolution

Post by sje »

bob wrote:If you use CPU time, you might pull that off. But for elapsed time, which is all I measure, that won't work, since elapsed time is not that accurate. If I am going to compare two things, I need very accurate measurements, and operating system interference can affect very short time measurements. Hence my wanting positions that 15-30 seconds of time is enough so that I can use enough time to wash out the "jitter effect".
How about using a fixed node count limit instead of a wall clock limit?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Changing time value resolution

Post by bob »

sje wrote:
bob wrote:If you use CPU time, you might pull that off. But for elapsed time, which is all I measure, that won't work, since elapsed time is not that accurate. If I am going to compare two things, I need very accurate measurements, and operating system interference can affect very short time measurements. Hence my wanting positions that 15-30 seconds of time is enough so that I can use enough time to wash out the "jitter effect".
How about using a fixed node count limit instead of a wall clock limit?
I don't know how to make it fair. Two different versions, NPS varies differently because of the q-search checks and check evasions... Tried it but then compared to times and basically each position needs a different number of nodes, which was a pain to try to deal with.

I think I have nearly reached a conclusion. The new version finds some things 1 or 2 plies quicker. That's good. But when it does, it still takes as much time as the old version did, except the old version got 1-2 plies deeper. A cluster test is, so far, showing no rating change. If that holds up for 40K games, this is getting the axe in favor of simplicity again, although I may try a little more tweaking here and there to see if there is any more left. I certainly need to try null R=3 everywhere. That used to be worse, I want to see if the checks allow it to be safe enough to use.