Self testing and high draw ratio

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Self testing and high draw ratio

Post by xr_a_y »

As Minic become smarter, the draw ratio between the last release and the current dev is growing high.
I often see 65% of draws using 2 moves opening book.
Hopefully the draw ratio versus other engines is still correct (~20%) so that a little tourney gives reliable and quite quick results.
But before going to a tourney I always like to run a little head-to-head self test.
Because a little tourney takes at least 2 days, including 3 or 4 versions of Minic and 6 to 10 other engines, running 7 concurrent games at TC40/20sec).

SPRT seems to not help much here, but I may not be using it right ... :oops:

Let's look at today example (still running...):

Code: Select all

Score of minic_0.86 vs minic_dev: 75 - 96 - 316 [0.478]
Elo difference: -15.0 +/- 18.3, LOS: 5.4 %, DrawRatio: 64.9 %
Can I use SPRT to conclude quickly here that the patch is good or not ?

A connected question might be, what are the odds to see a +25 +/-20 (looks good to me) become a 15 +/-18 (look less good to me ...) that's what I saw today ...
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Self testing and high draw ratio

Post by jdart »

You should be using SPRT if testing new version against old version.

I have a little python3 script here that will monitor cutechess-cli logs, report SPRT results, and terminate tests once there is a signficant result (may need some slight mods for your environment):

https://github.com/jdart1/arasan-chess/ ... monitor.py
User avatar
xr_a_y
Posts: 1871
Joined: Sat Nov 25, 2017 2:28 pm
Location: France

Re: Self testing and high draw ratio

Post by xr_a_y »

jdart wrote: Mon Aug 19, 2019 10:08 pm You should be using SPRT if testing new version against old version.

I have a little python3 script here that will monitor cutechess-cli logs, report SPRT results, and terminate tests once there is a signficant result (may need some slight mods for your environment):

https://github.com/jdart1/arasan-chess/ ... monitor.py
Yes I've found your script already thanks to a discussion you had with Kai some time ago but I'm still confused about which bound I shall use.
As said in your previous post, H0 and H1 elo shall be chosen according to the expected gain.

For Minic I'm looking at +10 or +20 elo patch, not less. What elo0 and elo1 are you recommending ?
jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Self testing and high draw ratio

Post by jdart »

xr_a_y wrote: Mon Aug 19, 2019 10:22 pm What elo0 and elo1 are you recommending ?
The ELO range is passed as parameters to the SPRT function. I don't adjust the values that are in the script. For me very few patches are +10 or +20 ELO.

--Jon