Self testing and high draw ratio

xr_a_y · Post by **xr_a_y** » Mon Aug 19, 2019 8:05 pm

As Minic become smarter, the draw ratio between the last release and the current dev is growing high.
I often see 65% of draws using 2 moves opening book.
Hopefully the draw ratio versus other engines is still correct (~20%) so that a little tourney gives reliable and quite quick results.
But before going to a tourney I always like to run a little head-to-head self test.
Because a little tourney takes at least 2 days, including 3 or 4 versions of Minic and 6 to 10 other engines, running 7 concurrent games at TC40/20sec).

SPRT seems to not help much here, but I may not be using it right ...

Let's look at today example (still running...):

Code: Select all

Score of minic_0.86 vs minic_dev: 75 - 96 - 316 [0.478]
Elo difference: -15.0 +/- 18.3, LOS: 5.4 %, DrawRatio: 64.9 %

Can I use SPRT to conclude quickly here that the patch is good or not ?

A connected question might be, what are the odds to see a +25 +/-20 (looks good to me) become a 15 +/-18 (look less good to me ...) that's what I saw today ...

jdart · Post by **jdart** » Mon Aug 19, 2019 10:08 pm

You should be using SPRT if testing new version against old version.

I have a little python3 script here that will monitor cutechess-cli logs, report SPRT results, and terminate tests once there is a signficant result (may need some slight mods for your environment):

https://github.com/jdart1/arasan-chess/ ... monitor.py

xr_a_y · Post by **xr_a_y** » Mon Aug 19, 2019 10:22 pm

jdart wrote: ↑Mon Aug 19, 2019 10:08 pm You should be using SPRT if testing new version against old version.

I have a little python3 script here that will monitor cutechess-cli logs, report SPRT results, and terminate tests once there is a signficant result (may need some slight mods for your environment):

https://github.com/jdart1/arasan-chess/ ... monitor.py

Yes I've found your script already thanks to a discussion you had with Kai some time ago but I'm still confused about which bound I shall use.
As said in your previous post, H0 and H1 elo shall be chosen according to the expected gain.

For Minic I'm looking at +10 or +20 elo patch, not less. What elo0 and elo1 are you recommending ?

jdart · Post by **jdart** » Tue Aug 20, 2019 2:55 am

xr_a_y wrote: ↑Mon Aug 19, 2019 10:22 pm What elo0 and elo1 are you recommending ?

The ELO range is passed as parameters to the SPRT function. I don't adjust the values that are in the script. For me very few patches are +10 or +20 ELO.

--Jon

Self testing and high draw ratio

Self testing and high draw ratio

Re: Self testing and high draw ratio

Re: Self testing and high draw ratio

Re: Self testing and high draw ratio