sprt tourney manager

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: sprt tourney manager

Post by Michel »

There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get

wdl: w: 231, d: 352, l: 187

LLR=1.02

whereas the reported LLR is 1.1.

The 5-nomial case is ok.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: sprt tourney manager

Post by Laskos »

Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get

wdl: w: 231, d: 352, l: 187

LLR=1.02

whereas the reported LLR is 1.1.

The 5-nomial case is ok.
It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: sprt tourney manager

Post by Laskos »

Laskos wrote:
Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get

wdl: w: 231, d: 352, l: 187

LLR=1.02

whereas the reported LLR is 1.1.

The 5-nomial case is ok.
It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.
By the way, I observed that the test sometimes doesn't stop at exactly the desired ln(19), but at a higher value, dozens of games later. I don't know if this is intended.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: sprt tourney manager

Post by Michel »

Laskos wrote:
Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get

wdl: w: 231, d: 352, l: 187

LLR=1.02

whereas the reported LLR is 1.1.

The 5-nomial case is ok.
It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.
Thanks for the explanation. I put a link for the script computing LLR's here.

http://hardy.uhasselt.be/Toga/computeLLR.py

Now that there is an implementation in the Amoebe tourney manager perhaps cutechess-cli can follow? The code is really trivial. Much easier than the current code in cutechess-cli.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
abulmo2
Posts: 433
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: sprt tourney manager

Post by abulmo2 »

Laskos wrote:By the way, I observed that the test sometimes doesn't stop at exactly the desired ln(19), but at a higher value, dozens of games later. I don't know if this is intended.
Several possibilities:

- I requested to do at least 100 pairs of games before stopping.

- By default the tournament stops when both LLR from 3-nomial and LLR from 5-nomial distributions are both > ln(19). Thus, you may see the 5-nomial test succeeded before the 3-nomial test. It is possible to use only 3-nomial or 5-nomial test by using the argument '-V 3nomial or '-V 5nomial'.

- When concurrency is used, it will finish all games already started before stopping. So it is possible to continue playing a few more games than necessary and even that the stopping rule is no more verified in the last game.
Richard Delorme
abulmo2
Posts: 433
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: sprt tourney manager

Post by abulmo2 »

Michel wrote:Now that there is an implementation in ... /tourney.d
The sprt part is between line 175 & 300.

It is written in D language which is not very popular yet. Nevertheless, for people used to C-like language, I hope my code to be quite readable. (I like & abuse UTF-8 characters, so you need an UTF-8 compatible editor though).
Richard Delorme
abulmo2
Posts: 433
Joined: Fri Dec 16, 2016 11:04 am
Location: France
Full name: Richard Delorme

Re: sprt tourney manager

Post by abulmo2 »

Laskos wrote:Excellent, with all the stats in the output, I love it and I saw no flaws. The setting of H0 and H1 doesn't work for me, I just use defaults, maybe I am doing something wrong, it gives:

Code: Select all

std.conv.ConvException@C:\ldc\bin\..\import\std\conv.d(1876): Can't parse string: bool should be case-insensitive 'true' or 'false'
Thank you for reporting this. I probably misunderstood how the argument parser on the command line works, and I forgot to test this part thoroughly. I think the long option name --elo0 and --elo1 should work nonetheless. I will try to correct this in a future release.
Richard Delorme
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: sprt tourney manager

Post by Laskos »

abulmo2 wrote:
Laskos wrote:Excellent, with all the stats in the output, I love it and I saw no flaws. The setting of H0 and H1 doesn't work for me, I just use defaults, maybe I am doing something wrong, it gives:

Code: Select all

std.conv.ConvException@C:\ldc\bin\..\import\std\conv.d(1876): Can't parse string: bool should be case-insensitive 'true' or 'false'
Thank you for reporting this. I probably misunderstood how the argument parser on the command line works, and I forgot to test this part thoroughly. I think the long option name --elo0 and --elo1 should work nonetheless. I will try to correct this in a future release.
Great, --elo0 and --elo1 work!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: sprt tourney manager

Post by Laskos »

Michel wrote:
Laskos wrote:
Michel wrote:There seems to be a puzzling deviation in the reported 3-nomial LLR's. E.g. for the last experiment I get

wdl: w: 231, d: 352, l: 187

LLR=1.02

whereas the reported LLR is 1.1.

The 5-nomial case is ok.
It is probable that it's my fault. This stop (where test in not yet accepted) was my ad-hoc stop, the test was still running (it stops when both 3- and 5-nomial reach desired LLR), but I wanted to see what 3-nomial LLR is when 5-nomial stops. 2 others were stopped regularly (tests were accepted), should be ok to fine details.
Thanks for the explanation. I put a link for the script computing LLR's here.

http://hardy.uhasselt.be/Toga/computeLLR.py

Now that there is an implementation in the Amoebe tourney manager perhaps cutechess-cli can follow? The code is really trivial. Much easier than the current code in cutechess-cli.
Thanks for the script! Yes, this Richard's implementation is the most straightforward use of SPRT for 3- and 5-nomials, maybe Cutechess-Cli will follow. I tried to compute 3-nomial expression for LLR in closed form without substitutions, I hoped it will simplify a bit, but the most I get is still very long, and 5-nomial would be even longer. Just for fun, LLR for 3-nomial:
Image
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: sprt tourney manager

Post by Laskos »

abulmo2 wrote:
Laskos wrote:By the way, I observed that the test sometimes doesn't stop at exactly the desired ln(19), but at a higher value, dozens of games later. I don't know if this is intended.
Several possibilities:

- I requested to do at least 100 pairs of games before stopping.

- By default the tournament stops when both LLR from 3-nomial and LLR from 5-nomial distributions are both > ln(19). Thus, you may see the 5-nomial test succeeded before the 3-nomial test. It is possible to use only 3-nomial or 5-nomial test by using the argument '-V 3nomial or '-V 5nomial'.

- When concurrency is used, it will finish all games already started before stopping. So it is possible to continue playing a few more games than necessary and even that the stopping rule is no more verified in the last game.
It seems that I have cases when nothing of that happens, but the stop is late:

Code: Select all

Stockfish 210117 64 BMI2 vs Stockfish 140916 64 BMI2
results: 428 games
wdl:    w: 47, d: 356, l: 25
pair:   0: 0, 0.5: 15, 1: 163, 1.5: 35, 2: 1
Using variance of the pentanomial distribution of game pairs:
Elo: 17.9 [13.0, 22.8]
LOS: 99.89 %
LLR: 6.756 [-2.944, 2.944]
test accepted

Using variance of the trinomial distribution of single games:
Elo: 17.9 [12.2, 23.6]
LOS: 99.55 %
LLR: 4.933 [-2.944, 2.944]
test accepted