sprt tourney manager

abulmo2 · Post by **abulmo2** » Wed Jan 25, 2017 12:23 am

Hello,

With the latest release of amoeba (version 2.1), I published the code of my own tourney manager, that I used to validate (or reject) amoeba changes.
It uses the SPRT approach with the remarks discussed here:
http://talkchess.com/forum/viewtopic.php?t=57465
and here:
http://www.talkchess.com/forum/viewtopic.php?t=61105

It is still very basic, though, as only a fixed time per move is allowed to set game duration but has got some nice features:
- written in D language (simple & efficient)
- can use various openings (from pgn files)
- can play several games in parallel
- can set H0 & H1 hypothesis for SPRT
- can limit the test for a fixed number of games (SPRT does not always converge).
- can save the played game to a pgn file.
- works with unbalanced openings.
- no bayeselo assumptions.

Laskos · Post by **Laskos** » Wed Jan 25, 2017 10:35 am

abulmo2 wrote:Hello,

With the latest release of amoeba (version 2.1), I published the code of my own tourney manager, that I used to validate (or reject) amoeba changes.
It uses the SPRT approach with the remarks discussed here:
http://talkchess.com/forum/viewtopic.php?t=57465
and here:
http://www.talkchess.com/forum/viewtopic.php?t=61105

It is still very basic, though, as only a fixed time per move is allowed to set game duration but has got some nice features:
- written in D language (simple & efficient)
- can use various openings (from pgn files)
- can play several games in parallel
- can set H0 & H1 hypothesis for SPRT
- can limit the test for a fixed number of games (SPRT does not always converge).
- can save the played game to a pgn file.
- works with unbalanced openings.
- no bayeselo assumptions.

Great! Does it use 5-nomial for paired color reversed games, potentially using unbalanced openings? 5-nomial is what was missing from Cutechess-Cli. Can you compile this thing for Win64, I would try it a bit? Can you implement EPD openings too? My files of all sorts of openings are almost all EPD. Thanks!

abulmo2 · Post by **abulmo2** » Sat Feb 04, 2017 9:14 pm

Laskos wrote:Great! Does it use 5-nomial for paired color reversed games, potentially using unbalanced openings?

Yes.

5-nomial is what was missing from Cutechess-Cli. Can you compile this thing for Win64, I would try it a bit? Can you implement EPD openings too? My files of all sorts of openings are almost all EPD. Thanks!

Done here : https://github.com/abulmo/amoeba/releas ... ba-win.zip

epd file are now supported for openings. pgn & epd files are distinguished from the file extension: ".pgn" for pgn files & ".epd" or ".fen" for epd files. Only the fen string is read in epd files. It is possible to use several opening files at once: -b file1.pgn -b file2.epd, -b file3.pgn etc.

Laskos · Post by **Laskos** » Sun Feb 05, 2017 7:37 am

abulmo2 wrote:
Laskos wrote:Great! Does it use 5-nomial for paired color reversed games, potentially using unbalanced openings?
Yes.

5-nomial is what was missing from Cutechess-Cli. Can you compile this thing for Win64, I would try it a bit? Can you implement EPD openings too? My files of all sorts of openings are almost all EPD. Thanks!
Done here : https://github.com/abulmo/amoeba/releas ... ba-win.zip

epd file are now supported for openings. pgn & epd files are distinguished from the file extension: ".pgn" for pgn files & ".epd" or ".fen" for epd files. Only the fen string is read in epd files. It is possible to use several opening files at once: -b file1.pgn -b file2.epd, -b file3.pgn etc.

Thanks, I will play with it!

Laskos · Post by **Laskos** » Sun Feb 05, 2017 5:43 pm

abulmo2 wrote:
Laskos wrote:Great! Does it use 5-nomial for paired color reversed games, potentially using unbalanced openings?
Yes.

5-nomial is what was missing from Cutechess-Cli. Can you compile this thing for Win64, I would try it a bit? Can you implement EPD openings too? My files of all sorts of openings are almost all EPD. Thanks!
Done here : https://github.com/abulmo/amoeba/releas ... ba-win.zip

epd file are now supported for openings. pgn & epd files are distinguished from the file extension: ".pgn" for pgn files & ".epd" or ".fen" for epd files. Only the fen string is read in epd files. It is possible to use several opening files at once: -b file1.pgn -b file2.epd, -b file3.pgn etc.

Richard, I don't seem to find a help or examples file with the usage of tourney.exe. Can you exemplify a command line to use 2 engines using SPRT stop? Thanks!

Adam Hair · Post by **Adam Hair** » Sun Feb 05, 2017 6:11 pm

Until Richard replies, here is what I have found:

Code: Select all

C&#58;\Users\Adam\Downloads\amoeba-win>tourney-win64.exe -h

Run a tournament between two UCI engines using Sequential Probability Ratio Test as stopping condition.

tourney --engine|-e <cmd> --engine|-e <cmd>  &#91;optional settings&#93;
    --engine|-e <cmd>        launch an engine with <cmd>. 2 engines should be loaded
    --time|-t <movetime>     time &#40;in seconds&#41; to play a move &#40;default 0.1s&#41;
    --book|-b <pgn|epd file> opening book
    --output|-o <pgn file>   save the played games
    --games|-g <games>       max number of game pairs to play &#40;default 30000&#41;
    --cpu|-n <cpu>           number of games to play in parallel &#40;default 1&#41;
    --elo0|-H0 <elo>         H0 hypothesis &#40;default = 0&#41;
    --elo1|-H1 <elo>         H1 hypothesis &#40;default = 5&#41;
    --alpha|-&#9580;&#9618; <alpha>       type I error &#40;default = 0.05&#41;
    --beta|-&#9580;&#9619; <beta>         type II error &#40;default = 0.05&#41;
    --variance|-V <type>     3nomial|5nomial|all &#40;default=all&#41;
    --help|-h                display this help
    --version|-v             show version number

For example&#58;
$ tourney -e amoeba-2.1 -e amoeba-2.0 -g 30000 -b opening.pgn -t 0.1 -n 3 -o game.pgn
&#91;...&#93;
Amoeba 2.1-l64p vs Amoeba 2.0.l64p
results&#58; 3524 games
wdl&#58;    w&#58; 1058, d&#58; 1545, l&#58; 921
pair&#58;   0&#58; 112, 0.5&#58; 395, 1&#58; 651, 1.5&#58; 452, 2&#58; 152
Elo&#58; 13.5 &#91;9.9, 17.1&#93;
LOS&#58; 99.92 %
LLR&#58; 2.992 &#91;-2.944, 2.944&#93;
test accepted

Laskos · Post by **Laskos** » Sun Feb 05, 2017 6:38 pm

Adam Hair wrote:Until Richard replies, here is what I have found:

Code: Select all

C&#58;\Users\Adam\Downloads\amoeba-win>tourney-win64.exe -h

Run a tournament between two UCI engines using Sequential Probability Ratio Test as stopping condition.

tourney --engine|-e <cmd> --engine|-e <cmd>  &#91;optional settings&#93;
    --engine|-e <cmd>        launch an engine with <cmd>. 2 engines should be loaded
    --time|-t <movetime>     time &#40;in seconds&#41; to play a move &#40;default 0.1s&#41;
    --book|-b <pgn|epd file> opening book
    --output|-o <pgn file>   save the played games
    --games|-g <games>       max number of game pairs to play &#40;default 30000&#41;
    --cpu|-n <cpu>           number of games to play in parallel &#40;default 1&#41;
    --elo0|-H0 <elo>         H0 hypothesis &#40;default = 0&#41;
    --elo1|-H1 <elo>         H1 hypothesis &#40;default = 5&#41;
    --alpha|-&#9580;&#9618; <alpha>       type I error &#40;default = 0.05&#41;
    --beta|-&#9580;&#9619; <beta>         type II error &#40;default = 0.05&#41;
    --variance|-V <type>     3nomial|5nomial|all &#40;default=all&#41;
    --help|-h                display this help
    --version|-v             show version number

For example&#58;
$ tourney -e amoeba-2.1 -e amoeba-2.0 -g 30000 -b opening.pgn -t 0.1 -n 3 -o game.pgn
&#91;...&#93;
Amoeba 2.1-l64p vs Amoeba 2.0.l64p
results&#58; 3524 games
wdl&#58;    w&#58; 1058, d&#58; 1545, l&#58; 921
pair&#58;   0&#58; 112, 0.5&#58; 395, 1&#58; 651, 1.5&#58; 452, 2&#58; 152
Elo&#58; 13.5 &#91;9.9, 17.1&#93;
LOS&#58; 99.92 %
LLR&#58; 2.992 &#91;-2.944, 2.944&#93;
test accepted

Thanks, that helps very much.

abulmo2 · Post by **abulmo2** » Mon Feb 06, 2017 7:25 am

Thank you for your anticipated answer.
Note that you can remove this garbage output:

Code: Select all

    --alpha|-&#9580;&#9618; <alpha>       type I error &#40;default = 0.05&#41;
    --beta|-&#9580;&#9619; <beta>         type II error &#40;default = 0.05&#41;

By typing chcp 65001 before launching the program and get:

Code: Select all

   --alpha|-&#945; <alpha>       type I error &#40;default = 0.05&#41;
    --beta|-&#946; <beta>         type II error &#40;default = 0.05&#41;

Michel · Post by **Michel** » Mon Feb 06, 2017 12:11 pm

Very nice!

I checked that the 5-nomial LLR is correct. The trinomial and 5-nomial LLR are respectively

Code: Select all

2.87
2.99

So the 5-nomial one is a bit bigger, allowing the test to stop earlier.

Here is some quick and dirty code for computing these LLR's

Code: Select all

from __future__ import division
import math

bb=math.log&#40;10&#41;/400
def L&#40;x&#41;&#58;
    return 1/&#40;1+math.exp&#40;-bb*x&#41;)

def LLR&#40;elo0,elo1,results&#41;&#58;
    """ 
Compute the generalized log-likelihood ratio for "results" which
should be a list of either length 3 or 5.
"""
    score0=L&#40;elo0&#41;
    score1=L&#40;elo1&#41;
    N=sum&#40;results&#41;
    if N<=1&#58;
        return 0
    l=len&#40;results&#41;
    l1=l-1
    score=sum&#40;&#91;results&#91;i&#93; * &#40;i/l1&#41; for i in xrange&#40;0,l&#41;&#93;)/&#40;N&#41;
    var=sum&#40;&#91;results&#91;i&#93; * (&#40;i/l1&#41;-score&#41; * (&#40;i/l1&#41;-score&#41; for i in xrange&#40;0,l&#41;&#93;)/&#40;N-1&#41;
    var_score=var/N

    if len&#40; filter&#40;lambda x&#58;cmp&#40;x,0&#41;>0,results&#41;)<=1&#58;
        return 0
    else&#58;
        return &#40;score1-score0&#41;*&#40;2*score-score0-score1&#41;/var_score/2

if __name__=='__main__'&#58;
    print LLR&#40;0,5,&#91;921,1545,1058&#93;)
    print LLR&#40;0,5,&#91;112,395,651,452,152&#93;)

Laskos · Post by **Laskos** » Tue Feb 07, 2017 7:20 am

Excellent, with all the stats in the output, I love it and I saw no flaws. The setting of H0 and H1 doesn't work for me, I just use defaults, maybe I am doing something wrong, it gives:

Code: Select all

std.conv.ConvException@C&#58;\ldc\bin\..\import\std\conv.d&#40;1876&#41;&#58; Can't parse string&#58; bool should be case-insensitive 'true' or 'false'

Some experiments:

First, to have a feeling about the scaling.
N = number of games.

LLR ~ N
standard deviation ~ confidence interval ~ 1/sqrt(N)

1/ Low draw rate (around 40%):
===============================================
2moves_v1 (balanced):

Stockfish 210117 64 BMI2 vs Stockfish 140916 64 BMI2
results: 3192 games
wdl: w: 1015, d: 1309, l: 868
pair: 0: 104, 0.5: 336, 1: 607, 1.5: 407, 2: 142
Using variance of the pentanomial distribution of game pairs:
Elo: 16.0 [12.2, 19.8]
LOS: 99.98 %
LLR: 3.325 [-2.944, 2.944]
test accepted

Using variance of the trinomial distribution of single games:
Elo: 16.0 [12.1, 20.0]
LOS: 99.97 %
LLR: 3.037 [-2.944, 2.944]
test accepted

5-nomial still helps a bit by having some 10% higher LLR even for balanced openings.

Open_07_09 (disbalance 70-90cp)

Stockfish 210117 64 BMI2 vs Stockfish 140916 64 BMI2
results: 2730 games
wdl: w: 944, d: 957, l: 829
pair: 0: 70, 0.5: 266, 1: 608, 1.5: 321, 2: 100
Using variance of the pentanomial distribution of game pairs:
Elo: 14.6 [10.9, 18.4]
LOS: 99.94 %
LLR: 2.979 [-2.944, 2.944]
test accepted

Using variance of the trinomial distribution of single games:
Elo: 14.6 [9.3, 20.0]
LOS: 99.68 %
LLR: 2.020 [-2.944, 2.944]

Unbalanced openings help only moderately or not at all. 5-nomial helps a lot for the case of unbalanced.
===============================================

2/ High draw rate (around 90%)
===============================================
Endgame (balanced)

Stockfish 210117 64 BMI2 vs Stockfish 090416 64 BMI2
results: 2516 games
wdl: w: 126, d: 2300, l: 90
pair: 0: 0, 0.5: 66, 1: 1095, 1.5: 92, 2: 5
Using variance of the pentanomial distribution of game pairs:
Elo: 5.0 [3.4, 6.5]
LOS: 99.66 %
LLR: 3.662 [-2.944, 2.944]
test accepted

Using variance of the trinomial distribution of single games:
Elo: 5.0 [3.3, 6.7]
LOS: 99.29 %
LLR: 3.007 [-2.944, 2.944]
test accepted

5-nomial still helps even for balanced positions, by having LLR 20% higher.

Endgame_10_14 (disbalance 100-140cp)

Stockfish 210117 64 BMI2 vs Stockfish 090416 64 BMI2
results: 770 games
wdl: w: 231, d: 352, l: 187
pair: 0: 1, 0.5: 46, 1: 249, 1.5: 86, 2: 3
Using variance of the pentanomial distribution of game pairs:
Elo: 19.9 [15.4, 24.4]
LOS: 99.99 %
LLR: 2.981 [-2.944, 2.944]
test accepted

Using variance of the trinomial distribution of single games:
Elo: 19.9 [10.7, 29.1]
LOS: 98.43 %
LLR: 1.103 [-2.944, 2.944]

Unbalanced openings & 5-nomial help very much, shortening the matches to SPRT stop by factors of 3-4, as predicted by the models.
===============================================

sprt tourney manager

sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager

Re: sprt tourney manager