Hello,
With the latest release of amoeba (version 2.1), I published the code of my own tourney manager, that I used to validate (or reject) amoeba changes.
It uses the SPRT approach with the remarks discussed here:
http://talkchess.com/forum/viewtopic.php?t=57465
and here:
http://www.talkchess.com/forum/viewtopic.php?t=61105
It is still very basic, though, as only a fixed time per move is allowed to set game duration but has got some nice features:
- written in D language (simple & efficient)
- can use various openings (from pgn files)
- can play several games in parallel
- can set H0 & H1 hypothesis for SPRT
- can limit the test for a fixed number of games (SPRT does not always converge).
- can save the played game to a pgn file.
- works with unbalanced openings.
- no bayeselo assumptions.
sprt tourney manager
Moderators: hgm, Rebel, chrisw
-
- Posts: 433
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
sprt tourney manager
Richard Delorme
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: sprt tourney manager
Great! Does it use 5-nomial for paired color reversed games, potentially using unbalanced openings? 5-nomial is what was missing from Cutechess-Cli. Can you compile this thing for Win64, I would try it a bit? Can you implement EPD openings too? My files of all sorts of openings are almost all EPD. Thanks!abulmo2 wrote:Hello,
With the latest release of amoeba (version 2.1), I published the code of my own tourney manager, that I used to validate (or reject) amoeba changes.
It uses the SPRT approach with the remarks discussed here:
http://talkchess.com/forum/viewtopic.php?t=57465
and here:
http://www.talkchess.com/forum/viewtopic.php?t=61105
It is still very basic, though, as only a fixed time per move is allowed to set game duration but has got some nice features:
- written in D language (simple & efficient)
- can use various openings (from pgn files)
- can play several games in parallel
- can set H0 & H1 hypothesis for SPRT
- can limit the test for a fixed number of games (SPRT does not always converge).
- can save the played game to a pgn file.
- works with unbalanced openings.
- no bayeselo assumptions.
-
- Posts: 433
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: sprt tourney manager
Yes.Laskos wrote:Great! Does it use 5-nomial for paired color reversed games, potentially using unbalanced openings?
Done here : https://github.com/abulmo/amoeba/releas ... ba-win.zip5-nomial is what was missing from Cutechess-Cli. Can you compile this thing for Win64, I would try it a bit? Can you implement EPD openings too? My files of all sorts of openings are almost all EPD. Thanks!
epd file are now supported for openings. pgn & epd files are distinguished from the file extension: ".pgn" for pgn files & ".epd" or ".fen" for epd files. Only the fen string is read in epd files. It is possible to use several opening files at once: -b file1.pgn -b file2.epd, -b file3.pgn etc.
Richard Delorme
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: sprt tourney manager
Thanks, I will play with it!abulmo2 wrote:Yes.Laskos wrote:Great! Does it use 5-nomial for paired color reversed games, potentially using unbalanced openings?
Done here : https://github.com/abulmo/amoeba/releas ... ba-win.zip5-nomial is what was missing from Cutechess-Cli. Can you compile this thing for Win64, I would try it a bit? Can you implement EPD openings too? My files of all sorts of openings are almost all EPD. Thanks!
epd file are now supported for openings. pgn & epd files are distinguished from the file extension: ".pgn" for pgn files & ".epd" or ".fen" for epd files. Only the fen string is read in epd files. It is possible to use several opening files at once: -b file1.pgn -b file2.epd, -b file3.pgn etc.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: sprt tourney manager
Richard, I don't seem to find a help or examples file with the usage of tourney.exe. Can you exemplify a command line to use 2 engines using SPRT stop? Thanks!abulmo2 wrote:Yes.Laskos wrote:Great! Does it use 5-nomial for paired color reversed games, potentially using unbalanced openings?
Done here : https://github.com/abulmo/amoeba/releas ... ba-win.zip5-nomial is what was missing from Cutechess-Cli. Can you compile this thing for Win64, I would try it a bit? Can you implement EPD openings too? My files of all sorts of openings are almost all EPD. Thanks!
epd file are now supported for openings. pgn & epd files are distinguished from the file extension: ".pgn" for pgn files & ".epd" or ".fen" for epd files. Only the fen string is read in epd files. It is possible to use several opening files at once: -b file1.pgn -b file2.epd, -b file3.pgn etc.
-
- Posts: 3226
- Joined: Wed May 06, 2009 10:31 pm
- Location: Fuquay-Varina, North Carolina
Re: sprt tourney manager
Until Richard replies, here is what I have found:
Code: Select all
C:\Users\Adam\Downloads\amoeba-win>tourney-win64.exe -h
Run a tournament between two UCI engines using Sequential Probability Ratio Test as stopping condition.
tourney --engine|-e <cmd> --engine|-e <cmd> [optional settings]
--engine|-e <cmd> launch an engine with <cmd>. 2 engines should be loaded
--time|-t <movetime> time (in seconds) to play a move (default 0.1s)
--book|-b <pgn|epd file> opening book
--output|-o <pgn file> save the played games
--games|-g <games> max number of game pairs to play (default 30000)
--cpu|-n <cpu> number of games to play in parallel (default 1)
--elo0|-H0 <elo> H0 hypothesis (default = 0)
--elo1|-H1 <elo> H1 hypothesis (default = 5)
--alpha|-╬▒ <alpha> type I error (default = 0.05)
--beta|-╬▓ <beta> type II error (default = 0.05)
--variance|-V <type> 3nomial|5nomial|all (default=all)
--help|-h display this help
--version|-v show version number
For example:
$ tourney -e amoeba-2.1 -e amoeba-2.0 -g 30000 -b opening.pgn -t 0.1 -n 3 -o game.pgn
[...]
Amoeba 2.1-l64p vs Amoeba 2.0.l64p
results: 3524 games
wdl: w: 1058, d: 1545, l: 921
pair: 0: 112, 0.5: 395, 1: 651, 1.5: 452, 2: 152
Elo: 13.5 [9.9, 17.1]
LOS: 99.92 %
LLR: 2.992 [-2.944, 2.944]
test accepted
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: sprt tourney manager
Thanks, that helps very much.Adam Hair wrote:Until Richard replies, here is what I have found:
Code: Select all
C:\Users\Adam\Downloads\amoeba-win>tourney-win64.exe -h Run a tournament between two UCI engines using Sequential Probability Ratio Test as stopping condition. tourney --engine|-e <cmd> --engine|-e <cmd> [optional settings] --engine|-e <cmd> launch an engine with <cmd>. 2 engines should be loaded --time|-t <movetime> time (in seconds) to play a move (default 0.1s) --book|-b <pgn|epd file> opening book --output|-o <pgn file> save the played games --games|-g <games> max number of game pairs to play (default 30000) --cpu|-n <cpu> number of games to play in parallel (default 1) --elo0|-H0 <elo> H0 hypothesis (default = 0) --elo1|-H1 <elo> H1 hypothesis (default = 5) --alpha|-╬▒ <alpha> type I error (default = 0.05) --beta|-╬▓ <beta> type II error (default = 0.05) --variance|-V <type> 3nomial|5nomial|all (default=all) --help|-h display this help --version|-v show version number For example: $ tourney -e amoeba-2.1 -e amoeba-2.0 -g 30000 -b opening.pgn -t 0.1 -n 3 -o game.pgn [...] Amoeba 2.1-l64p vs Amoeba 2.0.l64p results: 3524 games wdl: w: 1058, d: 1545, l: 921 pair: 0: 112, 0.5: 395, 1: 651, 1.5: 452, 2: 152 Elo: 13.5 [9.9, 17.1] LOS: 99.92 % LLR: 2.992 [-2.944, 2.944] test accepted
-
- Posts: 433
- Joined: Fri Dec 16, 2016 11:04 am
- Location: France
- Full name: Richard Delorme
Re: sprt tourney manager
Thank you for your anticipated answer.
Note that you can remove this garbage output:
By typing chcp 65001 before launching the program and get:
Note that you can remove this garbage output:
Code: Select all
--alpha|-╬▒ <alpha> type I error (default = 0.05)
--beta|-╬▓ <beta> type II error (default = 0.05)
Code: Select all
--alpha|-α <alpha> type I error (default = 0.05)
--beta|-β <beta> type II error (default = 0.05)
Richard Delorme
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: sprt tourney manager
Very nice!
I checked that the 5-nomial LLR is correct. The trinomial and 5-nomial LLR are respectively
So the 5-nomial one is a bit bigger, allowing the test to stop earlier.
Here is some quick and dirty code for computing these LLR's
I checked that the 5-nomial LLR is correct. The trinomial and 5-nomial LLR are respectively
Code: Select all
2.87
2.99
Here is some quick and dirty code for computing these LLR's
Code: Select all
from __future__ import division
import math
bb=math.log(10)/400
def L(x):
return 1/(1+math.exp(-bb*x))
def LLR(elo0,elo1,results):
"""
Compute the generalized log-likelihood ratio for "results" which
should be a list of either length 3 or 5.
"""
score0=L(elo0)
score1=L(elo1)
N=sum(results)
if N<=1:
return 0
l=len(results)
l1=l-1
score=sum([results[i] * (i/l1) for i in xrange(0,l)])/(N)
var=sum([results[i] * ((i/l1)-score) * ((i/l1)-score) for i in xrange(0,l)])/(N-1)
var_score=var/N
if len( filter(lambda x:cmp(x,0)>0,results))<=1:
return 0
else:
return (score1-score0)*(2*score-score0-score1)/var_score/2
if __name__=='__main__':
print LLR(0,5,[921,1545,1058])
print LLR(0,5,[112,395,651,452,152])
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: sprt tourney manager
Excellent, with all the stats in the output, I love it and I saw no flaws. The setting of H0 and H1 doesn't work for me, I just use defaults, maybe I am doing something wrong, it gives:
Some experiments:
First, to have a feeling about the scaling.
N = number of games.
LLR ~ N
standard deviation ~ confidence interval ~ 1/sqrt(N)
1/ Low draw rate (around 40%):
===============================================
2moves_v1 (balanced):
Stockfish 210117 64 BMI2 vs Stockfish 140916 64 BMI2
results: 3192 games
wdl: w: 1015, d: 1309, l: 868
pair: 0: 104, 0.5: 336, 1: 607, 1.5: 407, 2: 142
Using variance of the pentanomial distribution of game pairs:
Elo: 16.0 [12.2, 19.8]
LOS: 99.98 %
LLR: 3.325 [-2.944, 2.944]
test accepted
Using variance of the trinomial distribution of single games:
Elo: 16.0 [12.1, 20.0]
LOS: 99.97 %
LLR: 3.037 [-2.944, 2.944]
test accepted
5-nomial still helps a bit by having some 10% higher LLR even for balanced openings.
Open_07_09 (disbalance 70-90cp)
Stockfish 210117 64 BMI2 vs Stockfish 140916 64 BMI2
results: 2730 games
wdl: w: 944, d: 957, l: 829
pair: 0: 70, 0.5: 266, 1: 608, 1.5: 321, 2: 100
Using variance of the pentanomial distribution of game pairs:
Elo: 14.6 [10.9, 18.4]
LOS: 99.94 %
LLR: 2.979 [-2.944, 2.944]
test accepted
Using variance of the trinomial distribution of single games:
Elo: 14.6 [9.3, 20.0]
LOS: 99.68 %
LLR: 2.020 [-2.944, 2.944]
Unbalanced openings help only moderately or not at all. 5-nomial helps a lot for the case of unbalanced.
===============================================
2/ High draw rate (around 90%)
===============================================
Endgame (balanced)
Stockfish 210117 64 BMI2 vs Stockfish 090416 64 BMI2
results: 2516 games
wdl: w: 126, d: 2300, l: 90
pair: 0: 0, 0.5: 66, 1: 1095, 1.5: 92, 2: 5
Using variance of the pentanomial distribution of game pairs:
Elo: 5.0 [3.4, 6.5]
LOS: 99.66 %
LLR: 3.662 [-2.944, 2.944]
test accepted
Using variance of the trinomial distribution of single games:
Elo: 5.0 [3.3, 6.7]
LOS: 99.29 %
LLR: 3.007 [-2.944, 2.944]
test accepted
5-nomial still helps even for balanced positions, by having LLR 20% higher.
Endgame_10_14 (disbalance 100-140cp)
Stockfish 210117 64 BMI2 vs Stockfish 090416 64 BMI2
results: 770 games
wdl: w: 231, d: 352, l: 187
pair: 0: 1, 0.5: 46, 1: 249, 1.5: 86, 2: 3
Using variance of the pentanomial distribution of game pairs:
Elo: 19.9 [15.4, 24.4]
LOS: 99.99 %
LLR: 2.981 [-2.944, 2.944]
test accepted
Using variance of the trinomial distribution of single games:
Elo: 19.9 [10.7, 29.1]
LOS: 98.43 %
LLR: 1.103 [-2.944, 2.944]
Unbalanced openings & 5-nomial help very much, shortening the matches to SPRT stop by factors of 3-4, as predicted by the models.
===============================================
Code: Select all
std.conv.ConvException@C:\ldc\bin\..\import\std\conv.d(1876): Can't parse string: bool should be case-insensitive 'true' or 'false'
First, to have a feeling about the scaling.
N = number of games.
LLR ~ N
standard deviation ~ confidence interval ~ 1/sqrt(N)
1/ Low draw rate (around 40%):
===============================================
2moves_v1 (balanced):
Stockfish 210117 64 BMI2 vs Stockfish 140916 64 BMI2
results: 3192 games
wdl: w: 1015, d: 1309, l: 868
pair: 0: 104, 0.5: 336, 1: 607, 1.5: 407, 2: 142
Using variance of the pentanomial distribution of game pairs:
Elo: 16.0 [12.2, 19.8]
LOS: 99.98 %
LLR: 3.325 [-2.944, 2.944]
test accepted
Using variance of the trinomial distribution of single games:
Elo: 16.0 [12.1, 20.0]
LOS: 99.97 %
LLR: 3.037 [-2.944, 2.944]
test accepted
5-nomial still helps a bit by having some 10% higher LLR even for balanced openings.
Open_07_09 (disbalance 70-90cp)
Stockfish 210117 64 BMI2 vs Stockfish 140916 64 BMI2
results: 2730 games
wdl: w: 944, d: 957, l: 829
pair: 0: 70, 0.5: 266, 1: 608, 1.5: 321, 2: 100
Using variance of the pentanomial distribution of game pairs:
Elo: 14.6 [10.9, 18.4]
LOS: 99.94 %
LLR: 2.979 [-2.944, 2.944]
test accepted
Using variance of the trinomial distribution of single games:
Elo: 14.6 [9.3, 20.0]
LOS: 99.68 %
LLR: 2.020 [-2.944, 2.944]
Unbalanced openings help only moderately or not at all. 5-nomial helps a lot for the case of unbalanced.
===============================================
2/ High draw rate (around 90%)
===============================================
Endgame (balanced)
Stockfish 210117 64 BMI2 vs Stockfish 090416 64 BMI2
results: 2516 games
wdl: w: 126, d: 2300, l: 90
pair: 0: 0, 0.5: 66, 1: 1095, 1.5: 92, 2: 5
Using variance of the pentanomial distribution of game pairs:
Elo: 5.0 [3.4, 6.5]
LOS: 99.66 %
LLR: 3.662 [-2.944, 2.944]
test accepted
Using variance of the trinomial distribution of single games:
Elo: 5.0 [3.3, 6.7]
LOS: 99.29 %
LLR: 3.007 [-2.944, 2.944]
test accepted
5-nomial still helps even for balanced positions, by having LLR 20% higher.
Endgame_10_14 (disbalance 100-140cp)
Stockfish 210117 64 BMI2 vs Stockfish 090416 64 BMI2
results: 770 games
wdl: w: 231, d: 352, l: 187
pair: 0: 1, 0.5: 46, 1: 249, 1.5: 86, 2: 3
Using variance of the pentanomial distribution of game pairs:
Elo: 19.9 [15.4, 24.4]
LOS: 99.99 %
LLR: 2.981 [-2.944, 2.944]
test accepted
Using variance of the trinomial distribution of single games:
Elo: 19.9 [10.7, 29.1]
LOS: 98.43 %
LLR: 1.103 [-2.944, 2.944]
Unbalanced openings & 5-nomial help very much, shortening the matches to SPRT stop by factors of 3-4, as predicted by the models.
===============================================