Hi everyone,
I am developing a chess engine Shuffle - https://github.com/ArjunBasandrai/shuffle-chess-engine/
and I am using Cutechess GUI to test newer versions with previous versions
In the results tab for a tournament there is metric "SPRT". I looked it up on CPW and could not fully understand what this metric represents. Can anyone explain this to me?
Also, why is it always zero (SPRT: llr 0 (0.0%), lbound -inf, ubound inf)?
Cutechess GUI SPRT doubts
Moderator: Ras
-
- Posts: 10
- Joined: Thu Oct 26, 2023 8:41 pm
- Full name: Arjun Basandrai
Cutechess GUI SPRT doubts
Shuffle Chess Engine (made in C) - https://github.com/ArjunBasandrai/shuffle-chess-engine/
-
- Posts: 93
- Joined: Sun Aug 08, 2021 9:14 pm
- Full name: Kurt Peters
Re: Cutechess GUI SPRT doubts
SPRT is a type of test that can be used to determine whether one engine is stronger than another, by a desired value and confidence level. As games between two engines are played, the test compares the results of the games and through a bunch of math calculates the likelihood that one engine is (or is not) stronger than the other by the desired value and confidence.
It's always zero in the results tab because the GUI doesn't support SPRT test, but still shows that result. If you want to do use SPRT, you need to use the CLI.
It's always zero in the results tab because the GUI doesn't support SPRT test, but still shows that result. If you want to do use SPRT, you need to use the CLI.
-
- Posts: 192
- Joined: Sun Oct 30, 2022 5:26 pm
- Full name: Conor Anstey
Re: Cutechess GUI SPRT doubts
Note that if you are not SPRTing, you are not testing your engine properly. There are several people on this forum (e.g. amchess) who do not believe in proper testing, but they are objectively wrong.
This involves, if you're using cutechess, using the CLI and passing for example "-sprt elo0=0 elo1=5 alpha=0.05 beta=0.05".
That test tells you, to 95% confidence, whether it is more likely that your change gained 5 elo or 0. You can adjust elo0 and elo1 depending on the hypotheses you wish to test.
This involves, if you're using cutechess, using the CLI and passing for example "-sprt elo0=0 elo1=5 alpha=0.05 beta=0.05".
That test tells you, to 95% confidence, whether it is more likely that your change gained 5 elo or 0. You can adjust elo0 and elo1 depending on the hypotheses you wish to test.
-
- Posts: 10
- Joined: Thu Oct 26, 2023 8:41 pm
- Full name: Arjun Basandrai
Re: Cutechess GUI SPRT doubts
Thank you so much for clarifying!
I have a couple more doubts regarding cutechess-cli
Like in the gui there is an option to load openings from a polyglot book upto a certain depth. How can I achieve this using the cli?
I ran a test using this command
cutechess-cli -engine conf="Shuffle 4.2.0" -engine conf="Shuffle 4.1.0" -each proto=uci tc=0/20+0.2s book="C:\programming\Machine Learning\Chess\shuffle-chess-engine\src\polyglot\polyglot_opening_books\shuffle.bin" bookdepth=12 -concurrency 20 -rounds 10000 -pgnout 420.pgn -bookmode disk -resultformat wide
Is the book= command doing what I mentioned above?
Also I will add the -sprt line and run the test again.
One other question I had is that, how do you determine as an ultimatum that whether a version is better than others or not.
Which all time controls should I test on and for how many rounds and what metrics should I use to determine this?
I have a couple more doubts regarding cutechess-cli
Like in the gui there is an option to load openings from a polyglot book upto a certain depth. How can I achieve this using the cli?
I ran a test using this command
cutechess-cli -engine conf="Shuffle 4.2.0" -engine conf="Shuffle 4.1.0" -each proto=uci tc=0/20+0.2s book="C:\programming\Machine Learning\Chess\shuffle-chess-engine\src\polyglot\polyglot_opening_books\shuffle.bin" bookdepth=12 -concurrency 20 -rounds 10000 -pgnout 420.pgn -bookmode disk -resultformat wide
Is the book= command doing what I mentioned above?
Also I will add the -sprt line and run the test again.
One other question I had is that, how do you determine as an ultimatum that whether a version is better than others or not.
Which all time controls should I test on and for how many rounds and what metrics should I use to determine this?
Shuffle Chess Engine (made in C) - https://github.com/ArjunBasandrai/shuffle-chess-engine/
-
- Posts: 87
- Joined: Thu Oct 07, 2021 12:48 am
- Location: Warsaw, Poland
- Full name: Michal Witanowski
Re: Cutechess GUI SPRT doubts
I would suggest using one of the books from here: https://github.com/official-stockfish/booksarjunbasandrai wrote: ↑Mon Dec 11, 2023 10:02 am Like in the gui there is an option to load openings from a polyglot book upto a certain depth. How can I achieve this using the cli?
8moves_v3 is a good "balanced" book, and UHO_4060 is good "unbalanced" book.
If SPRT test passes then you know with >95% certainty that a new version is better. Most people test at STC (short time control) - for example 8+0.08 or 10+0.1. If you have resources you should test at longer time controls.arjunbasandrai wrote: ↑Mon Dec 11, 2023 10:02 am One other question I had is that, how do you determine as an ultimatum that whether a version is better than others or not.
Which all time controls should I test on and for how many rounds and what metrics should I use to determine this?
Author of Caissa Chess Engine: https://github.com/Witek902/Caissa
-
- Posts: 10
- Joined: Thu Oct 26, 2023 8:41 pm
- Full name: Arjun Basandrai
Re: Cutechess GUI SPRT doubts
Thank you so much for the information!Witek wrote: ↑Thu Dec 14, 2023 2:47 amI would suggest using one of the books from here: https://github.com/official-stockfish/booksarjunbasandrai wrote: ↑Mon Dec 11, 2023 10:02 am Like in the gui there is an option to load openings from a polyglot book upto a certain depth. How can I achieve this using the cli?
8moves_v3 is a good "balanced" book, and UHO_4060 is good "unbalanced" book.
If SPRT test passes then you know with >95% certainty that a new version is better. Most people test at STC (short time control) - for example 8+0.08 or 10+0.1. If you have resources you should test at longer time controls.arjunbasandrai wrote: ↑Mon Dec 11, 2023 10:02 am One other question I had is that, how do you determine as an ultimatum that whether a version is better than others or not.
Which all time controls should I test on and for how many rounds and what metrics should I use to determine this?
Shuffle Chess Engine (made in C) - https://github.com/ArjunBasandrai/shuffle-chess-engine/