Cutechess GUI SPRT doubts

Discussion of chess software programming and technical issues.

Moderator: Ras

arjunbasandrai
Posts: 10
Joined: Thu Oct 26, 2023 8:41 pm
Full name: Arjun Basandrai

Cutechess GUI SPRT doubts

Post by arjunbasandrai »

Hi everyone,

I am developing a chess engine Shuffle - https://github.com/ArjunBasandrai/shuffle-chess-engine/
and I am using Cutechess GUI to test newer versions with previous versions

In the results tab for a tournament there is metric "SPRT". I looked it up on CPW and could not fully understand what this metric represents. Can anyone explain this to me?

Also, why is it always zero (SPRT: llr 0 (0.0%), lbound -inf, ubound inf)?
KhepriChess
Posts: 93
Joined: Sun Aug 08, 2021 9:14 pm
Full name: Kurt Peters

Re: Cutechess GUI SPRT doubts

Post by KhepriChess »

SPRT is a type of test that can be used to determine whether one engine is stronger than another, by a desired value and confidence level. As games between two engines are played, the test compares the results of the games and through a bunch of math calculates the likelihood that one engine is (or is not) stronger than the other by the desired value and confidence.

It's always zero in the results tab because the GUI doesn't support SPRT test, but still shows that result. If you want to do use SPRT, you need to use the CLI.
Puffin: Github
KhepriChess: Github
Ciekce
Posts: 192
Joined: Sun Oct 30, 2022 5:26 pm
Full name: Conor Anstey

Re: Cutechess GUI SPRT doubts

Post by Ciekce »

Note that if you are not SPRTing, you are not testing your engine properly. There are several people on this forum (e.g. amchess) who do not believe in proper testing, but they are objectively wrong.

This involves, if you're using cutechess, using the CLI and passing for example "-sprt elo0=0 elo1=5 alpha=0.05 beta=0.05".

That test tells you, to 95% confidence, whether it is more likely that your change gained 5 elo or 0. You can adjust elo0 and elo1 depending on the hypotheses you wish to test.
arjunbasandrai
Posts: 10
Joined: Thu Oct 26, 2023 8:41 pm
Full name: Arjun Basandrai

Re: Cutechess GUI SPRT doubts

Post by arjunbasandrai »

Thank you so much for clarifying!

I have a couple more doubts regarding cutechess-cli

Like in the gui there is an option to load openings from a polyglot book upto a certain depth. How can I achieve this using the cli?

I ran a test using this command

cutechess-cli -engine conf="Shuffle 4.2.0" -engine conf="Shuffle 4.1.0" -each proto=uci tc=0/20+0.2s book="C:\programming\Machine Learning\Chess\shuffle-chess-engine\src\polyglot\polyglot_opening_books\shuffle.bin" bookdepth=12 -concurrency 20 -rounds 10000 -pgnout 420.pgn -bookmode disk -resultformat wide

Is the book= command doing what I mentioned above?
Also I will add the -sprt line and run the test again.

One other question I had is that, how do you determine as an ultimatum that whether a version is better than others or not.
Which all time controls should I test on and for how many rounds and what metrics should I use to determine this?
Witek
Posts: 87
Joined: Thu Oct 07, 2021 12:48 am
Location: Warsaw, Poland
Full name: Michal Witanowski

Re: Cutechess GUI SPRT doubts

Post by Witek »

arjunbasandrai wrote: Mon Dec 11, 2023 10:02 am Like in the gui there is an option to load openings from a polyglot book upto a certain depth. How can I achieve this using the cli?
I would suggest using one of the books from here: https://github.com/official-stockfish/books
8moves_v3 is a good "balanced" book, and UHO_4060 is good "unbalanced" book.
arjunbasandrai wrote: Mon Dec 11, 2023 10:02 am One other question I had is that, how do you determine as an ultimatum that whether a version is better than others or not.
Which all time controls should I test on and for how many rounds and what metrics should I use to determine this?
If SPRT test passes then you know with >95% certainty that a new version is better. Most people test at STC (short time control) - for example 8+0.08 or 10+0.1. If you have resources you should test at longer time controls.
Author of Caissa Chess Engine: https://github.com/Witek902/Caissa
arjunbasandrai
Posts: 10
Joined: Thu Oct 26, 2023 8:41 pm
Full name: Arjun Basandrai

Re: Cutechess GUI SPRT doubts

Post by arjunbasandrai »

Witek wrote: Thu Dec 14, 2023 2:47 am
arjunbasandrai wrote: Mon Dec 11, 2023 10:02 am Like in the gui there is an option to load openings from a polyglot book upto a certain depth. How can I achieve this using the cli?
I would suggest using one of the books from here: https://github.com/official-stockfish/books
8moves_v3 is a good "balanced" book, and UHO_4060 is good "unbalanced" book.
arjunbasandrai wrote: Mon Dec 11, 2023 10:02 am One other question I had is that, how do you determine as an ultimatum that whether a version is better than others or not.
Which all time controls should I test on and for how many rounds and what metrics should I use to determine this?
If SPRT test passes then you know with >95% certainty that a new version is better. Most people test at STC (short time control) - for example 8+0.08 or 10+0.1. If you have resources you should test at longer time controls.
Thank you so much for the information!