Cutechess GUI SPRT doubts

arjunbasandrai · Post by **arjunbasandrai** » Mon Dec 11, 2023 2:26 am

Hi everyone,

I am developing a chess engine Shuffle - https://github.com/ArjunBasandrai/shuffle-chess-engine/
and I am using Cutechess GUI to test newer versions with previous versions

In the results tab for a tournament there is metric "SPRT". I looked it up on CPW and could not fully understand what this metric represents. Can anyone explain this to me?

Also, why is it always zero (SPRT: llr 0 (0.0%), lbound -inf, ubound inf)?

KhepriChess · Post by **KhepriChess** » Mon Dec 11, 2023 3:12 am

SPRT is a type of test that can be used to determine whether one engine is stronger than another, by a desired value and confidence level. As games between two engines are played, the test compares the results of the games and through a bunch of math calculates the likelihood that one engine is (or is not) stronger than the other by the desired value and confidence.

It's always zero in the results tab because the GUI doesn't support SPRT test, but still shows that result. If you want to do use SPRT, you need to use the CLI.

Ciekce · Post by **Ciekce** » Mon Dec 11, 2023 4:50 am

Note that if you are not SPRTing, you are not testing your engine properly. There are several people on this forum (e.g. amchess) who do not believe in proper testing, but they are objectively wrong.

This involves, if you're using cutechess, using the CLI and passing for example "-sprt elo0=0 elo1=5 alpha=0.05 beta=0.05".

That test tells you, to 95% confidence, whether it is more likely that your change gained 5 elo or 0. You can adjust elo0 and elo1 depending on the hypotheses you wish to test.

arjunbasandrai · Post by **arjunbasandrai** » Mon Dec 11, 2023 10:02 am

Thank you so much for clarifying!

I have a couple more doubts regarding cutechess-cli

Like in the gui there is an option to load openings from a polyglot book upto a certain depth. How can I achieve this using the cli?

I ran a test using this command

cutechess-cli -engine conf="Shuffle 4.2.0" -engine conf="Shuffle 4.1.0" -each proto=uci tc=0/20+0.2s book="C:\programming\Machine Learning\Chess\shuffle-chess-engine\src\polyglot\polyglot_opening_books\shuffle.bin" bookdepth=12 -concurrency 20 -rounds 10000 -pgnout 420.pgn -bookmode disk -resultformat wide

Is the book= command doing what I mentioned above?
Also I will add the -sprt line and run the test again.

One other question I had is that, how do you determine as an ultimatum that whether a version is better than others or not.
Which all time controls should I test on and for how many rounds and what metrics should I use to determine this?

Witek · Post by **Witek** » Thu Dec 14, 2023 2:47 am

arjunbasandrai wrote: ↑Mon Dec 11, 2023 10:02 am Like in the gui there is an option to load openings from a polyglot book upto a certain depth. How can I achieve this using the cli?

I would suggest using one of the books from here: https://github.com/official-stockfish/books
8moves_v3 is a good "balanced" book, and UHO_4060 is good "unbalanced" book.

arjunbasandrai wrote: ↑Mon Dec 11, 2023 10:02 am One other question I had is that, how do you determine as an ultimatum that whether a version is better than others or not.
Which all time controls should I test on and for how many rounds and what metrics should I use to determine this?

If SPRT test passes then you know with >95% certainty that a new version is better. Most people test at STC (short time control) - for example 8+0.08 or 10+0.1. If you have resources you should test at longer time controls.

arjunbasandrai · Post by **arjunbasandrai** » Mon Dec 25, 2023 9:26 am

Witek wrote: ↑Thu Dec 14, 2023 2:47 am
arjunbasandrai wrote: ↑Mon Dec 11, 2023 10:02 am Like in the gui there is an option to load openings from a polyglot book upto a certain depth. How can I achieve this using the cli?
I would suggest using one of the books from here: https://github.com/official-stockfish/books
8moves_v3 is a good "balanced" book, and UHO_4060 is good "unbalanced" book.

arjunbasandrai wrote: ↑Mon Dec 11, 2023 10:02 am One other question I had is that, how do you determine as an ultimatum that whether a version is better than others or not.
Which all time controls should I test on and for how many rounds and what metrics should I use to determine this?
If SPRT test passes then you know with >95% certainty that a new version is better. Most people test at STC (short time control) - for example 8+0.08 or 10+0.1. If you have resources you should test at longer time controls.

Thank you so much for the information!

Cutechess GUI SPRT doubts

Cutechess GUI SPRT doubts

Re: Cutechess GUI SPRT doubts

Re: Cutechess GUI SPRT doubts

Re: Cutechess GUI SPRT doubts

Re: Cutechess GUI SPRT doubts

Re: Cutechess GUI SPRT doubts