Can someone explain what all these values below mean(values in parenthesis and brackets).
Do I understand this correctly?
How can this test run show positive improvement of 1.36 ELO even
thou there are more losses (11896) than wins (11850)?
LLR: 1.36 (-2.94,2.94) [-3.00,1.00]
Total: 77733 W: 11850 L: 11896 D: 53987
Regards
Jarkko
SF testing framework
Moderators: hgm, Rebel, chrisw
-
- Posts: 198
- Joined: Thu Mar 09, 2006 2:44 am
- Location: Helsinki, Finland
-
- Posts: 5228
- Joined: Thu Mar 09, 2006 9:40 am
- Full name: Vincent Lejeune
Re: SF testing framework
The range [-3.00,1.00] start at -3 that means the test accepts slightly negative values. If you count, 46 points in 77733 game is about nothing (50,06%-49,94%).jarkkop wrote:Can someone explain what all these values below mean(values in parenthesis and brackets).
Do I understand this correctly?
How can this test run show positive improvement of 1.36 ELO even
thou there are more losses (11896) than wins (11850)?
LLR: 1.36 (-2.94,2.94) [-3.00,1.00]
Total: 77733 W: 11850 L: 11896 D: 53987
Regards
Jarkko
-
- Posts: 10309
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: SF testing framework
positive LLR does not mean improvement in strength.jarkkop wrote:Can someone explain what all these values below mean(values in parenthesis and brackets).
Do I understand this correctly?
How can this test run show positive improvement of 1.36 ELO even
thou there are more losses (11896) than wins (11850)?
LLR: 1.36 (-2.94,2.94) [-3.00,1.00]
Total: 77733 W: 11850 L: 11896 D: 53987
Regards
Jarkko
baiscally the test is testing H0 against H1 when
H0 is losing 3/1.6 elo and
H1 is winning 1/1.6 elo.
3/1.6 elo reduction has probablity of 5% to pass
1/1.6 elo improvement has probability of 5% to fail
The test is done both in short time control and in long time control so the probability of 1 elo regression to pass twice is practically very small(less than 5% if I remember correctly) and the idea is that you are ready to accept a small regression to make the code simpler.
-
- Posts: 2272
- Joined: Mon Sep 29, 2008 1:50 am
Re: SF testing framework
The point is that simplifications may also _gain_ a bit of elo and the gamble is that those gains outweigh the occasional very small regression.The test is done both in short time control and in long time control so the probability of 1 elo regression to pass twice is practically very small(less than 5% if I remember correctly) and the idea is that you are ready to accept a small regression to make the code simpler.
What kind of no regression mode you choose basically depends on how pessimistic you are about the distribution of positive versus negative simplifications.
On this particular point Marco is extremely pessimistic and that's why we have the SPRT(-3,1). This test has the sad disadvantage that you cannot draw any conclusions from it if it fails: it has a 30% probability of giving the wrong answer on a neutral simplification. So it is not suitable if you really want a yes/no answer.
This being said, I must confess that the SPRT(-3,1) test has worked rather well so far and has allowed a number of non-trivial simplifications to pass.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 1971
- Joined: Wed Jul 13, 2011 9:04 pm
- Location: Madrid, Spain.
Re: SF testing framework.
Hello Jarkko:
Stats and bench on Stockfish development site
I expect you will find useful the explanation given by me there:
Re: Stats and bench on Stockfish development site.
Please feel free to ask if you have more doubts.
Regards from Spain.
Ajedrecista.
There was a very similar thread in September of 2013:jarkkop wrote:Can someone explain what all these values below mean(values in parenthesis and brackets).
Do I understand this correctly?
How can this test run show positive improvement of 1.36 ELO even
thou there are more losses (11896) than wins (11850)?
LLR: 1.36 (-2.94,2.94) [-3.00,1.00]
Total: 77733 W: 11850 L: 11896 D: 53987
Regards
Jarkko
Stats and bench on Stockfish development site
I expect you will find useful the explanation given by me there:
Re: Stats and bench on Stockfish development site.
Please feel free to ask if you have more doubts.
Regards from Spain.
Ajedrecista.