SF testing framework

jarkkop · Post by **jarkkop** » Fri Apr 25, 2014 12:01 pm

Can someone explain what all these values below mean(values in parenthesis and brackets).
Do I understand this correctly?
How can this test run show positive improvement of 1.36 ELO even
thou there are more losses (11896) than wins (11850)?

LLR: 1.36 (-2.94,2.94) [-3.00,1.00]
Total: 77733 W: 11850 L: 11896 D: 53987

Regards

Jarkko

Vinvin · Post by **Vinvin** » Fri Apr 25, 2014 12:37 pm

jarkkop wrote:Can someone explain what all these values below mean(values in parenthesis and brackets).
Do I understand this correctly?
How can this test run show positive improvement of 1.36 ELO even
thou there are more losses (11896) than wins (11850)?

LLR: 1.36 (-2.94,2.94) [-3.00,1.00]
Total: 77733 W: 11850 L: 11896 D: 53987

Regards

Jarkko

The range [-3.00,1.00] start at -3 that means the test accepts slightly negative values. If you count, 46 points in 77733 game is about nothing (50,06%-49,94%).

Uri Blass · Post by **Uri Blass** » Fri Apr 25, 2014 1:57 pm

jarkkop wrote:Can someone explain what all these values below mean(values in parenthesis and brackets).
Do I understand this correctly?
How can this test run show positive improvement of 1.36 ELO even
thou there are more losses (11896) than wins (11850)?

LLR: 1.36 (-2.94,2.94) [-3.00,1.00]
Total: 77733 W: 11850 L: 11896 D: 53987

Regards

Jarkko

positive LLR does not mean improvement in strength.

baiscally the test is testing H0 against H1 when
H0 is losing 3/1.6 elo and
H1 is winning 1/1.6 elo.

3/1.6 elo reduction has probablity of 5% to pass
1/1.6 elo improvement has probability of 5% to fail

The test is done both in short time control and in long time control so the probability of 1 elo regression to pass twice is practically very small(less than 5% if I remember correctly) and the idea is that you are ready to accept a small regression to make the code simpler.

Michel · Post by **Michel** » Fri Apr 25, 2014 2:32 pm

The test is done both in short time control and in long time control so the probability of 1 elo regression to pass twice is practically very small(less than 5% if I remember correctly) and the idea is that you are ready to accept a small regression to make the code simpler.

The point is that simplifications may also _gain_ a bit of elo and the gamble is that those gains outweigh the occasional very small regression.

What kind of no regression mode you choose basically depends on how pessimistic you are about the distribution of positive versus negative simplifications.

On this particular point Marco is extremely pessimistic and that's why we have the SPRT(-3,1). This test has the sad disadvantage that you cannot draw any conclusions from it if it fails: it has a 30% probability of giving the wrong answer on a neutral simplification. So it is not suitable if you really want a yes/no answer.

This being said, I must confess that the SPRT(-3,1) test has worked rather well so far and has allowed a number of non-trivial simplifications to pass.

Ajedrecista · Post by **Ajedrecista** » Sat Apr 26, 2014 11:54 am

Hello Jarkko:

jarkkop wrote:Can someone explain what all these values below mean(values in parenthesis and brackets).
Do I understand this correctly?
How can this test run show positive improvement of 1.36 ELO even
thou there are more losses (11896) than wins (11850)?

LLR: 1.36 (-2.94,2.94) [-3.00,1.00]
Total: 77733 W: 11850 L: 11896 D: 53987

Regards

Jarkko

There was a very similar thread in September of 2013:

Stats and bench on Stockfish development site

I expect you will find useful the explanation given by me there:

Re: Stats and bench on Stockfish development site.

Please feel free to ask if you have more doubts.

Regards from Spain.

Ajedrecista.

SF testing framework

SF testing framework

Re: SF testing framework

Re: SF testing framework

Re: SF testing framework

Re: SF testing framework.