SF testing framework

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

jarkkop
Posts: 198
Joined: Thu Mar 09, 2006 2:44 am
Location: Helsinki, Finland

SF testing framework

Post by jarkkop »

Can someone explain what all these values below mean(values in parenthesis and brackets).
Do I understand this correctly?
How can this test run show positive improvement of 1.36 ELO even
thou there are more losses (11896) than wins (11850)?

LLR: 1.36 (-2.94,2.94) [-3.00,1.00]
Total: 77733 W: 11850 L: 11896 D: 53987


Regards

Jarkko
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: SF testing framework

Post by Vinvin »

jarkkop wrote:Can someone explain what all these values below mean(values in parenthesis and brackets).
Do I understand this correctly?
How can this test run show positive improvement of 1.36 ELO even
thou there are more losses (11896) than wins (11850)?

LLR: 1.36 (-2.94,2.94) [-3.00,1.00]
Total: 77733 W: 11850 L: 11896 D: 53987


Regards

Jarkko
The range [-3.00,1.00] start at -3 that means the test accepts slightly negative values. If you count, 46 points in 77733 game is about nothing (50,06%-49,94%).
Uri Blass
Posts: 10269
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: SF testing framework

Post by Uri Blass »

jarkkop wrote:Can someone explain what all these values below mean(values in parenthesis and brackets).
Do I understand this correctly?
How can this test run show positive improvement of 1.36 ELO even
thou there are more losses (11896) than wins (11850)?

LLR: 1.36 (-2.94,2.94) [-3.00,1.00]
Total: 77733 W: 11850 L: 11896 D: 53987


Regards

Jarkko
positive LLR does not mean improvement in strength.

baiscally the test is testing H0 against H1 when
H0 is losing 3/1.6 elo and
H1 is winning 1/1.6 elo.

3/1.6 elo reduction has probablity of 5% to pass
1/1.6 elo improvement has probability of 5% to fail

The test is done both in short time control and in long time control so the probability of 1 elo regression to pass twice is practically very small(less than 5% if I remember correctly) and the idea is that you are ready to accept a small regression to make the code simpler.
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: SF testing framework

Post by Michel »

The test is done both in short time control and in long time control so the probability of 1 elo regression to pass twice is practically very small(less than 5% if I remember correctly) and the idea is that you are ready to accept a small regression to make the code simpler.
The point is that simplifications may also _gain_ a bit of elo and the gamble is that those gains outweigh the occasional very small regression.

What kind of no regression mode you choose basically depends on how pessimistic you are about the distribution of positive versus negative simplifications.

On this particular point Marco is extremely pessimistic and that's why we have the SPRT(-3,1). This test has the sad disadvantage that you cannot draw any conclusions from it if it fails: it has a 30% probability of giving the wrong answer on a neutral simplification. So it is not suitable if you really want a yes/no answer.

This being said, I must confess that the SPRT(-3,1) test has worked rather well so far and has allowed a number of non-trivial simplifications to pass.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
User avatar
Ajedrecista
Posts: 1968
Joined: Wed Jul 13, 2011 9:04 pm
Location: Madrid, Spain.

Re: SF testing framework.

Post by Ajedrecista »

Hello Jarkko:
jarkkop wrote:Can someone explain what all these values below mean(values in parenthesis and brackets).
Do I understand this correctly?
How can this test run show positive improvement of 1.36 ELO even
thou there are more losses (11896) than wins (11850)?

LLR: 1.36 (-2.94,2.94) [-3.00,1.00]
Total: 77733 W: 11850 L: 11896 D: 53987


Regards

Jarkko
There was a very similar thread in September of 2013:

Stats and bench on Stockfish development site

I expect you will find useful the explanation given by me there:

Re: Stats and bench on Stockfish development site.

Please feel free to ask if you have more doubts.

Regards from Spain.

Ajedrecista.