Stockfish no progress in 2month and half , why ?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
JJJ
Posts: 1286
Joined: Sat Apr 19, 2014 11:47 am

Stockfish no progress in 2month and half , why ?

Post by JJJ » Mon Aug 28, 2017 12:15 pm

I ask here. I d like to know why with many green patches, regression test indicate no progress ? Last time it was +27~ elo and now seems +29 at best. 2 elo with how many green patches ? Do you think the way Stockfish is tested does not work anymore ?

Just asking, not here to offense Stockfish team.

MikeB
Posts: 3461
Joined: Thu Mar 09, 2006 5:34 am
Location: Pen Argyl, Pennsylvania

Re: Stockfish no progress in 2month and half , why ?

Post by MikeB » Wed Aug 30, 2017 3:21 am

JJJ wrote:I ask here. I d like to know why with many green patches, regression test indicate no progress ? Last time it was +27~ elo and now seems +29 at best. 2 elo with how many green patches ? Do you think the way Stockfish is tested does not work anymore ?

Just asking, not here to offense Stockfish team.
Not speaking for the SF team , but for chess programs in general it's never a straight-line up, self play testing will tend to over exaggerate the benefit of a patch, not all patches are tested ( some are deemed non functional - although I have seen the benchmark nodes change on non-functional patches - simple logic will tell anyone if the benchmark changes, it is not a non-functional patch) and some simplification patches cost ELO - with that said, this is how the process works for any engine . You need simplification patches and sometimes a total re-write because the path you're on is only going to give you minimal ELO gains from here on out. Every engine goes through this and that is most likely the primary reason why singular authors eventually burnout rather quickly ( 10 to 15 years is considered long) - those authors who keep at it for more than 20 - 25 years consecutively and non-stop are a rare bird indeed. (Dart/Hyatt and a few others)

Dirt
Posts: 2851
Joined: Wed Mar 08, 2006 9:01 pm
Location: Irvine, CA, USA

Re: Stockfish no progress in 2month and half , why ?

Post by Dirt » Wed Aug 30, 2017 4:06 am

JJJ wrote:I ask here. I d like to know why with many green patches, regression test indicate no progress ? Last time it was +27~ elo and now seems +29 at best. 2 elo with how many green patches ? Do you think the way Stockfish is tested does not work anymore ?

Just asking, not here to offense Stockfish team.
Well, Marco is busy making tablebases not work. That can't help.
Deasil is the right way to go.

shrapnel
Posts: 1198
Joined: Fri Nov 02, 2012 8:43 am
Location: New Delhi, India

Re: Stockfish no progress in 2month and half , why ?

Post by shrapnel » Wed Aug 30, 2017 4:24 am

Stockfish has solved Chess.
Maybe we should all take up Chinese Checkers or something....
i7 5960X @ 4.1 Ghz, 64 GB G.Skill RipJaws RAM, Twin Asus ROG Strix OC 11 GB Geforce 2080 Tis

Uri Blass
Posts: 8586
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Stockfish no progress in 2month and half , why ?

Post by Uri Blass » Wed Aug 30, 2017 4:26 am

most improvement fail at long time control and part of the improvement that pass may be lucky runs(There is a probability of 5% that 0 elo change is going to pass).

I think that the way the stockfish team work is not scientific in order to know the reasons.

The correct way should be to test every patch that they accept including non functional patch with 40,000 games at LTC against previous version(Note that for functional patches I do not suggest not to use SPRT but to have an additional test with fixed number of games also for them).

In this way we can get a better estimate about the value of every patch in elo terms and we can really see if self play really exaggerate the benefit of patches.

Note that I do not believe that the problem is that simplifications lose elo considering the fact that most simplifications that you test at LTC also pass with more than 50%.

I guess that simplifications give positive elo improvement when the problem is that some patches that people consider to be non functional changes that people even do not test lose elo.

Note that
Having the same bench is not a proof that the patch is a non functional patch not only because bench is based on small number of positions but also because bench is not the same as playing a game when you use hash in the next move for analysis.

whereagles
Posts: 561
Joined: Thu Nov 13, 2014 11:03 am

Re: Stockfish no progress in 2month and half , why ?

Post by whereagles » Wed Aug 30, 2017 8:26 am

no progress because it gets harder and harder as you get better :)

Michel
Posts: 2046
Joined: Sun Sep 28, 2008 11:50 pm

Re: Stockfish no progress in 2month and half , why ?

Post by Michel » Wed Aug 30, 2017 8:46 am

Uri Blass wrote:most improvement fail at long time control and part of the improvement that pass may be lucky runs(There is a probability of 5% that 0 elo change is going to pass).
It is much smaller: namely 0.05^2=0.25%.

Probably the majority of the patches that pass STC are lucky runs these days (this will happen for 1 neutral patch in 20). However most of those lucky runs will be caught by the LTC test. This creates somehow the perception that the STC test is not a good predictor for the LTC test, leading people to make misguided calls for increasing the STC TC.
I think that the way the stockfish team work is not scientific in order to know the reasons.

The correct way should be to test every patch that they accept including non functional patch with 40,000 games at LTC against previous version(Note that for functional patches I do not suggest not to use SPRT but to have an additional test with fixed number of games also for them).

In this way we can get a better estimate about the value of every patch in elo terms and we can really see if self play really exaggerate the benefit of patches.
I agree that this would be an interesting experiment. But recall that these are 1-2 elo patches. If you really want to evaluate the value of such patches in a statistically sound way, 40,000 games is far from enough. There is a huge difference with having a procedure that on average evaluates patches correctly, and a procedure that evaluates every individual patch correctly.


Note that I do not believe that the problem is that simplifications lose elo considering the fact that most simplifications that you test at LTC also pass with more than 50%.

I guess that simplifications give positive elo improvement when the problem is that some patches that people consider to be non functional changes that people even do not test lose elo.

Note that
Having the same bench is not a proof that the patch is a non functional patch not only because bench is based on small number of positions but also because bench is not the same as playing a game when you use hash in the next move for analysis.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.

mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 7:17 pm

Re: Stockfish no progress in 2month and half , why ?

Post by mcostalba » Wed Aug 30, 2017 9:53 am

Michel wrote: Probably the majority of the patches that pass STC are lucky runs these days (this will happen for 1 neutral patch in 20). However most of those lucky runs will be caught by the LTC test. This creates somehow the perception that the STC test is not a good predictor for the LTC test, leading people to make misguided calls for increasing the STC TC.
This is a comment that makes sense (a novelty in this thread).

In these 2 months there has been a huge number of tests and attempts tried by many people, not less then in the past, and for me this is the most important point. It means interest of developers is still high with SF.

Also finding good patches is a statistical process: sometime you find 3 in a row, sometime you fish for months for nothing....

Still too early to tell if we reached a plateau with current development model or it is just a temporary glitch.

Uri Blass
Posts: 8586
Joined: Wed Mar 08, 2006 11:37 pm
Location: Tel-Aviv Israel

Re: Stockfish no progress in 2month and half , why ?

Post by Uri Blass » Wed Aug 30, 2017 11:03 am

Michel wrote:
It is much smaller: namely 0.05^2=0.25%.

Probably the majority of the patches that pass STC are lucky runs these days (this will happen for 1 neutral patch in 20). However most of those lucky runs will be caught by the LTC test. This creates somehow the perception that the STC test is not a good predictor for the LTC test, leading people to make misguided calls for increasing the STC TC.
We do not know if the problem is that majority of patches that pass STC are lucky runs or majority of patches are patches that are good only at STC.

0.05^2 is only for patches that are 0 elo both in short time control and long time control but the probability is different for patches that are good for short time control but not for long time control.

Rodolfo Leoni
Posts: 544
Joined: Tue Jun 06, 2017 2:49 pm
Location: Italy

Re: Stockfish no progress in 2month and half , why ?

Post by Rodolfo Leoni » Wed Aug 30, 2017 11:21 am

mcostalba wrote:
Michel wrote: Probably the majority of the patches that pass STC are lucky runs these days (this will happen for 1 neutral patch in 20). However most of those lucky runs will be caught by the LTC test. This creates somehow the perception that the STC test is not a good predictor for the LTC test, leading people to make misguided calls for increasing the STC TC.
This is a comment that makes sense (a novelty in this thread).

In these 2 months there has been a huge number of tests and attempts tried by many people, not less then in the past, and for me this is the most important point. It means interest of developers is still high with SF.

Also finding good patches is a statistical process: sometime you find 3 in a row, sometime you fish for months for nothing....

Still too early to tell if we reached a plateau with current development model or it is just a temporary glitch.
I apologize for being so ignorant, but... Is it possible that test conditions should be revisited? What's the suite of the 10000 games per patch? And, to conclude, is it possible that SF7 is playing its % of perfect games under that test conditions so that improvements can't be detected anymore?

My two cents. :)
F.S.I. Chess Teacher

Post Reply