I ask here. I d like to know why with many green patches, regression test indicate no progress ? Last time it was +27~ elo and now seems +29 at best. 2 elo with how many green patches ? Do you think the way Stockfish is tested does not work anymore ?
Just asking, not here to offense Stockfish team.
Stockfish no progress in 2month and half , why ?
Moderator: Ras
-
- Posts: 1346
- Joined: Sat Apr 19, 2014 1:47 pm
-
- Posts: 4889
- Joined: Thu Mar 09, 2006 6:34 am
- Location: Pen Argyl, Pennsylvania
Re: Stockfish no progress in 2month and half , why ?
Not speaking for the SF team , but for chess programs in general it's never a straight-line up, self play testing will tend to over exaggerate the benefit of a patch, not all patches are tested ( some are deemed non functional - although I have seen the benchmark nodes change on non-functional patches - simple logic will tell anyone if the benchmark changes, it is not a non-functional patch) and some simplification patches cost ELO - with that said, this is how the process works for any engine . You need simplification patches and sometimes a total re-write because the path you're on is only going to give you minimal ELO gains from here on out. Every engine goes through this and that is most likely the primary reason why singular authors eventually burnout rather quickly ( 10 to 15 years is considered long) - those authors who keep at it for more than 20 - 25 years consecutively and non-stop are a rare bird indeed. (Dart/Hyatt and a few others)JJJ wrote:I ask here. I d like to know why with many green patches, regression test indicate no progress ? Last time it was +27~ elo and now seems +29 at best. 2 elo with how many green patches ? Do you think the way Stockfish is tested does not work anymore ?
Just asking, not here to offense Stockfish team.
-
- Posts: 2851
- Joined: Wed Mar 08, 2006 10:01 pm
- Location: Irvine, CA, USA
Re: Stockfish no progress in 2month and half , why ?
Well, Marco is busy making tablebases not work. That can't help.JJJ wrote:I ask here. I d like to know why with many green patches, regression test indicate no progress ? Last time it was +27~ elo and now seems +29 at best. 2 elo with how many green patches ? Do you think the way Stockfish is tested does not work anymore ?
Just asking, not here to offense Stockfish team.
Deasil is the right way to go.
-
- Posts: 1339
- Joined: Fri Nov 02, 2012 9:43 am
- Location: New Delhi, India
Re: Stockfish no progress in 2month and half , why ?
Stockfish has solved Chess.
Maybe we should all take up Chinese Checkers or something....
Maybe we should all take up Chinese Checkers or something....
i7 5960X @ 4.1 Ghz, 64 GB G.Skill RipJaws RAM, Twin Asus ROG Strix OC 11 GB Geforce 2080 Tis
-
- Posts: 10770
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Stockfish no progress in 2month and half , why ?
most improvement fail at long time control and part of the improvement that pass may be lucky runs(There is a probability of 5% that 0 elo change is going to pass).
I think that the way the stockfish team work is not scientific in order to know the reasons.
The correct way should be to test every patch that they accept including non functional patch with 40,000 games at LTC against previous version(Note that for functional patches I do not suggest not to use SPRT but to have an additional test with fixed number of games also for them).
In this way we can get a better estimate about the value of every patch in elo terms and we can really see if self play really exaggerate the benefit of patches.
Note that I do not believe that the problem is that simplifications lose elo considering the fact that most simplifications that you test at LTC also pass with more than 50%.
I guess that simplifications give positive elo improvement when the problem is that some patches that people consider to be non functional changes that people even do not test lose elo.
Note that
Having the same bench is not a proof that the patch is a non functional patch not only because bench is based on small number of positions but also because bench is not the same as playing a game when you use hash in the next move for analysis.
I think that the way the stockfish team work is not scientific in order to know the reasons.
The correct way should be to test every patch that they accept including non functional patch with 40,000 games at LTC against previous version(Note that for functional patches I do not suggest not to use SPRT but to have an additional test with fixed number of games also for them).
In this way we can get a better estimate about the value of every patch in elo terms and we can really see if self play really exaggerate the benefit of patches.
Note that I do not believe that the problem is that simplifications lose elo considering the fact that most simplifications that you test at LTC also pass with more than 50%.
I guess that simplifications give positive elo improvement when the problem is that some patches that people consider to be non functional changes that people even do not test lose elo.
Note that
Having the same bench is not a proof that the patch is a non functional patch not only because bench is based on small number of positions but also because bench is not the same as playing a game when you use hash in the next move for analysis.
-
- Posts: 565
- Joined: Thu Nov 13, 2014 12:03 pm
Re: Stockfish no progress in 2month and half , why ?
no progress because it gets harder and harder as you get better 

-
- Posts: 2292
- Joined: Mon Sep 29, 2008 1:50 am
Re: Stockfish no progress in 2month and half , why ?
It is much smaller: namely 0.05^2=0.25%.Uri Blass wrote:most improvement fail at long time control and part of the improvement that pass may be lucky runs(There is a probability of 5% that 0 elo change is going to pass).
Probably the majority of the patches that pass STC are lucky runs these days (this will happen for 1 neutral patch in 20). However most of those lucky runs will be caught by the LTC test. This creates somehow the perception that the STC test is not a good predictor for the LTC test, leading people to make misguided calls for increasing the STC TC.
I agree that this would be an interesting experiment. But recall that these are 1-2 elo patches. If you really want to evaluate the value of such patches in a statistically sound way, 40,000 games is far from enough. There is a huge difference with having a procedure that on average evaluates patches correctly, and a procedure that evaluates every individual patch correctly.I think that the way the stockfish team work is not scientific in order to know the reasons.
The correct way should be to test every patch that they accept including non functional patch with 40,000 games at LTC against previous version(Note that for functional patches I do not suggest not to use SPRT but to have an additional test with fixed number of games also for them).
In this way we can get a better estimate about the value of every patch in elo terms and we can really see if self play really exaggerate the benefit of patches.
Note that I do not believe that the problem is that simplifications lose elo considering the fact that most simplifications that you test at LTC also pass with more than 50%.
I guess that simplifications give positive elo improvement when the problem is that some patches that people consider to be non functional changes that people even do not test lose elo.
Note that
Having the same bench is not a proof that the patch is a non functional patch not only because bench is based on small number of positions but also because bench is not the same as playing a game when you use hash in the next move for analysis.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Without ideas there is nothing to simplify.
-
- Posts: 2684
- Joined: Sat Jun 14, 2008 9:17 pm
Re: Stockfish no progress in 2month and half , why ?
This is a comment that makes sense (a novelty in this thread).Michel wrote: Probably the majority of the patches that pass STC are lucky runs these days (this will happen for 1 neutral patch in 20). However most of those lucky runs will be caught by the LTC test. This creates somehow the perception that the STC test is not a good predictor for the LTC test, leading people to make misguided calls for increasing the STC TC.
In these 2 months there has been a huge number of tests and attempts tried by many people, not less then in the past, and for me this is the most important point. It means interest of developers is still high with SF.
Also finding good patches is a statistical process: sometime you find 3 in a row, sometime you fish for months for nothing....
Still too early to tell if we reached a plateau with current development model or it is just a temporary glitch.
-
- Posts: 10770
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Stockfish no progress in 2month and half , why ?
We do not know if the problem is that majority of patches that pass STC are lucky runs or majority of patches are patches that are good only at STC.Michel wrote:
It is much smaller: namely 0.05^2=0.25%.
Probably the majority of the patches that pass STC are lucky runs these days (this will happen for 1 neutral patch in 20). However most of those lucky runs will be caught by the LTC test. This creates somehow the perception that the STC test is not a good predictor for the LTC test, leading people to make misguided calls for increasing the STC TC.
0.05^2 is only for patches that are 0 elo both in short time control and long time control but the probability is different for patches that are good for short time control but not for long time control.
-
- Posts: 545
- Joined: Tue Jun 06, 2017 4:49 pm
- Location: Italy
Re: Stockfish no progress in 2month and half , why ?
I apologize for being so ignorant, but... Is it possible that test conditions should be revisited? What's the suite of the 10000 games per patch? And, to conclude, is it possible that SF7 is playing its % of perfect games under that test conditions so that improvements can't be detected anymore?mcostalba wrote:This is a comment that makes sense (a novelty in this thread).Michel wrote: Probably the majority of the patches that pass STC are lucky runs these days (this will happen for 1 neutral patch in 20). However most of those lucky runs will be caught by the LTC test. This creates somehow the perception that the STC test is not a good predictor for the LTC test, leading people to make misguided calls for increasing the STC TC.
In these 2 months there has been a huge number of tests and attempts tried by many people, not less then in the past, and for me this is the most important point. It means interest of developers is still high with SF.
Also finding good patches is a statistical process: sometime you find 3 in a row, sometime you fish for months for nothing....
Still too early to tell if we reached a plateau with current development model or it is just a temporary glitch.
My two cents.

F.S.I. Chess Teacher