Re: Stockfish no progress in 2month and half , why ?
Posted: Wed Aug 30, 2017 2:07 pm
Current test conditions are able to detect almost always a good patch of at least 1-2 ELO improvement. There is no need to change that. In particular lowering the threshold would (1) commit many neutral patches and this in the long term is very bad (2) commit even regression patches sometime.Rodolfo Leoni wrote:I apologize for being so ignorant, but... Is it possible that test conditions should be revisited? What's the suite of the 10000 games per patch? And, to conclude, is it possible that SF7 is playing its % of perfect games under that test conditions so that improvements can't be detected anymore?mcostalba wrote:This is a comment that makes sense (a novelty in this thread).Michel wrote: Probably the majority of the patches that pass STC are lucky runs these days (this will happen for 1 neutral patch in 20). However most of those lucky runs will be caught by the LTC test. This creates somehow the perception that the STC test is not a good predictor for the LTC test, leading people to make misguided calls for increasing the STC TC.
In these 2 months there has been a huge number of tests and attempts tried by many people, not less then in the past, and for me this is the most important point. It means interest of developers is still high with SF.
Also finding good patches is a statistical process: sometime you find 3 in a row, sometime you fish for months for nothing....
Still too early to tell if we reached a plateau with current development model or it is just a temporary glitch.
My two cents.
The test conditions did not change from last year or the year before and the math behind them did not change too Unfortunately the only way to improve SF is to find good patches, there are no shortcuts or workarounds to this very simple reality.
I'd also like to highlight that detecting tests with 1-2 ELO threshold for submitted patches it is already really low. Very few people here realize how low this threshold is, before fishtest this was not even thinkable to detect such very small improvements in a reliable and statistically sound way.