Zenmastur wrote: ↑Wed Dec 11, 2019 11:58 pm
Uri Blass wrote: ↑Wed Dec 11, 2019 11:43 pm
Zenmastur wrote: ↑Wed Dec 11, 2019 7:12 pm
Uri Blass wrote: ↑Wed Dec 11, 2019 6:16 pm
I think that one of the problem with testing in the stockfish framework is that you adjudicate games as a draw when the evaluation is 0.00 for some moves and there is no progress and do not continue until you see a draw by the 50 move rule.
It means that even if some patch make an improvement then it will probably not pass the tests in the framework because the games are adjudicated too early as a draw.
I'm not sure that's the problem.
The bounds used for simplification tests are [-3.00,1.00]. This allows to many regressions. It would be much more balanced if they changed the bounds to [-2.00,2.00]. Even [-2.50,1.50] would help.
I do not believe that this is the problem.
Simplifications usually pass with more than 50% and there is no proof that stockfish without the simplifications is better.
Well if you look at the simplifications and their regressions found here:
https://nextchessmove.com/dev-builds
You can see that many of them are regressions and that over time they seem to loose almost as many ELO as the rest of the patchs gain. This has recently lead to several months of basically no change in SF's ELO.
There are not enough games to know if a simplification is a regression or an improvement but you can get an unbiased estimate for the average value of simplifications from stockfish10.
These are the first numbers and you need to get more numbers from the link and calculate average for that purpose.
At least when I look at the first numbers it seems to me that the average is positive.
209.73->207.78(-1.95 elo) 1.12.2018 simplification
208.88->206.03(-2.85 elo) 6.12.2018 simplification
208.75->211.58(2.83 elo) 16.12.2018 simplification
214.03->216.00(1.97 elo) 16.12.2018 simplification
214.25->213.08(-1.17 elo) 24.12.2018 simplification
212.66->213.98(1.32 elo) 27.12.2018 simplification
209.88->210.54(0.66 elo) 4.1.2019 simplification
211.45->215.12(3.67 elo) 10.1.2019 simplification
215.12->212.84(-2.28 elo) 14.1.2019 simplification
212.84->212.17(-0.67 elo) 14.1.2019 simplification
212.17->216.75(4.58 elo) 17.1.2019 simplification
215.25->217.07(1.82 elo) 22.1.2019 simplification
216.10->215.10(-1 elo) 29.1.2019 simplification
215.10->221.39(6.29 elo) 31.1.2019 simplification
217.75->219.64(1.89 elo) 8.2.2019 simplification
219.64->220.48(0.84 elo) 21.2.2019 simplification
220.48->220.45(-0.03 elo) 21.2.2019 simplification
218.64->219.64(1 elo) 27.2.2019 simplification
220.93->218.38->220.45(-0.48 elo) 5.3 simplifications
219.49->220.93(+1.44 elo) 10.3 simplification
219.87->218.20(-1.67 elo) 20.3 simplification
221.09->218.53(-2.56 elo) 24.3 simplification
217.85->218.81(0.96 elo) 4.4 simplification
223.36->220.86->221.82(-1.54 elo) 13.4 simplifications
219.64->219.14->218.61->219.15(-0.49 elo) 16.4 smplifications
219.15->220.30(1.15 elo) 17.4 simplification
218.51->218.61(0.1 elo) 19.4 simplification
221.37->220.81->225.70(4.33 elo) 9.5 simplifications