http://tests.stockfishchess.org/tests/v ... 25cf0fe4c4
http://tests.stockfishchess.org/tests/v ... 25cf0f9041
It seems that they get more than 10 elo improvement at 180+1.8 when there is probably no improvement with 10+0.1 time control.
Maybe many other patches with significant improvement at 180+1.8 and no improvement at 10+0.1 time control are possible but thanks to stockfish's testing we even do not know because they do not test with fixed number of games and they did not test again many patches that failed 10+0.1
Maybe stockfish could be today 50 elo stronger at 180+1.8 in case that they tested systematically with longer time control in the last year.
What is your opinion?
About stockfish's optimizing for bullet
Moderators: hgm, Rebel, chrisw
-
- Posts: 10282
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
-
- Posts: 6442
- Joined: Tue Jan 09, 2007 12:31 am
- Location: PA USA
- Full name: Louis Zulli
Re: About stockfish's optimizing for bullet
It seems that the current testing method is satisfactory for most of the people who use Stockfish. Those who want an engine that might perform better for them than the current one are free to modify the code as they wish, and to test those modifications as they see fit.Uri Blass wrote: ↑Sun Aug 04, 2019 10:18 am http://tests.stockfishchess.org/tests/v ... 25cf0fe4c4
http://tests.stockfishchess.org/tests/v ... 25cf0f9041
It seems that they get more than 10 elo improvement at 180+1.8 when there is probably no improvement with 10+0.1 time control.
Maybe many other patches with significant improvement at 180+1.8 and no improvement at 10+0.1 time control are possible but thanks to stockfish's testing we even do not know because they do not test with fixed number of games and they did not test again many patches that failed 10+0.1
Maybe stockfish could be today 50 elo stronger at 180+1.8 in case that they tested systematically with longer time control in the last year.
What is your opinion?
-
- Posts: 3546
- Joined: Thu Jun 07, 2012 11:02 pm
Re: About stockfish's optimizing for bullet
I think they should move to standardise at a longer time control. At least give it a try.
-
- Posts: 5563
- Joined: Tue Feb 28, 2012 11:56 pm
Re: About stockfish's optimizing for bullet
How can that be given a try? It would mean that vastly fewer tests can be run.Modern Times wrote: ↑Sun Aug 04, 2019 4:02 pm I think they should move to standardise at a longer time control. At least give it a try.
-
- Posts: 3546
- Joined: Thu Jun 07, 2012 11:02 pm
Re: About stockfish's optimizing for bullet
Test would take longer to run yes, and it would mean some sort of priority system in the queue. But I think it is a natural step to take in Stockfish's development. Very controversial subject though.syzygy wrote: ↑Sun Aug 04, 2019 4:13 pmHow can that be given a try? It would mean that vastly fewer tests can be run.Modern Times wrote: ↑Sun Aug 04, 2019 4:02 pm I think they should move to standardise at a longer time control. At least give it a try.
-
- Posts: 10282
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: About stockfish's optimizing for bullet
Less tests does not mean less elo.syzygy wrote: ↑Sun Aug 04, 2019 4:13 pmHow can that be given a try? It would mean that vastly fewer tests can be run.Modern Times wrote: ↑Sun Aug 04, 2019 4:02 pm I think they should move to standardise at a longer time control. At least give it a try.
I believe it is better to have less tests at longer time control in order to improve long time control results.
Of course people are free to disagree with me.
-
- Posts: 4565
- Joined: Sun Mar 12, 2006 2:40 am
- Full name:
Re: About stockfish's optimizing for bullet
Most eval tests are better off with better statistical certainty of more tests than longer time controls, IMO, at least I think for most of SF eval, I can't remember things that were especially bad at long timecontrols, I have never really looked for such eval for LTC myself but I also don't remember anyone posting here or in Fishcooking about examples of that.
I exclude examples one might mention like king safety and passed pawn eval with as reason they are about interaction of eval and search, the terms are often still somewhat 'larger than life', speculative, so search will go look for quiescence. (With more tuning and more hardware thrown at Stockfish, the speculative terms are probably less large than they have been. Although I don't know if that was tested in some way.)
This tuning is mainly search changes and if you read the discussion, it is possible that the fact that the tuning did not pass STC (STC just as a check) is possible because the tuning was only at LTC (which I imagine is pretty costly) but could still be further optimized for STC (considerably less costly if by tuning, or some handtuning/tinkering perhaps followed by testing LTC for non regression). The goal is of course without sacrifing the gain at LTC. Only the tuning algorithm was not asked to do that at all so no wonder if changes do not work so well at STC. There is no need of course to do STC repairs, the goal is always performance at LTC.
I exclude examples one might mention like king safety and passed pawn eval with as reason they are about interaction of eval and search, the terms are often still somewhat 'larger than life', speculative, so search will go look for quiescence. (With more tuning and more hardware thrown at Stockfish, the speculative terms are probably less large than they have been. Although I don't know if that was tested in some way.)
This tuning is mainly search changes and if you read the discussion, it is possible that the fact that the tuning did not pass STC (STC just as a check) is possible because the tuning was only at LTC (which I imagine is pretty costly) but could still be further optimized for STC (considerably less costly if by tuning, or some handtuning/tinkering perhaps followed by testing LTC for non regression). The goal is of course without sacrifing the gain at LTC. Only the tuning algorithm was not asked to do that at all so no wonder if changes do not work so well at STC. There is no need of course to do STC repairs, the goal is always performance at LTC.
Debugging is twice as hard as writing the code in the first
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
place. Therefore, if you write the code as cleverly as possible, you
are, by definition, not smart enough to debug it.
-- Brian W. Kernighan
-
- Posts: 5563
- Joined: Tue Feb 28, 2012 11:56 pm
Re: About stockfish's optimizing for bullet
A mere belief seems to be a rather thin basis for discarding the development model of SF, which so far has proved to be very successful.Uri Blass wrote: ↑Sun Aug 04, 2019 6:56 pmLess tests does not mean less elo.syzygy wrote: ↑Sun Aug 04, 2019 4:13 pmHow can that be given a try? It would mean that vastly fewer tests can be run.Modern Times wrote: ↑Sun Aug 04, 2019 4:02 pm I think they should move to standardise at a longer time control. At least give it a try.
I believe it is better to have less tests at longer time control in order to improve long time control results.
Of course people are free to disagree with me.
-
- Posts: 536
- Joined: Thu Mar 09, 2006 12:53 am
Re: About stockfish's optimizing for bullet
I may be remembering incorrectly, but I think that Vas Rajlich was an early believer that LOTS of very fast engine vs engine games was just as good if not better than a significantly smaller number of slower games and he rode that assumption to the top of the ratings lists with Rybka a fairly long time ago. If I'm right about that, then the idea that a very large number of very fast games is as good as or better than a significantly smaller number of slower games is an idea that has helped propel two very prominent engines in computer chess history to the top of the rating lists and keep them at the top of the list in their respective eras for quite a long while.syzygy wrote: ↑Tue Aug 06, 2019 12:31 amA mere belief seems to be a rather thin basis for discarding the development model of SF, which so far has proved to be very successful.Uri Blass wrote: ↑Sun Aug 04, 2019 6:56 pmLess tests does not mean less elo.syzygy wrote: ↑Sun Aug 04, 2019 4:13 pmHow can that be given a try? It would mean that vastly fewer tests can be run.Modern Times wrote: ↑Sun Aug 04, 2019 4:02 pm I think they should move to standardise at a longer time control. At least give it a try.
I believe it is better to have less tests at longer time control in order to improve long time control results.
Of course people are free to disagree with me.
-
- Posts: 10282
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: About stockfish's optimizing for bullet
I agree that this was correct for Vas when you do not use a lot of hardware to test and when chess engines were relatively weak.royb wrote: ↑Tue Aug 06, 2019 3:58 amI may be remembering incorrectly, but I think that Vas Rajlich was an early believer that LOTS of very fast engine vs engine games was just as good if not better than a significantly smaller number of slower games and he rode that assumption to the top of the ratings lists with Rybka a fairly long time ago. If I'm right about that, then the idea that a very large number of very fast games is as good as or better than a significantly smaller number of slower games is an idea that has helped propel two very prominent engines in computer chess history to the top of the rating lists and keep them at the top of the list in their respective eras for quite a long while.syzygy wrote: ↑Tue Aug 06, 2019 12:31 amA mere belief seems to be a rather thin basis for discarding the development model of SF, which so far has proved to be very successful.Uri Blass wrote: ↑Sun Aug 04, 2019 6:56 pmLess tests does not mean less elo.syzygy wrote: ↑Sun Aug 04, 2019 4:13 pmHow can that be given a try? It would mean that vastly fewer tests can be run.Modern Times wrote: ↑Sun Aug 04, 2019 4:02 pm I think they should move to standardise at a longer time control. At least give it a try.
I believe it is better to have less tests at longer time control in order to improve long time control results.
Of course people are free to disagree with me.
It does not mean it is also correct today when the stockfish team use more than 1000 cores for testing that is probably more than 10 times more hardware than Vas used so it can justify using more than 10 slower time control than the time control that he used.
The improvement of stockfish does not prove that it was impossible to get bigger improvement by testing at longer time control.
There are some easy problems that stockfish does not solve in a reasonable time when even my weak old engine movei solve very fast
[d]rk6/p1r3p1/P3B1Kp/1p2B3/8/8/8/8 w - - 0 1
I wonder if disabling null move in some conditions like small number of legal moves and remaining depth that is not too small(so counting the number of legal moves is not too expensive) can help stockfish at long time control.
Note that I tested it in the past with no significant change at short time control.
I believe that it cannot be optimal that a program that is number 1 can be blind in some simple problems and by blind I do not mean 5 or 10 times slower than other engines but more than 1000 times slower.