About stockfish's optimizing for bullet

Uri Blass · Post by **Uri Blass** » Sun Aug 04, 2019 10:18 am

http://tests.stockfishchess.org/tests/v ... 25cf0fe4c4

http://tests.stockfishchess.org/tests/v ... 25cf0f9041

It seems that they get more than 10 elo improvement at 180+1.8 when there is probably no improvement with 10+0.1 time control.
Maybe many other patches with significant improvement at 180+1.8 and no improvement at 10+0.1 time control are possible but thanks to stockfish's testing we even do not know because they do not test with fixed number of games and they did not test again many patches that failed 10+0.1

Maybe stockfish could be today 50 elo stronger at 180+1.8 in case that they tested systematically with longer time control in the last year.

What is your opinion?

zullil · Post by **zullil** » Sun Aug 04, 2019 1:12 pm

Uri Blass wrote: ↑Sun Aug 04, 2019 10:18 am http://tests.stockfishchess.org/tests/v ... 25cf0fe4c4

http://tests.stockfishchess.org/tests/v ... 25cf0f9041

It seems that they get more than 10 elo improvement at 180+1.8 when there is probably no improvement with 10+0.1 time control.
Maybe many other patches with significant improvement at 180+1.8 and no improvement at 10+0.1 time control are possible but thanks to stockfish's testing we even do not know because they do not test with fixed number of games and they did not test again many patches that failed 10+0.1

Maybe stockfish could be today 50 elo stronger at 180+1.8 in case that they tested systematically with longer time control in the last year.

What is your opinion?

It seems that the current testing method is satisfactory for most of the people who use Stockfish. Those who want an engine that might perform better for them than the current one are free to modify the code as they wish, and to test those modifications as they see fit.

Modern Times · Post by **Modern Times** » Sun Aug 04, 2019 4:02 pm

I think they should move to standardise at a longer time control. At least give it a try.

syzygy · Post by **syzygy** » Sun Aug 04, 2019 4:13 pm

Modern Times wrote: ↑Sun Aug 04, 2019 4:02 pm I think they should move to standardise at a longer time control. At least give it a try.

How can that be given a try? It would mean that vastly fewer tests can be run.

Modern Times · Post by **Modern Times** » Sun Aug 04, 2019 4:33 pm

syzygy wrote: ↑Sun Aug 04, 2019 4:13 pm
Modern Times wrote: ↑Sun Aug 04, 2019 4:02 pm I think they should move to standardise at a longer time control. At least give it a try.
How can that be given a try? It would mean that vastly fewer tests can be run.

Test would take longer to run yes, and it would mean some sort of priority system in the queue. But I think it is a natural step to take in Stockfish's development. Very controversial subject though.

Uri Blass · Post by **Uri Blass** » Sun Aug 04, 2019 6:56 pm

syzygy wrote: ↑Sun Aug 04, 2019 4:13 pm
Modern Times wrote: ↑Sun Aug 04, 2019 4:02 pm I think they should move to standardise at a longer time control. At least give it a try.
How can that be given a try? It would mean that vastly fewer tests can be run.

Less tests does not mean less elo.
I believe it is better to have less tests at longer time control in order to improve long time control results.

Of course people are free to disagree with me.

Eelco de Groot · Post by **Eelco de Groot** » Sun Aug 04, 2019 7:29 pm

Most eval tests are better off with better statistical certainty of more tests than longer time controls, IMO, at least I think for most of SF eval, I can't remember things that were especially bad at long timecontrols, I have never really looked for such eval for LTC myself but I also don't remember anyone posting here or in Fishcooking about examples of that.

I exclude examples one might mention like king safety and passed pawn eval with as reason they are about interaction of eval and search, the terms are often still somewhat 'larger than life', speculative, so search will go look for quiescence. (With more tuning and more hardware thrown at Stockfish, the speculative terms are probably less large than they have been. Although I don't know if that was tested in some way.)

This tuning is mainly search changes and if you read the discussion, it is possible that the fact that the tuning did not pass STC (STC just as a check) is possible because the tuning was only at LTC (which I imagine is pretty costly) but could still be further optimized for STC (considerably less costly if by tuning, or some handtuning/tinkering perhaps followed by testing LTC for non regression). The goal is of course without sacrifing the gain at LTC. Only the tuning algorithm was not asked to do that at all so no wonder if changes do not work so well at STC. There is no need of course to do STC repairs, the goal is always performance at LTC.

syzygy · Post by **syzygy** » Tue Aug 06, 2019 12:31 am

Uri Blass wrote: ↑Sun Aug 04, 2019 6:56 pm
syzygy wrote: ↑Sun Aug 04, 2019 4:13 pm
Modern Times wrote: ↑Sun Aug 04, 2019 4:02 pm I think they should move to standardise at a longer time control. At least give it a try.
How can that be given a try? It would mean that vastly fewer tests can be run.
Less tests does not mean less elo.
I believe it is better to have less tests at longer time control in order to improve long time control results.

Of course people are free to disagree with me.

A mere belief seems to be a rather thin basis for discarding the development model of SF, which so far has proved to be very successful.

royb · Post by **royb** » Tue Aug 06, 2019 3:58 am

syzygy wrote: ↑Tue Aug 06, 2019 12:31 am
Uri Blass wrote: ↑Sun Aug 04, 2019 6:56 pm
syzygy wrote: ↑Sun Aug 04, 2019 4:13 pm
Modern Times wrote: ↑Sun Aug 04, 2019 4:02 pm I think they should move to standardise at a longer time control. At least give it a try.
How can that be given a try? It would mean that vastly fewer tests can be run.
Less tests does not mean less elo.
I believe it is better to have less tests at longer time control in order to improve long time control results.

Of course people are free to disagree with me.
A mere belief seems to be a rather thin basis for discarding the development model of SF, which so far has proved to be very successful.

I may be remembering incorrectly, but I think that Vas Rajlich was an early believer that LOTS of very fast engine vs engine games was just as good if not better than a significantly smaller number of slower games and he rode that assumption to the top of the ratings lists with Rybka a fairly long time ago. If I'm right about that, then the idea that a very large number of very fast games is as good as or better than a significantly smaller number of slower games is an idea that has helped propel two very prominent engines in computer chess history to the top of the rating lists and keep them at the top of the list in their respective eras for quite a long while.

Uri Blass · Post by **Uri Blass** » Tue Aug 06, 2019 5:20 am

royb wrote: ↑Tue Aug 06, 2019 3:58 am
syzygy wrote: ↑Tue Aug 06, 2019 12:31 am
Uri Blass wrote: ↑Sun Aug 04, 2019 6:56 pm
syzygy wrote: ↑Sun Aug 04, 2019 4:13 pm
Modern Times wrote: ↑Sun Aug 04, 2019 4:02 pm I think they should move to standardise at a longer time control. At least give it a try.
How can that be given a try? It would mean that vastly fewer tests can be run.
Less tests does not mean less elo.
I believe it is better to have less tests at longer time control in order to improve long time control results.

Of course people are free to disagree with me.
A mere belief seems to be a rather thin basis for discarding the development model of SF, which so far has proved to be very successful.
I may be remembering incorrectly, but I think that Vas Rajlich was an early believer that LOTS of very fast engine vs engine games was just as good if not better than a significantly smaller number of slower games and he rode that assumption to the top of the ratings lists with Rybka a fairly long time ago. If I'm right about that, then the idea that a very large number of very fast games is as good as or better than a significantly smaller number of slower games is an idea that has helped propel two very prominent engines in computer chess history to the top of the rating lists and keep them at the top of the list in their respective eras for quite a long while.

I agree that this was correct for Vas when you do not use a lot of hardware to test and when chess engines were relatively weak.

It does not mean it is also correct today when the stockfish team use more than 1000 cores for testing that is probably more than 10 times more hardware than Vas used so it can justify using more than 10 slower time control than the time control that he used.

The improvement of stockfish does not prove that it was impossible to get bigger improvement by testing at longer time control.
There are some easy problems that stockfish does not solve in a reasonable time when even my weak old engine movei solve very fast

[d]rk6/p1r3p1/P3B1Kp/1p2B3/8/8/8/8 w - - 0 1

I wonder if disabling null move in some conditions like small number of legal moves and remaining depth that is not too small(so counting the number of legal moves is not too expensive) can help stockfish at long time control.

Note that I tested it in the past with no significant change at short time control.

I believe that it cannot be optimal that a program that is number 1 can be blind in some simple problems and by blind I do not mean 5 or 10 times slower than other engines but more than 1000 times slower.

About stockfish's optimizing for bullet

About stockfish's optimizing for bullet

Re: About stockfish's optimizing for bullet

Re: About stockfish's optimizing for bullet

Re: About stockfish's optimizing for bullet

Re: About stockfish's optimizing for bullet

Re: About stockfish's optimizing for bullet

Re: About stockfish's optimizing for bullet

Re: About stockfish's optimizing for bullet

Re: About stockfish's optimizing for bullet

Re: About stockfish's optimizing for bullet