Re: SF was more seriously handicapped than I thought
Posted: Wed Jan 03, 2018 3:45 am
Forget about ELO calculations for a moment, did you see the QUALITY of the games ? I didn't expect such a Post from you. Anyway, you've made Milos and Tsvetkov very happy.Laskos wrote:Sorry for bringing A0 topic again, but this remarkable achievement of using NN + MCTS, and beating categorically the top conventional engine, is worth for me to have another short look at it.
First, I observed that hash size matters more than I thought. At 3s/move (1 thread) on my PC, hash is filled to some 50MB, and optimal hash size would be 128MB (40% hashfull). It was derived that the optimal hash needed against A0 was 128GB, but only 1GB was used. So, on my PC, I measured the effect of 128MB hash (optimal) against 1MB hash with SF dev at 3s/move. The result in 1000 games was a pretty surprising to me
+189 -66 =745
or +43 Elo points
In A0 versus SF8, that effect would be smaller (diminishing gains at LTC and hardware used), but not negligible.
Then, I decided to test SF dev at 12s/move (as an emulation of A0) versus SF dev at 3s/move, using 2moves_v1.epd for diversity of openings. The result was:
+19 -1 =20
So, an actual overshoot of what A0 did to SF8, but that doesn't bother me, as again, diminishing gains are at work in that real A0 match.
I left A0 as it was (SF dev at 12s/move), enabling it with a general and solid 3moves_GM.epd openings for variety, but pitted it against full panoply SF dev now. This full panoply SF dev is BrainFish + Cerebellum + Time Control 105''+ 1'' (equivalent in total time used to 3s/move) + Syzygy-6 from SSD. And the result is:
+3 -2 =35 for A0 (or SF dev at 12s/move).
The change from the previous result is pretty drastic. It is probably exaggerated by the fact that Cerebellum book is an anti-Stockfish book, but nevertheless, the draw rate increases dramatically, and the strength difference now is small.
That was just nitpicking, as I am sure if DeepMind will try seriously to improve upon A0, it will surpass anyway dramatically any conventional engine. Their achievement is remarkable, just a reminder to myself that this "panoply" of engines is not that unimportant.
Spot on !A stronger Stockfish wouldn't have changed anything because its problems are at the core of its approach, the whole point was to show how it thought itself to be so strong, and it's not like Stockfish played in a Pentium 4 with 1MB RAM.