SF was more seriously handicapped than I thought

shrapnel · Post by **shrapnel** » Wed Jan 03, 2018 3:45 am

Laskos wrote:Sorry for bringing A0 topic again, but this remarkable achievement of using NN + MCTS, and beating categorically the top conventional engine, is worth for me to have another short look at it.

First, I observed that hash size matters more than I thought. At 3s/move (1 thread) on my PC, hash is filled to some 50MB, and optimal hash size would be 128MB (40% hashfull). It was derived that the optimal hash needed against A0 was 128GB, but only 1GB was used. So, on my PC, I measured the effect of 128MB hash (optimal) against 1MB hash with SF dev at 3s/move. The result in 1000 games was a pretty surprising to me

+189 -66 =745
or +43 Elo points

In A0 versus SF8, that effect would be smaller (diminishing gains at LTC and hardware used), but not negligible.

Then, I decided to test SF dev at 12s/move (as an emulation of A0) versus SF dev at 3s/move, using 2moves_v1.epd for diversity of openings. The result was:

+19 -1 =20

So, an actual overshoot of what A0 did to SF8, but that doesn't bother me, as again, diminishing gains are at work in that real A0 match.

I left A0 as it was (SF dev at 12s/move), enabling it with a general and solid 3moves_GM.epd openings for variety, but pitted it against full panoply SF dev now. This full panoply SF dev is BrainFish + Cerebellum + Time Control 105''+ 1'' (equivalent in total time used to 3s/move) + Syzygy-6 from SSD. And the result is:

+3 -2 =35 for A0 (or SF dev at 12s/move).

The change from the previous result is pretty drastic. It is probably exaggerated by the fact that Cerebellum book is an anti-Stockfish book, but nevertheless, the draw rate increases dramatically, and the strength difference now is small.

That was just nitpicking, as I am sure if DeepMind will try seriously to improve upon A0, it will surpass anyway dramatically any conventional engine. Their achievement is remarkable, just a reminder to myself that this "panoply" of engines is not that unimportant.

Forget about ELO calculations for a moment, did you see the QUALITY of the games ? I didn't expect such a Post from you. Anyway, you've made Milos and Tsvetkov very happy.

A stronger Stockfish wouldn't have changed anything because its problems are at the core of its approach, the whole point was to show how it thought itself to be so strong, and it's not like Stockfish played in a Pentium 4 with 1MB RAM.

Spot on !

Laskos · Post by **Laskos** » Wed Jan 03, 2018 10:31 am

shrapnel wrote: Forget about ELO calculations for a moment, did you see the QUALITY of the games ? I didn't expect such a Post from you. Anyway, you've made Milos and Tsvetkov very happy.

A stronger Stockfish wouldn't have changed anything because its problems are at the core of its approach, the whole point was to show how it thought itself to be so strong, and it's not like Stockfish played in a Pentium 4 with 1MB RAM.
Spot on !

My quibble with Elo is just my curiosity about Tord's what appeared to me to be nitpicking at handicaps SF suffered. Well, it is not that groundless. First, out of curiosity, I measured the effect of hash size at 50x overload (compared to 40% optimal load), and it came about twice as large as I expected. Also, Vincent seems to be right, with more threads and Lazy SMP, it is even more severe. For seven threads I got 56 Elo points handicap, but in only 200 games.

Also, with a book like Cerebellum maybe SF would have rarely entered losing positions in the middlegames, although Cerebellum is not an anti-A0 book. Moreover time control management IS important, but DeepMind could have easily implemented it for A0 even better.

Positionally, A0 no doubt is incomparably better than SF or anything. Even Giraffe eval was derived to be comparable to SF eval, only an order or two of magnitudes slower. But there are things to ponder about. AFAIK MCTS is prone to fall for deep tactical traps (at least in Go that was my experience, not as a player, as an observer). Are you sure that A0 is better than Houdini Tactical on Arasan 19 Tactical Testsuite (say on same hardware used in matches)? That those positions almost never occur in games is less relevant for the question.

Another my quibble is that DeepMind is not caring about me as a potential user of their software adapted for a regular i7 with a strong GPU. And it would be nice if they cared, because building this software requires enormous hardware resources only they have access to. Although in Go they have achieved incredible level of play (1000+ Elo points beyond anything human), the best we have as of now is Zen 7 based on AlphaGo project, but developed with very limited resources. From what I analyzed it is not yet Lee Sedol or Ke Jie level on my PC, probably only strong pro-level. In Chess, probably we will have to wait several years until A0 approach adopted by some developers with limited resources will reach some top engine strength for ordinary users.

A0 is surely revolutionary, for example it reinvented the opening theory from scratch, and I am itching to have access to A0 to check some databases of human opening theory. But they will publish a paper in "Nature" and move on.

Laskos · Post by **Laskos** » Wed Jan 03, 2018 10:42 am

Jouni wrote:But where is the promised More details in full peer-reviewed paper coming soon (8.12.2017)? It didn't pass?

"Nature" reviewing and publishing might take time, they can ask the authors various explanations and changes (especially if some read this forum

). I am sure A0 IS the calibre of "Nature" to be published in, and we will probably see the paper in some time.

pilgrimdan · Post by **pilgrimdan** » Wed Jan 03, 2018 8:06 pm

Laskos wrote:
shrapnel wrote: Forget about ELO calculations for a moment, did you see the QUALITY of the games ? I didn't expect such a Post from you. Anyway, you've made Milos and Tsvetkov very happy.

A stronger Stockfish wouldn't have changed anything because its problems are at the core of its approach, the whole point was to show how it thought itself to be so strong, and it's not like Stockfish played in a Pentium 4 with 1MB RAM.
Spot on !
My quibble with Elo is just my curiosity about Tord's what appeared to me to be nitpicking at handicaps SF suffered. Well, it is not that groundless. First, out of curiosity, I measured the effect of hash size at 50x overload (compared to 40% optimal load), and it came about twice as large as I expected. Also, Vincent seems to be right, with more threads and Lazy SMP, it is even more severe. For seven threads I got 56 Elo points handicap, but in only 200 games.

Also, with a book like Cerebellum maybe SF would have rarely entered losing positions in the middlegames, although Cerebellum is not an anti-A0 book. Moreover time control management IS important, but DeepMind could have easily implemented it for A0 even better.

Positionally, A0 no doubt is incomparably better than SF or anything. Even Giraffe eval was derived to be comparable to SF eval, only an order or two of magnitudes slower. But there are things to ponder about. AFAIK MCTS is prone to fall for deep tactical traps (at least in Go that was my experience, not as a player, as an observer). Are you sure that A0 is better than Houdini Tactical on Arasan 19 Tactical Testsuite (say on same hardware used in matches)? That those positions almost never occur in games is less relevant for the question.

Another my quibble is that DeepMind is not caring about me as a potential user of their software adapted for a regular i7 with a strong GPU. And it would be nice if they cared, because building this software requires enormous hardware resources only they have access to. Although in Go they have achieved incredible level of play (1000+ Elo points beyond anything human), the best we have as of now is Zen 7 based on AlphaGo project, but developed with very limited resources. From what I analyzed it is not yet Lee Sedol or Ke Jie level on my PC, probably only strong pro-level. In Chess, probably we will have to wait several years until A0 approach adopted by some developers with limited resources will reach some top engine strength for ordinary users.

A0 is surely revolutionary, for example it reinvented the opening theory from scratch, and I am itching to have access to A0 to check some databases of human opening theory. But they will publish a paper in "Nature" and move on.

with the hardware Deepmind used ... Deepmind has 'allowed' us to peek 10 years in the future ... it is odd ... I always thought that if I could somehow peek 10 years in the future ... that would be kinda neat ... but all it has done to me ... is to make me really pissed off ... so ... today ... I will live for today ... and not care anymore ... about looking into the future ...

Ras · Post by **Ras** » Wed Jan 03, 2018 8:20 pm

Laskos wrote:out of curiosity, I measured the effect

Yeah, of Stockfish against Stockfish, and that doesn't say that Stockfish boosted with better hardware would also have taken the same amount of Elo en plus against A0. This simply doesn't wash.

Of course you'll get effects because if the same engine can calculate a bit deeper, it will win more often than not. But Stockfish's problem in the matches was NOT that Stockfish didn't go deep enough. Even after long analysis and when making A0's moves, Stockfish still doesn't understand what was going on. More time and more speed would just have made Stockfish calculating the losing moves a bit deeper.

Leo · Post by **Leo** » Wed Jan 03, 2018 10:54 pm

Something tells me that AZ cant be improved beyond its 4 hour training. Like it saturated itself or something. I am all for progress. I hope they improve it so its invincible but I don't think they can or will.

Ovyron · Post by **Ovyron** » Wed Jan 03, 2018 11:36 pm

Leo wrote:Something tells me that AZ cant be improved beyond its 4 hour training.

Against who? The result would have been a lot better if it trained against Stockfish, as it would have been an "AntiStockfish" engine.

If it plays at this level in 4 hours, you can train it for another 4 hours against Stockfish.

Then another 4 hours against Houdini.

Then another 4 against Komodo...

They improved a lot from Alpha Go, to Alpha Master, to Alpha Zero, and in chess it's just in its infancy. It did something wrong in the 72 games it couldn't win against Stockfish, so there's a lot room for improvement.

But, yeah, 10 years ago I thought Rybka 3 was so strong that it was nearing perfection, and I expected about a 100 elo improvement the following years. Now is sitting below the top 22 engine, 300 elo weaker than Asmfish, so never underestimate how things can improve!

Rodolfo Leoni · Post by **Rodolfo Leoni** » Thu Jan 04, 2018 12:33 am

Ras wrote:
Laskos wrote:out of curiosity, I measured the effect
Yeah, of Stockfish against Stockfish, and that doesn't say that Stockfish boosted with better hardware would also have taken the same amount of Elo en plus against A0. This simply doesn't wash.

Of course you'll get effects because if the same engine can calculate a bit deeper, it will win more often than not. But Stockfish's problem in the matches was NOT that Stockfish didn't go deep enough. Even after long analysis and when making A0's moves, Stockfish still doesn't understand what was going on. More time and more speed would just have made Stockfish calculating the losing moves a bit deeper.

As the real "match" conditions cannot be reproduced, I think Kai Laskos performed the best scientific approach possible to the question of how much SF got handicapped. And I wish to thank him for his effort.

Until Deep Mind will decide to run a serious test, against the best SF, the best book, syzygy TBs, and possibly a SF operator there always will be doubts.

Ras · Post by **Ras** » Thu Jan 04, 2018 1:00 am

Rodolfo Leoni wrote:As the real "match" conditions cannot be reproduced, I think Kai Laskos performed the best scientific approach possible to the question of how much SF got handicapped.

Right, but the best approach possible is so far off the match that conclusions aren't valid anymore.

Rodolfo Leoni · Post by **Rodolfo Leoni** » Thu Jan 04, 2018 2:19 pm

Ras wrote:
Rodolfo Leoni wrote:As the real "match" conditions cannot be reproduced, I think Kai Laskos performed the best scientific approach possible to the question of how much SF got handicapped.
Right, but the best approach possible is so far off the match that conclusions aren't valid anymore.

As I said, it cannot be reproduced. So, we have no answers. Only questions. And the main question is:

Why did they need to give SF such handicaps?

My personal opinion is that they didn't really want a hard fight.

SF was more seriously handicapped than I thought

Re: SF was more seriously handicapped than I thought

Re: SF was more seriously handicapped than I thought

Re: SF was more seriously handicapped than I thought

Re: SF was more seriously handicapped than I thought

Re: SF was more seriously handicapped than I thought

Re: SF was more seriously handicapped than I thought

Re: SF was more seriously handicapped than I thought

Re: SF was more seriously handicapped than I thought

Re: SF was more seriously handicapped than I thought

Re: SF was more seriously handicapped than I thought