Alphazero news

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

shrapnel
Posts: 1339
Joined: Fri Nov 02, 2012 9:43 am
Location: New Delhi, India

Re: Alphazero news

Post by shrapnel »

matthewlai wrote: Tue Dec 11, 2018 4:20 amCannot talk about anything unannounced unfortunately!
Your simple Statement is open to different interpretations.
But fair enough.
i7 5960X @ 4.1 Ghz, 64 GB G.Skill RipJaws RAM, Twin Asus ROG Strix OC 11 GB Geforce 2080 Tis
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Alphazero news

Post by Laskos »

clumma wrote: Tue Dec 11, 2018 6:39 pm
Laskos wrote: Tue Dec 11, 2018 1:42 pm Yes, but didn't they take SF8 + book itself, and not the full BrainFish with its UCI option? That would be a good match: A0 versus full BrainFish of early 2018 with varied openings UCI option, but they seem to not have done that.
This is an important question. Matthew, can you confirm whether Brainfish was used, or merely Stockfish with the Cerebellum polyglot book? It's my understanding that the Brainfish binary will use the stored evaluations in search, whereas SF + polyglot will not.

-Carl
Yes, it seems that's what they did. SF8 + Cerebellum best moves. It would have been better SF8 + Cerebellum varied moves (UCI option in BrainFish), and much better full BrainFish with varied openings. In fact, this A0 versus full BrainFish (varied openings) of early 2018 would have been a very interesting match. With possibly quite different result compared to usual results of the paper at full time control. Allowing only A0 for variety with only the best moves of Cerebellum, A0 will play close to its most trained lines. And, aside from storing evaluations in its search, in early 2018, the engine of BrainFish was close to SF9, not SF8.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Alphazero news

Post by Laskos »

Laskos wrote: Tue Dec 11, 2018 7:15 pm
clumma wrote: Tue Dec 11, 2018 6:39 pm
Laskos wrote: Tue Dec 11, 2018 1:42 pm Yes, but didn't they take SF8 + book itself, and not the full BrainFish with its UCI option? That would be a good match: A0 versus full BrainFish of early 2018 with varied openings UCI option, but they seem to not have done that.
This is an important question. Matthew, can you confirm whether Brainfish was used, or merely Stockfish with the Cerebellum polyglot book? It's my understanding that the Brainfish binary will use the stored evaluations in search, whereas SF + polyglot will not.

-Carl
Yes, it seems that's what they did. SF8 + Cerebellum best moves. It would have been better SF8 + Cerebellum varied moves (UCI option in BrainFish), and much better full BrainFish with varied openings. In fact, this A0 versus full BrainFish (varied openings) of early 2018 would have been a very interesting match. With possibly quite different result compared to usual results of the paper at full time control. Allowing only A0 for variety with only the best moves of Cerebellum, A0 will play close to its most trained lines. And, aside from storing evaluations in its search, in early 2018, the engine of BrainFish was close to SF9, not SF8.
In fact, I mainly take as reliable result for A0 from varied openings the TCEC openings match against SF8:

A0 vs SF8
+17 =75 -8
+31 Elo points

Now, at first glance one can almost surely say that SF10 would have performed better, like:

SF10 vs SF8
+22 =73 -5
+60 Elo points

But it still doesn't mean I have a very high confidence that SF10 would beat A0 in 100 games from TCEC openings in their conditions. A0 (and Lc0) is not that "sensitive" to the regular opponent, be it SF8 or SF10, when in superiority. A0 vs inferior regular engine shows a compressed Elo difference (I showed a model-plot in another thread). But in that model, the Elo compression is hardly above a factor of 2 or so. So, I would be fairly confident that A0 and SF10 are quite closely matched playing from TCEC openings, maybe with a slight advantage of SF10.
But as Matthew said, this was the first version of A0, I don't know what they have in hand by now.
User avatar
George Tsavdaris
Posts: 1627
Joined: Thu Mar 09, 2006 12:35 pm

Re: Alphazero news

Post by George Tsavdaris »

noobpwnftw wrote: Sun Dec 09, 2018 6:31 pm For generic algorithm, yes, a lazy man's solution to everything, but in chess domain, I do not see how it differs from a giant SPSA run in principle, and how would one call it "the future" when the comparable performance can be achieved in a way that is not a black-box.
This new approach for Chess is some MONTHS old. 8? 9?
The other, comparable in strength, is 50 years old.

Unless you believe this new approach has reached saturation after 8-9 months.....
After his son's birth they've asked him:
"Is it a boy or girl?"
YES! He replied.....
Michel
Posts: 2272
Joined: Mon Sep 29, 2008 1:50 am

Re: Alphazero news

Post by Michel »

Milos wrote: Tue Dec 11, 2018 2:39 pm BS. You obviously never worked in a multi billion dollar company research department. You never get outright rejection, never, not for Science not for Nature, not for any other journal.
You have no idea what you are talking about.
Ideas=science. Simplification=engineering.
Without ideas there is nothing to simplify.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Alphazero news

Post by Milos »

Michel wrote: Tue Dec 11, 2018 11:27 pm
Milos wrote: Tue Dec 11, 2018 2:39 pm BS. You obviously never worked in a multi billion dollar company research department. You never get outright rejection, never, not for Science not for Nature, not for any other journal.
You have no idea what you are talking about.
As someone who exclusively spent a whole life in academia and had never published in Science or Nature, you certainly must be authority on who has idea about what. Specially with marvellous one-liners as argumentation.
jp
Posts: 1470
Joined: Mon Apr 23, 2018 7:54 am

Re: Alphazero news

Post by jp »

hgm wrote: Tue Dec 11, 2018 5:52 pm
jp wrote: Tue Dec 11, 2018 5:29 pmYes, yeah. The reviewers can be in disagreement. It happens. Depends how much the unhappy reviewer wants to fight.
That did not answer my question. How many Science papers did you publish that way?

Your question was replying to
jp wrote: Tue Dec 11, 2018 2:44 pm
hgm wrote: Tue Dec 11, 2018 2:17 pm You have to 'pass' 3 anonymous peer reviewers, and if only a single one of those is slightly critical about the quality or the importance of the result, your paper will be rejected outright.
That is not true. It's definitely possible to get the paper accepted without reviewers all being happy.

so wasn't the real question you wanted to ask: do you know for a fact that you can get a Science paper accepted with a reviewer slightly critical?

The answer is: yes, I know 100% that it can be accepted with a reviewer much more than "slightly critical".

Or did you really want to ask a different question?
Thomas A. Anderson
Posts: 27
Joined: Tue Feb 23, 2016 6:57 pm

Re: Alphazero news

Post by Thomas A. Anderson »

Laskos wrote: Tue Dec 11, 2018 8:09 pm
In fact, I mainly take as reliable result for A0 from varied openings the TCEC openings match against SF8:

A0 vs SF8
+17 =75 -8
+31 Elo points

Now, at first glance one can almost surely say that SF10 would have performed better, like:

SF10 vs SF8
+22 =73 -5
+60 Elo points

But it still doesn't mean I have a very high confidence that SF10 would beat A0 in 100 games from TCEC openings in their conditions. A0 (and Lc0) is not that "sensitive" to the regular opponent, be it SF8 or SF10, when in superiority. A0 vs inferior regular engine shows a compressed Elo difference (I showed a model-plot in another thread). But in that model, the Elo compression is hardly above a factor of 2 or so. So, I would be fairly confident that A0 and SF10 are quite closely matched playing from TCEC openings, maybe with a slight advantage of SF10.
But as Matthew said, this was the first version of A0, I don't know what they have in hand by now.
Laskos, your post let me smile a little bit, because it looks like a good example, how hard it is to adopt the "new kind of math", that AZ brought us. I know you as a good analytical working person in this forum and also in this thread, you came up with this (compressed-ELO) model that tries to make AZ vs. SF match results fit into the standard ELO model. Reading your post, you stated at the beginning, that you don't have a high confidence SF10 would beat AZ, because AZ isn't "sensitive" to regular opponents. Two sentences later, you conclude that, because your model wouldn't explain ELO anomalies of more than factor 2, you are fairly confident AZ must be at SF10 strength level, with a possible slight edge for SF10 :) Now, that Matthew told us the number of games in the matches against ~SF9/Brainfish/etc. has been "more than high enough that the result is statistically significant", how can your compressed-ELO model "explain", that SF9 and SF8 lost against AZ by the same margin (while SF9 is ~30ELO ahead of SF8)? When the compression-model didn't work for the SF8-SF9-AZ trio, why believing it will fit for any kind of ELO-math in the SF9-SF10-AZ relation? Wouldn't it be more likely that the model doesn't fit the observations/results and therefore has to be dropped/revised?
I think, to be meaningful, the ELO system needs a certain kind of "transitivity" regarding the strength of the contenders (it's been a long time since my math classes and I might use the wrong term here). In case of AZ, this it lacks this prerequisite. When I need to explain the results, I think of contenders in Formula One race: Ferrari is constantly working on improving their cars, season by season. As McLaren etc. does. The 2018 Ferrari is better than the 2017 model, that was better than the 2016 type and so on. Thinking of the Ferrari as SF and the McLarens as Komodo and so on everything in the ELO world is fine, transitivity, rule-of-three and comparisons between cars over seasons boundaries might fit well more or less.
No the new "kid on the block" went in, a car constructed by Gyro Gearloose or Christopher Lloyd. When the car finishes a race the traditional contenders have no chance by any means. As aspected, knowing the constructors, the new car with its "jet propulsion" didn't manage to finish more than 60% of the races. Now, unlike in Formula One, think of one-on-one matches of the cars, and try to establish a performance rating where you get any meaningful number for the rocket-car. What would you think about a calculation like: If the rocket-car ist a 60-40 favorite against the 2018 Ferrari, and I build a 2019 Ferrari that beats the old model by a higher margin, then I have a high confidence that the Ferrari 2019 will beat the rocket car. I believe I'm preaching to the choir, as I remember you also stating stuff like this and it comes down to the fact that AZ has certain weaknesses, that can be exposed by traditional AB-Engines. But the rate of exposures (resulting in AZ losses) isn't very proportional to the ELO strength of the AB-engines. I'm wouldn't be surprised, if engines (ELO-)rated much lower than Stockfish would get better results against AZ, because they better expose its weaknesses.
cu
Thomas A. Anderson
Posts: 27
Joined: Tue Feb 23, 2016 6:57 pm

Re: Alphazero news

Post by Thomas A. Anderson »

matthewlai wrote: Tue Dec 11, 2018 1:13 pm
Thomas A. Anderson wrote: Tue Dec 11, 2018 11:24 am Matthew, thank you very much for participating in this thread! Everyone here should be grateful, that we are now getting the answers that were missing so much after releasing of the preliminary paper last December. Let's keep this discussion a technical one. With regards to this, I have a question, that came up while I was greedily digging through the papers. One number seems to be very odd to me: In the match against the SF with opening book (Brainfish), when AZ play black, it gets ~2/4 % wins against SF8/SF9, but ~18% wins when playing against the supposedly stronger Brainfish ?! Beside a docu bug, I can imagine two possible reasons: Either the number of games in this match has been very low or AZ discovered some poor opening book lines. The latter would be very interesting, especially how many and which lines are affected. While getting the game notation from these matches would be perfect, knowing how many games have been played in these tests would also be helpful.
Thanks again for your contribution, Matthew!
It's hard to say and I don't want to speculate much beyond what we have data to support, but my guess (and I can very well be wrong) is that there's much less diversity when SF uses the BF opening book. We already didn't have a lot of diversity from start position, but start position at least has several variations that are roughly equal and both AZ and SF have enough non-determinism (through multi-threaded scheduling mostly) that we still got reasonably diverse games. With the BF games we took out most of SF's non-determinism, and it's possible that SF just ends up playing a line that's not very good often, or something like that. In fact, we found that as we explained in the paper, if we force AZ to play some sub-optimal moves (in its opinion) to enforce diversity, we win even more games! I realise there's a lot of hand-waving here, but there are just too many possibilities.

I don't remember the number of games played, but it was more than high enough that the result is statistically significant.
Thank you for confirming that we are facing statistically significant numbers here. This lead to very interesting consequences.
matthewlai wrote: "With the BF games we took out most of SF's non-determinism, and it's possible that SF just ends up playing a line that's not very good often"
Sounds like a very reasonable explanation. But "being good" seems to be an attribute of a position that appears to be much more subjective than I thought. BF is playing the book-resulting positions successfully against a non-book SF, that its purpose, the reason why it exists. This superiority is, as far as I know, confirmed by any match against the "usual suspects", means the crowd of the AB-engines. Now it seems that this is reversed when using the book against AZ. Of course, you can build books specifically against certain components: SF is handling KID positions better than Engine A but is playing them less good than Engine B. Therefore a book that forces SF into the KID might work against Engine A well, but fails against Engine B. But we are talking about the starting position and a complete book that wasn't certainly proofed only against some narrow opening lines, because AZ used was playing with diversity activated. How big was the diversity of those games? I would assume that AZ was playing different moves starting from move 1 on because there should be some of them within the 1% range already. We would need the games to answer the question finally, but my gut feeling is that here is something covered we can learn a lot from. The most "zero-ish" created opening book we have is shifting the match score of SF playing white pieces against AZ playing black from a 1-95-4 % towards a 9-73-18 % shape (both are rough values derived from the published graphs. Format: SF wins-Draws-AZ wins). Another interesting fact: that the BF-book works well for SF if it is playing the black pieces and fails only as white. This evens out and leads to the statement in the paper, that the usage of the opening book didn't have had a significant impact on the total match score .
matthewlai wrote: In fact, we found that as we explained in the paper, if we force AZ to play some sub-optimal moves (in its opinion) to enforce diversity, we win even more games!
Thats another very strange result I have to think about next. :shock:
cu
Damir
Posts: 2801
Joined: Mon Feb 11, 2008 3:53 pm
Location: Denmark
Full name: Damir Desevac

Re: Alphazero news

Post by Damir »

I see Lc0 networks have been restarted from scratch..

I wonder what could be the reason… before google people decided to publish their papers and lc0 team put the same changes in Leela, as suggested in Google paper, Leela was doing very well, its elo kept rising, now it is going downwards, which shows that what google published can not be trusted... :( :(