Your simple Statement is open to different interpretations.
But fair enough.
Moderators: hgm, Rebel, chrisw
Your simple Statement is open to different interpretations.
Yes, it seems that's what they did. SF8 + Cerebellum best moves. It would have been better SF8 + Cerebellum varied moves (UCI option in BrainFish), and much better full BrainFish with varied openings. In fact, this A0 versus full BrainFish (varied openings) of early 2018 would have been a very interesting match. With possibly quite different result compared to usual results of the paper at full time control. Allowing only A0 for variety with only the best moves of Cerebellum, A0 will play close to its most trained lines. And, aside from storing evaluations in its search, in early 2018, the engine of BrainFish was close to SF9, not SF8.clumma wrote: ↑Tue Dec 11, 2018 6:39 pmThis is an important question. Matthew, can you confirm whether Brainfish was used, or merely Stockfish with the Cerebellum polyglot book? It's my understanding that the Brainfish binary will use the stored evaluations in search, whereas SF + polyglot will not.
-Carl
In fact, I mainly take as reliable result for A0 from varied openings the TCEC openings match against SF8:Laskos wrote: ↑Tue Dec 11, 2018 7:15 pmYes, it seems that's what they did. SF8 + Cerebellum best moves. It would have been better SF8 + Cerebellum varied moves (UCI option in BrainFish), and much better full BrainFish with varied openings. In fact, this A0 versus full BrainFish (varied openings) of early 2018 would have been a very interesting match. With possibly quite different result compared to usual results of the paper at full time control. Allowing only A0 for variety with only the best moves of Cerebellum, A0 will play close to its most trained lines. And, aside from storing evaluations in its search, in early 2018, the engine of BrainFish was close to SF9, not SF8.clumma wrote: ↑Tue Dec 11, 2018 6:39 pmThis is an important question. Matthew, can you confirm whether Brainfish was used, or merely Stockfish with the Cerebellum polyglot book? It's my understanding that the Brainfish binary will use the stored evaluations in search, whereas SF + polyglot will not.
-Carl
This new approach for Chess is some MONTHS old. 8? 9?noobpwnftw wrote: ↑Sun Dec 09, 2018 6:31 pm For generic algorithm, yes, a lazy man's solution to everything, but in chess domain, I do not see how it differs from a giant SPSA run in principle, and how would one call it "the future" when the comparable performance can be achieved in a way that is not a black-box.
You have no idea what you are talking about.
As someone who exclusively spent a whole life in academia and had never published in Science or Nature, you certainly must be authority on who has idea about what. Specially with marvellous one-liners as argumentation.
Laskos, your post let me smile a little bit, because it looks like a good example, how hard it is to adopt the "new kind of math", that AZ brought us. I know you as a good analytical working person in this forum and also in this thread, you came up with this (compressed-ELO) model that tries to make AZ vs. SF match results fit into the standard ELO model. Reading your post, you stated at the beginning, that you don't have a high confidence SF10 would beat AZ, because AZ isn't "sensitive" to regular opponents. Two sentences later, you conclude that, because your model wouldn't explain ELO anomalies of more than factor 2, you are fairly confident AZ must be at SF10 strength level, with a possible slight edge for SF10 Now, that Matthew told us the number of games in the matches against ~SF9/Brainfish/etc. has been "more than high enough that the result is statistically significant", how can your compressed-ELO model "explain", that SF9 and SF8 lost against AZ by the same margin (while SF9 is ~30ELO ahead of SF8)? When the compression-model didn't work for the SF8-SF9-AZ trio, why believing it will fit for any kind of ELO-math in the SF9-SF10-AZ relation? Wouldn't it be more likely that the model doesn't fit the observations/results and therefore has to be dropped/revised?Laskos wrote: ↑Tue Dec 11, 2018 8:09 pm
In fact, I mainly take as reliable result for A0 from varied openings the TCEC openings match against SF8:
A0 vs SF8
+17 =75 -8
+31 Elo points
Now, at first glance one can almost surely say that SF10 would have performed better, like:
SF10 vs SF8
+22 =73 -5
+60 Elo points
But it still doesn't mean I have a very high confidence that SF10 would beat A0 in 100 games from TCEC openings in their conditions. A0 (and Lc0) is not that "sensitive" to the regular opponent, be it SF8 or SF10, when in superiority. A0 vs inferior regular engine shows a compressed Elo difference (I showed a model-plot in another thread). But in that model, the Elo compression is hardly above a factor of 2 or so. So, I would be fairly confident that A0 and SF10 are quite closely matched playing from TCEC openings, maybe with a slight advantage of SF10.
But as Matthew said, this was the first version of A0, I don't know what they have in hand by now.
Thank you for confirming that we are facing statistically significant numbers here. This lead to very interesting consequences.matthewlai wrote: ↑Tue Dec 11, 2018 1:13 pmIt's hard to say and I don't want to speculate much beyond what we have data to support, but my guess (and I can very well be wrong) is that there's much less diversity when SF uses the BF opening book. We already didn't have a lot of diversity from start position, but start position at least has several variations that are roughly equal and both AZ and SF have enough non-determinism (through multi-threaded scheduling mostly) that we still got reasonably diverse games. With the BF games we took out most of SF's non-determinism, and it's possible that SF just ends up playing a line that's not very good often, or something like that. In fact, we found that as we explained in the paper, if we force AZ to play some sub-optimal moves (in its opinion) to enforce diversity, we win even more games! I realise there's a lot of hand-waving here, but there are just too many possibilities.Thomas A. Anderson wrote: ↑Tue Dec 11, 2018 11:24 am Matthew, thank you very much for participating in this thread! Everyone here should be grateful, that we are now getting the answers that were missing so much after releasing of the preliminary paper last December. Let's keep this discussion a technical one. With regards to this, I have a question, that came up while I was greedily digging through the papers. One number seems to be very odd to me: In the match against the SF with opening book (Brainfish), when AZ play black, it gets ~2/4 % wins against SF8/SF9, but ~18% wins when playing against the supposedly stronger Brainfish ?! Beside a docu bug, I can imagine two possible reasons: Either the number of games in this match has been very low or AZ discovered some poor opening book lines. The latter would be very interesting, especially how many and which lines are affected. While getting the game notation from these matches would be perfect, knowing how many games have been played in these tests would also be helpful.
Thanks again for your contribution, Matthew!
I don't remember the number of games played, but it was more than high enough that the result is statistically significant.
Sounds like a very reasonable explanation. But "being good" seems to be an attribute of a position that appears to be much more subjective than I thought. BF is playing the book-resulting positions successfully against a non-book SF, that its purpose, the reason why it exists. This superiority is, as far as I know, confirmed by any match against the "usual suspects", means the crowd of the AB-engines. Now it seems that this is reversed when using the book against AZ. Of course, you can build books specifically against certain components: SF is handling KID positions better than Engine A but is playing them less good than Engine B. Therefore a book that forces SF into the KID might work against Engine A well, but fails against Engine B. But we are talking about the starting position and a complete book that wasn't certainly proofed only against some narrow opening lines, because AZ used was playing with diversity activated. How big was the diversity of those games? I would assume that AZ was playing different moves starting from move 1 on because there should be some of them within the 1% range already. We would need the games to answer the question finally, but my gut feeling is that here is something covered we can learn a lot from. The most "zero-ish" created opening book we have is shifting the match score of SF playing white pieces against AZ playing black from a 1-95-4 % towards a 9-73-18 % shape (both are rough values derived from the published graphs. Format: SF wins-Draws-AZ wins). Another interesting fact: that the BF-book works well for SF if it is playing the black pieces and fails only as white. This evens out and leads to the statement in the paper, that the usage of the opening book didn't have had a significant impact on the total match score .matthewlai wrote: "With the BF games we took out most of SF's non-determinism, and it's possible that SF just ends up playing a line that's not very good often"
Thats another very strange result I have to think about next.matthewlai wrote: In fact, we found that as we explained in the paper, if we force AZ to play some sub-optimal moves (in its opinion) to enforce diversity, we win even more games!