TCEC S15, END of an ERA event is much more Brutal than I thought!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

jorose
Posts: 358
Joined: Thu Jan 22, 2015 3:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: TCEC S15, END of an ERA event is much more Brutal than I thought!

Post by jorose »

Michel wrote: Tue May 28, 2019 9:00 pm
jorose wrote: Tue May 28, 2019 8:32 pm I haven't been following this closely and I don't really want to get involved, but I can't help but point out that throwing out 8 decisive games because the same side happened to win is a fairly extreme thing to do. I don't know how much confidence can be had in the results as soon as you start being that hand wavy.
It has nothing to do with being hand wavy. On the contrary. The trinomial model is simply wrong in the case of unbalanced positions and to get accurate results one should use the pentanomial model instead.

However if there are no double wins (were there?) then the pentanomial model degenerates again to a trinomial model where reciprocal wins should be discarded (if one wants to reject the null hypothesis of equal strength).
It seems very naive to reject the results in an opening with double wins based on a sample size of a single 2 game minimatch. That to me is hand wavy.

I will however concede that your point may be even stronger.
-Jonathan
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: TCEC S15, END of an ERA event is much more Brutal than I thought!

Post by Laskos »

jorose wrote: Tue May 28, 2019 8:32 pm I haven't been following this closely and I don't really want to get involved, but I can't help but point out that throwing out 8 decisive games because the same side happened to win is a fairly extreme thing to do. I don't know how much confidence can be had in the results as soon as you start being that hand wavy.
No, Michel already wrote, the trinomial model for paired games is simply wrong when computing the variance, and the first approximation to it is just the trinomial model with paired wins (same color) taken as draws. The correct thing is to use pentanomial variance. I can give the (correct) pentanomial error margins and LOS for TCEC case, but I don't want to use here such language as "pentanomial" or "the stopping rule should be based on SPRT which controls both Type I and Type II errors, with LLR computed using pentanomial for pairs of games". Nobody wants to read that, I cannot convince even some scientists to use SPRT (adapted to the problem at hand) if one stops at p-value pleasure, so I just suggest them, if they stop at pleasure, to use a p-value not of 0.05 or 0.01 (as it is often used in life sciences, for example), but of 0.001. In fact, this is one of the main recommendations of many statisticians in order to cure the rising number of papers with faulty use of statistics. I am not a statistician, but I think I understood their reasoning.
I doubt the engines in a few years will lose the weaker side of those positions against current SF and Leela.

Perhaps if we ran this same set several times the other minimatches that ended in one engines favor would end up going in two wins for the same color and the drawless 1-1 matches would end in decisive minimatches or double draws.

I also don't understand why we are even discussing this. TCEC is not intending to do the impossible thing of determining the "best" engine under all circumstances. There are many factors and either side can argue TCEC is totally unfair for their side.

What we know is LC0 won the S15 TCEC SuFi with a score of 53.5-46.5 and it was great entertainment. I also felt it was a much better battle of ice and fire than the last GoT season.
jorose
Posts: 358
Joined: Thu Jan 22, 2015 3:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: TCEC S15, END of an ERA event is much more Brutal than I thought!

Post by jorose »

Laskos wrote: Tue May 28, 2019 10:49 pm No, Michel already wrote, the trinomial model for paired games is simply wrong when computing the variance, and the first approximation to it is just the trinomial model with paired wins (same color) taken as draws.
Just trying to make sure I understood you correctly; I assume you mean the first order approximation is to use the pentanomial (not the trinomial) model with paired wins taken as draws?
-Jonathan
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: TCEC S15, END of an ERA event is much more Brutal than I thought!

Post by Laskos »

jorose wrote: Tue May 28, 2019 11:52 pm
Laskos wrote: Tue May 28, 2019 10:49 pm No, Michel already wrote, the trinomial model for paired games is simply wrong when computing the variance, and the first approximation to it is just the trinomial model with paired wins (same color) taken as draws.
Just trying to make sure I understood you correctly; I assume you mean the first order approximation is to use the pentanomial (not the trinomial) model with paired wins taken as draws?
No, pentanomial model is theoretically not an approximation and it does not change any outcome (wins are wins, paired or not). Trinomial model with paired wins taken as draws (as that +10 -3 result instead of +14 -7) is usually an approximation. But in TCEC case, maybe (I have to check) there is no any 2:0 result, and +10 -3 and 8 draws instead of +14 -7 maybe gives the exact variance and LOS. If not, it's anyway a good approximation, and it doesn't require technicalities.
Last edited by Laskos on Wed May 29, 2019 12:11 am, edited 1 time in total.
jorose
Posts: 358
Joined: Thu Jan 22, 2015 3:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: TCEC S15, END of an ERA event is much more Brutal than I thought!

Post by jorose »

Laskos wrote: Wed May 29, 2019 12:08 am
jorose wrote: Tue May 28, 2019 11:52 pm
Laskos wrote: Tue May 28, 2019 10:49 pm No, Michel already wrote, the trinomial model for paired games is simply wrong when computing the variance, and the first approximation to it is just the trinomial model with paired wins (same color) taken as draws.
Just trying to make sure I understood you correctly; I assume you mean the first order approximation is to use the pentanomial (not the trinomial) model with paired wins taken as draws?
No, pentanomial model is theoretically not an approximation and it does not change any outcome (draws are draws). Trinomial model with paired games taken as draws (as that +10 -3 result instead of +14 -7) is usually an approximation. But in TCEC case, maybe (I have to check) there is no any 2:0 result, +10 -3 and 8 draws instead of +14 -7 maybe gives the exact variance and LOS.
Ah, that clears up the confusion a bit. There was a 2-0 opening (the Trompovsky) in favor of Leela, so +9 -3 would be the "corrected" result if I am understanding you correctly.
-Jonathan
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: TCEC S15, END of an ERA event is much more Brutal than I thought!

Post by Laskos »

jorose wrote: Wed May 29, 2019 12:11 am
Laskos wrote: Wed May 29, 2019 12:08 am
jorose wrote: Tue May 28, 2019 11:52 pm
Laskos wrote: Tue May 28, 2019 10:49 pm No, Michel already wrote, the trinomial model for paired games is simply wrong when computing the variance, and the first approximation to it is just the trinomial model with paired wins (same color) taken as draws.
Just trying to make sure I understood you correctly; I assume you mean the first order approximation is to use the pentanomial (not the trinomial) model with paired wins taken as draws?
No, pentanomial model is theoretically not an approximation and it does not change any outcome (draws are draws). Trinomial model with paired games taken as draws (as that +10 -3 result instead of +14 -7) is usually an approximation. But in TCEC case, maybe (I have to check) there is no any 2:0 result, +10 -3 and 8 draws instead of +14 -7 maybe gives the exact variance and LOS.
Ah, that clears up the confusion a bit, there was a 2-0 opening (the Trompovsky) in favor of Leela, so +9 -3 would be the "corrected" result if I am understanding you correctly.
No, +9 -3 I think is wrong, use either trinomial with removed paired wins (taken as draws) as an approximation or the exact pentanomial.
jorose
Posts: 358
Joined: Thu Jan 22, 2015 3:21 pm
Location: Zurich, Switzerland
Full name: Jonathan Rosenthal

Re: TCEC S15, END of an ERA event is much more Brutal than I thought!

Post by jorose »

So where I am struggling with this intuitively is that it seems like you should either be looking at game pairs or individual games. Only turning 1-1 minimatches into draws while counting 2-0 results as individual games seems like it is doing both. On the other hand I suppose my intuition might be right and thats why its an approximation as apposed to exact.
-Jonathan
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: TCEC S15, END of an ERA event is much more Brutal than I thought!

Post by Laskos »

jorose wrote: Wed May 29, 2019 12:36 am So where I am struggling with this intuitively is that it seems like you should either be looking at game pairs or individual games. Only turning 1-1 minimatches into draws while counting 2-0 results as individual games seems like it is doing both. On the other hand I suppose my intuition might be right and thats why its an approximation as apposed to exact.
Sure, 1-1 minimatches as draws inside trinomial model is just a simple approximation for paired games. You might come with better approximation using trinomial, but I use either this simple approximation or the exact pentanomial results (as variance, LOS) for paired games.
Kanizsa
Posts: 51
Joined: Mon Feb 20, 2017 8:29 am
Location: Rialto, Venice

Re: TCEC S15, END of an ERA event is much more Brutal than I thought!

Post by Kanizsa »

Raphexon wrote: Tue May 28, 2019 4:14 pm
Kanizsa wrote: Tue May 28, 2019 1:05 pm A new possible challenge for LC0.

in his latest book Garry Kasparov still states that “Centaur mode” is still the best expression of strength in chess games.
I am doubtful after witnessing how LC0 defeated Stockfish.

In order to test Kasparov's ipothesis I would suggest the following challenge:
GM + best collection Alpha Beta programs available at the spot vs. best Neural Network program (LC0 or Alpha 0)


Principal addictive rules:
- GM Centaur mode has access to the screen of analysis of AB programs, but without expanding analysis tree handmade
- on the other hand, GM centaur mode has the possibility to withdraw the move after LC0 reply and is allowed to play another single substitute move, with a time penalty.

According to you, who would win in a match of 16-24 games in this conditions ?
GM centaur mode or LCO ?
Why wouldn't the GM be allowed to use the best NN-engine?
Because Kasparov's hypothesis as regards superiority of the Centaur mode over Computer mode was expressed before December 2017, when the famous paper of Demis Hassabis came out.

His book "Deep Thinking: Where Machine Intelligence Ends and Human Creativity Begins" has no reference to neural networks
Kanizsa
Posts: 51
Joined: Mon Feb 20, 2017 8:29 am
Location: Rialto, Venice

Re: TCEC S15, END of an ERA event is much more Brutal than I thought!

Post by Kanizsa »

Ozymandias wrote: Tue May 28, 2019 5:41 pm
Raphexon wrote: Tue May 28, 2019 4:14 pm
Kanizsa wrote: Tue May 28, 2019 1:05 pm A new possible challenge for LC0.

in his latest book Garry Kasparov still states that “Centaur mode” is still the best expression of strength in chess games.
I am doubtful after witnessing how LC0 defeated Stockfish.

In order to test Kasparov's ipothesis I would suggest the following challenge:
GM + best collection Alpha Beta programs available at the spot vs. best Neural Network program (LC0 or Alpha 0)


Principal addictive rules:
- GM Centaur mode has access to the screen of analysis of AB programs, but without expanding analysis tree handmade
- on the other hand, GM centaur mode has the possibility to withdraw the move after LC0 reply and is allowed to play another single substitute move, with a time penalty.

According to you, who would win in a match of 16-24 games in this conditions ?
GM centaur mode or LCO ?
Why wouldn't the GM be allowed to use the best NN-engine?
That's a good question, but not the only one at all. Why a GM? They have a very poor record as centaur players. Why can't they expand "analysis tree handmade"? Whatever that means. Why the "possibility to withdraw the move after LC0 reply"? No need for that under normal tournament conditions.

Not to mention that the main question can't even begin to be answered without clarifying book conditions for both players.
1 Ok to the alternative suggestion of a very good centaur player respect to a GM centaur
2 I suggest that Centaur player should read the first lines of multiple engines but not explore them back and forth.
3 the possibility to withdraw the move after LC0 reply is another advantage for human player.

Regardless of the rules the aim of similar experiments is as follows:

despite having huge advantage thanks to the best AB engines, human intuition would not be able to beat the positional insight of LCO, not even withdrawing the moves played. If this is true, Kasparov ipothesis should be rejected.