Houston: We have lift off ...

jp · Post by jp » Sun Nov 18, 2018 6:58 am

chrisw wrote: ↑Sat Nov 17, 2018 10:05 pm “They” see “themselves” doing a “community” effort, but “you” see “them”, an “in-group” in which “you” are not “in”, but “you” hope “they” will do whatever and so on.

Herein lies the problem. Groups have positive but also negative aspects. Something has gone wrong (in the sense that the inexorable expected rise in strength is kind of questionable). I’ld be looking at groupthink and social dynamics, rather than technical. There’ll be something dumb, that nobody thought, going on, somewhere, someway, somehow. Is why I decided a few months ago not to “follow” this project, I think I prefer to watch the other ones, working by themselves, or in small groups. All credit, however, to the central core at lc0 for setting a path.

We don't know what percentage of the community is happy for it to be non-zero. It could be a majority or even a large majority.

We don't know if "something has gone wrong". How do you know it's not peak performance?

Werewolf · Post by **Werewolf** » Sun Nov 18, 2018 9:36 am

That self play graph

Laskos · Post by **Laskos** » Sun Nov 18, 2018 9:45 am

Werewolf wrote: ↑Sun Nov 18, 2018 9:36 am That self play graph

Almost total fiction. They even have some inversions on long spans of self-Elo versus real-Elo. Nonetheless, the progress overall is real, on very long spans. Since I am following it, 1.5 days, the self-Elo improved by 650 (!!!) Elo points, but against SF8 (real-Elo) by no more than 100 Elo points. 100 is still very large improvement, but their self-Elo graph is pretty much fiction. Hopefully, it won't start to exhibit inversions over very long spans, the things in this case will become simply arbitrary. The best 30xxx nets by now are only 50 or so Elo points weaker at blitz than the best versions overall of 10xxx run. 30xxx is the most inflated run of all I have seen with Leela Chess.

Laskos · Post by **Laskos** » Sun Nov 18, 2018 1:47 pm

Laskos wrote: ↑Sun Nov 18, 2018 9:45 am
Werewolf wrote: ↑Sun Nov 18, 2018 9:36 am That self play graph
Almost total fiction. They even have some inversions on long spans of self-Elo versus real-Elo. Nonetheless, the progress overall is real, on very long spans. Since I am following it, 1.5 days, the self-Elo improved by 650 (!!!) Elo points, but against SF8 (real-Elo) by no more than 100 Elo points. 100 is still very large improvement, but their self-Elo graph is pretty much fiction. Hopefully, it won't start to exhibit inversions over very long spans, the things in this case will become simply arbitrary. The best 30xxx nets by now are only 50 or so Elo points weaker at blitz than the best versions overall of 10xxx run. 30xxx is the most inflated run of all I have seen with Leela Chess.

This is already ridiculous. They have 750 self-Elo improvement since I check it (almost 2 days), but real-Elo against SF8 of only about 50, at most 100. Often with inversions in self-Elo vs real-Elo. Hope they know what they are measuring and promoting there, seems out of control.

chrisw · Post by **chrisw** » Sun Nov 18, 2018 5:16 pm

Laskos wrote: ↑Sun Nov 18, 2018 1:47 pm
Laskos wrote: ↑Sun Nov 18, 2018 9:45 am
Werewolf wrote: ↑Sun Nov 18, 2018 9:36 am That self play graph
Almost total fiction. They even have some inversions on long spans of self-Elo versus real-Elo. Nonetheless, the progress overall is real, on very long spans. Since I am following it, 1.5 days, the self-Elo improved by 650 (!!!) Elo points, but against SF8 (real-Elo) by no more than 100 Elo points. 100 is still very large improvement, but their self-Elo graph is pretty much fiction. Hopefully, it won't start to exhibit inversions over very long spans, the things in this case will become simply arbitrary. The best 30xxx nets by now are only 50 or so Elo points weaker at blitz than the best versions overall of 10xxx run. 30xxx is the most inflated run of all I have seen with Leela Chess.
This is already ridiculous. They have 750 self-Elo improvement since I check it (almost 2 days), but real-Elo against SF8 of only about 50, at most 100. Often with inversions in self-Elo vs real-Elo. Hope they know what they are measuring and promoting there, seems out of control.

Well, if you train against yourself, then test your new self against your old self, then your new self will know not to repeat mistakes that your old self will still make. By definition, your new self will be winning more games that your old self, your self-play elo will rise (possibly in proportion to the training interval time).
The obvious question is: does this self play elo rise map to an actual elo rise? Or, put another way, has your new self acquired any generalised knowledge, or is the “knowledge” specific to playing against your old self only. The only way to discover that is by playing games against a general pool of opponents. Aren’t I good at stating the bleeding obvious?!
By repeatedly playing your self against your prior old self, over and over, you can obtain meteoric elo rises into the stratosphere and beyond. Obviously (again) this is all quite meaningless when mapped back real life.

gladius · Post by **gladius** » Sun Nov 18, 2018 7:49 pm

Laskos wrote: ↑Sun Nov 18, 2018 1:47 pm
Laskos wrote: ↑Sun Nov 18, 2018 9:45 am
Werewolf wrote: ↑Sun Nov 18, 2018 9:36 am That self play graph
Almost total fiction. They even have some inversions on long spans of self-Elo versus real-Elo. Nonetheless, the progress overall is real, on very long spans. Since I am following it, 1.5 days, the self-Elo improved by 650 (!!!) Elo points, but against SF8 (real-Elo) by no more than 100 Elo points. 100 is still very large improvement, but their self-Elo graph is pretty much fiction. Hopefully, it won't start to exhibit inversions over very long spans, the things in this case will become simply arbitrary. The best 30xxx nets by now are only 50 or so Elo points weaker at blitz than the best versions overall of 10xxx run. 30xxx is the most inflated run of all I have seen with Leela Chess.
This is already ridiculous. They have 750 self-Elo improvement since I check it (almost 2 days), but real-Elo against SF8 of only about 50, at most 100. Often with inversions in self-Elo vs real-Elo. Hope they know what they are measuring and promoting there, seems out of control.

Unless things have changed since I put it together (which it doesn't look like), the elo graph is measuring exactly self-play vs the previous version. Then it adds the numbers together

. Not accurate in the least, but at least gives a hint if the training has gone off the rails. The matches are not fair either, no opening books mean that if one version improves just slightly in a favorite opening then it can look like a massive improvement (or regression if it improves overall, but gets slightly worse).

It is not meant to indicate strength against SF or other strong engines. That's what rating groups and folks here do an amazing job of

.

Laskos · Post by **Laskos** » Sun Nov 18, 2018 8:14 pm

gladius wrote: ↑Sun Nov 18, 2018 7:49 pm
Laskos wrote: ↑Sun Nov 18, 2018 1:47 pm
Laskos wrote: ↑Sun Nov 18, 2018 9:45 am
Werewolf wrote: ↑Sun Nov 18, 2018 9:36 am That self play graph
Almost total fiction. They even have some inversions on long spans of self-Elo versus real-Elo. Nonetheless, the progress overall is real, on very long spans. Since I am following it, 1.5 days, the self-Elo improved by 650 (!!!) Elo points, but against SF8 (real-Elo) by no more than 100 Elo points. 100 is still very large improvement, but their self-Elo graph is pretty much fiction. Hopefully, it won't start to exhibit inversions over very long spans, the things in this case will become simply arbitrary. The best 30xxx nets by now are only 50 or so Elo points weaker at blitz than the best versions overall of 10xxx run. 30xxx is the most inflated run of all I have seen with Leela Chess.
This is already ridiculous. They have 750 self-Elo improvement since I check it (almost 2 days), but real-Elo against SF8 of only about 50, at most 100. Often with inversions in self-Elo vs real-Elo. Hope they know what they are measuring and promoting there, seems out of control.
Unless things have changed since I put it together (which it doesn't look like), the elo graph is measuring exactly self-play vs the previous version. Then it adds the numbers together . Not accurate in the least, but at least gives a hint if the training has gone off the rails. The matches are not fair either, no opening books mean that if one version improves just slightly in a favorite opening then it can look like a massive improvement (or regression if it improves overall, but gets slightly worse).

It is not meant to indicate strength against SF or other strong engines. That's what rating groups and folks here do an amazing job of .

Yes, but it was always like that with Leela, and never so much distortion in self-play. Do they use less noise and as a result overfit? This afternoon I took ID31255 ("Elo" 5196) and compared it more thoroughly to ID31311 ("Elo" 5784), both against SF8, and much earlier ID31255 came 60 +/- 50 real Elo points better, being almost 600 self-"Elo" points worse. This is very serious, they have to change something. Overfit on openings or anything, temperature, noise, I don't know, but something is wrong there.

chrisw · Post by **chrisw** » Sun Nov 18, 2018 11:04 pm

jp wrote: ↑Sun Nov 18, 2018 6:58 am
chrisw wrote: ↑Sat Nov 17, 2018 10:05 pm “They” see “themselves” doing a “community” effort, but “you” see “them”, an “in-group” in which “you” are not “in”, but “you” hope “they” will do whatever and so on.

Herein lies the problem. Groups have positive but also negative aspects. Something has gone wrong (in the sense that the inexorable expected rise in strength is kind of questionable). I’ld be looking at groupthink and social dynamics, rather than technical. There’ll be something dumb, that nobody thought, going on, somewhere, someway, somehow. Is why I decided a few months ago not to “follow” this project, I think I prefer to watch the other ones, working by themselves, or in small groups. All credit, however, to the central core at lc0 for setting a path.
We don't know what percentage of the community is happy for it to be non-zero. It could be a majority or even a large majority.

We don't know if "something has gone wrong". How do you know it's not peak performance?

Well, it’s obviously not at peak possible performamce, because that would be an infernal machine that approximated to a 32-man perfect play database, and there’s no reason why layers of neurons with the correct connections could not achieve that approximation.
It may well be that what “has gone wrong” is that with the current dynamic, which includes the social group called “the community” which influences what is and what isn’t done, that this system has reached some time ago its own “peak performance” and has been generally thrashing around that level for some time.

Leto · Post by **Leto** » Mon Nov 19, 2018 1:18 am

Laskos wrote: ↑Fri Nov 16, 2018 3:56 pm I don't know why you are so enthusiastic. Runs 20xxx and 30xxx are pretty pathetic, especially considering how much resources they have eaten up. Some folks there have overdone something. Just a quick check with the latest engine (rc4) and one of the latest nets:

TC: 60'' + 1''
Code: Select all
Rank Name                          Elo     +/-   Games   Score   Draws
     SF8                           120      68      60   66.7%   43.3%
   
   1 lc0_v19_11261                   0     111      20   50.0%   50.0%
   2 lc0_v19_31214                -147     128      20   30.0%   40.0%
   3 lc0_v19_9155                 -241     127      20   20.0%   40.0%
Finished match
So, run 30xxx is still ~150 Elo points below run 10xxx, and barely ~100 Elo points above 6x64 net 9155 (run 9xxx). Taking into account that the games with 6x64 net were 10-12 times faster and taking into account the hardware resources allocated, the whole run 9xxx could have been completed in less than a day. Lame runs, these newest ones. But I still hope that they will improve some 200 real Elo points over current level, although this is not granted at all.

I don't think Test30 is this close to Test10 in strength, I still think it's several hundred elo weaker. What's 60" + 1", is that game in 1 minute with an extra second per move?

glennsamuel32 · Post by **glennsamuel32** » Mon Nov 19, 2018 3:52 am

I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy

Rank Name Elo + - games score oppo. draws
1 31311 9 10 10 1000 52% -9 28%
2 31255 -9 10 10 1000 48% 9 28%

Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...

Re: Houston: We have lift off ...