Why do the latest Sergio Vieri 384x30b networks scale so badly?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Why do the latest Sergio Vieri 384x30b networks scale so badly?

Post by Laskos »

Here is the link to the series of Sergio Vieri very strong large networks (net 3010 won TCEC 17), best Lc0 nets on RTX GPU to LTC (maybe the latest LS nets can match it).
https://www.comp.nus.edu.sg/~sergio-v/t60/384x30/

His late releases like
384x30-t60-4155.pb.gz 2020-06-18 09:22 131M
seem very strong in fast testing, significantly stronger than 3010 net
RTX 2070 GPU

TC: 6s + 0.1s

Code: Select all

Score of SV_384x30_4155 vs SV_384x30_3010: 259 - 178 - 363  [0.551] 800
...      SV_384x30_4155 playing White: 217 - 19 - 164  [0.748] 400
...      SV_384x30_4155 playing Black: 42 - 159 - 199  [0.354] 400
...      White vs Black: 376 - 61 - 363  [0.697] 800
Elo difference: 35.3 +/- 17.8, LOS: 100.0 %, DrawRatio: 45.4 %
Finished match
The pentanomial error margins are 30% smaller than shown above.

But at longer time control, they are invariably weaker:

TC: 60s + 1s

Code: Select all

Score of SV_384x30_4155 vs SV_384x30_3010: 30 - 39 - 81  [0.470] 150
...      SV_384x30_4155 playing White: 30 - 1 - 44  [0.693] 75
...      SV_384x30_4155 playing Black: 0 - 38 - 37  [0.247] 75
...      White vs Black: 68 - 1 - 81  [0.723] 150
Elo difference: -20.9 +/- 37.8, LOS: 13.9 %, DrawRatio: 54.0 %
Finished match
The pentanomial error margins are 35% smaller than shown above.

The remark is outside error margins, I am not sure what he is doing since the net 3290, when he reset the LR, but the nets don't seem to improve their bad scaling which started since then.
dkappe
Posts: 1631
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: Why do the latest Sergio Vieri 384x30b networks scale so badly?

Post by dkappe »

It seems the MLH is still an experiment. Most testing in the leela discord is short tc, so unlikely to uncover this.

We’ll see if the bad start in tcec SuFi is just a statistical wobble or the start of an epic collapse.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
jjoshua2
Posts: 99
Joined: Sat Mar 10, 2018 6:16 am

Re: Why do the latest Sergio Vieri 384x30b networks scale so badly?

Post by jjoshua2 »

I will still be surprised if lc0 does not win a couple games in a row sometime in the reminder. Some engine has to win first, and close to 50% chance it will win the 2nd one, so getting 2-0 means nothing really. EDIT well 3 losses in a row now :)
dkappe
Posts: 1631
Joined: Tue Aug 21, 2018 7:52 pm
Full name: Dietrich Kappe

Re: Why do the latest Sergio Vieri 384x30b networks scale so badly?

Post by dkappe »

Something looks suspicious in the tcec gpu utilization. Maybe.

https://tcec-chess.com/gpu_temperature.txt
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: Why do the latest Sergio Vieri 384x30b networks scale so badly?

Post by Milos »

dkappe wrote: Sat Jun 20, 2020 11:13 pm Something looks suspicious in the tcec gpu utilization. Maybe.

https://tcec-chess.com/gpu_temperature.txt
It's taken every 5min, so when low, you look at one during SF move when GPU is idle. Lc0 fansboys are really starting to get paranoid.
So hard to accept the fact that SF might have actually improved significantly since S17 and Lc0 didn't improve much if at all.
Current 3972 net is very, very close to 3010. And while mlh might produce a bit faster wins for Lc0, there is no clear indication that it brings any Elo at all (also that it looses Elo).
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Why do the latest Sergio Vieri 384x30b networks scale so badly?

Post by Ovyron »

Milos wrote: Sun Jun 21, 2020 12:30 am So hard to accept the fact that SF might have actually improved significantly since S17
Did you mean to write "S1.7"?, I don't think comparing the development of Stockfish 1.7 (1.7.1?) since such an early version with that of Leela is relevant.
Alayan
Posts: 550
Joined: Tue Nov 19, 2019 8:48 pm
Full name: Alayan Feh

Re: Why do the latest Sergio Vieri 384x30b networks scale so badly?

Post by Alayan »

S17 obviously means TCEC season 17.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Why do the latest Sergio Vieri 384x30b networks scale so badly?

Post by Ovyron »

Oh, that makes sense. I'd never gotten why SF is shortened like that when its name is not Stock Fish, but it's probably so when people say "S17" instead of "Season 17" people don't get confused when reading it in a sentence.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Why do the latest Sergio Vieri 384x30b networks scale so badly?

Post by Laskos »

dkappe wrote: Sat Jun 20, 2020 7:54 pm It seems the MLH is still an experiment. Most testing in the leela discord is short tc, so unlikely to uncover this.

We’ll see if the bad start in tcec SuFi is just a statistical wobble or the start of an epic collapse.

As you seem to be in contact with Lc0 community advice them in testing methodology. My 150 game 60s+1s match is probably equivalent to some 300 games "others" tests. People for some reason try to choose balanced very regular openings in their tests, and often not that short openings. The result from such "regular" opening suite of net 3972 versus net 3010 at 60s+1s on an RTX GPU will be 80-90% draw rate. It was shown that this is a very inefficient way of testing, factors of often 2-6 needing more games for the same statistical significance than testing from unbalanced opening suite. The unbalance was shown to be at the border 50% White win 50% draw border, or 0.8-1.0 eval of SF. I take my suites from human games, to be short 3-movers, but in that 0.8-1.0 range of the SF evals. One can build hundreds of such openings and play thousands of games side-reversed in a match from FIDE Elo above 2200 human not very balanced openings. Read "Match Statistics" in the Chess Wiki on how to use pentanomial variance with unbalanced openings, which often is almost 2 times smaller than the usual trinomial variance with ultra-balanced openings, thus needing 4 times less games for the same statistical significance. People don't read Chess Wiki and its links? Michel posted very useful material there and links. One cannot test LTC between 2 similar Lc0 nets and expect high statistical significance not reading that page. 90% draw rate is not a joke in separating engines strength-wise, while trinomial errors are no smaller than the pentanomial ones using unbalanced openings. Lc0 nets are so similar style-wise, that high draw rates between them from regular balanced openings are unavoidable.
mbabigian
Posts: 204
Joined: Tue Oct 15, 2013 2:34 am
Location: US
Full name: Mike Babigian

Re: Why do the latest Sergio Vieri 384x30b networks scale so badly?

Post by mbabigian »

Read "Match Statistics" in the Chess Wiki on how to use pentanomial variance with unbalanced openings, which often is almost 2 times smaller than the usual trinomial variance with ultra-balanced openings, thus needing 4 times less games for the same statistical significance.
The direct link: https://www.chessprogramming.org/Match_Statistics

Unfortunately the Leela project is still in its infancy and its maturing duration is hamstrung by the normal social nonsense that plagues all open source efforts. If the project remains active long enough "to be or not to be" will eventually be typed. It is just a matter of patience unfortunately. It often drives me nuts every time they rely on voting for decisions that can be imperically derived, but social norms, an obsession with trying to "appear" democratic keeps progress at a snails pace.

No worries however, they are slowly rediscovering knowledge documented decades ago. Someone looking to earn their PhD in the social sciences could find a gaggle of Thesis ideas by studying human behavior in open source projects. The results could even suggest ways to speed up the maturing process! :D
“Censorship is telling a man he can't have a steak just because a baby can't chew it.” ― Mark Twain