Yes, 1m + 1s. It is close, Test30 is about 100 Elo points weaker than Test10.Leto wrote: ↑Mon Nov 19, 2018 1:18 amI don't think Test30 is this close to Test10 in strength, I still think it's several hundred elo weaker. What's 60" + 1", is that game in 1 minute with an extra second per move?Laskos wrote: ↑Fri Nov 16, 2018 3:56 pm I don't know why you are so enthusiastic. Runs 20xxx and 30xxx are pretty pathetic, especially considering how much resources they have eaten up. Some folks there have overdone something. Just a quick check with the latest engine (rc4) and one of the latest nets:
TC: 60'' + 1''
So, run 30xxx is still ~150 Elo points below run 10xxx, and barely ~100 Elo points above 6x64 net 9155 (run 9xxx). Taking into account that the games with 6x64 net were 10-12 times faster and taking into account the hardware resources allocated, the whole run 9xxx could have been completed in less than a day. Lame runs, these newest ones. But I still hope that they will improve some 200 real Elo points over current level, although this is not granted at all.Code: Select all
Rank Name Elo +/- Games Score Draws SF8 120 68 60 66.7% 43.3% 1 lc0_v19_11261 0 111 20 50.0% 50.0% 2 lc0_v19_31214 -147 128 20 30.0% 40.0% 3 lc0_v19_9155 -241 127 20 20.0% 40.0% Finished match
Houston: We have lift off ...
Moderators: hgm, Rebel, chrisw
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Houston: We have lift off ...
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
Re: Houston: We have lift off ...
Still self-play, and not sure what to make out of 1 node result.glennsamuel32 wrote: ↑Mon Nov 19, 2018 3:52 am I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
Rank Name Elo + - games score oppo. draws
1 31311 9 10 10 1000 52% -9 28%
2 31255 -9 10 10 1000 48% 9 28%
-
- Posts: 136
- Joined: Sat Dec 04, 2010 5:31 pm
- Location: 223
Re: Houston: We have lift off ...
Yes, I realized my mistake. Please ignore the last test.Laskos wrote: ↑Mon Nov 19, 2018 6:41 amStill self-play, and not sure what to make out of 1 node result.glennsamuel32 wrote: ↑Mon Nov 19, 2018 3:52 am I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
Rank Name Elo + - games score oppo. draws
1 31311 9 10 10 1000 52% -9 28%
2 31255 -9 10 10 1000 48% 9 28%
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw
Therefore, I used SF8 as a sparring partner.
I think 1 node games with a high sample number tests not only the policy of the network, but also it's all-round strength.
Are the results below comparable to yours ?
Score of 31311 vs stockfish_8_x64_bmi2: 58 - 806 - 136 [0.126]
Elo difference: -336.46 +/- 27.08
1000 of 1000 games finished.
Score of 31330 vs stockfish_8_x64_bmi2: 74 - 780 - 146 [0.147]
Elo difference: -305.45 +/- 25.68
1000 of 1000 games finished.
Bayeselo Ratings
================
Network Selfplay ELO Real ELO vs SF8
======= ========== =============
31311 5783.59 -161
31330 6043.22 -148
Judge without bias, or don't judge at all...
-
- Posts: 1470
- Joined: Mon Apr 23, 2018 7:54 am
Re: Houston: We have lift off ...
I guess this was because it was 1 node. Did it just repeat moves?glennsamuel32 wrote: ↑Mon Nov 19, 2018 7:39 am
I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
---
Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw
-
- Posts: 136
- Joined: Sat Dec 04, 2010 5:31 pm
- Location: 223
Re: Houston: We have lift off ...
I didn't bother to check.jp wrote: ↑Mon Nov 19, 2018 8:13 amI guess this was because it was 1 node. Did it just repeat moves?glennsamuel32 wrote: ↑Mon Nov 19, 2018 7:39 am
I ran a 1000 game tournament at 1 node, using SALC v5 500 positions, reversed colors, 6-man egtb, no adj...
Flags used were --weights and --syzygy
---
Yes, I realized my mistake. Please ignore the last test.
Too many obvious wins became draws due to horrible endgame play.
I even saw a KQQQ vs K play out to a draw
But basically ridiculous endgame play with 2 networks in a tournament.
Judge without bias, or don't judge at all...
-
- Posts: 1796
- Joined: Thu Sep 18, 2008 10:24 pm
Re: Houston: We have lift off ...
Over 1000 points now!
That graph
That graph
-
- Posts: 10948
- Joined: Wed Jul 26, 2006 10:21 pm
- Full name: Kai Laskos
-
- Posts: 565
- Joined: Thu Nov 13, 2014 12:03 pm
Re: Houston: We have lift off ...
how can it raise to >6000 and still suck compared to 11248? I dont get it
-
- Posts: 4889
- Joined: Thu Mar 09, 2006 6:34 am
- Location: Pen Argyl, Pennsylvania
Re: Houston: We have lift off ...
you're right of course, it's starting to remind me of the Seinfeld episode where Elaine says "...fake, fake, fake, fake..." and then smiles,
https://www.youtube.com/watch?v=ywi9-MGUCy8
-
- Posts: 4313
- Joined: Tue Apr 03, 2012 4:28 pm
Re: Houston: We have lift off ...
presumably because the self play learning generalises well at first, but eventually ends up fitting only to itself, and no longer generalising.whereagles wrote: ↑Mon Nov 19, 2018 11:35 am how can it raise to >6000 and still suck compared to 11248? I dont get it
actually someone could possibly test this. As far as I know the self-play elo chart is generated by testing iteration x+1 against iteration x and so on.
did anybody try testing, say iteration 50 against iteration 50 + 100 or whatever? In other words does this additive elo gain showing up in the chart exist when looked at over a range of the chart?