JJJ wrote: ↑Mon Jun 25, 2018 11:46 am
You might try id 440 now Kai , seems good in selfplay, maybe the best now ?
Yes, now running, intermediate results show that ID440 is close to ID395 in the gauntlet against AB engines. ID395 scored 103.0/200, let's see what ID440 will score (in about one-two hours). The small net from testserver ID9065 scored 93.5/200, which is already close to the best bignets from the mainserver.
ID 9071 now at 3606 elo self play.
Presumably this has the potential to be a lot stronger than all the others once the net gets bigger.
Well, what I dont understand is that the original branch had i.e. ID 107 with 6x64 at 4780 which is much higher/stronger right?
JJJ wrote: ↑Mon Jun 25, 2018 11:46 am
You might try id 440 now Kai , seems good in selfplay, maybe the best now ?
Yes, now running, intermediate results show that ID440 is close to ID395 in the gauntlet against AB engines. ID395 scored 103.0/200, let's see what ID440 will score (in about one-two hours). The small net from testserver ID9065 scored 93.5/200, which is already close to the best bignets from the mainserver.
ID 9071 now at 3606 elo self play.
Presumably this has the potential to be a lot stronger than all the others once the net gets bigger.
JJJ wrote: ↑Mon Jun 25, 2018 11:46 am
You might try id 440 now Kai , seems good in selfplay, maybe the best now ?
Yes, now running, intermediate results show that ID440 is close to ID395 in the gauntlet against AB engines. ID395 scored 103.0/200, let's see what ID440 will score (in about one-two hours). The small net from testserver ID9065 scored 93.5/200, which is already close to the best bignets from the mainserver.
ID 9071 now at 3606 elo self play.
Presumably this has the potential to be a lot stronger than all the others once the net gets bigger.
ID440 from mainserver came at 99.5/200, a bit weaker than ID395 at 103.0/200, but within error margins. I will check one of the later smallnets from testserver.
Laskos wrote: ↑Mon Jun 25, 2018 1:18 pm
Yes, now running, intermediate results show that ID440 is close to ID395 in the gauntlet against AB engines. ID395 scored 103.0/200, let's see what ID440 will score (in about one-two hours). The small net from testserver ID9065 scored 93.5/200, which is already close to the best bignets from the mainserver.
ID 9071 now at 3606 elo self play.
Presumably this has the potential to be a lot stronger than all the others once the net gets bigger.
Well, what I dont understand is that the original branch had i.e. ID 107 with 6x64 at 4780 which is much higher/stronger right?
rgds
I doubt it. The elo of self play is very distorted so it's hard to know what it means without independent testing like Kai does.
I read the following:
9074 9073 true +101 -177 =172
I think that the result is significantly worse so I do not understand how they get true for pass.
Note that conditions of the test are also not clear because a match may be from the opening position or better than it from different opening positions.
Lion wrote: ↑Mon Jun 25, 2018 2:10 pm
Well, what I dont understand is that the original branch had i.e. ID 107 with 6x64 at 4780 which is much higher/stronger right?
rgds
I doubt it. The elo of self play is very distorted so it's hard to know what it means without independent testing like Kai does.
I also do not understand how they have pass with negative result and fail with positive results. http://testserver.lczero.org/matches
I read the following:
9074 9073 true +101 -177 =172
I think that the result is significantly worse so I do not understand how they get true for pass.
Note that conditions of the test are also not clear because a match may be from the opening position or better than it from different opening positions.
Uri, it is only mediocre formatting / data representation. LCzero always promotes, so it doesn't really matter "pass = true" or "pass = false", which were inherited by Leela Zero (which instead has gating, with promotion only above 55%).
The table indicate as "pass = false" (fail) what should in reality by classified as "test" (grey crosses in Leela Zero). The test matches are usually regression tests with a previous "strong" network . They are needed to check if the "self play elo" (which tends to compound error bars) is still reasonably accurate.
Laskos wrote: ↑Mon Jun 25, 2018 3:19 pm
ID440 from mainserver came at 99.5/200, a bit weaker than ID395 at 103.0/200, but within error margins. I will check one of the later smallnets from testserver.
My STS test confirms your result.
id395 scores 76,5%
id440 scores 73,2%
Laskos wrote: ↑Mon Jun 25, 2018 3:19 pm
ID440 from mainserver came at 99.5/200, a bit weaker than ID395 at 103.0/200, but within error margins. I will check one of the later smallnets from testserver.
My STS test confirms your result.
id395 scores 76,5%
id440 scores 73,2%
1 sec / position on a 780Ti
It seems that the latest smallnets from testserver are already almost the level of ID395 from mainserver, at least at my short time control and my GTX 1060 GPU. I switched from LittleBlitzer GUI to Cutechess-Cli UI and got similar results as before. LittleBlitzer has some problems in games without adjudication (I use no adjudication, as Leela can blunder even in late endgames). When searching in stalemate/mate positions GUI sometimes sends 'best move none' and puts it as "illegal move" (in the "i" row). Also, 50-move rule is not always enforced correctly in LittleBlitzer. Maybe one can study these problems in more detail and send a report to the developer of the LittleBlitzer.
So, in Cutechess-Cli, in the same gauntlet against AB engines at same short time control:
ID395 (15x192 from mainserver)
101.5/200
ID9149 (6x64 from testserver)
98.0/200
They are basically within error margins one from another in these conditions. Maybe they scale differently, though, I don't know. Both scale better than AB engines of similar strength.
i would guess the larger the net the more elo gained with time. but there is this from the lc0 forum --
I did a quick run of 9142 and in a small sample of just 70 games versus the same line up of engines I posted earlier it scored just over 35% for a ELO performance of 3220ish. I have 390 and 395 at 3324 so the test net for this run is also doing incredibly well given it had only played 7.2 million games to get to 9142.