Page 1 of 2

SPCC: Testrun of Fat Fritz 1.0 finished

Posted: Mon Nov 18, 2019 5:16 am
by pohl4711
Testrun finished: Fat Fritz 1.0

https://www.sp-cc.de

(Perhaps you have to clear your browsercache or reload the website)

Re: SPCC: Testrun of Fat Fritz 1.0 finished

Posted: Mon Nov 18, 2019 12:11 pm
by pohl4711
pohl4711 wrote:
Mon Nov 18, 2019 5:16 am
Testrun finished: Fat Fritz 1.0
I decided to replay the head-to-head between Fat Fritz vs Stockfish 190622 (500 games, HERT_250 openings, 50''+500ms), which was a part of the 3000 games testrun of Fat Fritz, with 4x longer thinking-time of 200''+2''. Which raises the average game-duration from 3 minutes to 12-13 minutes. Because, I want to end the discussion, that Fat Fritz (or lc0) benefits soooooo much more from longer thinking-time, than Stockfish does.

Result of the 50''+500ms testrun was:
Fat Fritz 1.0 vs. Stockfish 190622 bmi2 : 500 (+ 74,=318,-108), 46.6 % (Draws: 63.6%)

What to expect from the rematch with 4x more time (and 4x bigger Hash/NNCacheSize)? The draw-rate will increase, that could push the results towards a 50%-50% result - that will cause the illusion, that Fat Fritz ist getting stronger, even though only the higher number of draws is responsible for that (that effect is well known in computerchess since more than 30 years!). If Fat Fritz gets really stronger with more thinking-time, it should climb over the 50%-level - so it should beat Stockfish 190622. But, I doubt that. But in 4-5 days we will get the result.

Re: SPCC: Testrun of Fat Fritz 1.0 finished

Posted: Mon Nov 18, 2019 4:18 pm
by schack
(Ignore. Misread post.)

Re: SPCC: Testrun of Fat Fritz 1.0 finished

Posted: Mon Nov 18, 2019 5:43 pm
by sovaz1997
schack wrote:
Mon Nov 18, 2019 4:18 pm
(Ignore. Misread post.)
Wdum?

Re: SPCC: Testrun of Fat Fritz 1.0 finished

Posted: Mon Nov 18, 2019 6:05 pm
by schack
I asked a question, but it was based on a misreading of the above. So I edited to remove the question. If I could delete it, I would.

Re: SPCC: Testrun of Fat Fritz 1.0 finished

Posted: Tue Nov 19, 2019 5:34 am
by JJJ
So it is weaker than Stockfish and lczero !

Re: SPCC: Testrun of Fat Fritz 1.0 finished

Posted: Thu Nov 21, 2019 12:24 pm
by pohl4711
pohl4711 wrote:
Mon Nov 18, 2019 12:11 pm
pohl4711 wrote:
Mon Nov 18, 2019 5:16 am
Testrun finished: Fat Fritz 1.0
I decided to replay the head-to-head between Fat Fritz vs Stockfish 190622 (500 games, HERT_250 openings, 50''+500ms), which was a part of the 3000 games testrun of Fat Fritz, with 4x longer thinking-time of 200''+2''. Which raises the average game-duration from 3 minutes to 12-13 minutes. Because, I want to end the discussion, that Fat Fritz (or lc0) benefits soooooo much more from longer thinking-time, than Stockfish does.

Result of the 50''+500ms testrun was:
Fat Fritz 1.0 vs. Stockfish 190622 bmi2 : 500 (+ 74,=318,-108), 46.6 % (Draws: 63.6%)

What to expect from the rematch with 4x more time (and 4x bigger Hash/NNCacheSize)? The draw-rate will increase, that could push the results towards a 50%-50% result - that will cause the illusion, that Fat Fritz ist getting stronger, even though only the higher number of draws is responsible for that (that effect is well known in computerchess since more than 30 years!). If Fat Fritz gets really stronger with more thinking-time, it should climb over the 50%-level - so it should beat Stockfish 190622. But, I doubt that. But in 4-5 days we will get the result.
Because I need my machines otherwise, I aborted the testrun with 4x more time after 250 of 500 games. The result until then is exactly like I expected:
200''+2000ms:
Fat Fritz 1.0 vs. Stockfish 190622 bmi2 :250 (+ 35,=177,- 38), 49.4 % (Draws: 70.8%) =
Measureable higher draw-rate, which pushes the result closer to the 50%-50% score (+7% more draw-rate should push the result around 3-3.5% closer to 50%-50%. That is exactly, what we see.)

QED

Re: SPCC: Testrun of Fat Fritz 1.0 finished

Posted: Thu Nov 21, 2019 5:09 pm
by lkaufman
pohl4711 wrote:
Thu Nov 21, 2019 12:24 pm
pohl4711 wrote:
Mon Nov 18, 2019 12:11 pm
pohl4711 wrote:
Mon Nov 18, 2019 5:16 am
Testrun finished: Fat Fritz 1.0
I decided to replay the head-to-head between Fat Fritz vs Stockfish 190622 (500 games, HERT_250 openings, 50''+500ms), which was a part of the 3000 games testrun of Fat Fritz, with 4x longer thinking-time of 200''+2''. Which raises the average game-duration from 3 minutes to 12-13 minutes. Because, I want to end the discussion, that Fat Fritz (or lc0) benefits soooooo much more from longer thinking-time, than Stockfish does.

Result of the 50''+500ms testrun was:
Fat Fritz 1.0 vs. Stockfish 190622 bmi2 : 500 (+ 74,=318,-108), 46.6 % (Draws: 63.6%)

What to expect from the rematch with 4x more time (and 4x bigger Hash/NNCacheSize)? The draw-rate will increase, that could push the results towards a 50%-50% result - that will cause the illusion, that Fat Fritz ist getting stronger, even though only the higher number of draws is responsible for that (that effect is well known in computerchess since more than 30 years!). If Fat Fritz gets really stronger with more thinking-time, it should climb over the 50%-level - so it should beat Stockfish 190622. But, I doubt that. But in 4-5 days we will get the result.
Because I need my machines otherwise, I aborted the testrun with 4x more time after 250 of 500 games. The result until then is exactly like I expected:
200''+2000ms:
Fat Fritz 1.0 vs. Stockfish 190622 bmi2 :250 (+ 35,=177,- 38), 49.4 % (Draws: 70.8%) =
Measureable higher draw-rate, which pushes the result closer to the 50%-50% score (+7% more draw-rate should push the result around 3-3.5% closer to 50%-50%. That is exactly, what we see.)

QED
I would draw the opposite conclusion from your results. If we discard the draws, the score went from 74-108 to 35-38, a very marked improved (though the error margins are large). Probably one more doubling in TC would put Fat Fritz ahead of SF under your test conditions. I don't have a dog in that dogfight, but I will say that I do rather like Fat Fritz and I have the subjective impression that it needs more time than Lc0 to shine.

Re: SPCC: Testrun of Fat Fritz 1.0 finished

Posted: Thu Nov 21, 2019 7:19 pm
by Raphexon
lkaufman wrote:
Thu Nov 21, 2019 5:09 pm
pohl4711 wrote:
Thu Nov 21, 2019 12:24 pm
pohl4711 wrote:
Mon Nov 18, 2019 12:11 pm
pohl4711 wrote:
Mon Nov 18, 2019 5:16 am
Testrun finished: Fat Fritz 1.0
I decided to replay the head-to-head between Fat Fritz vs Stockfish 190622 (500 games, HERT_250 openings, 50''+500ms), which was a part of the 3000 games testrun of Fat Fritz, with 4x longer thinking-time of 200''+2''. Which raises the average game-duration from 3 minutes to 12-13 minutes. Because, I want to end the discussion, that Fat Fritz (or lc0) benefits soooooo much more from longer thinking-time, than Stockfish does.

Result of the 50''+500ms testrun was:
Fat Fritz 1.0 vs. Stockfish 190622 bmi2 : 500 (+ 74,=318,-108), 46.6 % (Draws: 63.6%)

What to expect from the rematch with 4x more time (and 4x bigger Hash/NNCacheSize)? The draw-rate will increase, that could push the results towards a 50%-50% result - that will cause the illusion, that Fat Fritz ist getting stronger, even though only the higher number of draws is responsible for that (that effect is well known in computerchess since more than 30 years!). If Fat Fritz gets really stronger with more thinking-time, it should climb over the 50%-level - so it should beat Stockfish 190622. But, I doubt that. But in 4-5 days we will get the result.
Because I need my machines otherwise, I aborted the testrun with 4x more time after 250 of 500 games. The result until then is exactly like I expected:
200''+2000ms:
Fat Fritz 1.0 vs. Stockfish 190622 bmi2 :250 (+ 35,=177,- 38), 49.4 % (Draws: 70.8%) =
Measureable higher draw-rate, which pushes the result closer to the 50%-50% score (+7% more draw-rate should push the result around 3-3.5% closer to 50%-50%. That is exactly, what we see.)

QED
I would draw the opposite conclusion from your results. If we discard the draws, the score went from 74-108 to 35-38, a very marked improved (though the error margins are large). Probably one more doubling in TC would put Fat Fritz ahead of SF under your test conditions. I don't have a dog in that dogfight, but I will say that I do rather like Fat Fritz and I have the subjective impression that it needs more time than Lc0 to shine.
I think NN's have a very asymptotic scaling.
At low nodes per move they can badly suffer from a tactical horizon effect.
But once you get past that they don't really change their mind anymore.

Re: SPCC: Testrun of Fat Fritz 1.0 finished

Posted: Thu Nov 21, 2019 8:18 pm
by Dann Corbit
I get a big difference between 12 minutes (720 seconds) per position and 3000 seconds per position, analyzing opening positions.
So I think that they are like other engines and do better with more time.