CCRL 40/40, 40/4 and FRC lists updated (23rd November 2019)

Modern Times · Post by **Modern Times** » Tue Nov 26, 2019 10:38 pm

Eduard wrote: ↑Tue Nov 26, 2019 8:22 pm Anyone who conscientiously tests Lc0 against AB Engines knows, there is a Lc0 Ratio for reliable results.

The Leela ratio is very useful for comparing two machines - Machine A with GPU A and CPU A vs machine B with GPU B and CPU B. If the Leela ratio on machine A is 1.0 and on machine B it is 2.5, then you know that Lc0 is going to be a lot better on machine B. But as an absolute measure of determining equivalence of CPU and CPU on one machine - that is very debateable. There have been extensive discussions about this on this forum. Some people say you should use cost to measure equivalence, some say use power consumption to measure equivalence, some swear by the Leela ratio. Personally, I think it is a general guide, and just one of many factors to consider. And not all NN engines behave the same either.

pohl4711 · Post by **pohl4711** » Wed Nov 27, 2019 10:34 am

Eduard wrote: ↑Tue Nov 26, 2019 8:22 pm Anyone who conscientiously tests Lc0 against AB Engines knows, there is a Lc0 Ratio for reliable results. And CCRL knows that too. Nevertheless, CCRL does not release any information about it.

Agreed. Any testing of NN-engine vs AB-engines without information about the Leela-Ratio makes the results total useless. And I believe, the CCRL Leela-Ratio of the Fat Fritz testrun could be ridiculous high (2 - 2.5 or even worse) - only explanation for Fat Fritz being at the same Elo-level as a new Stockfish-Dev.
Somewhat around 1.0 (between 0.7 and 1.3) is an acceptable Leela-Ratio for testing.

AndrewGrant · Post by **AndrewGrant** » Wed Nov 27, 2019 12:19 pm

pohl4711 wrote: ↑Wed Nov 27, 2019 10:34 am
Eduard wrote: ↑Tue Nov 26, 2019 8:22 pm Anyone who conscientiously tests Lc0 against AB Engines knows, there is a Lc0 Ratio for reliable results. And CCRL knows that too. Nevertheless, CCRL does not release any information about it.
Agreed. Any testing of NN-engine vs AB-engines without information about the Leela-Ratio makes the results total useless. And I believe, the CCRL Leela-Ratio of the Fat Fritz testrun could be ridiculous high (2 - 2.5 or even worse) - only explanation for Fat Fritz being at the same Elo-level as a new Stockfish-Dev.
Somewhat around 1.0 (between 0.7 and 1.3) is an acceptable Leela-Ratio for testing.

Well its RTX2080 vs Athlon 64 X2 4600+ (2.4 GHz) on 4 cores. Does not require much analysis to see that the results are not as meaningful as others cast present them.

Graham Banks · Post by **Graham Banks** » Wed Nov 27, 2019 12:44 pm

AndrewGrant wrote: ↑Wed Nov 27, 2019 12:19 pm
pohl4711 wrote: ↑Wed Nov 27, 2019 10:34 am
Eduard wrote: ↑Tue Nov 26, 2019 8:22 pm Anyone who conscientiously tests Lc0 against AB Engines knows, there is a Lc0 Ratio for reliable results. And CCRL knows that too. Nevertheless, CCRL does not release any information about it.
Agreed. Any testing of NN-engine vs AB-engines without information about the Leela-Ratio makes the results total useless. And I believe, the CCRL Leela-Ratio of the Fat Fritz testrun could be ridiculous high (2 - 2.5 or even worse) - only explanation for Fat Fritz being at the same Elo-level as a new Stockfish-Dev.
Somewhat around 1.0 (between 0.7 and 1.3) is an acceptable Leela-Ratio for testing.
Well its RTX2080 vs Athlon 64 X2 4600+ (2.4 GHz) on 4 cores. Does not require much analysis to see that the results are not as meaningful as others cast present them.

We use mainly modern Intel and AMD computers for testing. Nobody uses a Athlon 64 X2 4600+ (2.4 GHz). All of our computers are just benchmarked to that CPU.

AndrewGrant · Post by **AndrewGrant** » Wed Nov 27, 2019 1:05 pm

Graham Banks wrote: ↑Wed Nov 27, 2019 12:44 pm
AndrewGrant wrote: ↑Wed Nov 27, 2019 12:19 pm
pohl4711 wrote: ↑Wed Nov 27, 2019 10:34 am
Eduard wrote: ↑Tue Nov 26, 2019 8:22 pm Anyone who conscientiously tests Lc0 against AB Engines knows, there is a Lc0 Ratio for reliable results. And CCRL knows that too. Nevertheless, CCRL does not release any information about it.
Agreed. Any testing of NN-engine vs AB-engines without information about the Leela-Ratio makes the results total useless. And I believe, the CCRL Leela-Ratio of the Fat Fritz testrun could be ridiculous high (2 - 2.5 or even worse) - only explanation for Fat Fritz being at the same Elo-level as a new Stockfish-Dev.
Somewhat around 1.0 (between 0.7 and 1.3) is an acceptable Leela-Ratio for testing.
Well its RTX2080 vs Athlon 64 X2 4600+ (2.4 GHz) on 4 cores. Does not require much analysis to see that the results are not as meaningful as others cast present them.
We use mainly modern Intel and AMD computers for testing. Nobody uses a Athlon 64 X2 4600+ (2.4 GHz). All of our computers are just benchmarked to that CPU.

Well yeah, I never thought otherwise. But if its scaled to match, its "akin" to playing on an Athlon. Which is perfectly fine. CCRL is explicit about how they scale things. However, readers of the list don't always appreciate that.

pohl4711 · Post by **pohl4711** » Wed Nov 27, 2019 2:09 pm

Graham Banks wrote: ↑Wed Nov 27, 2019 12:44 pm
AndrewGrant wrote: ↑Wed Nov 27, 2019 12:19 pm
pohl4711 wrote: ↑Wed Nov 27, 2019 10:34 am
Eduard wrote: ↑Tue Nov 26, 2019 8:22 pm Anyone who conscientiously tests Lc0 against AB Engines knows, there is a Lc0 Ratio for reliable results. And CCRL knows that too. Nevertheless, CCRL does not release any information about it.
Agreed. Any testing of NN-engine vs AB-engines without information about the Leela-Ratio makes the results total useless. And I believe, the CCRL Leela-Ratio of the Fat Fritz testrun could be ridiculous high (2 - 2.5 or even worse) - only explanation for Fat Fritz being at the same Elo-level as a new Stockfish-Dev.
Somewhat around 1.0 (between 0.7 and 1.3) is an acceptable Leela-Ratio for testing.
Well its RTX2080 vs Athlon 64 X2 4600+ (2.4 GHz) on 4 cores. Does not require much analysis to see that the results are not as meaningful as others cast present them.
We use mainly modern Intel and AMD computers for testing. Nobody uses a Athlon 64 X2 4600+ (2.4 GHz). All of our computers are just benchmarked to that CPU.

But what is the benchmark (Leela-Ratio) of your NN-testings??????

Graham Banks · Post by **Graham Banks** » Wed Nov 27, 2019 3:53 pm

pohl4711 wrote: ↑Wed Nov 27, 2019 2:09 pm.........But what is the benchmark (Leela-Ratio) of your NN-testings??????

Not sure offhand, as I didn't do the testing. Will try to find out.
However, the following was posted in Lc0 discord:

To be fair, Ray is correct, the ratio is irrelevant for a ratings list.
If Leela crushes Fire on one 2080 vs 1-cpu it is a result. And if it loses to SF14 on 32 cores, it is another. The results are all correct, whether some wins or losses end up lopsided.
On the other hand it must be said that if the top CPU testing in that list is 4 cores, the NNs will easily dominate all the top spots.
The ratio concept is really only about 2 things:
1 - Having a common baseline with AlphaZero to be able to compare to its published results. This is the primary reason for using that specific ratio, no other.
2 - Find a common baseline for everyone to use so that results can be compared along a variety of computer setups. Otherwise comparing the results of one person with another would utter chaos. It still is, but at least this helps.
AlphaZero's ratio is merely chosen for no.2 to align both

crem · Post by **crem** » Wed Nov 27, 2019 4:10 pm

I couldn't find much details in description on CCRL site, so just guessing and hoping that someone either confirms or rejects those claims.

So, here is my guesses about CCRL rating:

1. It's driven by multiple volunteers who donate their computer's time to run games.

2. As CPU differ between volunteers, a Crafty-based benchmark is ran to determine the factor which is used to scale time control to match contributor's CPU to Athlon 64 X2 4600+ (2.4 GHz).

3. If a volunteer happens also to have RTX 2080 GPU, games vs NN-based engines are ran on this machine, and they are just given the same time as CPU engines. (so how much of GPU time an engine gets only depends on speed of CPU, not GPU)

4. If there are more than one GPU machine, it may happen that same NN-based engine running on two different machines with same GPU, will get different amount of GPU time.

pohl4711 · Post by **pohl4711** » Wed Nov 27, 2019 4:14 pm

Graham Banks wrote: ↑Wed Nov 27, 2019 3:53 pm To be fair, Ray is correct, the ratio is irrelevant for a ratings list.

That is complete nonsense. It is the benchmark.
Otherwise you can test lc0 on 1 CPU-core vs Stockfish and other opponents on a Threadripper and say: Oh, look. Stockfish crushes lc0 and lc0 is so bad, only 2700 Elo
Or Stockfish and other opponents on singlecore vs lc0 on 2 RTX 2080. And say: Oh,look, lc0 crushes Stockfish and lc0 is the clear world's number one and has 3500 Elo.
Ridiculous. Or not? Without Leela-Ratio, both "tests" would be valid...And that is why the Leela-Ratio must be in a 0.7-1.3 range.

Fo CCRL this would mean: Checkout the number n/s, lc0 calculates on your system (RTX 2080 (should be around 36000 n/s in starting position)) using a T10 or T30 network (size must be 256x20!!!). Then multiply the result with 875. And then you have your bench, as if Stockfish would run on CPU. And then you can calibrate lc0 like an AB-Engine for your CCRL-testing ("Equivalent to 40 moves in 4 minutes on Athlon 64 X2 4600+ (2.4 GHz), about 1.5 minutes on a modern CPU.")... otherwise your testing-results are just nonsense, because at the moment you calibrate all AB-engines, but the NN-engines not. Like now (Fat Fritz number 1 on CCRL). Fat Fritz is not better than Stockfish-Dev. And definitly not number 1. Not on this planet.

crem · Post by **crem** » Wed Nov 27, 2019 4:35 pm

pohl4711 wrote: ↑Wed Nov 27, 2019 4:14 pm
Fo CCRL this would mean: Checkout the number n/s, lc0 calculates on your system (RTX 2080 (should be around 36000 n/s in starting position)) using a T10 or T30 network (size must be 256x20!!!). Then multiply the result with 875. And then you have your bench, as if Stockfish would run on CPU. And then you can calibrate lc0 like an AB-Engine for your CCRL-testing ("Equivalent to 40 moves in 4 minutes on Athlon 64 X2 4600+ (2.4 GHz), about 1.5 minutes on a modern CPU.")... otherwise your testing-results are just nonsense, because at the moment you calibrate all AB-engines, but the NN-engines not. Like now (Fat Fritz number 1 on CCRL). Fat Fritz is not better than Stockfish-Dev. And definitly not number 1. Not on this planet.

They also should fix the Lc0 version/weights file/etc for those tests, similarly like they do with Crafty.

Otherwise there's e.g. no sense to try to speed Lc0 up, as any nps improvement will be offset by giving CPU-based opponents more time.
And the opposite, if NN-based engine start to use large and slow neural network, it doesn't mean CPU engines should be slowed down too.

CCRL 40/40, 40/4 and FRC lists updated (23rd November 2019)

Re: CCRL 40/40, 40/4 and FRC lists updated (23rd November 2019)

Re: CCRL 40/40, 40/4 and FRC lists updated (23rd November 2019)

Re: CCRL 40/40, 40/4 and FRC lists updated (23rd November 2019)

Re: CCRL 40/40, 40/4 and FRC lists updated (23rd November 2019)

Re: CCRL 40/40, 40/4 and FRC lists updated (23rd November 2019)

Re: CCRL 40/40, 40/4 and FRC lists updated (23rd November 2019)

Re: CCRL 40/40, 40/4 and FRC lists updated (23rd November 2019)

Re: CCRL 40/40, 40/4 and FRC lists updated (23rd November 2019)

Re: CCRL 40/40, 40/4 and FRC lists updated (23rd November 2019)

Re: CCRL 40/40, 40/4 and FRC lists updated (23rd November 2019)