CUDA benchmarks

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

CUDA benchmarks

Post by corres »

There is a list of CUDA benchmarks for GPUs: https://browser.geekbench.com/cuda-benchmarks
From this list GTX 1080 Ti (used by TCEC 13) has ~ 1.7 times higher value of cuda benchmark than GTX 1060 ("standard" GPU for LC0 testers) has.
How much is the difference in ELO between GTX 1080 Ti and GTX 1060 supposed the two GPUs run on similar PCs?
I think LC0 running on TCEC hardware use only two threads - as it does in our PC. Am I right?
User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: CUDA benchmarks

Post by CMCanavessi »

In tcec lc0 is using 4 threads, 2 for each gpu.
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: CUDA benchmarks

Post by corres »

CMCanavessi wrote: Sun Aug 12, 2018 6:36 pm In tcec lc0 is using 4 threads, 2 for each gpu.
OK.
And they use the two GPU independently.
They need only the two GPU to play against each other the two NN-based engines (LC0 and DeusX).
Is it right?
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: CUDA benchmarks

Post by corres »

corres wrote: Sun Aug 12, 2018 7:21 pm
CMCanavessi wrote: Sun Aug 12, 2018 6:36 pm In tcec lc0 is using 4 threads, 2 for each gpu.
OK.
And they use the two GPU independently.
They need only the two GPU to play against each other the two NN-based engines (LC0 and DeusX).
Is it right? (Right!)
TCEC 13 GPU server uses together with the two GTX 1080 Ti a Xeon E5-2609v4 running on 1.7GHz.
Considering the above Leela Chess Zero and DeusX on TCEC uses effectively only the same power of PC than they would run on a laptop with 1.7 GHz and GTX 1080 Ti.
Somebody said GPU server of TCEC has same the power as ~35% of AlphaZero 4xTPU had.
I do not believe to be the power of 4xTPU (even from the first generation) only three time more powerful than a weak laptop with a GTX 1080 Ti.
Nay Lin Tun
Posts: 708
Joined: Mon Jan 16, 2012 6:34 am

Re: CUDA benchmarks

Post by Nay Lin Tun »

corres wrote: Mon Aug 13, 2018 8:54 am
corres wrote: Sun Aug 12, 2018 7:21 pm
CMCanavessi wrote: Sun Aug 12, 2018 6:36 pm In tcec lc0 is using 4 threads, 2 for each gpu.
OK.
And they use the two GPU independently.
They need only the two GPU to play against each other the two NN-based engines (LC0 and DeusX).
Is it right? (Right!)
TCEC 13 GPU server uses together with the two GTX 1080 Ti a Xeon E5-2609v4 running on 1.7GHz.
Considering the above Leela Chess Zero and DeusX on TCEC uses effectively only the same power of PC than they would run on a laptop with 1.7 GHz and GTX 1080 Ti.
Somebody said GPU server of TCEC has same the power as ~35% of AlphaZero 4xTPU had.
I do not believe to be the power of 4xTPU (even from the first generation) only three time more powerful than a weak laptop with a GTX 1080 Ti.
Google 4 TPU would probably be 20x more powerful than 2x 1080Ti, however Leela ratio is based on SF vs A0 search speed ratio in their match. With Lco crem compile, it is about 8x faster than previous Leela Zero. So in practical speed term, leela ratio become 0.35( it would be 0.043 in case with Leela Zero)
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: CUDA benchmarks

Post by corres »

Nay Lin Tun wrote: Mon Aug 13, 2018 4:05 pm
corres wrote: Mon Aug 13, 2018 8:54 am
corres wrote: Sun Aug 12, 2018 7:21 pm
CMCanavessi wrote: Sun Aug 12, 2018 6:36 pm In tcec lc0 is using 4 threads, 2 for each gpu.
OK.
And they use the two GPU independently.
They need only the two GPU to play against each other the two NN-based engines (LC0 and DeusX).
Is it right? (Right!)
TCEC 13 GPU server uses together with the two GTX 1080 Ti a Xeon E5-2609v4 running on 1.7GHz.
Considering the above Leela Chess Zero and DeusX on TCEC uses effectively only the same power of PC than they would run on a laptop with 1.7 GHz and GTX 1080 Ti.
Somebody said GPU server of TCEC has same the power as ~35% of AlphaZero 4xTPU had.
I do not believe to be the power of 4xTPU (even from the first generation) only three time more powerful than a weak laptop with a GTX 1080 Ti.
Google 4 TPU would probably be 20x more powerful than 2x 1080Ti, however Leela ratio is based on SF vs A0 search speed ratio in their match. With Lco crem compile, it is about 8x faster than previous Leela Zero. So in practical speed term, leela ratio become 0.35( it would be 0.043 in case with Leela Zero)
As I know LC0 can use only one GPU so unnecessary to compare the 4 TPU to 2xGTX 1080 Ti.
The difference in Elo depends on a lot of other factors not only on the speed.
In the case of AB engines this is obvious for a long time yet.
The results of LC0 in Div.3 proves this is the reality for an NN based engine too.
So a comparison based only on speed between the 4 TPU and GTX 1080 Ti is very misleading.
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: CUDA benchmarks

Post by Milos »

Nay Lin Tun wrote: Mon Aug 13, 2018 4:05 pm Google 4 TPU would probably be 20x more powerful than 2x 1080Ti, however Leela ratio is based on SF vs A0 search speed ratio in their match. With Lco crem compile, it is about 8x faster than previous Leela Zero. So in practical speed term, leela ratio become 0.35( it would be 0.043 in case with Leela Zero)
We don't know how TF backend for TPUs work. We don't even know what size of the net A0 was using. We don't know much. We only know that 80knps number and that's it.
OTOH, we know that once you start cuda backend multiplexing you start loosing performance, and more GPUs you multiplex more performance you loose.
So all those speculations about some ratios are nothing but very wild guesses.
User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: CUDA benchmarks

Post by CMCanavessi »

corres wrote: Mon Aug 13, 2018 5:34 pm
Nay Lin Tun wrote: Mon Aug 13, 2018 4:05 pm
corres wrote: Mon Aug 13, 2018 8:54 am
corres wrote: Sun Aug 12, 2018 7:21 pm
CMCanavessi wrote: Sun Aug 12, 2018 6:36 pm In tcec lc0 is using 4 threads, 2 for each gpu.
OK.
And they use the two GPU independently.
They need only the two GPU to play against each other the two NN-based engines (LC0 and DeusX).
Is it right? (Right!)
TCEC 13 GPU server uses together with the two GTX 1080 Ti a Xeon E5-2609v4 running on 1.7GHz.
Considering the above Leela Chess Zero and DeusX on TCEC uses effectively only the same power of PC than they would run on a laptop with 1.7 GHz and GTX 1080 Ti.
Somebody said GPU server of TCEC has same the power as ~35% of AlphaZero 4xTPU had.
I do not believe to be the power of 4xTPU (even from the first generation) only three time more powerful than a weak laptop with a GTX 1080 Ti.
Google 4 TPU would probably be 20x more powerful than 2x 1080Ti, however Leela ratio is based on SF vs A0 search speed ratio in their match. With Lco crem compile, it is about 8x faster than previous Leela Zero. So in practical speed term, leela ratio become 0.35( it would be 0.043 in case with Leela Zero)
As I know LC0 can use only one GPU so unnecessary to compare the 4 TPU to 2xGTX 1080 Ti.
The difference in Elo depends on a lot of other factors not only on the speed.
In the case of AB engines this is obvious for a long time yet.
The results of LC0 in Div.3 proves this is the reality for an NN based engine too.
So a comparison based only on speed between the 4 TPU and GTX 1080 Ti is very misleading.
You're wrong, lc0 can use whatever number of GPUs you can throw to it. It's using 2 in TCEC, it was using 8xV100 in the WCCCC for a couple of rounds.
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: CUDA benchmarks

Post by corres »

CMCanavessi wrote: Mon Aug 13, 2018 7:21 pm
corres wrote: Mon Aug 13, 2018 5:34 pm
As I know LC0 can use only one GPU so unnecessary to compare the 4 TPU to 2xGTX 1080 Ti.
The difference in Elo depends on a lot of other factors not only on the speed.
In the case of AB engines this is obvious for a long time yet.
The results of LC0 in Div.3 proves this is the reality for an NN based engine too.
So a comparison based only on speed between the 4 TPU and GTX 1080 Ti is very misleading.
You're wrong, lc0 can use whatever number of GPUs you can throw to it. It's using 2 in TCEC, it was using 8xV100 in the WCCCC for a couple of rounds.
Sorry, but the information about LC0 is rather defective.
I do not know any benchmark of LC0 with uses more GPU than one.
The version of LC0 participated on WCCC is unknown.
The version of LC0 used by TCEC 13 is 16(?).
From Github.com we can download LC0 v0.16.0.
What is the connection between these LC0s?
I should like your opinion about CPU frequency of remote server.
How much effect has the CPU frequency on the chess power of LC0?
Why does not use more threads than two for LC0?
User avatar
CMCanavessi
Posts: 1142
Joined: Thu Dec 28, 2017 4:06 pm
Location: Argentina

Re: CUDA benchmarks

Post by CMCanavessi »

corres wrote: Mon Aug 13, 2018 10:38 pm I should like your opinion about CPU frequency of remote server.
How much effect has the CPU frequency on the chess power of LC0?
Why does not use more threads than two for LC0?
1) Almost nothing if you're using GPU
2) Because 2 are enough to drive the GPU to 100%. If you use 2 gpus like in TCEC, than 4 threads are ideal.

0.16.0 is the latest official release. The one currently playing in tcec is an experimental version with some improvements that will probably make it to the next official release.
Follow my tournament and some Leela gauntlets live at http://twitch.tv/ccls