CCC has serious hardware update!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Jouni
Posts: 3281
Joined: Wed Mar 08, 2006 8:15 pm

CCC has serious hardware update!

Post by Jouni »

CPUs: 2 x AMD EPYC 7H12
GPU: 2x A100 (40 GB GPU memory)
Cores: 256 cores (128 physical)
RAM: 512GB DIMM DDR4 2933 MHz (0.3 ns)
SSD: 2x Micron 5210 MTFD (2TB) in RAID1
OS: CentOS 8

What's TCEC answer :) ?
Jouni
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: CCC has serious hardware update!

Post by mwyoung »

Jouni wrote: Tue Dec 29, 2020 9:30 am CPUs: 2 x AMD EPYC 7H12
GPU: 2x A100 (40 GB GPU memory)
Cores: 256 cores (128 physical)
RAM: 512GB DIMM DDR4 2933 MHz (0.3 ns)
SSD: 2x Micron 5210 MTFD (2TB) in RAID1
OS: CentOS 8

What's TCEC answer :) ?
Why do they need to answer. Both have massive hardware. Will it change any results.

If you notice even on our much smaller hardware. The ranking of the engines does not really change.

And this is also what I am seeing in my testing.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: CCC has serious hardware update!

Post by Ras »

Jouni wrote: Tue Dec 29, 2020 9:30 amOS: CentOS 8
That's probably not the best choice, given that the EOL date for CentOS 8 has just been preponed from 2029 to 2021, after which CentOS will become a rolling release testbed. CentOS 7 is still supported until 2024, provided that IBM/RedHat won't axe that, too.
Rasmus Althoff
https://www.ct800.net
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: CCC has serious hardware update!

Post by brianr »

As with the TCEC rig, the GPUs are very nice, but unfortunately the slow CPUs are a fairly severe handicap for GPU engines like Lc0, which do best with a small number of very fast CPUs. Of course, one could say that it is up to Lc0 to improve its CPU code to better utilize more CPUs, but this is proving to be extremely difficult. Apparently, it is a complex topic in that more CPUs can increase the nps, yet the playing strength goes down after N+1 CPUs where N is the number of GPUs. It has to do with assembling full batches of work for the GPUs. The experts on the Leela Discord can explain it better. Thus a few 5GHz CPUs for Lc0 are far better than 64+ slower ones. In any case, even if there were no significant hardware handicap, Leela is still currently far behind SF-NNUE.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: CCC has serious hardware update!

Post by Dann Corbit »

brianr wrote: Tue Dec 29, 2020 12:14 pm As with the TCEC rig, the GPUs are very nice, but unfortunately the slow CPUs are a fairly severe handicap for GPU engines like Lc0, which do best with a small number of very fast CPUs. Of course, one could say that it is up to Lc0 to improve its CPU code to better utilize more CPUs, but this is proving to be extremely difficult. Apparently, it is a complex topic in that more CPUs can increase the nps, yet the playing strength goes down after N+1 CPUs where N is the number of GPUs. It has to do with assembling full batches of work for the GPUs. The experts on the Leela Discord can explain it better. Thus a few 5GHz CPUs for Lc0 are far better than 64+ slower ones. In any case, even if there were no significant hardware handicap, Leela is still currently far behind SF-NNUE.
Tcec uses four of these CPUs:
https://ark.intel.com/content/www/us/en ... 0-ghz.html
Which run at 2.2GHz

Whereas the 7H12:
https://www.amd.com/en/products/cpu/amd-epyc-7h12
runs at 2.6 Ghz

Hence the CCC cores are 2.6/2.2 * 100 = 118% of the speed or 18% faster.

On the other hand, TCEC uses 4x V100 GPUs whereas CCC is using 2x A100 GPUs.
I do not know enough about the difference between the A100 and V100 to know which system has the advantage there.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: CCC has serious hardware update!

Post by Raphexon »

Dann Corbit wrote: Tue Dec 29, 2020 12:47 pm
brianr wrote: Tue Dec 29, 2020 12:14 pm As with the TCEC rig, the GPUs are very nice, but unfortunately the slow CPUs are a fairly severe handicap for GPU engines like Lc0, which do best with a small number of very fast CPUs. Of course, one could say that it is up to Lc0 to improve its CPU code to better utilize more CPUs, but this is proving to be extremely difficult. Apparently, it is a complex topic in that more CPUs can increase the nps, yet the playing strength goes down after N+1 CPUs where N is the number of GPUs. It has to do with assembling full batches of work for the GPUs. The experts on the Leela Discord can explain it better. Thus a few 5GHz CPUs for Lc0 are far better than 64+ slower ones. In any case, even if there were no significant hardware handicap, Leela is still currently far behind SF-NNUE.
Tcec uses four of these CPUs:
https://ark.intel.com/content/www/us/en ... 0-ghz.html
Which run at 2.2GHz

Whereas the 7H12:
https://www.amd.com/en/products/cpu/amd-epyc-7h12
runs at 2.6 Ghz

Hence the CCC cores are 2.6/2.2 * 100 = 118% of the speed or 18% faster.

On the other hand, TCEC uses 4x V100 GPUs whereas CCC is using 2x A100 GPUs.
I do not know enough about the difference between the A100 and V100 to know which system has the advantage there.
The 7H12 has far higher IPC than the TCEC Xeon.
It's not 18% faster, it's faster than that. (With the exception of PEXT because microcoded lol)


A single A100 is roughly twice as fast as a V100.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: CCC has serious hardware update!

Post by Dann Corbit »

Actually, the CPU difference is smaller, since the GPU server uses its own CPU cores, which are:
CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz, 32 vcores
So they are :
2.5/2.6 * 100 = 96% of the speed or about 4% slower than the AMD cores.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: CCC has serious hardware update!

Post by Dann Corbit »

My conclusion is then, that the GPU systems are very nearly the same speed. Within a few percent perhaps.
On the other hand, I think that the CPU version on CCC is probably stronger.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
brianr
Posts: 536
Joined: Thu Mar 09, 2006 3:01 pm

Re: CCC has serious hardware update!

Post by brianr »

The point is that on both CCC and TCEC the fast GPUs are being vastly underutilized by the extremely slow MHz CPUs.
Jouni
Posts: 3281
Joined: Wed Mar 08, 2006 8:15 pm

Re: CCC has serious hardware update!

Post by Jouni »

Ipman bench:

278.098.432 2x AMD EPYC 7742 256threads
190.384.961 4x Intel Xeon E5-4669 v4 @2.20GHz

+46% my calculation says!
Jouni