NUMA & TT

Discussion of chess software programming and technical issues.

Moderator: Ras

diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: NUMA & TT

Post by diep »

Joost Buijs wrote: Sun Oct 24, 2021 4:58 pm I'm on Windows and cannot compare it one to one.

With default BIOS settings it appears to be 1 NUMA node with all memory available to it, basically the same as UMA.
My Threadripper has 4 CCX, I can adjust the number of NUMA nodes in the BIOS from 0 to 1, 2 and 4. For a single socket board 0 and 1 seem to be the same.

BTW: This forum makes me crazy, each time when I try to post a message I appear to be logged off, when I logon again my message is gone and can only get it back by hitting the back button in my browser a few times. I wonder when this will be fixed, as it is right now this forum is pretty unusable. I also get CloudFlare 520 error messages, whatever it means.
Joost - would be quite interesting to know which latencies you see to the RAM with all cores busy for each independant core. Do you use hyperthreading - AMD equivalent of it - probably also not called IBMs SMT they use for power - yet you know what i mean. With or without?

What do you clock the chip to and what sort of ram at which frequency do you have @ which latency?

Is interesting to compare agains the 44 core intel box here at 2.0Ghz. Built it for bit over 1000 euro - that was just a month before import duties.
The 2 socket intel box here obviously has bunch of DIMMs and big bandwidth to each cpu. Wonder how AMD solves that knowing they are having 64 and 96 cores nowadays.
Joost Buijs
Posts: 1665
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: NUMA & TT

Post by Joost Buijs »

diep wrote: Tue Oct 26, 2021 6:31 pm
Joost Buijs wrote: Sun Oct 24, 2021 4:58 pm I'm on Windows and cannot compare it one to one.

With default BIOS settings it appears to be 1 NUMA node with all memory available to it, basically the same as UMA.
My Threadripper has 4 CCX, I can adjust the number of NUMA nodes in the BIOS from 0 to 1, 2 and 4. For a single socket board 0 and 1 seem to be the same.

BTW: This forum makes me crazy, each time when I try to post a message I appear to be logged off, when I logon again my message is gone and can only get it back by hitting the back button in my browser a few times. I wonder when this will be fixed, as it is right now this forum is pretty unusable. I also get CloudFlare 520 error messages, whatever it means.
Joost - would be quite interesting to know which latencies you see to the RAM with all cores busy for each independant core. Do you use hyperthreading - AMD equivalent of it - probably also not called IBMs SMT they use for power - yet you know what i mean. With or without?

What do you clock the chip to and what sort of ram at which frequency do you have @ which latency?

Is interesting to compare agains the 44 core intel box here at 2.0Ghz. Built it for bit over 1000 euro - that was just a month before import duties.
The 2 socket intel box here obviously has bunch of DIMMs and big bandwidth to each cpu. Wonder how AMD solves that knowing they are having 64 and 96 cores nowadays.
Vincent,

My 32 core AMD threadripper 3070X runs at default settings, 3700 MHz. with precision boost disabled and SMT enabled, although I never use SMT for my engine. Memory is 4 way 128 GB DDR4 3200 MT/s, but it runs at default DDR4 speed of 2133 Mhz. with timings CL-15, tRCD-15, tRP-15, tRAS-36, tRC-51 and CR-T2.

These settings are very conservative, the system can run a lot faster when I enable PB and PBO and an XMP profile for memory. In the past I was always very fond of getting the most out of my systems by overclocking them to the max, nowadays I like to keep them running cool, quiet and stable.

I never measured true latencies but if you know a good program to measure cache and memory latency I can take a look at it.

I also have an Intel i9-10980XE system with 18 cores running at 3800 MHz. This system has 4 way 64 GB DDR4 3000 MT/s.

BTW. for your Intel box you can use Visual Studio with the CLang compiler, I use it here too and it is as good as the Intel compiler. The latest version of the Intel compiler is also based on CLang, times are changing.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: NUMA & TT

Post by diep »

Most interesting Joost!

Note i do not have visual studio as main platform to develop (as for the robot/3d printer type software it's all cross platform obviously and if you google you'll figure out not many choices that are mature there).

I remember testing clang years ago (10 years ago) and it was duck slow for Diep back then -
Didn't have good PGO back then if i remember. It really was 30-50% slower on a single core back then. Couldn't even beat GCC.

Good to remind me i can retest it again! As you said, software sometimes improve!
Joost Buijs
Posts: 1665
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: NUMA & TT

Post by Joost Buijs »

diep wrote: Tue Oct 26, 2021 11:38 pm Most interesting Joost!

Note i do not have visual studio as main platform to develop (as for the robot/3d printer type software it's all cross platform obviously and if you google you'll figure out not many choices that are mature there).

I remember testing clang years ago (10 years ago) and it was duck slow for Diep back then -
Didn't have good PGO back then if i remember. It really was 30-50% slower on a single core back then. Couldn't even beat GCC.

Good to remind me i can retest it again! As you said, software sometimes improve!
Recently I switched from Visual Studio 2019 Professional to Visual Studio 2022-RC3 Community, the latter is completely 64 bits and it gives you the option to install CLang 12.0. CLang is in particular very good at vectorizing loops. Of course you'll need hardware that supports SIMD instructions like AVX2 to make use of this.

The 8th of November Visual Studio 2022 will be officially released.