AVX2 optimized SF+NNUE and processor temperature

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

AVX2 optimized SF+NNUE and processor temperature

Post by corres »

I was obliged to lessen the settings of my Ryzen 9 3950x because after some time it was frozen during test run (SF+NNUE against SF+NNUE) because of over-heating the CPU (CPU temperature was more than 90 degrees Celsius with Noctua NHD15 SE AM4 air cooler)
Now the CPU clock is 16 x 4.0 GHZ (It was 16 x 4.40 GHz with CPU Core Voltage = 1.450) and the Core Voltage is 1.400 Volt, nominally.
Power consumption of CPU was ~200 Watts, now it is about 160 Watts and now the temperature of CPU is about 75 degrees Celsius.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: AVX2 optimized SF+NNUE and processor temperature

Post by MikeB »

corres wrote: Sat Sep 05, 2020 3:02 pm I was obliged to lessen the settings of my Ryzen 9 3950x because after some time it was frozen during test run (SF+NNUE against SF+NNUE) because of over-heating the CPU (CPU temperature was more than 90 degrees Celsius with Noctua NHD15 SE AM4 air cooler)
Now the CPU clock is 16 x 4.0 GHZ (It was 16 x 4.40 GHz with CPU Core Voltage = 1.450) and the Core Voltage is 1.400 Volt, nominally.
Power consumption of CPU was ~200 Watts, now it is about 160 Watts and now the temperature of CPU is about 75 degrees Celsius.
With the 3970x chip, I have found anything over 85C would eventually become problematic ( crashes) , I ultimately decided to go with a temperature throttle of 78C. I found 78C a good setting for reducing heat generation and keeping noise level down — important since my setup is in the loft overlooking the family room. It runs for days without issue, I generally reboot once a week to just keep things fresh and run updates.
Image
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: AVX2 optimized SF+NNUE and processor temperature

Post by corres »

MikeB wrote: Sat Sep 05, 2020 3:14 pm
corres wrote: Sat Sep 05, 2020 3:02 pm I was obliged to lessen the settings of my Ryzen 9 3950x because after some time it was frozen during test run (SF+NNUE against SF+NNUE) because of over-heating the CPU (CPU temperature was more than 90 degrees Celsius with Noctua NHD15 SE AM4 air cooler)
Now the CPU clock is 16 x 4.0 GHZ (It was 16 x 4.40 GHz with CPU Core Voltage = 1.450) and the Core Voltage is 1.400 Volt, nominally.
Power consumption of CPU was ~200 Watts, now it is about 160 Watts and now the temperature of CPU is about 75 degrees Celsius.
With the 3970x chip, I have found anything over 85C would eventually become problematic ( crashes) , I ultimately decided to go with a temperature throttle of 78C. I found 78C a good setting for reducing heat generation and keeping noise level down — important since my setup is in the loft overlooking the family room. It runs for days without issue, I generally reboot once a week to just keep things fresh and run updates.
I switched off the Power Boost in BIOS, so my processor did not show power throttling, but at ~95 degrees Celsius the defense of CPU was switched on blocking the run of temperature away.
And now how many the clock speed of your 3970x?
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: AVX2 optimized SF+NNUE and processor temperature

Post by MikeB »

corres wrote: Sat Sep 05, 2020 3:25 pm
MikeB wrote: Sat Sep 05, 2020 3:14 pm
corres wrote: Sat Sep 05, 2020 3:02 pm I was obliged to lessen the settings of my Ryzen 9 3950x because after some time it was frozen during test run (SF+NNUE against SF+NNUE) because of over-heating the CPU (CPU temperature was more than 90 degrees Celsius with Noctua NHD15 SE AM4 air cooler)
Now the CPU clock is 16 x 4.0 GHZ (It was 16 x 4.40 GHz with CPU Core Voltage = 1.450) and the Core Voltage is 1.400 Volt, nominally.
Power consumption of CPU was ~200 Watts, now it is about 160 Watts and now the temperature of CPU is about 75 degrees Celsius.
With the 3970x chip, I have found anything over 85C would eventually become problematic ( crashes) , I ultimately decided to go with a temperature throttle of 78C. I found 78C a good setting for reducing heat generation and keeping noise level down — important since my setup is in the loft overlooking the family room. It runs for days without issue, I generally reboot once a week to just keep things fresh and run updates.
I switched off the Power Boost in BIOS, so my processor did not show power throttling, but at ~95 degrees Celsius the defense of CPU was switched on blocking the run of temperature away.
And now how many the clock speed of your 3970x?
It shows usually between 3700 and 3800 at full tilt, I have seen it go lower when both GPS are also running full tilt - that's probably my fault for adding e second GPU later and my GPUs have no blower fans and they generate serious heat- I believe my setup is probably slower than others with a 3970x , and I'm ok with that - it has plenty of CPU power for me - my max speed today so far today was 4391 mhz - but that is with just using one core.
Image
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: AVX2 optimized SF+NNUE and processor temperature

Post by mwyoung »

corres wrote: Sat Sep 05, 2020 3:02 pm I was obliged to lessen the settings of my Ryzen 9 3950x because after some time it was frozen during test run (SF+NNUE against SF+NNUE) because of over-heating the CPU (CPU temperature was more than 90 degrees Celsius with Noctua NHD15 SE AM4 air cooler)
Now the CPU clock is 16 x 4.0 GHZ (It was 16 x 4.40 GHz with CPU Core Voltage = 1.450) and the Core Voltage is 1.400 Volt, nominally.
Power consumption of CPU was ~200 Watts, now it is about 160 Watts and now the temperature of CPU is about 75 degrees Celsius.
This is common when running AVX. And why good stress test software has a AVX code option. To stress test the system under those conditions. And why it is better to use liquid cooling to avoid thermal runaway. Testing chess engines is brutal on a computer system.

This is a easy fix even with a AIO cooler, and a couple of fans.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: AVX2 optimized SF+NNUE and processor temperature

Post by corres »

mwyoung wrote: Sat Sep 05, 2020 4:18 pm
corres wrote: Sat Sep 05, 2020 3:02 pm I was obliged to lessen the settings of my Ryzen 9 3950x because after some time it was frozen during test run (SF+NNUE against SF+NNUE) because of over-heating the CPU (CPU temperature was more than 90 degrees Celsius with Noctua NHD15 SE AM4 air cooler)
Now the CPU clock is 16 x 4.0 GHZ (It was 16 x 4.40 GHz with CPU Core Voltage = 1.450) and the Core Voltage is 1.400 Volt, nominally.
Power consumption of CPU was ~200 Watts, now it is about 160 Watts and now the temperature of CPU is about 75 degrees Celsius.
This is common when running AVX. And why good stress test software has a AVX code option. To stress test the system under those conditions. And why it is better to use liquid cooling to avoid thermal runaway. Testing chess engines is brutal on a computer system.
This is a easy fix even with a AIO cooler, and a couple of fans.
I do not like noisy water cooler.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: AVX2 optimized SF+NNUE and processor temperature

Post by corres »

MikeB wrote: Sat Sep 05, 2020 3:39 pm ...
It shows usually between 3700 and 3800 at full tilt, I have seen it go lower when both GPS are also running full tilt - that's probably my fault for adding e second GPU later and my GPUs have no blower fans and they generate serious heat- I believe my setup is probably slower than others with a 3970x , and I'm ok with that - it has plenty of CPU power for me - my max speed today so far today was 4391 mhz - but that is with just using one core.
I thought the 3970x can be run with air cooler only on its base (3700 MHz) clock speed.
And what kind of cooler you use? There are some for 3970x.
It would be interesting to know the chess bench on that 3700 MHz fixed frequency, without power boost and SMT/HT.
mwyoung
Posts: 2727
Joined: Wed May 12, 2010 10:00 pm

Re: AVX2 optimized SF+NNUE and processor temperature

Post by mwyoung »

corres wrote: Sat Sep 05, 2020 4:39 pm
mwyoung wrote: Sat Sep 05, 2020 4:18 pm
corres wrote: Sat Sep 05, 2020 3:02 pm I was obliged to lessen the settings of my Ryzen 9 3950x because after some time it was frozen during test run (SF+NNUE against SF+NNUE) because of over-heating the CPU (CPU temperature was more than 90 degrees Celsius with Noctua NHD15 SE AM4 air cooler)
Now the CPU clock is 16 x 4.0 GHZ (It was 16 x 4.40 GHz with CPU Core Voltage = 1.450) and the Core Voltage is 1.400 Volt, nominally.
Power consumption of CPU was ~200 Watts, now it is about 160 Watts and now the temperature of CPU is about 75 degrees Celsius.
This is common when running AVX. And why good stress test software has a AVX code option. To stress test the system under those conditions. And why it is better to use liquid cooling to avoid thermal runaway. Testing chess engines is brutal on a computer system.
This is a easy fix even with a AIO cooler, and a couple of fans.
I do not like noisy water cooler.
My computer sits 1 foot from me on a glass desk. And it is whisper quite. And I test 24/7. No issues.
"The worst thing that can happen to a forum is a running wild attacking moderator(HGM) who is not corrected by the community." - Ed Schröder
But my words like silent raindrops fell. And echoed in the wells of silence.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: AVX2 optimized SF+NNUE and processor temperature

Post by MikeB »

corres wrote: Sat Sep 05, 2020 4:49 pm
MikeB wrote: Sat Sep 05, 2020 3:39 pm ...
It shows usually between 3700 and 3800 at full tilt, I have seen it go lower when both GPS are also running full tilt - that's probably my fault for adding e second GPU later and my GPUs have no blower fans and they generate serious heat- I believe my setup is probably slower than others with a 3970x , and I'm ok with that - it has plenty of CPU power for me - my max speed today so far today was 4391 mhz - but that is with just using one core.
I thought the 3970x can be run with air cooler only on its base (3700 MHz) clock speed.
And what kind of cooler you use? There are some for 3970x.
It would be interesting to know the chess bench on that 3700 MHz fixed frequency, without power boost and SMT/HT.
AIO cooler

as matter of practice, I always record the benches in the Makefile for the Honey engines after they are built:

these bench results :

### Based on commits through 09/04/2020:
### ======================================================
### Black-Diamond-12.bench:Nodes searched : 9689830
### Black-Diamond-12.bench:Nodes/second : 1735906
### Black-Diamond-12-60_Cores.bench:Nodes searched : 1115013195
### Black-Diamond-12-60_Cores.bench:Nodes/second : 82446k
### Bluefish-12.bench:Nodes searched : 4099498
### Bluefish-12.bench:Nodes/second : 1655025
### Bluefish-12-60_Cores.bench:Nodes searched : 643029748
### Bluefish-12-60_Cores.bench:Nodes/second : 83380k
### Honey-12.bench:Nodes searched : 3717770
### Honey-12.bench:Nodes/second : 1693744
### Honey-12-60_Cores.bench:Nodes searched : 634091772
### Honey-12-60_Cores.bench:Nodes/second : 83774k
### Oki-Maguro-12.bench:Nodes searched : 4040237
### Oki-Maguro-12.bench:Nodes/second : 1815021
### Oki-Maguro-12-60_Cores.bench:Nodes searched : 511839920
### Oki-Maguro-12-60_Cores.bench:Nodes/second : 81180k

were from these exe's
"https://www.dropbox.com/l/AAAjG0-2ybhbS ... -_yc_nCeN4

with these commands:
EXE bench 16 1 13 true >/dev/nul
EXE bench 2048 60 18 true >/dev/null

"true" is NNUE bench and that is the only bench I'm concerned about - my PGOs are based on NNUE evlauation only - that is simply my choice , but it is NOT how the SF developers do it

I use 60 as the thread count since that is how I use the engines in normal practice , allowing me to other things as analysis is running
— when I run matches, I use 50 as the concurrency, which allow me to do some analysis with 12 cores as they run....
Image
Gregory Owett
Posts: 249
Joined: Fri Mar 10, 2006 10:26 am
Location: France

Re: AVX2 optimized SF+NNUE and processor temperature

Post by Gregory Owett »

Hi,
Do you keep your machines running all the time? Me, I do it approx. 14-16 h. max per day. I have a Ryzen 3900x, and I use 16 threads (temp. approx. 77-81 ° C, 4000 MHz according to Ryzen Master).