Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost

Discussion of chess software programming and technical issues.

Moderator: Ras

User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost

Post by mvanthoor »

Hi :)

What is the gist with regard to these topics in 2023? This is what I know about them and/or are my opinions:

- Hyper-Threading: when this was introduced somewhere in 2002, it was tested to be bad for chess engine testing. An engine using more threads than there were cores would not become significantly stronger. Running a match with more engines than cores could mean that an engine got assigned to a hyper-thread, and thus be significantly weaker compared to it being assigned to a normal core. With my tests of Rustic on the 6700K, I've always stuck to 4 concurrent games as it was a quad core CPU. Now that I have a 7950X, I have tried a a match between the same version of Rustic, running 1000 games at 16, 24, and 30 threads. There's no difference in the outcome. I assume that this is because each of the engines has a 50% chance of getting assigned to a hyper-thread and this will be equally divided. I have not yet tested this with a gauntlet.

Do you run matches, gauntlets or tournaments with more concurrent games than you have cores, and what is your rationale that this doesn't affect the outcome?

- Intel E-Cores: these are different cores compared to the normal cores. An engine running on such a core would be much slower than an engine on a normal performance core. If I had a CPU with E-Cores, I would probably disable them.

- Turbo Boost: My Intel 6700K just boosts a single thread to 4.4 GHz, and any load higher than a single thread gets boosted to 4.2 GHz. Thus when running a match the entire CPU runs at 4.2 GHz. Thus I have never disabled Turbo Boost.

- Precision Boost Optimizer (AMD Ryzen): this doesn't try to hit a specific frequency, but a specific power draw or CPU temperature. If you manage to lower the CPU temperature with a bigger cooler, or the lower the power draw due to an undervolt, the CPU just boosts higher. (The one option in the BIOS to prevent this does either not work at all, or does not work on Linux.) When running a gauntlet with 16, 24 or 30 threads, the entire CPU boosts to 5.3, 5.2 and 5.0 GHz respectively. It stays pegged at 85, 84 and 78 degrees respectively. In the summer, the CPU will probably hit the 95 thermal target, so it will run slower than in winter. However, because all cores run at the same speed during a match or gauntlet, I see no reason to disable boosting altogether. I could, but then the CPU would be capped at 4.5 GHz, which would lose out on hundreds of MHz of speed, x16. That wouldn't be an option.

So what's your take on this? Testing chess engines doesn't seem to become easier...
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
j.t.
Posts: 263
Joined: Wed Jun 16, 2021 2:08 am
Location: Berlin
Full name: Jost Triller

Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost

Post by j.t. »

mvanthoor wrote: Mon Apr 10, 2023 5:23 pm - Intel E-Cores: these are different cores compared to the normal cores. An engine running on such a core would be much slower than an engine on a normal performance core. If I had a CPU with E-Cores, I would probably disable them.
I use an i9-13900 for testing, and it seems to work fine (I also test on a remote older Intel CPU that has only one type of core, and results don't differ much). As I am using hyperthreading on the P-Cores, the performance difference between a hyperthread and an E-core isn't that big. With enough games, every engine will get almost equal amount of time on a P-core or E-core, meaning that the effect is probably comparable to using an opening book with unbalanced positions.
Modern Times
Posts: 3703
Joined: Thu Jun 07, 2012 11:02 pm

Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost

Post by Modern Times »

mvanthoor wrote: Mon Apr 10, 2023 5:23 pm Hi :)
Now that I have a 7950X, I have tried a a match between the same version of Rustic, running 1000 games at 16, 24, and 30 threads. There's no difference in the outcome. I assume that this is because each of the engines has a 50% chance of getting assigned to a hyper-thread and this will be equally divided. I have not yet tested this with a gauntlet.

Do you run matches, gauntlets or tournaments with more concurrent games than you have cores, and what is your rationale that this doesn't affect the outcome?
You've opened a can of worms as the saying goes...

Stephan Pohl does exactly as you describe for his ratings list - he runs 20 threads on his 12-core Ryzen 3900X.

I'm very conservative - I don't run more threads than I have physical cores for ratings list matches. I did it once as an experiment on my 5900X, and as you say the result seemed OK, but I wasn't confident that it was OK to do this.
KhepriChess
Posts: 93
Joined: Sun Aug 08, 2021 9:14 pm
Full name: Kurt Peters

Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost

Post by KhepriChess »

mvanthoor wrote: Mon Apr 10, 2023 5:23 pm - Intel E-Cores: these are different cores compared to the normal cores. An engine running on such a core would be much slower than an engine on a normal performance core. If I had a CPU with E-Cores, I would probably disable them.
I have a i7-12700F. Didn't know it prior to purchasing it, but apparently Windows 10 had some pretty poor core optimization. Any time the console window (running any tests) would lose focus, the work would switch to use only the E-cores. Just terribly stupid. Had to disable them.
Puffin: Github
KhepriChess: Github
dangi12012
Posts: 1062
Joined: Tue Apr 28, 2020 10:03 pm
Full name: Daniel Infuehr

Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost

Post by dangi12012 »

Generally its good that processors try to run as fast as possible under certain constraints: Energy, VRM Current, Core Voltage ideally being settable by the user.

Tests on Hyper-Threading show: Sometimes it hurts performance - sometimes you get an extra 5-15% (but for twice the number of threads). So scaling can be linear until you hit half the number of physical cores - after N threads scaling is inconsistent and slow.
E and P cores complicate the matter but its a good development and AMD already has plans to go into that direction as well.

There is a saying in Germany: "Totgesagte Leben Länger" - which makes me thing x86-64 is not dead and replaced by ARM tomorrow but will still be the dominant PC platform this decade.
Worlds-fastest-Bitboard-Chess-Movegenerator
Daniel Inführ - Software Developer
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost

Post by hgm »

mvanthoor wrote: Mon Apr 10, 2023 5:23 pmRunning a match with more engines than cores could mean that an engine got assigned to a hyper-thread, and thus be significantly weaker compared to it being assigned to a normal core.
This is not accurately phrased, and suggests you misunderstand what hyperthreads are. There is no such thing as a 'normal core', and an engine would always run on a hyperthread. Even if you run one single-threaded engine on a quad-core machine.

The point is that each core contains two hyperthreads, and that these compete for resources within that core. So if one hyperthread is idle, the other one on the same core runs faster than when both hyperthreads of that core are active. But together they usually process more instructions per second than a single HT would, because of better utilization of the execute units by increasing demand for those.

What you want to avoid if 6 engines run on a hyperthreading quad core is that the OS schedules the engines over the physical cores as 2:2:1:1, (which would be the natural thing to do for best total performance), and that the engines that do not share a core are playing against engines that do. That would be unfair. Windows allows the user to control this through an affinity mask, which for each process can specify which 'virtual cores' (= hyperthreads) it is allowed to use.

This mask is inherited from the parent process, and can be set through the task manager. So when I want to run 6 matches in parallel (with ponder off) on my quad, (or a single tournament with concurrency 6), I would force each of the GUIs to share a core by assigning it to one of the virtual cores 0-5, before I set them playing. The engines processes they spawn would then also run on that virtual core, forcing all engines to always share a core. The virtual cores 6 and 7 would remain available for running system tasks, and because the OS would find them idle they would schedule system tasks on those, so they would not interfere with the engines.

I also use this to control background tasks I started myself: in the past it happened that I had several very long analyses running in parallel, and then wanted to participate in the monthly on-line blitz tournament without aborting those. So I just assigned all the processes of the engine doing the analysis to two virtual cores on the same physical core (e.g. 6 and 7). This freed the other 3 cores for playing in the tourney with 3 engines. (In fact I used one core to run both Fairy-Max and KingSlayer, as these cannot ponder, and thus would not slow down each other that much.) After the tourney I reassigned the analysis processes each to a different core again.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost

Post by mvanthoor »

hgm wrote: Tue Apr 11, 2023 10:45 am This is not accurately phrased, and suggests you misunderstand what hyperthreads are.
Indeed. I do understand what hyperthreads are and how they work, and...
What you want to avoid if 6 engines run on a hyperthreading quad core is that the OS schedules the engines over the physical cores as 2:2:1:1, (which would be the natural thing to do for best total performance), and that the engines that do not share a core are playing against engines that do. That would be unfair.
...this is exactly what I meant.

With regard to setting affinity: that's too much work, especially with a CPU that now has 32 threads. I run my matches using Cute Chess and it runs 16 games in parallel; so it has 32 engines loaded at any given point.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
smatovic
Posts: 3226
Joined: Wed Mar 10, 2010 10:18 pm
Location: Hamburg, Germany
Full name: Srdja Matovic

Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost

Post by smatovic »

I test currently with fixed frequency (turbo boost off), HyperThreading (SMT-2) off, and do not intend to buy hardware with efficiency cores.

For generating games, tuning/RL, massive self-play SPRT testing, the above might not matter on a larger scale.

--
Srdja
User avatar
hgm
Posts: 28353
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost

Post by hgm »

mvanthoor wrote: Tue Apr 11, 2023 10:53 amWith regard to setting affinity: that's too much work, especially with a CPU that now has 32 threads. I run my matches using Cute Chess and it runs 16 games in parallel; so it has 32 engines loaded at any given point.
Well, work and investment are exchangeable assets. Doing things carefully and efficiently on cheap hardware can have the same performance as doing things clumsily and wastefully on very expensive hardware. Whether any extra work it might require is worth the savings on hardware price is a dilemma that every person has to decide for himself.

You can also run 16 instances of CuteChess in parallel, and assigning an affinity to each of those is just a matter of 3 mouse clicks per instance...
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Testing: Hyper-Threading, E-cores, Turbo Boost / Precision Boost

Post by mvanthoor »

hgm wrote: Tue Apr 11, 2023 11:36 am Well, work and investment are exchangeable assets. Doing things carefully and efficiently on cheap hardware can have the same performance as doing things clumsily and wastefully on very expensive hardware. Whether any extra work it might require is worth the savings on hardware price is a dilemma that every person has to decide for himself.

You can also run 16 instances of CuteChess in parallel, and assigning an affinity to each of those is just a matter of 3 mouse clicks per instance...
That is 48 mouse clicks for every tournament I want to run. Why would I want to run 16 instances of CuteChess? It can run games concurrently all by itself, and when running 16 threads, I can just put CuteChess in the background and keep using the computer.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL