mvanthoor wrote: ↑Mon Apr 10, 2023 5:23 pm- Hyper-Threading: when this was introduced somewhere in 2002, it was tested to be bad for chess engine testing.
An engine using more threads than there were cores would not become significantly stronger. Running a match with more engines than cores could mean that an engine got assigned to a hyper-thread, and thus be significantly weaker compared to it being assigned to a normal core.
Knowingly or unknowingly you mix up two things here.
1. Does a
multi-threaded engine running on a CPU with hyperthreading gain from using more search threads than there are physical cores?
In the past (pre-"lazy smp") the answer was generally no, nowadays the answer may be yes. But it'll have to be tested since engines differ (and CPUs too).
2. When doing ultrabullet testing on such a CPU, should you run more parallel matches than there are physical cores (each match being between two
single-threaded engines without pondering)?
This is probably not worth it. You get more games but of lower quality (since nps per engine will be much lower). If you increase the time control for each game to correct for the nps loss, you still get a bit more games than without hyperthreading, but there will be a lot more noise which decreases the statistical relevance of the results. Noise means you need many more games to get the same statistical significance. (And if your statistical model does not reflect this, you will have results that are less reliable than you think.)
By setting affinities you may be able to reduce the noise a bit, but it will still be there. If you run two matches on two hyperthreads of the same core, nps will vary quite a bit, which introduces noise. Maybe in this situation the nps variations will sufficiently average out that there is still a total net gain, but this is not so clear.
With my tests of Rustic on the 6700K, I've always stuck to 4 concurrent games as it was a quad core CPU. Now that I have a 7950X, I have tried a a match between the same version of Rustic, running 1000 games at 16, 24, and 30 threads. There's no difference in the outcome. I assume that this is because each of the engines has a 50% chance of getting assigned to a hyper-thread and this will be equally divided. I have not yet tested this with a gauntlet.
What do you mean by "outcome"?
If you use comparable time controls, you will probably have fewer draws on your 7950X with hyperthreading.
Fewer draws means more statistical noise and is therefore bad.
Suppose you run 100 games between A and B.
On your 6700K, A wins 10 games, draws 90 games. "Outcome" is 55-45..
On your 7950X, A wins 55 games, loses 45 games. "Outcome" is 55-45.
These are wildly different outcomes. The match on your 6700K is far stronger evidence that A is better than B than the match on your 7950X.
I think HGM once argued in a similar context (perhaps it was about imbalanced openings) that the extra noise/imbalance could be what you need to bring out a small difference in Elo more clearly. There is probably something to that argument. Say you need +1 to win a game but the strength difference only allows +0.75, so all games are draws. If you now randomly add +/- 0.25 in noise, you will go from all draws to some wins for the stronger engine. So there are two sides to this.
- Intel E-Cores: these are different cores compared to the normal cores. An engine running on such a core would be much slower than an engine on a normal performance core. If I had a CPU with E-Cores, I would probably disable them.
Using E-cores should be fine if you make sure that both engines in a match run on the same E-core. Ideally you give them a slower time control than engines running on faster P-cores.
I don't know if cutechess supports this. if not, then support should be added.
- Turbo Boost: My Intel 6700K just boosts a single thread to 4.4 GHz, and any load higher than a single thread gets boosted to 4.2 GHz. Thus when running a match the entire CPU runs at 4.2 GHz. Thus I have never disabled Turbo Boost.
That makes sense.
- Precision Boost Optimizer (AMD Ryzen): this doesn't try to hit a specific frequency, but a specific power draw or CPU temperature. If you manage to lower the CPU temperature with a bigger cooler, or the lower the power draw due to an undervolt, the CPU just boosts higher. (The one option in the BIOS to prevent this does either not work at all, or does not work on Linux.) When running a gauntlet with 16, 24 or 30 threads, the entire CPU boosts to 5.3, 5.2 and 5.0 GHz respectively. It stays pegged at 85, 84 and 78 degrees respectively. In the summer, the CPU will probably hit the 95 thermal target, so it will run slower than in winter. However, because all cores run at the same speed during a match or gauntlet, I see no reason to disable boosting altogether. I could, but then the CPU would be capped at 4.5 GHz, which would lose out on hundreds of MHz of speed, x16. That wouldn't be an option.
Ideally time controls (or measurement of time) would be adjusted with clock frequency.
Fixed nps matches would overcome all these problems, but they introduce their own...
So what's your take on this? Testing chess engines doesn't seem to become easier...
But your hardware is now also far more powerful and allows you to do much more testing in the same time.