lithander wrote: ↑Wed Apr 05, 2023 9:59 am
Marcel posted in this very thread that he only uses concurrency up to the amount of physical cores on his system because hyper-threading has been problematic in the past
What I've noticed in the past is that multithreading beyond the number of physical cores you have can skew the results. An engine that plays its game from a logical thread is much slower than one playing from a real core. If you run a huge match, this should equalize out between the engines, but in practice, it not always does. Then you get the result that an engine can perform much better or worse than expected in one match, and in another match, it's a different engine with weird results.
The current trend of putting E-cores in CPU's (the big.LITTLE architecture that Intel now uses) will complicate this further, because we get another type of thread in the computer. If these E-cores gain hyperthreading / SMT at some point, we get a fourth. Then an engine could be running on a real performance core, on a logical thread of one of the performance cores, on the E-core, or on on a logical thread of an E-core.
That would make match results very unpredictable IMHO, unless you run HUGE matches to try and equalize everything out (as in: each engine runs on each type of thread the same amount of time).
Even Turbo Boost is problematic, because, at least for AMD, it is controlled by temperature. The CPU will just boost until it hits 95C, so in the winter your computer will run faster than in the summer. Depending on the cooling in the system, you could even have Turbo Boost go up and down and use different frequencies during a match, which will also skew the results. Engines running in the start of the tournament may have a faster CPU than the ones running later.
Therefore I'm going to put my new CPU into 105W Eco mode (it'll cost something like 3% performance), undervolt it by 0.1V, and probably even limit the boost speed to a point where I can be sure that the CPU can hold that speed indefinitely; even in mid-summer.
But Magic Bitboards haven't gone poof before some patch replacing them is accepted to Stockfish
We'll see. In the end, the move generator itself just accounts for something like 10% of the computations required, so you hit the notion of diminishing returns quite quickly. I liken it to an old CD-burner. The first 1x burner took 74 minutes to write a CD.
1x = 74 minutes
2x = 37 minutes
4x = 18.5 minutes
8x = 9 minutes, 15 seconds
16x = 4 minutes, 35 seconds
And beyond that, nobody cared. Even if a burner was "24x", "32x" or even "40x" or faster, none I had ever dropped below 4 minutes. That is because of the CD's lead-in was still written at 1x, then the disc had to slowly ramp up to its maximum speed and then ramp down again to write the lead-out at 1x.
I feel it is the same with move generators: Fancy / Black Magic Bitboards + PEXT are so fast, and current evaluations and NN's take up so much computational power, that any improvement that can be measured outside the margin of error will probably be less than 5 Elo.