pohl4711 wrote: ↑Sat Mar 27, 2021 7:21 am
MMarco wrote: ↑Fri Mar 26, 2021 9:12 pm
Seriously?
You're the only one here with a ratio far above 100% amongs a few users with an AMD processor. MikeB got 92%. I get 91%, Elpapa 90%, Modern Times 94%. You get 131% and you think that perhaps other's results are wrong?
What I measured (and what has to be measured) was the speed gain from popcount binary to bmi2/avx2 binary. I measured the right thing (to show, why the Ipman speed measurements with a non nnue-engine (asmFish) are outdated), but I wrote nonsense:
Popcnt compiles weren't even mentionned then. Without doubt, you measured NNUE=ON vs NNUE=OFF with the latest Stockfish from Abrok, at least for bmi2:
pohl4711 wrote:
All with latest Stockfish from abrok (March, 24), go depth 28
Intel Haswell i7-6700HQ (bmi2 binary):
SF nnue off: 1071209 nps
SF nnue on: 932672 nps
= using nnue is 87% speed compared to nnue off
I specifically told you I got the same ratio with my intel processor:
MMarco wrote: With a 10700, NNUE gets 88% of classical's speed here:
All these numbers we shared came out to measure what you mentionned:
Pohl wrote:The nnue-engines run much faster on AMD Ryzen/Threadripper with an avx2-compile (+25%-30% more nodes!) compared to all Intel CPUs. So, for nnue-engines the Intel CPUs are much worse than they appear in that list.
So to check that your assertion that "avx-2-compiles for AMD generates 25-30% more nodes for NNUE compared to Intel CPU for NNUE engines" was correct, we compared the ratio NNUE=ON/NNUE=OFF for both AMD and Intel. Which makes a lot of sense.
As we saw, for Intel cpu, you and I got the same 87-88% ratio. For AMD, this ratio goes from 90-94% according to the numbers shown here. It seems that avx2 implementation on AMD processors is slightly superior to that of Intel (avx2 instructions are included in bmi2 compiles). But nowhere near your stated 25-30%. That is plain wrong.
What currently makes AMD processors so much better than Intel one at chess now, is just that they are faster processors with or without avx2 instructions (I'm not a hardware specialist, but I think it related to the number of instructions per cycle).
Compare a 9900k (that should be close to a 10700k) to a 5800x here with NNUE=ON with official Stockfish 12:
https://openbenchmarking.org/test/pts/s ... f5#metrics
Code: Select all
9900k = 17.73 knps (16 threads)
3800xt = 21.49 knps (16 threads), +21.2% vs 9900k
5800x = 24.31 knps (16 threads), +37.0% vs 9900k
The better avx2 implementation (which gives a relative 5% edge) counts for at most 25% of the difference for Ryzen 3000 and less than 15% of the difference for Ryzen 5000.
This was done comparing avx2/bmi2 with NNUE=OFF and avx2/bmi2 with NNUE=ON. I don't get why you want to compare the increase from popcnt to bmi2 for NNUE=ON. Intel's bmi2 instructions have been there since 2013 (
https://en.wikipedia.org/wiki/Haswell_( ... hitecture)), and NNUE came to computer chess not even a year ago. There never been popcnt compiles for NNUE that made sense to use with an Intel CPU. That is a counter factual reality and a useless comparison, and it doesn't yield your claimed +25-30% either.
Even Ipman's list that you claim being "outdated" doesn't use popcnt for intel! But you suggests worst, going back to popcnt for intel and being even more outdated! That is laughable.
Ipman uses bmi2 as the instructions were available when he tested the CPUs, and that is where we should start from.
IPMAN's list remains very useful to compare processors within the same family, to compare processors for non avx2 load, and given that you add an extra pourcentage on performance for nnue, it will give you a list of relative performance for NNUE engines. And he was a wide variety of processors listed including high count ones. Not bad, and certainly more reliable than your bogus numbers.