The most muscular compiler switch I ever saw

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Dann Corbit
Posts: 12828
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

The most muscular compiler switch I ever saw

Post by Dann Corbit »

The bleeding edge Stockfish code, fetched from github today gave the following results:

Arch=avx2:
Total time (ms) : 258602
Nodes searched : 3311863930
Nodes/second : 12806799

Arch=native:
Total time (ms) : 109160
Nodes searched : 1347508396
Nodes/second : 12344342

The time is less than half.
The nodes are less than half.
The NPS is very close to equal.

The only thing that I can figure is that the profile guided optimization caused much better move ordering when the architecture was set to native. Of course, with an AMD Ryzen Threadripper 3970X, one would think that avx2 and native would be nearly identical.

Thoughts?
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12828
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: The most muscular compiler switch I ever saw

Post by Dann Corbit »

I should mention: 16GB RAM for hash, and 16 threads, depth = 20 for the benchmark.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12828
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: The most muscular compiler switch I ever saw

Post by Dann Corbit »

One more thing (probably inconsequential) is that I profile longer than the standard makefile with this bench command:

PGOBENCH = $(WINE_PATH) ./$(EXE) bench 16384 1 18

I have found that increasing the thread count destroys the profile. Seems odd. Other profilers I have used work find with SMP.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12828
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: The most muscular compiler switch I ever saw

Post by Dann Corbit »

For those with similar hardware that want to replicate the result, here is the compiler version that I am using:

$ g++ --version
g++.exe (Rev11, Built by MSYS2 project) 15.2.0
Copyright (C) 2025 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
AndrewGrant
Posts: 1971
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: The most muscular compiler switch I ever saw

Post by AndrewGrant »

Dann Corbit wrote: Thu Feb 05, 2026 7:44 am The only thing that I can figure is that the profile guided optimization caused much better move ordering when the architecture was set to native. Of course, with an AMD Ryzen Threadripper 3970X, one would think that avx2 and native would be nearly identical.
PGO does not produce functional differences. The move ordering is the same.
jdart
Posts: 4423
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: The most muscular compiler switch I ever saw

Post by jdart »

-march=native will enable optimizations that are specific to the processor you are building on. Since Intel/AMD add new instructions regularly, processors can differ in terms of the exact instruction set they support. "avx2" will enable a common subset of instructions, but full use of the supported instruction set may result in better code.
syzygy
Posts: 5896
Joined: Tue Feb 28, 2012 11:56 pm

Re: The most muscular compiler switch I ever saw

Post by syzygy »

Dann Corbit wrote: Thu Feb 05, 2026 7:45 am I should mention: 16GB RAM for hash, and 16 threads, depth = 20 for the benchmark.
So it's just SMP randomness.
Either use 1 thread or do 20 runs with both and take the average.

The only reason to use mulitple threads to benchmark two functionally identica versions is to see if avx512 instructions (if -march=native uses those) cause a slowdown with multiple threads that you might not see with a single thread.

You have to understand that a compiler switch cannot be the reason for a different number of nodes unless there is a bug. The difference is entirely caused by indeterminacy caused by multithreading.
Dann Corbit
Posts: 12828
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: The most muscular compiler switch I ever saw

Post by Dann Corbit »

The Native version is also a much better problem solver.
Since time to ply is better than cut in half, it seems to be far more capable.
Now, the other version (avx2) searches wider. So you would think that it would solve some problems better. But so far I have not seen that.
It's not just SMP variation, or I would see a fluxuation like that for native on multiple runs, and a fluxuation of that scale for avx2 on multiple runs. But I don't. Now, I don't have any logical explanation of why the native tree seems to be much less bushy. But it is very interesting to me.
I have seen this effect for other engines as well, but those were also related to Stockfish so it could still be something peculiar to the SF codebase/
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
syzygy
Posts: 5896
Joined: Tue Feb 28, 2012 11:56 pm

Re: The most muscular compiler switch I ever saw

Post by syzygy »

Dann Corbit wrote: Fri Feb 06, 2026 12:52 am The Native version is also a much better problem solver.
Nonsense. Unless you found a bug in either SF or your compiler, the two builds are functionally identical.

Do the two versions give identical benches when using 1 thread?
Dann Corbit
Posts: 12828
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: The most muscular compiler switch I ever saw

Post by Dann Corbit »

Bench with no parameters:
avx2
otal time (ms) : 7936
Nodes searched : 2668754
Nodes/second : 336284

native
Total time (ms) : 7354
Nodes searched : 2668754
Nodes/second : 362898
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.