The bleeding edge Stockfish code, fetched from github today gave the following results:
Arch=avx2:
Total time (ms) : 258602
Nodes searched : 3311863930
Nodes/second : 12806799
Arch=native:
Total time (ms) : 109160
Nodes searched : 1347508396
Nodes/second : 12344342
The time is less than half.
The nodes are less than half.
The NPS is very close to equal.
The only thing that I can figure is that the profile guided optimization caused much better move ordering when the architecture was set to native. Of course, with an AMD Ryzen Threadripper 3970X, one would think that avx2 and native would be nearly identical.
Thoughts?
The most muscular compiler switch I ever saw
Moderator: Ras
-
Dann Corbit
- Posts: 12828
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
The most muscular compiler switch I ever saw
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
Dann Corbit
- Posts: 12828
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: The most muscular compiler switch I ever saw
I should mention: 16GB RAM for hash, and 16 threads, depth = 20 for the benchmark.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
Dann Corbit
- Posts: 12828
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: The most muscular compiler switch I ever saw
One more thing (probably inconsequential) is that I profile longer than the standard makefile with this bench command:
PGOBENCH = $(WINE_PATH) ./$(EXE) bench 16384 1 18
I have found that increasing the thread count destroys the profile. Seems odd. Other profilers I have used work find with SMP.
PGOBENCH = $(WINE_PATH) ./$(EXE) bench 16384 1 18
I have found that increasing the thread count destroys the profile. Seems odd. Other profilers I have used work find with SMP.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
Dann Corbit
- Posts: 12828
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: The most muscular compiler switch I ever saw
For those with similar hardware that want to replicate the result, here is the compiler version that I am using:
$ g++ --version
g++.exe (Rev11, Built by MSYS2 project) 15.2.0
Copyright (C) 2025 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ g++ --version
g++.exe (Rev11, Built by MSYS2 project) 15.2.0
Copyright (C) 2025 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
AndrewGrant
- Posts: 1971
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: The most muscular compiler switch I ever saw
PGO does not produce functional differences. The move ordering is the same.Dann Corbit wrote: ↑Thu Feb 05, 2026 7:44 am The only thing that I can figure is that the profile guided optimization caused much better move ordering when the architecture was set to native. Of course, with an AMD Ryzen Threadripper 3970X, one would think that avx2 and native would be nearly identical.
-
jdart
- Posts: 4423
- Joined: Fri Mar 10, 2006 5:23 am
- Location: http://www.arasanchess.org
Re: The most muscular compiler switch I ever saw
-march=native will enable optimizations that are specific to the processor you are building on. Since Intel/AMD add new instructions regularly, processors can differ in terms of the exact instruction set they support. "avx2" will enable a common subset of instructions, but full use of the supported instruction set may result in better code.
-
syzygy
- Posts: 5896
- Joined: Tue Feb 28, 2012 11:56 pm
Re: The most muscular compiler switch I ever saw
So it's just SMP randomness.Dann Corbit wrote: ↑Thu Feb 05, 2026 7:45 am I should mention: 16GB RAM for hash, and 16 threads, depth = 20 for the benchmark.
Either use 1 thread or do 20 runs with both and take the average.
The only reason to use mulitple threads to benchmark two functionally identica versions is to see if avx512 instructions (if -march=native uses those) cause a slowdown with multiple threads that you might not see with a single thread.
You have to understand that a compiler switch cannot be the reason for a different number of nodes unless there is a bug. The difference is entirely caused by indeterminacy caused by multithreading.
-
Dann Corbit
- Posts: 12828
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: The most muscular compiler switch I ever saw
The Native version is also a much better problem solver.
Since time to ply is better than cut in half, it seems to be far more capable.
Now, the other version (avx2) searches wider. So you would think that it would solve some problems better. But so far I have not seen that.
It's not just SMP variation, or I would see a fluxuation like that for native on multiple runs, and a fluxuation of that scale for avx2 on multiple runs. But I don't. Now, I don't have any logical explanation of why the native tree seems to be much less bushy. But it is very interesting to me.
I have seen this effect for other engines as well, but those were also related to Stockfish so it could still be something peculiar to the SF codebase/
Since time to ply is better than cut in half, it seems to be far more capable.
Now, the other version (avx2) searches wider. So you would think that it would solve some problems better. But so far I have not seen that.
It's not just SMP variation, or I would see a fluxuation like that for native on multiple runs, and a fluxuation of that scale for avx2 on multiple runs. But I don't. Now, I don't have any logical explanation of why the native tree seems to be much less bushy. But it is very interesting to me.
I have seen this effect for other engines as well, but those were also related to Stockfish so it could still be something peculiar to the SF codebase/
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
-
syzygy
- Posts: 5896
- Joined: Tue Feb 28, 2012 11:56 pm
Re: The most muscular compiler switch I ever saw
Nonsense. Unless you found a bug in either SF or your compiler, the two builds are functionally identical.Dann Corbit wrote: ↑Fri Feb 06, 2026 12:52 am The Native version is also a much better problem solver.
Do the two versions give identical benches when using 1 thread?
-
Dann Corbit
- Posts: 12828
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: The most muscular compiler switch I ever saw
Bench with no parameters:
avx2
otal time (ms) : 7936
Nodes searched : 2668754
Nodes/second : 336284
native
Total time (ms) : 7354
Nodes searched : 2668754
Nodes/second : 362898
avx2
otal time (ms) : 7936
Nodes searched : 2668754
Nodes/second : 336284
native
Total time (ms) : 7354
Nodes searched : 2668754
Nodes/second : 362898
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.