Ryzen seems to have very bad performance on BSR/BSF (6 ops?) which is very weird because the essentially identical LZCNT/TZCNT seem OK (1 and 2 uops respectively).
I think compilers will generate the lzcnt/tzcnt instructions nowadays. But asmFish seems to use bsf instead of tzcnt.
That whole thread is interesting. It seems Ryzen is as good as or better than Intel in executing instructions, but that its memory subsystem is behind Intel's.
I have made some speed test on my Ryzen machine with Stockfish Speed Test made by Brice Allenbrand. For the good reproducibility I use one core tests only.
My PC is a Ryzen 7 1800x running on 8x4000 MHz and SMT (~HT) is OFF.
OS is Windows 7 64 bits.
Stockfish dev. used by me is 17062123.
Results:
Compiled by me on this PC with MinGW w64 gcc-5.3.0:
x_64_modern 2244 Mnps
x_64 (old) 2181 Mnps
x_64_bmi2 1694 Mnps
Deviations from average are < 5 Mnps.
Bmi2 versions are slower near 25% and it is obvious that Abrok compiles far from optimal.
corres wrote:I have made some speed test on my Ryzen machine with Stockfish Speed Test made by Brice Allenbrand. For the good reproducibility I use one core tests only.
My PC is a Ryzen 7 1800x running on 8x4000 MHz and SMT (~HT) is OFF.
OS is Windows 7 64 bits.
Stockfish dev. used by me is 17062123.
Results:
Compiled by me on this PC with MinGW w64 gcc-5.3.0:
x_64_modern 2244 Mnps
x_64 (old) 2181 Mnps
x_64_bmi2 1694 Mnps
Deviations from average are < 5 Mnps.
Bmi2 versions are slower near 25% and it is obvious that Abrok compiles far from optimal.
I assume that Mnps should actually be Knps and the standard deviation is 5Knps. Is that correct?
Regards,
Forrest
Only 2 defining forces have ever offered to die for you.....Jesus Christ and the American Soldier. One died for your soul, the other for your freedom.
Sorry, but I have not detailed information about Brice's speed tester and moreover the critical parts of it are in binaries. But I am afraid you are right and it is a bug in the results displayed by the tester. In any case the results - disregarding their dimension - are good numerically.
corres wrote:I have made some speed test on my Ryzen machine with Stockfish Speed Test made by Brice Allenbrand. For the good reproducibility I use one core tests only.
My PC is a Ryzen 7 1800x running on 8x4000 MHz and SMT (~HT) is OFF.
OS is Windows 7 64 bits.
Stockfish dev. used by me is 17062123.
Results:
Compiled by me on this PC with MinGW w64 gcc-5.3.0:
x_64_modern 2244 Mnps
x_64 (old) 2181 Mnps
x_64_bmi2 1694 Mnps
Deviations from average are < 5 Mnps.
Bmi2 versions are slower near 25% and it is obvious that Abrok compiles far from optimal.
Anybody tried Stockfish compiled and benched using Clang compiler?
What is interesting is that some engines are more affected than others. For example stockfish bmi2 is clearly slower, but I tested Ginkgo and the bmi compile was 2-3% faster.
Another case is Orion, where bmi binary is faster.
Maybe they are not using the instruction to their fullest, and that's why the penalty is non-existant.