Ryzen 1800x and best stockfish version

syzygy · Post by **syzygy** » Sun Jun 25, 2017 12:01 am

syzygy wrote:
Cardoso wrote:On PDEP and PEXT instructions Skylake has one cycle latency.
Ryzen has 18 cycles !!!
Even then it is surprising (to me) that the bmi2 version is SO MUCH slower. (Does a bmi2 instruction flush the pipeline or something?)

Here I found the following measurements:

So pdep/pext not only have huge latency, they are even worse for throughput. So they basically screw up efficiency.

Something else I read here:

Ryzen seems to have very bad performance on BSR/BSF (6 ops?) which is very weird because the essentially identical LZCNT/TZCNT seem OK (1 and 2 uops respectively).

I think compilers will generate the lzcnt/tzcnt instructions nowadays. But asmFish seems to use bsf instead of tzcnt.

That whole thread is interesting. It seems Ryzen is as good as or better than Intel in executing instructions, but that its memory subsystem is behind Intel's.

corres · Post by **corres** » Sun Jun 25, 2017 3:23 pm

I have made some speed test on my Ryzen machine with Stockfish Speed Test made by Brice Allenbrand. For the good reproducibility I use one core tests only.
My PC is a Ryzen 7 1800x running on 8x4000 MHz and SMT (~HT) is OFF.
OS is Windows 7 64 bits.
Stockfish dev. used by me is 17062123.
Results:

Abrok-compiles:
x_64_modern 2156 Mnps
x_64 (old) 2155 Mnps
x_64_bmi2 1656 Mnps

Compiled by me on this PC with MinGW w64 gcc-5.3.0:
x_64_modern 2244 Mnps
x_64 (old) 2181 Mnps
x_64_bmi2 1694 Mnps
Deviations from average are < 5 Mnps.

Bmi2 versions are slower near 25% and it is obvious that Abrok compiles far from optimal.

Zenmastur · Post by **Zenmastur** » Mon Jun 26, 2017 5:52 am

corres wrote:I have made some speed test on my Ryzen machine with Stockfish Speed Test made by Brice Allenbrand. For the good reproducibility I use one core tests only.
My PC is a Ryzen 7 1800x running on 8x4000 MHz and SMT (~HT) is OFF.
OS is Windows 7 64 bits.
Stockfish dev. used by me is 17062123.
Results:

Abrok-compiles:
x_64_modern 2156 Mnps
x_64 (old) 2155 Mnps
x_64_bmi2 1656 Mnps

Compiled by me on this PC with MinGW w64 gcc-5.3.0:
x_64_modern 2244 Mnps
x_64 (old) 2181 Mnps
x_64_bmi2 1694 Mnps
Deviations from average are < 5 Mnps.

Bmi2 versions are slower near 25% and it is obvious that Abrok compiles far from optimal.

I assume that Mnps should actually be Knps and the standard deviation is 5Knps. Is that correct?

Regards,

Forrest

corres · Post by **corres** » Mon Jun 26, 2017 10:50 am

Sorry, but I have not detailed information about Brice's speed tester and moreover the critical parts of it are in binaries. But I am afraid you are right and it is a bug in the results displayed by the tester. In any case the results - disregarding their dimension - are good numerically.

Look · Post by **Look** » Tue Jun 27, 2017 4:24 pm

corres wrote:I have made some speed test on my Ryzen machine with Stockfish Speed Test made by Brice Allenbrand. For the good reproducibility I use one core tests only.
My PC is a Ryzen 7 1800x running on 8x4000 MHz and SMT (~HT) is OFF.
OS is Windows 7 64 bits.
Stockfish dev. used by me is 17062123.
Results:

Abrok-compiles:
x_64_modern 2156 Mnps
x_64 (old) 2155 Mnps
x_64_bmi2 1656 Mnps

Compiled by me on this PC with MinGW w64 gcc-5.3.0:
x_64_modern 2244 Mnps
x_64 (old) 2181 Mnps
x_64_bmi2 1694 Mnps
Deviations from average are < 5 Mnps.

Bmi2 versions are slower near 25% and it is obvious that Abrok compiles far from optimal.

Anybody tried Stockfish compiled and benched using Clang compiler?

david · Post by **david** » Thu Feb 01, 2018 5:47 pm

now with stockfish 9 from the stockfish downloads site Author: mstembera
Date: Wed Jan 31 11:41:09 2018 +0100 Timestamp: 1517395269

The three bench results are:
X64
Total time (ms) : 2705
Nodes searched : 5023629
Nodes/second : 1857164

Bmi2
Total time (ms) : 3510
Nodes searched : 5023629
Nodes/second : 1431233

Modern
Total time (ms) : 2693
Nodes searched : 5023629
Nodes/second : 1865439

So again, with a Ryzen 1800x, bmi2 is considerably slower. Modern is just a tad faster than x64.

david

CMCanavessi · Post by **CMCanavessi** » Thu Feb 01, 2018 6:04 pm

What is interesting is that some engines are more affected than others. For example stockfish bmi2 is clearly slower, but I tested Ginkgo and the bmi compile was 2-3% faster.
Another case is Orion, where bmi binary is faster.

Maybe they are not using the instruction to their fullest, and that's why the penalty is non-existant.

Edit:

Ginkgo test

Code: Select all

BMI binary&#58; info nodes 436765732 nps 1169736 hashfull 684 time 373388
Non-BMI&#58;    info nodes 436765732 nps 1147429 hashfull 684 time 380647

Paloma · Post by **Paloma** » Thu Feb 01, 2018 7:05 pm

Note, the bench results fluctuate with every run.

Run benchmark 10 times and you get 10 different results, unfortunately

Ryzen 1800x and best stockfish version

Re: Ryzen 1800x and best stockfish version

Re: Ryzen 1800x and best stockfish version

Re: Ryzen 1800x and best stockfish version

Re: Ryzen 1800x and best stockfish version

Re: Ryzen 1800x and best stockfish version

Re: Ryzen 1800x and best stockfish version

Re: Ryzen 1800x and best stockfish version

Re: Ryzen 1800x and best stockfish version