237 Mn/s for Stockfish on an 2xEPYC 7742

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by bob »

This goes 'round and 'round. My take has always been that NPS counts within the SAME program. IE if I can make my program 10% faster, purely by optimizations, then it will be better. If I make it 10% faster by stripping something out, it is not so clear. If A searches at X nodes per second, and B searches at Y nodes per second, that gives me nothing at all to compare, unless maybe if X is an order of magnitude larger than Y or something similar.

Comparing ASMfish ant Stockfish in terms of NPS doesn't make much sense since they are not the same program as of right now. If ASMfish gets updated to current stockfish algorithms, just refactored into assembly language, then the NPS suddenly becomes important since everything is the same except for the speed.
Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by Raphexon »

bob wrote: Sat Apr 11, 2020 7:01 am This goes 'round and 'round. My take has always been that NPS counts within the SAME program. IE if I can make my program 10% faster, purely by optimizations, then it will be better. If I make it 10% faster by stripping something out, it is not so clear. If A searches at X nodes per second, and B searches at Y nodes per second, that gives me nothing at all to compare, unless maybe if X is an order of magnitude larger than Y or something similar.

Comparing ASMfish ant Stockfish in terms of NPS doesn't make much sense since they are not the same program as of right now. If ASMfish gets updated to current stockfish algorithms, just refactored into assembly language, then the NPS suddenly becomes important since everything is the same except for the speed.
Just add an n/7 booster
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

Yes of course, that's why I'm pushing for SF to at least investigate why it's so much slower on this hardware. I mean we're talking 100,000 kN/s difference here. NPS is important but I wouldn't use asmFish over SF in a correspondence game today unless it was only a little behind SF updates.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by bob »

so 100 million nodes per second slower? Is that a typo where you meant 100K nodes per second? IE 100,000 KNPS adds 3 more zeros to the end of that 100,000...
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by MikeB »

bob wrote: Sun Apr 12, 2020 5:35 am so 100 million nodes per second slower? Is that a typo where you meant 100K nodes per second? IE 100,000 KNPS adds 3 more zeros to the end of that 100,000...
It was typo, he's missing 100M nps and based on iPman's benchmarks - he's right. With that type of powerhouse - you almost must have to go Linux really.
Image
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

bob wrote: Sat Apr 11, 2020 7:01 am This goes 'round and 'round. My take has always been that NPS counts within the SAME program. IE if I can make my program 10% faster, purely by optimizations, then it will be better. If I make it 10% faster by stripping something out, it is not so clear. If A searches at X nodes per second, and B searches at Y nodes per second, that gives me nothing at all to compare, unless maybe if X is an order of magnitude larger than Y or something similar.

Comparing ASMfish ant Stockfish in terms of NPS doesn't make much sense since they are not the same program as of right now. If ASMfish gets updated to current stockfish algorithms, just refactored into assembly language, then the NPS suddenly becomes important since everything is the same except for the speed.
I disagree. What you're saying is that asmFish is so much faster because it's not updated. Which means SF also would've had to gotten slower during the same time and we know it didn't.
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

MikeB wrote: Sun Apr 12, 2020 6:10 am
bob wrote: Sun Apr 12, 2020 5:35 am so 100 million nodes per second slower? Is that a typo where you meant 100K nodes per second? IE 100,000 KNPS adds 3 more zeros to the end of that 100,000...
It was typo, he's missing 100M nps and based on iPman's benchmarks - he's right. With that type of powerhouse - you almost must have to go Linux really.
No I don't think it was a typo. In Aquarium I'll get 180,000 kN/s with SF but with asmFish I'll get 280,000 kN/s which is a difference of 100,000 kN/s. This is how Aquarium reports the speed.

Also, I tried linux and I didn't see any difference there either although I only ran SF benches and at that time I didn't figure out yet that bench 1024 256 26 was the culprit because in Windows it was the same. Only after I increased the hash and ran bench 32768 256 26 did I get much faster speeds.

If I was convinced maybe that SF in Linux will run at 230,000+ kN/s might I make the switch but last time I tried Aquarium in wine I didn't like it and I only like to use Aquarium.

I'm still convinced that SF's NUMA/processor group code needs to at least be reviewed because I remember when asmFish was up to date with SF it was a LOT faster. Unfortunately nobody on the SF team wants to look at it probably because there's not many people with such hardware so not enough complaints. But that could also be the reason why there aren't more people with such hardware because they know now that they couldn't take advantage of that available speed using SF.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by MikeB »

yorkman wrote: Sun Apr 12, 2020 6:47 pm
MikeB wrote: Sun Apr 12, 2020 6:10 am
bob wrote: Sun Apr 12, 2020 5:35 am so 100 million nodes per second slower? Is that a typo where you meant 100K nodes per second? IE 100,000 KNPS adds 3 more zeros to the end of that 100,000...
It was typo, he's missing 100M nps and based on iPman's benchmarks - he's right. With that type of powerhouse - you almost must have to go Linux really.
No I don't think it was a typo. In Aquarium I'll get 180,000 kN/s with SF but with asmFish I'll get 280,000 kN/s which is a difference of 100,000 kN/s. This is how Aquarium reports the speed.

Also, I tried linux and I didn't see any difference there either although I only ran SF benches and at that time I didn't figure out yet that bench 1024 256 26 was the culprit because in Windows it was the same. Only after I increased the hash and ran bench 32768 256 26 did I get much faster speeds.

If I was convinced maybe that SF in Linux will run at 230,000+ kN/s might I make the switch but last time I tried Aquarium in wine I didn't like it and I only like to use Aquarium.

I'm still convinced that SF's NUMA/processor group code needs to at least be reviewed because I remember when asmFish was up to date with SF it was a LOT faster. Unfortunately nobody on the SF team wants to look at it probably because there's not many people with such hardware so not enough complaints. But that could also be the reason why there aren't more people with such hardware because they know now that they couldn't take advantage of that available speed using SF.
No typo- i misread, sorry.. On my system 3970x, asmfish with large pages is about 20% faster than Stockfish. When I added Large pages to Stockfish, that closed the difference to 10% - which is about where it should be.
Image
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by bob »

yorkman wrote: Sun Apr 12, 2020 6:37 pm
bob wrote: Sat Apr 11, 2020 7:01 am This goes 'round and 'round. My take has always been that NPS counts within the SAME program. IE if I can make my program 10% faster, purely by optimizations, then it will be better. If I make it 10% faster by stripping something out, it is not so clear. If A searches at X nodes per second, and B searches at Y nodes per second, that gives me nothing at all to compare, unless maybe if X is an order of magnitude larger than Y or something similar.

Comparing ASMfish ant Stockfish in terms of NPS doesn't make much sense since they are not the same program as of right now. If ASMfish gets updated to current stockfish algorithms, just refactored into assembly language, then the NPS suddenly becomes important since everything is the same except for the speed.
I disagree. What you're saying is that asmFish is so much faster because it's not updated. Which means SF also would've had to gotten slower during the same time and we know it didn't.
Where did I say that? I simply said that comparing two DIFFERENT programs (or versions of the same program) by NPS is not going to provide very interesting information. Only time NPS is really relevant is when you are doing optimizations and want to compare the changes with the original. This assumes NO other changes. Comparing ASMFish to current stockfish is again, apples to oranges. The two versions are not the same. One may have code the other does not.
yorkman
Posts: 105
Joined: Thu Jul 27, 2017 10:59 pm

Re: 237 Mn/s for Stockfish on an 2xEPYC 7742

Post by yorkman »

MikeB wrote: Sun Apr 12, 2020 6:10 am
No typo- i misread, sorry.. On my system 3970x, asmfish with large pages is about 20% faster than Stockfish. When I added Large pages to Stockfish, that closed the difference to 10% - which is about where it should be.
I was also comparing SF with LP to asmFish. And on my machine it's a world of difference...about 55% which is huge. Having a machine like this but not being able to use it to its full extent would be such a waste. Clearly there must be something that can be done to better optimize SF on such hardware.