M1 Apple Silicon for Chess?

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Raphexon
Posts: 476
Joined: Sun Mar 17, 2019 12:00 pm
Full name: Henk Drost

Re: M1 Apple Silicon for Chess?

Post by Raphexon »

Milos wrote: Wed Nov 25, 2020 6:27 pm
George Sobala wrote: Wed Nov 25, 2020 10:02 am Comparison of Stockfish and Cfish running on 8 cores, classical v NNUE, on M1 and iMac Pro Xeon W

run using

stockfish|cfish 64 8 20 default depth classical|NNUE

Code: Select all

		        Stockfish	Cfish
M1	classical	16319718	17118388
M1	NNUE	        10100445	12759621
Xeon W	classical	16900088	17929299
Xeon W	NNUE	        13631405	14191552
Wow, that's even worse than what I predicted. That is around 6.5Mnps from a starting position for current SF-NNUE-dev and 10.5Mnps for SF classical.
Btw. I get 13013186 and 16423498 for NNUE and classical bench receptively on E5-2689 8 core Xeon CPU (with HT on) that is almost 9 years old.
So basically one can expect almost 2x better SF-NNUE numbers on AMD 4900U than on M1 even with HT off. With HT on, 4900U should be around 3x better.
So much about how great M1 chip is :lol:.
Said so, ARM chips always overperform on Geekbench.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: M1 Apple Silicon for Chess?

Post by Ras »

Milos wrote: Wed Nov 25, 2020 8:39 pmYeah right, ppl who actually need the laptop for work, should not buy a tablet with a keyboard.
Not if the work needs a lot of computing power on the same machine. Local document work doesn't. Neither does connecting to a fast remote host running on wall power. The actual question is rather whether this laptop even makes much sense at its price point. The small screen is annoying without external monitor - but with external monitor, there's also wall power so that the battery runtime doesn't matter.

Btw., interesting to see that people were right pointing me to over-performance on Geekbench!
Rasmus Althoff
https://www.ct800.net
User avatar
phhnguyen
Posts: 1434
Joined: Wed Apr 21, 2010 4:58 am
Location: Australia
Full name: Nguyen Hong Pham

Re: M1 Apple Silicon for Chess?

Post by phhnguyen »

George Sobala wrote: Wed Nov 25, 2020 8:00 am Benchmarks on a MacBook Air 16GB 8-core GPU

Stockfish compiled for M1 Apple Silicon, commit f9595828eb7e5e970b0be3ee5f84ddd726845523 Wed 11 Nov

Code: Select all

bench	2408119
bench 64 1 20	2270034
bench 64 2 20	4503092
bench 64 4 20	9069926
bench 64 6 20	10820178
bench 64 8 20	12438598
...
Not M1 but A-series to compare, benchmark 6 threads on an iPhoneX:

Image

M1 is about 3.5 times as fast as an iPhone X!

Can someone post a benchmark of the latest iPhone 12 to compare?
https://banksiagui.com
The most features chess GUI, based on opensource Banksia - the chess tournament manager
Milos
Posts: 4190
Joined: Wed Nov 25, 2009 1:47 am

Re: M1 Apple Silicon for Chess?

Post by Milos »

phhnguyen wrote: Thu Nov 26, 2020 12:28 am
George Sobala wrote: Wed Nov 25, 2020 8:00 am Benchmarks on a MacBook Air 16GB 8-core GPU

Stockfish compiled for M1 Apple Silicon, commit f9595828eb7e5e970b0be3ee5f84ddd726845523 Wed 11 Nov

Code: Select all

bench	2408119
bench 64 1 20	2270034
bench 64 2 20	4503092
bench 64 4 20	9069926
bench 64 6 20	10820178
bench 64 8 20	12438598
...
Not M1 but A-series to compare, benchmark 6 threads on an iPhoneX:

Image

M1 is about 3.5 times as fast as an iPhone X!

Can someone post a benchmark of the latest iPhone 12 to compare?
That's not surprise at all since M1 with 6 cores runs 4 cores at 3.204GHz and 2 at 2.064GHz. And A11 runs 2 cores at 2.39GHz and 4 cores at 1.42GHz. Just by averaging frequency you get that M1 is 1.7x faster than A11. However, slow cores IPC is much slower, so ratio is closer to 4*3.2/2*2.4=2.67. Add to it, that A11 AVX is atrocious, and probably 20% larger IPC of M1 compared to A11 on the same frequency and you get your 3.5 ratio.
Jhoravi
Posts: 291
Joined: Wed May 08, 2013 6:49 am

Re: M1 Apple Silicon for Chess?

Post by Jhoravi »

George Sobala wrote: Wed Nov 25, 2020 9:31 am Its what you get by running bench from the command line, which is classical and (hybrid) NNUE on alternate positions.

stockfish bench 64 8 20 default depth NNUE gives 9818973 (8 cores using just hybrid NNUE)

stockfish bench 64 8 20 default depth classical gives 16097762 which is as fast as my 8-core Xeon iMac Pro
Thanks. Do you have NPS result from the initial board position? It's what we are used to in ARM benchmark comparisons.
George Sobala
Posts: 44
Joined: Sat Feb 03, 2018 2:42 pm
Location: Yorkshire, England

Re: M1 Apple Silicon for Chess?

Post by George Sobala »

Stockfish, 8 threads, Hash 512, from start position.

Bear in mind this is an Air so maybe some thermal throttling by this stage which an MBP or Mini will not show.

NNUE: 7019918

Code: Select all

info depth 39 seldepth 48 multipv 1 score cp 27 lowerbound nodes 861308936 nps 7019918 hashfull 1000 tbhits 0 time 122695 pv c2c4
Classical: 12992274

Code: Select all

info depth 41 seldepth 50 multipv 1 score cp 43 nodes 1613328624 nps 12992274 hashfull 1000 tbhits 0 time 124176 pv d2d4 e7e6 e2e4 d7d5 e4e5 c7c5 c2c3 g8e7 g1f3 c8d7 b1a3 c5d4 c3d4 e7f5 a3c2 b8c6 f1d3 c6b4 c2b4 f8b4 c1d2 d8b6 d2b4 b6b4 d1d2 b4b6 d3f5 e6f5 e1g1 e8g8 f1c1 f8c8 h2h4 d7e6 f3g5 h7h6 g5e6 b6e6 c1c8 a8c8 a1c1 a7a5 h4h5 a5a4 c1c8 e6c8 g1h1 c8c7
And for those interested, 1 core NNUE speed at 2 minutes is 1419481.

For specific concerns about NNUE performance on arm64, bear in mind that:
(a) Stockfish's NNUE code is suboptimal. Cfish gets 2.05Mnps single core at 2 minutes from startpos.
(b) The above binaries were compiled with Apple clang. No doubt gcc will provide small speed improvements when available.
daylen
Posts: 40
Joined: Fri Dec 30, 2011 5:33 am
Location: Berkeley, CA

Re: M1 Apple Silicon for Chess?

Post by daylen »

Jhoravi wrote: Tue Nov 24, 2020 7:30 pm But Apple M1 has neural engine processors. Maybe NNUE can be optimized for it?
The neural engine officially can only be used through Apple's Core ML API. This makes it tricky for Stockfish to adopt, but perhaps lc0 could adopt this.
arcobaer
Posts: 2
Joined: Tue Nov 05, 2019 5:38 pm
Full name: Marcus Kästner

Re: M1 Apple Silicon for Chess?

Post by arcobaer »

Hello,

where can I get the compiled stockfish(dev) and lc0 binaries for mac OS (M1)?

Best regards
Marcus
George Sobala wrote: Wed Nov 25, 2020 8:00 am Benchmarks on a MacBook Air 16GB 8-core GPU

Stockfish compiled for M1 Apple Silicon, commit f9595828eb7e5e970b0be3ee5f84ddd726845523 Wed 11 Nov

Code: Select all

bench	2408119
bench 64 1 20	2270034
bench 64 2 20	4503092
bench 64 4 20	9069926
bench 64 6 20	10820178
bench 64 8 20	12438598
Cfish compiled for M1 Apple Silicon, commit 6193ed1c3809cb4b71ad8f630fa4f52160390fb6 Sun 15 Nov

Code: Select all

bench	2844055
bench 64 1 20	2614400
bench 64 2 20	5232224
bench 64 4 20	10152288
bench 64 6 20	12388337
bench 64 8 20	14935531
dragon-ox running under Rosetta-2

Code: Select all

command  nodes  sec      nps
bench 1	1915866	4.80	399139
bench 2	3653649	4.70	777372
bench 4	5690302	3.36	1693542
lc0, compiled for M1 running on CPU using Apple's Accelerate framework, output of benchmark first position only:

Code: Select all

--backend=blas --weights=J92-330	 97
--backend=blas --weights=T70net-703810   1703
lc0, compiled for M1 running on GPU using OpenCL, output of benchmark first position only:

Code: Select all

--backend=opencl --weights=J92-330	   104
--backend=opencl --weights=T70net-703810   3431
George Sobala
Posts: 44
Joined: Sat Feb 03, 2018 2:42 pm
Location: Yorkshire, England

Re: M1 Apple Silicon for Chess?

Post by George Sobala »

Your best bet is to learn how to compile them yourself. It is really easy and then you can always have the latest dev version.
User avatar
MikeB
Posts: 4889
Joined: Thu Mar 09, 2006 6:34 am
Location: Pen Argyl, Pennsylvania

Re: M1 Apple Silicon for Chess?

Post by MikeB »

George Sobala wrote: Wed Nov 25, 2020 8:00 am Benchmarks on a MacBook Air 16GB 8-core GPU

Stockfish compiled for M1 Apple Silicon, commit f9595828eb7e5e970b0be3ee5f84ddd726845523 Wed 11 Nov

Code: Select all

bench	2408119
bench 64 1 20	2270034
bench 64 2 20	4503092
bench 64 4 20	9069926
bench 64 6 20	10820178
bench 64 8 20	12438598
Cfish compiled for M1 Apple Silicon, commit 6193ed1c3809cb4b71ad8f630fa4f52160390fb6 Sun 15 Nov

Code: Select all

bench	2844055
bench 64 1 20	2614400
bench 64 2 20	5232224
bench 64 4 20	10152288
bench 64 6 20	12388337
bench 64 8 20	14935531
dragon-ox running under Rosetta-2

Code: Select all

command  nodes  sec      nps
bench 1	1915866	4.80	399139
bench 2	3653649	4.70	777372
bench 4	5690302	3.36	1693542
lc0, compiled for M1 running on CPU using Apple's Accelerate framework, output of benchmark first position only:

Code: Select all

--backend=blas --weights=J92-330	 97
--backend=blas --weights=T70net-703810   1703
lc0, compiled for M1 running on GPU using OpenCL, output of benchmark first position only:

Code: Select all

--backend=opencl --weights=J92-330	   104
--backend=opencl --weights=T70net-703810   3431
this is not impressive at all ... will keep my money in my pocket ...Geekbench scores appear to be meaningless for any attempts to project, the Geekbench scores were better than my 2010 Mac Pro all the way around — huge disappointment for me ...
Image