Lc0 OpenCL benchmark with 128x10 network

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Max
Posts: 247
Joined: Tue Apr 13, 2010 10:41 am

Re: Lc0 OpenCL benchmark with 128x10 network

Post by Max »

MikeB wrote: Fri Jan 10, 2020 3:26 am I know this was for opencl benchmarks, but just for kicks I ran it with a 2060 RTX Super (cudnn-fp16)

Code: Select all

<snip>
..
Benchmark final time 10.0043s calculating 110656 nodes per second.
Thank you for ruining my day, Mike. :wink:
How high is your OpenCL score?

My new android smartphone LG G8s with a Qualcomm SD855 and Adreno 640 gpu gets ~73 nps here (220 nps with cpu and OpenBLAS).

:arrow: I'm curious how Samsung/Exynos and Huawei/Kirin are doing at OpenCL.
$ ./lc0-opencl benchmark -w net/56215.pb.gz
_
| _ | |
|_ |_ |_| v0.23.0+git.2498564 built Dec 1 2019
Loading weights file from: net/56215.pb.gz
Creating backend [opencl]...
OpenCL, maximum batch size set to 16.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 2.0 QUALCOMM build: commit #026fa27 changeid #I3763001aef Date: 03/13/19 Wed Local Branch: Remote Branch: quic/gfx-adreno.lnx.1.0.r50
Platform profile: FULL_PROFILE
Platform name: QUALCOMM Snapdragon(TM)
Platform vendor: QUALCOMM
Device ID: 0
Device name: QUALCOMM Adreno(TM)
Device type: GPU
Device vendor: QUALCOMM
Device driver: OpenCL 2.0 QUALCOMM build: commit #026fa27 changeid #I3763001aef Date: 03/13/19 Wed Local Branch: Remote Branch: quic/gfx-adreno.lnx.1.0.r50 Compiler E031.36.05.00
Device speed: 1 MHZ
Device cores: 2 CU
Device score: 120
Selected platform: QUALCOMM Snapdragon(TM)
Selected device: QUALCOMM Adreno(TM)
with OpenCL 2.0 capability.
Loaded existing SGEMM tuning for batch size 16.
Wavefront/Warp size: 64

Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 1024
Benchmark time 411ms, 2 nodes, 6 nps, move e2e4
Benchmark time 632ms, 5 nodes, 9 nps, move e2e4
Benchmark time 970ms, 9 nodes, 10 nps, move e2e4
Benchmark time 1307ms, 18 nodes, 14 nps, move e2e4
Benchmark time 1403ms, 21 nodes, 15 nps, move e2e4
Benchmark time 1787ms, 29 nodes, 17 nps, move e2e4
Benchmark time 1896ms, 33 nodes, 18 nps, move e2e4
Benchmark time 2248ms, 45 nodes, 20 nps, move e2e4
Benchmark time 2793ms, 61 nodes, 22 nps, move e2e4
Benchmark time 3417ms, 80 nodes, 24 nps, move e2e4
Benchmark time 4040ms, 109 nodes, 27 nps, move e2e4
Benchmark time 4110ms, 114 nodes, 28 nps, move e2e4
Benchmark time 4187ms, 123 nodes, 29 nps, move e2e4
Benchmark time 4669ms, 145 nodes, 31 nps, move e2e4
Benchmark time 5247ms, 176 nodes, 34 nps, move e2e4
Benchmark time 5838ms, 224 nodes, 38 nps, move e2e4
Benchmark time 6421ms, 279 nodes, 44 nps, move e2e4
Benchmark time 7202ms, 360 nodes, 50 nps, move e2e4
Benchmark time 7789ms, 388 nodes, 50 nps, move e2e4
Benchmark time 8296ms, 460 nodes, 56 nps, move e2e4
Benchmark time 8866ms, 544 nodes, 61 nps, move e2e4
Benchmark time 9987ms, 689 nodes, 69 nps, move e2e4
Benchmark time 10001ms, 708 nodes, 71 nps, move e2e4
bestmove e2e4
Benchmark final time 10.7432s calculating 72.9766 nodes per second.
$

Code: Select all

NPS	GPU (OpenCL)		System					OS
===================================================================================
10703	Nvidia GTX 1080		Desktop					Win10
 9150	Nvidia Tesla T4		Google Colab (*)			Linux
 8754	Nvidia GTX 1070		Desktop					Win10
 4829	Nvidia Tesla K80	Google Colab (*)			Linux <-new
 3986	Nvidia GTX 1050 Ti	Laptop					Win10
 3579	Nvidia GTX 1050		Desktop, AMD FX-8350			Win10
 2493	Nvidia GTX 750 Ti	Desktop, AMD FX-8300			Win10
  705	AMD Firepro M4000	HP EliteBook 8570w, i7-3740QM		Win10
  595	Intel 6100		MacBook Air 13" 2015, i5-5250U 		macOS 13.6
  573	Intel HD 630		Laptop					Win10
  545	Nvidia GTX 460M		Asus ROG G73S, i7-2630QM		Win10
  487	Intel HD 620		HP EliteBook 850 G4			Win10
  437	Intel HD 520		Dell Latitude E5570, i5-6200U		Win10
  412	Intel HD 4400		Sony Vaio Ultrabook 13", i5-4200U	Win10
  353	Nvidia GT 650M		MacBook Pro 15" 2012, i7-3615QM		macOS 12.6
  260	Intel HD 4000		Lenovo Thinkpad T430, i7-3520M		Win10
  155	Intel HD 505		Acer Spin 1, Pentium N4200		Win10
   74	ATI Radeon HD 5430M	Arctic MediaCenter MC001, Atom D525	Win10
   73	Adreno 640      	Smartphone LG G8s, SD855        	Android 9
   11	Intel HD		Medion E1232T, Celeron N2807		Win10
Hope we're not just the biological boot loader for digital super intelligence. Unfortunately, that is increasingly probable - Elon Musk