Thank you for ruining my day, Mike.MikeB wrote: ↑Fri Jan 10, 2020 3:26 am I know this was for opencl benchmarks, but just for kicks I ran it with a 2060 RTX Super (cudnn-fp16)
Code: Select all
<snip> .. Benchmark final time 10.0043s calculating 110656 nodes per second.
How high is your OpenCL score?
My new android smartphone LG G8s with a Qualcomm SD855 and Adreno 640 gpu gets ~73 nps here (220 nps with cpu and OpenBLAS).
I'm curious how Samsung/Exynos and Huawei/Kirin are doing at OpenCL.
$ ./lc0-opencl benchmark -w net/56215.pb.gz
_
| _ | |
|_ |_ |_| v0.23.0+git.2498564 built Dec 1 2019
Loading weights file from: net/56215.pb.gz
Creating backend [opencl]...
OpenCL, maximum batch size set to 16.
Initializing OpenCL.
Detected 1 OpenCL platforms.
Platform version: OpenCL 2.0 QUALCOMM build: commit #026fa27 changeid #I3763001aef Date: 03/13/19 Wed Local Branch: Remote Branch: quic/gfx-adreno.lnx.1.0.r50
Platform profile: FULL_PROFILE
Platform name: QUALCOMM Snapdragon(TM)
Platform vendor: QUALCOMM
Device ID: 0
Device name: QUALCOMM Adreno(TM)
Device type: GPU
Device vendor: QUALCOMM
Device driver: OpenCL 2.0 QUALCOMM build: commit #026fa27 changeid #I3763001aef Date: 03/13/19 Wed Local Branch: Remote Branch: quic/gfx-adreno.lnx.1.0.r50 Compiler E031.36.05.00
Device speed: 1 MHZ
Device cores: 2 CU
Device score: 120
Selected platform: QUALCOMM Snapdragon(TM)
Selected device: QUALCOMM Adreno(TM)
with OpenCL 2.0 capability.
Loaded existing SGEMM tuning for batch size 16.
Wavefront/Warp size: 64
Max workgroup size: 1024
Max workgroup dimensions: 1024 1024 1024
Benchmark time 411ms, 2 nodes, 6 nps, move e2e4
Benchmark time 632ms, 5 nodes, 9 nps, move e2e4
Benchmark time 970ms, 9 nodes, 10 nps, move e2e4
Benchmark time 1307ms, 18 nodes, 14 nps, move e2e4
Benchmark time 1403ms, 21 nodes, 15 nps, move e2e4
Benchmark time 1787ms, 29 nodes, 17 nps, move e2e4
Benchmark time 1896ms, 33 nodes, 18 nps, move e2e4
Benchmark time 2248ms, 45 nodes, 20 nps, move e2e4
Benchmark time 2793ms, 61 nodes, 22 nps, move e2e4
Benchmark time 3417ms, 80 nodes, 24 nps, move e2e4
Benchmark time 4040ms, 109 nodes, 27 nps, move e2e4
Benchmark time 4110ms, 114 nodes, 28 nps, move e2e4
Benchmark time 4187ms, 123 nodes, 29 nps, move e2e4
Benchmark time 4669ms, 145 nodes, 31 nps, move e2e4
Benchmark time 5247ms, 176 nodes, 34 nps, move e2e4
Benchmark time 5838ms, 224 nodes, 38 nps, move e2e4
Benchmark time 6421ms, 279 nodes, 44 nps, move e2e4
Benchmark time 7202ms, 360 nodes, 50 nps, move e2e4
Benchmark time 7789ms, 388 nodes, 50 nps, move e2e4
Benchmark time 8296ms, 460 nodes, 56 nps, move e2e4
Benchmark time 8866ms, 544 nodes, 61 nps, move e2e4
Benchmark time 9987ms, 689 nodes, 69 nps, move e2e4
Benchmark time 10001ms, 708 nodes, 71 nps, move e2e4
bestmove e2e4
Benchmark final time 10.7432s calculating 72.9766 nodes per second.
$
Code: Select all
NPS GPU (OpenCL) System OS
===================================================================================
10703 Nvidia GTX 1080 Desktop Win10
9150 Nvidia Tesla T4 Google Colab (*) Linux
8754 Nvidia GTX 1070 Desktop Win10
4829 Nvidia Tesla K80 Google Colab (*) Linux <-new
3986 Nvidia GTX 1050 Ti Laptop Win10
3579 Nvidia GTX 1050 Desktop, AMD FX-8350 Win10
2493 Nvidia GTX 750 Ti Desktop, AMD FX-8300 Win10
705 AMD Firepro M4000 HP EliteBook 8570w, i7-3740QM Win10
595 Intel 6100 MacBook Air 13" 2015, i5-5250U macOS 13.6
573 Intel HD 630 Laptop Win10
545 Nvidia GTX 460M Asus ROG G73S, i7-2630QM Win10
487 Intel HD 620 HP EliteBook 850 G4 Win10
437 Intel HD 520 Dell Latitude E5570, i5-6200U Win10
412 Intel HD 4400 Sony Vaio Ultrabook 13", i5-4200U Win10
353 Nvidia GT 650M MacBook Pro 15" 2012, i7-3615QM macOS 12.6
260 Intel HD 4000 Lenovo Thinkpad T430, i7-3520M Win10
155 Intel HD 505 Acer Spin 1, Pentium N4200 Win10
74 ATI Radeon HD 5430M Arctic MediaCenter MC001, Atom D525 Win10
73 Adreno 640 Smartphone LG G8s, SD855 Android 9
11 Intel HD Medion E1232T, Celeron N2807 Win10