Pigeon is now running on the GPU

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
StuartRiffle
Posts: 25
Joined: Tue Apr 05, 2016 7:34 pm
Location: Canada

Pigeon is now running on the GPU

Post by StuartRiffle » Wed Nov 02, 2016 5:38 am

Parallel search is working under CUDA in the dev branch! I am still fixing bugs on the CPU side. I just wanted to share because it's a big milestone. :)

Benchmarks later this week!

Code: Select all

     /O_    Pigeon 1.6.0 (UCI)
     ||     SSE4/POPCNT/CUDA
    / \\
  =/__//    pigeonengine.com
     ^^

uci
id name Pigeon 1.6.0
id author Stuart Riffle
option name Clear Hash type button
option name Hash type spin min 4 max 8192 default 512
option name OwnBook type check default true
option name Threads type spin default 1 min 1 max 24
option name Early Move type check default true
option name SIMD type check default true
option name POPCNT type check default true
option name CUDA type check default true
option name GPU Hash type spin min 4 max 8192 default 512
option name GPU Batch Size type spin min 32 max 8192 default 1024
option name GPU Batch Count type spin min 4 max 1024 default 32
option name GPU Plies type spin min 0 max 8 default 2
uciok
isready
info string CUDA 0: GeForce GTX 660 (CC 3.0, 960 cores, 1084 mHz, 2048 MB)
readyok
-Stuart
(Pigeon)

smatovic
Posts: 479
Joined: Wed Mar 10, 2010 9:18 pm
Location: Germany
Contact:

Re: Pigeon is now running on the GPU

Post by smatovic » Wed Nov 02, 2016 10:27 am

Kudos.

May i ask why you chose CUDA over OpenCL?

...if you are in need of an sparring partner:

http://zeta-chess.app26.de/page-1.html#Zeta-098e

--
Srdja

User avatar
cdani
Posts: 2047
Joined: Sat Jan 18, 2014 9:24 am
Location: Andorra
Contact:

Re: Pigeon is now running on the GPU

Post by cdani » Wed Nov 02, 2016 2:16 pm

Nice!!! We wait for your details about this achievement :-)

StuartRiffle
Posts: 25
Joined: Tue Apr 05, 2016 7:34 pm
Location: Canada

Re: Pigeon is now running on the GPU

Post by StuartRiffle » Wed Nov 02, 2016 2:54 pm

Thanks Srdja,

I'm using CUDA because Pigeon uses heavily templated C++, and the same code is compiled for the scalar, SIMD, and GPU paths. As far as I can tell, using templates would require vendor-specific extensions until OpenCL 2.1 is ready.

I expect to support OpenCL eventually, but for now CUDA just works. :)

I will definitely use Zeta for testing!
-Stuart
(Pigeon)

matthewlai
Posts: 736
Joined: Sun Aug 03, 2014 2:48 am
Location: London, UK
Contact:

Re: Pigeon is now running on the GPU

Post by matthewlai » Wed Nov 02, 2016 5:18 pm

StuartRiffle wrote:Parallel search is working under CUDA in the dev branch! I am still fixing bugs on the CPU side. I just wanted to share because it's a big milestone. :)

Benchmarks later this week!

Code: Select all

     /O_    Pigeon 1.6.0 (UCI)
     ||     SSE4/POPCNT/CUDA
    / \\
  =/__//    pigeonengine.com
     ^^

uci
id name Pigeon 1.6.0
id author Stuart Riffle
option name Clear Hash type button
option name Hash type spin min 4 max 8192 default 512
option name OwnBook type check default true
option name Threads type spin default 1 min 1 max 24
option name Early Move type check default true
option name SIMD type check default true
option name POPCNT type check default true
option name CUDA type check default true
option name GPU Hash type spin min 4 max 8192 default 512
option name GPU Batch Size type spin min 32 max 8192 default 1024
option name GPU Batch Count type spin min 4 max 1024 default 32
option name GPU Plies type spin min 0 max 8 default 2
uciok
isready
info string CUDA 0: GeForce GTX 660 (CC 3.0, 960 cores, 1084 mHz, 2048 MB)
readyok
Do you actually get a speedup from CUDA?

I would imagine all the branching in minimax will make a naive implementation very slow on CUDA.
Author of Giraffe, an engine based on deep reinforcement learning. https://bitbucket.org/waterreaction/giraffe/overview

brianr
Posts: 248
Joined: Thu Mar 09, 2006 2:01 pm

Re: Pigeon is now running on the GPU

Post by brianr » Thu Nov 03, 2016 6:41 pm

Thank you for sharing.
Got it to compile on my development system and was just wondering what GPU xxx options you might suggest for the following this graphics card:

Code: Select all

info string CUDA 0: GeForce GTX 770 (CC 3.0, 1536 cores, 1163 mHz, 2048 MB)

StuartRiffle
Posts: 25
Joined: Tue Apr 05, 2016 7:34 pm
Location: Canada

Re: Pigeon is now running on the GPU

Post by StuartRiffle » Thu Nov 03, 2016 8:20 pm

Egads!

It's cool that you got a build, but the dev branch will probably give you crazy output because I'm still working on the CPU-side code to gather the CUDA results as they return and fold them into the search.

(But if you do a pull you will get some more bugfixes, FWIW. The CUDA part does appear to be doing batches of searches correctly).

I expect to have the CPU part working this afternoon. I'll let you know when those changes are checked in.
-Stuart
(Pigeon)

StuartRiffle
Posts: 25
Joined: Tue Apr 05, 2016 7:34 pm
Location: Canada

Re: Pigeon is now running on the GPU

Post by StuartRiffle » Fri Nov 04, 2016 4:58 pm

matthewlai wrote:Do you actually get a speedup from CUDA?

I would imagine all the branching in minimax will make a naive implementation very slow on CUDA.
Oh, I'm nowhere near getting a speedup yet. I am doing horrible things to this poor GPU. At this stage I'm still pleasantly surprised it ticks over at all.

Current problems include:
- Warp occupancy under 5% (!!)
- Spilling registers left and right
- Heavy memory traffic in general (though L1 hit rate ~70%)
- 64-bit integer operations on current hardware are emulated with 32-bit registers (!!). This also blows up the code size, which is unfortunate, because the code is pretty big to start with.

The branch divergence could honestly be worse. I converted the negamax to an iterative implementation, and set things up so that after a thread finishes a search, it can start up another one and fall back into line with the rest of the warp.

There is still a lot of room for improvement though. :/
-Stuart
(Pigeon)

StuartRiffle
Posts: 25
Joined: Tue Apr 05, 2016 7:34 pm
Location: Canada

Re: Pigeon is now running on the GPU

Post by StuartRiffle » Sun Nov 06, 2016 2:51 am

Just a quick update.

The highest throughput I've been able to achieve with the current code on a GTX 660 is about 1 million nodes per second so far. Which is slower than a CPU, so... not compelling yet.

On the bright side, the code is utilizing the GPU very poorly, so there's a lot of room for improvement. :) I'm working on it.
-Stuart
(Pigeon)

ankan
Posts: 52
Joined: Sun Apr 21, 2013 1:29 pm
Location: Pune, India
Contact:

Re: Pigeon is now running on the GPU

Post by ankan » Sun Nov 06, 2016 5:10 am

what search algorithm are you using?
You mentioned negaMax - is that with or without alpha-beta pruning?

Post Reply