Using the GPU

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Using the GPU

Post by zullil »

I hope this question is not a silly one.

Does anyone see potential in using the GPU in addition to the CPU cores to strengthen existing chess engines?

http://www.nvidia.com/object/cuda_opencl_new.html

Apparently OpenCL is already available in Apple's current OS.

http://www.apple.com/macosx/technology/
John Major
Posts: 27
Joined: Fri Dec 11, 2009 10:23 pm

Re: Using the GPU

Post by John Major »

zullil wrote:I hope this question is not a silly one.

Does anyone see potential in using the GPU in addition to the CPU cores to strengthen existing chess engines?

http://www.nvidia.com/object/cuda_opencl_new.html

Apparently OpenCL is already available in Apple's current OS.

http://www.apple.com/macosx/technology/
Problem is that each thread core must execute exactly the same commands or you get a stall like situation.

Not to mention memory access problems.

Buy nature it's suited to matrix like calculations, not so much branch intensive code like chess.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Using the GPU

Post by zullil »

John Major wrote:
zullil wrote:I hope this question is not a silly one.

Does anyone see potential in using the GPU in addition to the CPU cores to strengthen existing chess engines?

http://www.nvidia.com/object/cuda_opencl_new.html

Apparently OpenCL is already available in Apple's current OS.

http://www.apple.com/macosx/technology/
Problem is that each thread core must execute exactly the same commands or you get a stall like situation.

Not to mention memory access problems.

By nature it's suited to matrix like calculations, not so much branch intensive code like chess.
Thanks. So the vector processing power of the GPU simply isn't of much value for things like move generation, search and evaluation.
humble_programmer

Re: Using the GPU

Post by humble_programmer »

You are not the first person to wonder about this: http://blog.cudachess.org/.
John Major
Posts: 27
Joined: Fri Dec 11, 2009 10:23 pm

Re: Using the GPU

Post by John Major »

zullil wrote:
John Major wrote:
zullil wrote:I hope this question is not a silly one.

Does anyone see potential in using the GPU in addition to the CPU cores to strengthen existing chess engines?

http://www.nvidia.com/object/cuda_opencl_new.html

Apparently OpenCL is already available in Apple's current OS.

http://www.apple.com/macosx/technology/
Problem is that each thread core must execute exactly the same commands or you get a stall like situation.

Not to mention memory access problems.

By nature it's suited to matrix like calculations, not so much branch intensive code like chess.
Thanks. So the vector processing power of the GPU simply isn't of much value for things like move generation, search and evaluation.
No, I don't think so. I have an ION board which is CUDA capable so I also thought about writing a chess engine for it. But after delving into it I concluded it wasn't worth it.
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: Using the GPU

Post by jwes »

zullil wrote:
John Major wrote:
zullil wrote:I hope this question is not a silly one.

Does anyone see potential in using the GPU in addition to the CPU cores to strengthen existing chess engines?

http://www.nvidia.com/object/cuda_opencl_new.html

Apparently OpenCL is already available in Apple's current OS.

http://www.apple.com/macosx/technology/
Problem is that each thread core must execute exactly the same commands or you get a stall like situation.

Not to mention memory access problems.

By nature it's suited to matrix like calculations, not so much branch intensive code like chess.
Thanks. So the vector processing power of the GPU simply isn't of much value for things like move generation, search and evaluation.
The problem is that the architecture is not suitable for search and latency between the cpu and gpu add too much overhead to use it per node for evaluation and move generation. I hope that new generations of processors with the gpu and cpu on the same chip will solve the latency problem.
vladstamate
Posts: 161
Joined: Thu Jan 08, 2009 9:06 pm
Location: San Francisco, USA

Re: Using the GPU

Post by vladstamate »

latency between the cpu and gpu add too much overhead to use it per node for evaluation and move generation.
That is true. "Branchy" code is definitely a no-no on the GPGPU architectures. Since it is designed to run pixels really and it runs them many in parallel, same way it would run your threads and it means it will have to run all the threads in lockstep, so you lose a lot of cycles when one thread branches one way and one another way.

The same kind of problem I ran into when I try to port some of Plisk to the SPEs inside the Cell processor. The SPEs are extremely fast (with very fast memory too) but the limited size (256K) that has to be shared by both data and code meant I could not put in any of my attack arrays, hash tables, etc and what not to perform a full search. That and the fact that you could not really use them just for evaluation or move generation since the overhead of calling them is far larger than the execution of the code.
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Using the GPU

Post by hgm »

256K? That is huge! My first Chess program ran on a machine with only 2KB of memory (program + data!). With 256K you can even afford a useful hash table. (E.g. 24K entries. That would be enough to store someting like a 4-ply perft tree, so with alpha-beta and a good move ordering it could hold a tree of about 7 ply.)
vb4
Posts: 165
Joined: Sat Mar 11, 2006 5:45 am
Location: NY

Re: Using the GPU

Post by vb4 »

This was posted not to long ago. I actually have a company who would love to get involved in generating the entire 7 piece set of EGTB's using some big time hardware using GPU's but the few people who commented didnt indicate this was possible without some serious thought. Any of you have any thoughts about this?

Les

Les Fernandez has compute resources
by dann corbit » Fri Nov 06, 2009 1:57 pm

I don't know if it is of any value, but Les Fernandez has a vendor who will lend time on a big array of GPU cards for computation of EGTBs.

I have no idea how difficult it would be to port EBTB systems to GPU cards (for instance, they can't do recursion if I recall correctly) but it might be worth taking a look at for you EGTB programming experts.

See the CCC programming forum post by Les Fernandez:
http://www.talkchess.com/forum/viewtopi ... 00&t=30404dann corbit

Posts: 10
Joined: Fri Jan 25, 2008 6:10 pm
John Major
Posts: 27
Joined: Fri Dec 11, 2009 10:23 pm

Re: Using the GPU

Post by John Major »

jwes wrote: The problem is that the architecture is not suitable for search and latency between the cpu and gpu add too much overhead to use it per node for evaluation and move generation. I hope that new generations of processors with the gpu and cpu on the same chip will solve the latency problem.
Also the intra GPU latency per thread is very high. Hundreds of threads must be kept busy to hide it.
So maybe a new type of chess engine will be feasible, with an extremely large evaluation.