To anyone who is interested in working with this, I recommend making a GPU application which can calculate the perft of a chess position. No eval or alpha-beta needed.
If you can't do that with good performance, it's probably time to forget making a chess engine for a GPU.
GPUs better for chess than CPUs?
Moderators: hgm, Rebel, chrisw
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: GPUs better for chess than CPUs?
I will definately try and take my dose of failure . But not with alpha-beta though. I think that it is difficult to finish the job on gpu with one kernel call and return a move. Hash tables, pawn tt, EGTBs etc can't be used unless you consult CPU... Also all non depth-first searches consume a lot of memory to be feasible on GPUs. Only feasible option I see is plain MC at the root only. Or to improve upon that I could build a shallow depth tree (say d=4) at startup and start simulations from the leaves... This is more or less what is done in APHID algorithm used for cluster computing, and it can give it a good tactical awareness...
I will try it on a simple game like Hex or Havannah (or some other game that I create for ease of implementation). Move generation would be easier for 'place a stone' games such as Go. But Go has captures so Hex is better. Also i am not sure if a good PRNG is available for GPUs yet. But anyway this is too much talk before typing one line of code.
I will try it on a simple game like Hex or Havannah (or some other game that I create for ease of implementation). Move generation would be easier for 'place a stone' games such as Go. But Go has captures so Hex is better. Also i am not sure if a good PRNG is available for GPUs yet. But anyway this is too much talk before typing one line of code.
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: GPUs better for chess than CPUs?
Hello Srdja
Sad to hear you stopped working on it. What was your major difficulty ? Can you elaborate on your experience a bit ? I read that you have used it for move generation but I do not quite get what you have tried regarding search..
I don't think recursion is a problem because alpha-beta can be done iteratively. I do not think YBW or similar schemes are feasible since you have to sync() somewhere, even if we assume there was a "sub threading" mechanism in gpus.
regards,
Sad to hear you stopped working on it. What was your major difficulty ? Can you elaborate on your experience a bit ? I read that you have used it for move generation but I do not quite get what you have tried regarding search..
I don't think recursion is a problem because alpha-beta can be done iteratively. I do not think YBW or similar schemes are feasible since you have to sync() somewhere, even if we assume there was a "sub threading" mechanism in gpus.
regards,
-
- Posts: 12541
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: GPUs better for chess than CPUs?
Since the GPUs have memory, perhaps each GPU can have its own small local hash table. That should be fine if they only work on their own subset of the problem.Daniel Shawul wrote:I will definately try and take my dose of failure . But not with alpha-beta though. I think that it is difficult to finish the job on gpu with one kernel call and return a move. Hash tables, pawn tt, EGTBs etc can't be used unless you consult CPU... Also all non depth-first searches consume a lot of memory to be feasible on GPUs. Only feasible option I see is plain MC at the root only. Or to improve upon that I could build a shallow depth tree (say d=4) at startup and start simulations from the leaves... This is more or less what is done in APHID algorithm used for cluster computing, and it can give it a good tactical awareness...
I will try it on a simple game like Hex or Havannah (or some other game that I create for ease of implementation). Move generation would be easier for 'place a stone' games such as Go. But Go has captures so Hex is better. Also i am not sure if a good PRNG is available for GPUs yet. But anyway this is too much talk before typing one line of code.
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: GPUs better for chess than CPUs?
Hello Srdja
Sad to hear you stopped working on it. What was your major difficulty ? Can you elaborate on your experience a bit ? I read that you have used it for move generation but I do not quite get what you have tried regarding search..
I don't think recursion is a problem because alpha-beta can be done iteratively. I do not think YBW or similar schemes are feasible since you have to sync() somewhere, even if we assume there was a "sub threading" mechanism in gpus.
1) Memory
GPUs have a lot of RAM but only a few registers for use. The more threads you run the less registers you have, but you have to run a lot of threads to hide GPU internal latencies.
I calculate with 32 to 512 bytes by thread. The moves and the state of the non-recursive alpha-beta has to be stored in slow global memory(RAM).
2) 32 bit / 64 bit
My last idea was to use a Quad-Bitboard presentation (32 Bytes) and a modified Kogge-Stone Move Generator for all Pieces.
Means less memory and a SIMD friendly Movegenerator, but i will loose some computation power of the GPU which is 32 bit native.
3) Search algorithm
Thought on YBW, but have no idea how to implement the communication between Threads.
Maybe somekind of a clever stack-meachnism could do the job instead of a complex-work-delivery process.
--
Srdja
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: GPUs better for chess than CPUs?
Yes possible even for medium size egbbs I think. But I was under the impression that global memory latency is very high on the gpu especially with strided access and no caches. This approach also exculdes the CPU from getting involved in the search. I think that even the regular move list and score needs to be stored on RAM so a contigious block of moves should be fetched at once for execution by a warp. My plan is for a game with simplistic board representation that can avoid this by storing the board in shared mem, and moves can be generated as needed on the fly(no storage). Then each thread executes a random move on its board (say 32 times) and evaluate the result (which can introduce some branching).
-
- Posts: 4185
- Joined: Tue Mar 14, 2006 11:34 am
- Location: Ethiopia
Re: GPUs better for chess than CPUs?
I don't think that is possible. You can only synchronize threads in the same block. HereThought on YBW, but have no idea how to implement the communication between Threads.
Maybe somekind of a clever stack-meachnism could do the job instead of a complex-work-delivery process.
Classical parallel programming techniques can't be used here. Threads can't spawn more threads; threads on one multiprocessor can't send results to threads on another multiprocessor; there's no facility for a critical section among all the threads across the whole system. Trying to use a Pthreads or OpenMP programming model will lead to pain, frustration, and failure.
-
- Posts: 2658
- Joined: Wed Mar 10, 2010 10:18 pm
- Location: Hamburg, Germany
- Full name: Srdja Matovic
Re: GPUs better for chess than CPUs?
to pain, frustration, and failure.
As an alternative to YBW i thought on using a depth*128x128 board array.
If you run 128 threads and assume 128 max possible moves you get an array of
boards[max_searchdepth][128 Threads][128 possible boards]
With such an array you could move up and down in blocks of 128 threads through the search tree, it should behave like a non-recursive alphabeta, but with 128 boards in parallel.
When you got 128 Threads running you could extent the search with x * 128 Threads.
--
Srdja
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: GPUs better for chess than CPUs?
Alpha/beta is, _by definition_, serial. You have to establish the bounds by searching the first move at a node _before_ you can search the others using that bound to reduce required effort...parrish wrote:A search tree is, by it's nature, parallel, and not serial.hgm wrote:Because Chess algorithms are by nature serial rather than parallel.
-
- Posts: 44
- Joined: Wed Apr 13, 2011 12:43 pm
Re: GPUs better for chess than CPUs?
Well I think it's a mix, it's not 100% serial and it's not 100% parallel.bob wrote:Alpha/beta is, _by definition_, serial. You have to establish the bounds by searching the first move at a node _before_ you can search the others using that bound to reduce required effort...parrish wrote:A search tree is, by it's nature, parallel, and not serial.hgm wrote:Because Chess algorithms are by nature serial rather than parallel.
On a high level it is serial, but significant parts of the tree can be searched in parallel (depending on the node type), without doing more work than a serial search.