| View previous topic :: View next topic |
| Author |
Message |
Vincent Diepeveen
Joined: 09 Mar 2006 Posts: 1738 Location: The Netherlands
|
Post subject: Re: uct for chess - MCS, YBW and 32 bit move gen Posted: Wed Mar 21, 2012 4:39 pm |
|
|
Srdja,
Diep already since 2002 implements YBW in a total SMP way, there is no master and there are no slaves. Everywhere it can split, in fact it can split simultaneously at the ATOMIC same time at several spots in the tree.
You can prove of course that otherwise it wouldn't run on a 512 processor supercomputer very well if you do master-slave relationships inside the search.
In a well implemented YBW search the search is only for a very short time frame at just 1 core until you divide it over the rest, after that the same thing happens in parallel everywhere. If you do *that* also as a master slave, like Crafty/SF and most YBW implementations do, that won't work well at a supercomputer of course.
Crafty avoids some problems for more than 4 cores there by splitting very little. Bad for speedup however. In diep i quickly split and happily keep doing that.
You can also mathematically prove that the concept of YBW, not necessarily to the full implemented, but at least the starting concept, is a manner to preserve the branching factor *a little* (not fully) and so far no other algorithm it's easy to prove that for and no other algorithm so far exists that's doing exactly that.
What we do know however is that, take my cluster as an example, aborting cpu's there takes microseconds as a minimum. The switching inside a GPU goes way faster than that. You can abort it in nanoseconds in theory.
In practice of course you need to do that very little at cpu's, so effectively it's milliseconds at each cpu, yet it can be microseconds at GPU's.
At gpu you just accept the overhead of cores that searched without having a job and each X nodes redivide search tree within 1 SIMD.
Another major advantage at gpu's, is that it's nonstop alternating each few cycles different threads, so you can do at gpu's a few things you can't do at CPU's which avoid some of the damage done by running that many threads
CPU's really suffer everywhere from the runqueue latency which fires each 10 milliseconds. In reality it's slower than that of course.
GPU's do not have this problem at all. Within 1 SIMD in fact you have complete deterministic search in fact. That's a HUGE advantage when debugging for performance and bugs.
You shouldn't ask yourself at a GPU: "can Monte Carlo with UCT play chess a tad?" as it will be 800 elo worse of course (i picked an arbitrary elo difference that it's worse. Could be 1000, could be 500, it's too much in short). The question is simply: "How do i get YBW to work in a fast manner without too much of an overhead?"
Same thing for Go of course - UCT there is a joke as well of course. Just because they don't know how to search there and some of the 'top engines' at the time were forward pruning even in the root, the UCT type engines won suddenly.
UCT is overrated in that sense. UCT is the most trivial form of selective search that isn't brute force, yet requires huge overhead.
At GPU's already in advance going for something total inferior is not a rather good idea.
My advice at the gpu's would be: whatever the hell you do on it, even if it is a tiny thing, try to get good performance out of that tiny thing.
The fact that you can very quickly run different kernels there is a huge advantage, which is total impossible at CPU's. Use that.
By accepting overhead you reduce the inefficiency.
But now i really have posted enough on this subject.
GPU programming is not for beginners, that's the whole problem.
| smatovic wrote: |
@Vincent:
Daniel is right with his MCS approach, it performs best on GPUs. If a MCS solution can play good chess is another question.
YBW
You really dont want to implement Master/Slave relations on a GPU, synching threads is a performance killer.
32 Bit Move Generator
Con: board presentation needs more memory,
remind how many registers / thread you have
Con: the solutions i know need nested loops, not good for GPUs,
Pro: GPUs are 32 bit devices
Looking forward to see your 32 bit GPU move generator solution.
--
Srdja |
|
|
| Back to top |
|
 |
|
| Subject |
Author |
Date/Time |
uct on gpu |
Daniel Shawul |
Fri Feb 24, 2012 5:52 am |
Re: uct on gpu |
Srdja Matovic |
Fri Feb 24, 2012 8:17 am |
Re: uct on gpu |
Srdja Matovic |
Fri Feb 24, 2012 8:45 am |
Re: uct on gpu |
Daniel Shawul |
Fri Feb 24, 2012 1:00 pm |
Re: uct on gpu |
Srdja Matovic |
Fri Feb 24, 2012 1:44 pm |
Re: uct on gpu |
Daniel Shawul |
Fri Feb 24, 2012 2:28 pm |
Re: uct on gpu |
Srdja Matovic |
Fri Feb 24, 2012 3:04 pm |
Re: uct on gpu |
Daniel Shawul |
Fri Feb 24, 2012 3:53 pm |
Re: uct on gpu |
david nash |
Sun Feb 26, 2012 12:42 am |
Re: uct on gpu |
Daniel Shawul |
Thu Mar 08, 2012 1:26 pm |
Re: uct on gpu |
Daniel Shawul |
Sat Feb 25, 2012 8:30 pm |
100x speed up |
Daniel Shawul |
Mon Feb 27, 2012 8:02 pm |
Re: 100x speed up |
Robert Hyatt |
Thu Mar 15, 2012 2:13 pm |
Re: 100x speed up |
Daniel Shawul |
Thu Mar 15, 2012 3:24 pm |
Re: 100x speed up |
Robert Hyatt |
Thu Mar 15, 2012 4:35 pm |
Re: 100x speed up |
Daniel Shawul |
Thu Mar 15, 2012 5:11 pm |
Table |
Daniel Shawul |
Thu Mar 15, 2012 5:51 pm |
Re: 100x speed up |
Robert Hyatt |
Thu Mar 15, 2012 7:36 pm |
Re: 100x speed up |
Daniel Shawul |
Thu Mar 15, 2012 8:21 pm |
Re: uct on gpu |
Daniel Shawul |
Thu Mar 08, 2012 1:08 pm |
uct for chess |
Daniel Shawul |
Mon Mar 12, 2012 10:30 pm |
Re: uct for chess |
Karlo Bala Jr. |
Mon Mar 12, 2012 11:14 pm |
Re: uct for chess |
Daniel Shawul |
Tue Mar 13, 2012 12:13 am |
Re: uct for chess |
Karlo Bala Jr. |
Tue Mar 13, 2012 12:52 pm |
Re: uct for chess |
Srdja Matovic |
Tue Mar 13, 2012 8:08 pm |
Re: uct for chess |
Daniel Shawul |
Tue Mar 13, 2012 9:43 pm |
Re: uct for chess |
Daniel Shawul |
Wed Mar 14, 2012 2:21 am |
Re: uct for chess |
Srdja Matovic |
Wed Mar 14, 2012 11:56 am |
Re: uct for chess |
Daniel Shawul |
Wed Mar 14, 2012 12:46 pm |
Re: uct for chess |
Srdja Matovic |
Wed Mar 14, 2012 1:00 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Srdja Matovic |
Mon Mar 19, 2012 3:04 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Mon Mar 19, 2012 8:01 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Mon Mar 19, 2012 8:43 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Mon Mar 19, 2012 9:01 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Mon Mar 19, 2012 10:01 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 12:59 am |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 1:04 am |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Tue Mar 20, 2012 2:40 am |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 1:07 pm |
Re: uct for chess - MCS, YBW and 32 bit move gen |
Srdja Matovic |
Tue Mar 20, 2012 2:37 pm |
Re: uct for chess - MCS, YBW and 32 bit move gen |
Vincent Diepeveen |
Wed Mar 21, 2012 4:39 pm |
Re: uct for chess - MCS, YBW and 32 bit move gen |
Srdja Matovic |
Wed Mar 21, 2012 5:53 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Tue Mar 20, 2012 3:18 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Wed Mar 21, 2012 2:13 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Daniel Shawul |
Wed Mar 21, 2012 4:00 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Mon Mar 19, 2012 8:33 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Srdja Matovic |
Mon Mar 19, 2012 9:30 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 12:54 am |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 12:45 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Srdja Matovic |
Tue Mar 20, 2012 2:38 am |
Re: uct for chess - move gen speedup by vector datatypes |
Vincent Diepeveen |
Tue Mar 20, 2012 1:13 pm |
Re: uct for chess - move gen speedup by vector datatypes |
Srdja Matovic |
Tue Mar 20, 2012 1:43 pm |
Re: uct for chess - move gen performance killers |
Srdja Matovic |
Tue Mar 20, 2012 4:45 pm |
intrinsic popcnt |
Daniel Shawul |
Wed Mar 14, 2012 5:21 am |
Re: intrinsic popcnt |
Daniel Shawul |
Wed Mar 14, 2012 5:50 am |
Re: intrinsic popcnt |
Robert Hyatt |
Thu Mar 15, 2012 5:12 pm |
Re: uct on gpu |
Vincent Diepeveen |
Thu Mar 15, 2012 8:14 pm |
Re: uct on gpu |
Daniel Shawul |
Thu Mar 15, 2012 8:27 pm |
Re: uct on gpu |
Vincent Diepeveen |
Sat Mar 17, 2012 1:17 pm |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|