ChessUSA.com TalkChess.com
Hosted by Your Move Chess & Games
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

uct on gpu
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions Flat
View previous topic :: View next topic  
Author Message
Vincent Diepeveen



Joined: 09 Mar 2006
Posts: 1738
Location: The Netherlands

PostPost subject: Re: uct for chess - move gen speedup by vector datatypes    Posted: Mon Mar 19, 2012 9:01 pm Reply to topic Reply with quote

Much depends upon which card you use of course,
as for the tesla's i have. Suppose i'd use 4 for this type of
SMP search and get chess to work.

The cards are 215 watt each. Add some psu overhead, but you can
get efficient psu's.

In terms of instructions you can execute and now i'm not assuming vectors at all, but just assuming a generic program with some units using hashtables (the full 6GB on each card) and other just using the last plies the local shared RAM (which is 64KB).

How fast is it possible to calculate then in theory?

Let's first estimate how many instructions per cycle we can push through.
Again this is not a gflop vector calculation, just for chess.

448 cores * 4 gpu's * 1.15Ghz = 2060 G instructions per second

Most chessprograms have an IPC at nehalem/i7 of around 1.5.
Diep is around 1.72 there currently, might get up a tad there.
That's in 32 bits of course. It's just over 1.5 as well at 64 bits.

Now getting a high IPC at todays gpu's is quite possible, but you'll have quite some overhead as in each node you willl need to do a full evaluation of course.

It's true that you can avoid that using APHID, but the algorithmic efficiency of aphid is so bad that it's not even worth trying it. You'll get to percentages like 1% or probably less at todays efficient deep searchers with APHID.

The mathematical problem of that is possible to show, but that'll go too far for a few messages here.

Now i assume evaluation is tiny and simple for now, as of course my plan was to write a HUGE evaluation function, which is quite possible in gpu's (sure there is a L1i problem then at each SIMD but that's another topic).

Let's look at potential first ok?

You need more overhead that you don't have at cpu's. Now for a tiny program you might be able to limit that overhead, i did do calculations on that and got to factor 5.

IPC then is around 80% with some effort. which makes it 0.8,
of course still 2x worse nearly than CPU's, which makes sense.

So effectively available for the searchspeed then is:

2060 GIPS * 0.8 / 5 = 329 GIPS

Again this is a very CONSERVATIVE estimate. It's really a lowerbound on what you can effectively throw into battle.

Now let's look at a sixcore i7.

3.46Ghz * 6 cores * IPC=1.5 = 31.14

So we're factor 10.6 faster then with the gpu's.
Again very conservative estimate. Probably a good programmer
will not lose factor 5, which is based upon having a huge eval and L1i problems at the GPU.

Sure there is 4 socket machines there, even 8 socket machines for intel, but they're $200k, so not so realistic to use.

Now making that 3 layer smp is complicated of course, but
there is however good news for the gpu's.

The last layer of SMP has something that no CPU has. Namely lightning speed fast communication to a shared local RAM.

You can't do this that fast at a CPU.

The cache runs at full speed you know. Sure it's tiny but you don't need too much anyway. So those 32 cores very effectively can search there at Nvidia.

AMD's cache is a tad slower and half the size and for 64 PE's, but same principle there as well; note i'm not sure how to get the same GIPS number from AMD, as each SIMD basically is 16 compute cores, and you need to use vectorized datatypes for the compute cores, which in computerchess is a bad idea. Works easier in Nvidia for the chess.

So at nvidia what i do in Diep : use a hashtable within qsearch, is peanuts at Nvidia to do in each SIMD/compute unit.

As we already did do a factor 5 reduction, which is a huge reduction. So this is not a 'good weather' calculation. Factor 5 penalty is *so huge*.

That means each node we can easily compare to CPU's. So we get to 1000 interval units then.

329 GIPS / 1000 = 329 million nps

That's not so difficult to get. Getting the SMP efficient is the only problem. Great task for me - but unpaid not so interesting for now.

Vincent

p.s. now i got sponsored by Nvidia of course, but even then it's easy to prove only Nvidia is the cards to get a chessprogram going; simply put you can run DIFFERENT instruction streams at each SIMD. At AMD that is not possible. So all cards (as far as you can steer enough at AMD) and all SIMD's basically execute the same instruction at the same time at AMD, despite 100 promises to change that - i don't see it happen any soon at AMD, even simple requests they don't manage there so far.

this 329 million nps is not a virtual number. Realize i reduced by factor 5, which is a HUGE overhead, if you get it more efficient you're over a billion nodes per second.

Also no need to razor then last few plies - as you have the shared hashtable there then, which at a cpu would be not possible - you can pick up nearly all tactics there - the ultimate tactical monster it will be not only outsearching everyone, also picking up tactics more - whereas todays houdini's are tactical not so strong when compared to Hiarcs and Diep.
Back to top
View user's profile Send private message Send e-mail Visit poster's website MSN Messenger
Display posts from previous:   
Subject Author Date/Time
uct on gpu Daniel Shawul Fri Feb 24, 2012 5:52 am
      Re: uct on gpu Srdja Matovic Fri Feb 24, 2012 8:17 am
      Re: uct on gpu Srdja Matovic Fri Feb 24, 2012 8:45 am
            Re: uct on gpu Daniel Shawul Fri Feb 24, 2012 1:00 pm
                  Re: uct on gpu Srdja Matovic Fri Feb 24, 2012 1:44 pm
                        Re: uct on gpu Daniel Shawul Fri Feb 24, 2012 2:28 pm
                              Re: uct on gpu Srdja Matovic Fri Feb 24, 2012 3:04 pm
                                    Re: uct on gpu Daniel Shawul Fri Feb 24, 2012 3:53 pm
                  Re: uct on gpu david nash Sun Feb 26, 2012 12:42 am
                        Re: uct on gpu Daniel Shawul Thu Mar 08, 2012 1:26 pm
      Re: uct on gpu Daniel Shawul Sat Feb 25, 2012 8:30 pm
      100x speed up Daniel Shawul Mon Feb 27, 2012 8:02 pm
            Re: 100x speed up Robert Hyatt Thu Mar 15, 2012 2:13 pm
                  Re: 100x speed up Daniel Shawul Thu Mar 15, 2012 3:24 pm
                        Re: 100x speed up Robert Hyatt Thu Mar 15, 2012 4:35 pm
                              Re: 100x speed up Daniel Shawul Thu Mar 15, 2012 5:11 pm
                                    Table Daniel Shawul Thu Mar 15, 2012 5:51 pm
                                    Re: 100x speed up Robert Hyatt Thu Mar 15, 2012 7:36 pm
                                          Re: 100x speed up Daniel Shawul Thu Mar 15, 2012 8:21 pm
      Re: uct on gpu Daniel Shawul Thu Mar 08, 2012 1:08 pm
      uct for chess Daniel Shawul Mon Mar 12, 2012 10:30 pm
            Re: uct for chess Karlo Bala Jr. Mon Mar 12, 2012 11:14 pm
                  Re: uct for chess Daniel Shawul Tue Mar 13, 2012 12:13 am
                        Re: uct for chess Karlo Bala Jr. Tue Mar 13, 2012 12:52 pm
            Re: uct for chess Srdja Matovic Tue Mar 13, 2012 8:08 pm
                  Re: uct for chess Daniel Shawul Tue Mar 13, 2012 9:43 pm
                        Re: uct for chess Daniel Shawul Wed Mar 14, 2012 2:21 am
                        Re: uct for chess Srdja Matovic Wed Mar 14, 2012 11:56 am
                              Re: uct for chess Daniel Shawul Wed Mar 14, 2012 12:46 pm
                                    Re: uct for chess Srdja Matovic Wed Mar 14, 2012 1:00 pm
                        Re: uct for chess - move gen speedup by vector datatypes Srdja Matovic Mon Mar 19, 2012 3:04 pm
                              Re: uct for chess - move gen speedup by vector datatypes Daniel Shawul Mon Mar 19, 2012 8:01 pm
                                    Re: uct for chess - move gen speedup by vector datatypes Vincent Diepeveen Mon Mar 19, 2012 8:43 pm
                                          Re: uct for chess - move gen speedup by vector datatypes Vincent Diepeveen Mon Mar 19, 2012 9:01 pm
                                                Re: uct for chess - move gen speedup by vector datatypes Daniel Shawul Mon Mar 19, 2012 10:01 pm
                                                      Re: uct for chess - move gen speedup by vector datatypes Vincent Diepeveen Tue Mar 20, 2012 12:59 am
                                                            Re: uct for chess - move gen speedup by vector datatypes Vincent Diepeveen Tue Mar 20, 2012 1:04 am
                                                            Re: uct for chess - move gen speedup by vector datatypes Daniel Shawul Tue Mar 20, 2012 2:40 am
                                                                  Re: uct for chess - move gen speedup by vector datatypes Vincent Diepeveen Tue Mar 20, 2012 1:07 pm
                                                                        Re: uct for chess - MCS, YBW and 32 bit move gen Srdja Matovic Tue Mar 20, 2012 2:37 pm
                                                                              Re: uct for chess - MCS, YBW and 32 bit move gen Vincent Diepeveen Wed Mar 21, 2012 4:39 pm
                                                                                    Re: uct for chess - MCS, YBW and 32 bit move gen Srdja Matovic Wed Mar 21, 2012 5:53 pm
                                                                        Re: uct for chess - move gen speedup by vector datatypes Daniel Shawul Tue Mar 20, 2012 3:18 pm
                                                                              Re: uct for chess - move gen speedup by vector datatypes Vincent Diepeveen Wed Mar 21, 2012 2:13 pm
                                                                                    Re: uct for chess - move gen speedup by vector datatypes Daniel Shawul Wed Mar 21, 2012 4:00 pm
                              Re: uct for chess - move gen speedup by vector datatypes Vincent Diepeveen Mon Mar 19, 2012 8:33 pm
                                    Re: uct for chess - move gen speedup by vector datatypes Srdja Matovic Mon Mar 19, 2012 9:30 pm
                                          Re: uct for chess - move gen speedup by vector datatypes Vincent Diepeveen Tue Mar 20, 2012 12:54 am
                                          Re: uct for chess - move gen speedup by vector datatypes Vincent Diepeveen Tue Mar 20, 2012 12:45 pm
                              Re: uct for chess - move gen speedup by vector datatypes Srdja Matovic Tue Mar 20, 2012 2:38 am
                                    Re: uct for chess - move gen speedup by vector datatypes Vincent Diepeveen Tue Mar 20, 2012 1:13 pm
                                          Re: uct for chess - move gen speedup by vector datatypes Srdja Matovic Tue Mar 20, 2012 1:43 pm
                                                Re: uct for chess - move gen performance killers Srdja Matovic Tue Mar 20, 2012 4:45 pm
            intrinsic popcnt Daniel Shawul Wed Mar 14, 2012 5:21 am
                  Re: intrinsic popcnt Daniel Shawul Wed Mar 14, 2012 5:50 am
                        Re: intrinsic popcnt Robert Hyatt Thu Mar 15, 2012 5:12 pm
      Re: uct on gpu Vincent Diepeveen Thu Mar 15, 2012 8:14 pm
            Re: uct on gpu Daniel Shawul Thu Mar 15, 2012 8:27 pm
                  Re: uct on gpu Vincent Diepeveen Sat Mar 17, 2012 1:17 pm
Post new topic    TalkChess.com Forum Index -> Computer Chess Club: Programming and Technical Discussions

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




Powered by phpBB © 2001, 2005 phpBB Group
Enhanced with Moby Threads