How do you build a chess cluster?

Uri · Post by **Uri** » Fri May 15, 2009 7:20 pm

M ANSARI wrote:I think Vas has done his cluster using something of both systems described here, but am not sure how DS is doing it. I have looked at early Rybka Cluster games when it was using Rybka 3 code, and I was able to reproduce every single move (including some of the stunning strong moves) by manually doing what I described above. I have looked at the latest Rybka Cluster games and this time I was not able to reproduce all the moves. My guess would be that Rybka Cluster code has become more efficient and is using some search optimizations of shared memory ... or that the new Rybka engine code itself has changed (or improved).

Thanks for the explanation Ansari. I didn't understand much about what you explained. This is like speaking Chinese to me. I won't understand much. And this doesn't really answer my question on how you can connect between 5 computers over a network to get more Kilonodes per second.

This is why I realized a long time ago that computer science is not my thing. I'm simply not good at it.

Zach Wegner wrote:The master then picks the move with the best score.

But how does the master know which score is actually the best one?

Dr.Wael Deeb · Post by **Dr.Wael Deeb** » Fri May 15, 2009 10:45 pm

Uri wrote:
M ANSARI wrote:I think Vas has done his cluster using something of both systems described here, but am not sure how DS is doing it. I have looked at early Rybka Cluster games when it was using Rybka 3 code, and I was able to reproduce every single move (including some of the stunning strong moves) by manually doing what I described above. I have looked at the latest Rybka Cluster games and this time I was not able to reproduce all the moves. My guess would be that Rybka Cluster code has become more efficient and is using some search optimizations of shared memory ... or that the new Rybka engine code itself has changed (or improved).
Thanks for the explanation Ansari. I didn't understand much about what you explained. This is like speaking Chinese to me. I won't understand much. And this doesn't really answer my question on how you can connect between 5 computers over a network to get more Kilonodes per second.

This is why I realized a long time ago that computer science is not my thing. I'm simply not good at it.

Zach Wegner wrote:The master then picks the move with the best score.
But how does the master know which score is actually the best one?

The best score will have the biggest positive value....simple as that

Dr.D

Uri · Post by **Uri** » Sat May 16, 2009 1:56 am

Gian-Carlo Pascutto wrote:The master tries to detect when, during it's own search, it's in a position where there is a large amount of work to do. It cuts the position into pieces (so to speak), and sends them via the network to the clients. Meanwhile it searches some pieces itself. When the clients are done, they send results back to the master, which collects them until all work for this position is done.

But what if you have 5 different computers which are not of equal processing strength like 2 old computers with only 1 core and 3 newer computers which are dual core, quad-core and and 6-core respectively?

Let's say that the position has 48 possible moves and the two single-core computers evaluate 10 possible moves each and the stronger three evaluate the remaining 38. The dual-core evaluates the 13 remaining, the quad-core evaluates the other 13 remaining and the 6-core evaluates the 12 last remaining moves. In such a case since the individual computers are not of equal strength how can you know if the evaluation is not polluted by the weaker computers?

Zach Wegner · Post by **Zach Wegner** » Sat May 16, 2009 2:37 am

Uri wrote:But what if you have 5 different computers which are not of equal processing strength like 2 old computers with only 1 core and 3 newer computers which are dual core, quad-core and and 6-core respectively?

Let's say that the position has 48 possible moves and the two single-core computers evaluate 10 possible moves each and the stronger three evaluate the remaining 38. The dual-core evaluates the 13 remaining, the quad-core evaluates the other 13 remaining and the 6-core evaluates the 12 last remaining moves. In such a case since the individual computers are not of equal strength how can you know if the evaluation is not polluted by the weaker computers?

That's more like Rybka's algorithm. The kind of cluster searching we are talking about here doesn't only divide up moves at the root position. As a simple example, the master computer would be searching 1. e4 e5 2. Nf3, and then whichever computers are idle would take a move there. They search there until all moves are completed to the same depth, and then the master computer starts searching something else. If a slow computer is taking to long on a move, the other computers will finish their moves and help the slow ones finish theirs by looking at a part of their subtree.

Ideally the cluster algorithm would have some sort of load balancing, where slow computers get easy moves with short lines, and then the faster ones get moves with deeper lines. This is pretty advanced though, I'm not sure any existing cluster algorithm does anything like it.

diep · Post by **diep** » Sat May 16, 2009 3:17 am

M ANSARI wrote:
bob wrote:
Zach Wegner wrote:
bob wrote:A future version of Crafty (hopefully this year) will also do so.
How's this going, by the way? Got anything running yet?
No. But got lots of the design and some of the coding done... Every time I get started on details, I discover a better approach to do something and take about 10 steps back for every step forward.
If Crafty does come out with a cluster version, will it be engine specific to Crafty only or will other engines be able to piggy back on the platform?

Forget about crafty, rybka or sjeng getting any significant positive speedup during gameplay from a 100 mbit or 1 gigabit standard non-dma ethernet card using old machines.

Getting to 2.0 out of n nodes is already really tough from such a cluster and you'll realize that some old junk hardware is more than factor 2 slower than the latest intel or AMD or even Sun cpu.

If you buy a bunch of cheap Q6600's now and expect such a cluster with standard components to be faster than a 8 core sun cpu in 2010, that's wishful thinking.

Some people have been bragging too much about clusters.

What i've got at home is a QM400 switch (16 way), 17 network cards and 17 cables. It requires special PCI-X capable mainboards.

If you'd buy a new network existing out of newer pci-e network cards, you'll get cards that are probably 8 times faster in bandwidth than this QM400.

Yet that has a node price of around $1000 a node, which still is cheap. For large clusters node price usuallygoes up to $3300 a node.

that's why i bought this second hand. getting it to work with modern cpu's is not so easy.

I could buy recently second hand, but didn't that of course, old junk dual xeon 3.06Ghz machines. All equipped with pci-x.

Had i bought 16 of 'em, each for $150, that would have been in total 8 x 300 = 2400 dollar.

Now i don't have a penny income at all, or i might have been tempted to do it.

Of course it is a big waste of power, but it is a relative cheap manner to already be able to experiment with a lot of nodes and a lot of cores at a cluster. Additional to that, the hardware for sure works with this network.

A single such box achieves a speed for diep of about 220k nps.

Forget about pwoer usage now. The real problem of clusters is power usage.

16 x 220k nps = 8 x 440 = 3200 + 320 = 3520k nps.

A single i7-965 gets 1.0 mln nps exactly, which is also what the phenom2-955 gets when overclocked a tad.

You'll figure out soon that i can easily build 4 nodes AMD 955 for $2400,
which will practically get more nodes per second.

Note that if i would use Q6600's that i can't overclock them, as i can't overclock pci-bus.

Newer hardware always slams to death old hardware so bigtime, that it is nearly not possible.

Now let's compare the powerusage.

That oldie p4 Xeon 3.06Ghz was eating 105 watt a cpu. 2 cpu's is 210 watt.
Add up other usages you end up with a box eating 270 watt each.

So 16 x 270 watt = 2700 + 1200 + 420 = 4320 watt or so.

Now the i7-965 is too expensive, let's skip to phenom2-955 which is 200 euro or so. It eats nearly as much as the i7 which eats too much power also. Both are around 220 watt i'm guessing under full diep load.

4 x 220 = 880 watt.

Add to those power costs 800 watt for the switch and another 25 watt or so for each QM400 card. Those highend network cards are never idle of course.
At 16 nodes you feel that: 16 x 25 = 8 * 50 = 400.

So that makes the old junk that gets 3.5 million nps using:

4320 watt + 1200 = 5.5 kilowatt in total for the cluster solution of just 16 old outdated and cheapo nodes.

Now the 4M nps phenom2 cluster of 4 nodes it is:

880 + 700 + 100 = 1680 watt.

Now the actual performance. This might surprise you. In the way how i'm doing parallel search, the 'oldie' nodes at 3.5 mln nps, might do in fact not much worse than the 4M nps phenom2 cluster. It might be better.

The reason is simple. 1 node is 5 times faster of that phenom2, so that means i'm missing a ply or 2 more that can't use a global hashtable, whereas the Xeon 3.06Ghz will not lose that.

Missing hashtable last few plies is really pricey.

The communication latency relative to the nps of 1 node is pretty ok.

The communication latency does hurt however at 220k nps bigtime.
It is about 3 us for a one way pingpong.

So that's Qm400 cards from quadrics. google for them at ebay.

Note they do not work for ethernet.

The latency of your 100 mbit or even 1 gbit is more like 0.3 ms, so factor 100 slower at least.

If Diep was the first program to deal reasonably with the latencies compared to the nps that diep got, something the 'old' guys didn't manage.

Note they were not very helpful either in providing information, despite that i had emailed 'em. So it was uncertain whether diep would manage to work with the latencies from SGI, which has the same latency like the cards i own now. Note those latencies at SGI are for HUGE partition of 128 nodes and 512 processors 500Mhz.

At that diep reached above 5 million nps, with a peek of exactly 9.99 mln nps. Just not 10 mln nps. Believe me i browsed those logfiles to see whether it got over 10 mln, but it never managed.

Now you're speaking of todays networks which still have those ugly latencies.

The fastest and most expensive network that's there right now, which IBM uses sometimes with its power6, it is about 1.0 us.

So less than factor 3 faster than "the old junk" i've got over here, and probably factor 20 bigger bandwidth thanks to 2 rails (note bandwidth is not that relevant in this case).

That factor 3 sounds like a lot, but it isn't. As that's of course the usual HPC lies. As soon as you get a BIG partition of course you ain't gonna see one-way pingpong times of 1 us. You only get that with 1 switch.

With a big network with level2 and level3 routers, say existing out of a 512 nodes or so, that latency again will be similar to what i've got at home here.

Add to that, that diep's nps is quite low compared to the fast beancounters like rybka (2.4 mln nps or so single core so i heard the latest rumours).

So forget about a cluster version that works for you with those commercial guys.

Buying a new 2 socket machine is always faster than that.

As for Diep, it might be the only program that is going to work well one day on a cluster, if i can find the money to buy some quad core nodes. Right now i own an old 2.4Ghz quad core AMD which replaced the 6 stolen machines from here. I already need it for source code editting. Testing at it is already tough as it means i can't work in the soruce code then.

But i do not give you the illusion that you can buy a cluster version of diep that easily works for you at 100 mbit or 1 gigabit networks.

I won't even say it "maybe works" end of 2009. It won't unless you buy a low latency network and manage to configure it (which you never will manage yourself, you need a HPC sysadmin for you to do so, someone without HPC experience won't manage it), in which case you CAN buy a cluster copy and i'll especially customize it for you to work with the network you've got (provided it has MPI or something similar).

Low latency networks are not easy to configure. Note it IS possible to buy low latency tcp/ip cards. They're quite expensive though. Nearly $1000 a piece. They're DMA. Additional to that you need a low latency switch which newprice also is a $5000. Maybe ebay has all that a lot cheaper.

So cluster versions MIGHT work onto that quite well. Some of you might be able to obtain that cheap.

Other than Diep, forget about cluster versions at your desk that get a speedup of over 2.0 out of the network. For all that money and effort,
it's easier to buy a 4 socket machine of course.

In june amd is supposed to release 24 core box. Now THAT kicks butt.
More than a 16 nodes P4 Xeon 3.06Ghz cluster, with worlds best parallel algorithm, that's for sure.

Vincent

Uri · Post by **Uri** » Sat May 16, 2009 3:17 am

Zach Wegner wrote:That's more like Rybka's algorithm. The kind of cluster searching we are talking about here doesn't only divide up moves at the root position. As a simple example, the master computer would be searching 1. e4 e5 2. Nf3, and then whichever computers are idle would take a move there.

What do you mean by 'idle'? Could you expand on that a bit? By 'idle' do you mean computers which are not busy searching something else than 1.e4 e5 2. Nf3 ...?

diep · Post by **diep** » Sat May 16, 2009 3:38 am

Uri wrote:
Zach Wegner wrote:That's more like Rybka's algorithm. The kind of cluster searching we are talking about here doesn't only divide up moves at the root position. As a simple example, the master computer would be searching 1. e4 e5 2. Nf3, and then whichever computers are idle would take a move there.
What do you mean by 'idle'? Could you expand on that a bit? By 'idle' do you mean computers which are not busy searching something else than 1.e4 e5 2. Nf3 ...?

Oh comeon you can mathematical already prove that dividing just the root can never give you over speedup 2.0, in fact usually speedup will be negative.

Forget the word cluster and rybka in 1 sentence and use the word NDA and think about the 32 core, 64 logical cores intel NDA'ed Xeon MP machine or the 48 core AMD machine that's still under NDA. The 24 core AMD machine should sell by end of june, but let's sit and wait for it to happen.

That's shared memory. Start rybka at it , and it works.

Now just find a location with such a box and to cover up for your NDA spread the rumour it works genius at your cluster.

there is a lot of other hardware out there that's secret. Like a chipset that can combine a number of 8 core boxes using HT3 to 1 shared memory box and so on.

Scroll up for the latencies of normal networks and consider how easy it is to post a few lies online versus making up for a factor 100 in latency and another factor 10 in nodes per second. So that's a problem of factor 1000 compared to what i've got over here with Diep.

Alternative get 1 secret machine with 48 cores from AMD, or intel 64 logical core machine, and run from that.

In past world champs it was very common that people ran at secret hardware. Some yanks are simply not clever and then start lying.

What cluster? the only guy with a program that ran well on a cluster is posting this to you.

Vincent

M ANSARI · Post by **M ANSARI** » Sat May 16, 2009 8:18 am

Here is a recent example how I think a cluster can come up with extremely strong play without having to worry about latencies of communication or difficult and hard to debug software. This game was recentely played between Rybka Cluster against Deep Shredder. I have shown 3 different clip analysis, normal Octa 4 Ghz Rybka 3, then the same computer with cleared hash and a 5 PV display for 5 minutes and then a single display of how Rybka's evaluation would be if it was forced to make the winning move Nd5. I think I read somewhere that it takes Rybka 3 about 20 or 30 minutes to find Nd5 on a Quad.

OK ... so let us simulate a Rybka Cluster, in this case we are simulating a 5 PV Rybka output with a clip analysis for 5 minutes which would be on one slave. Although it is not showing it took about 40 seconds for Nd5 to be chosen as the second best move.

Rybka - Shredder, Olympiad Pamplona 2009
[d]r1b2rk1/pppp2pp/4n1q1/1Bb2p2/7R/2N4Q/PPPB1PPP/4R1K1 w - - 0 1

Analysis by Rybka 3: (after 5 minutes analysis)

1. +/= (0.70): 19.Nd5 c6 20.Rxe6[] Qxe6 21.Nf4[] Qxa2 22.Bf1 h6 23.Qc3[] d5 24.Qxc5 Qxb2 25.Rh3 Qb6 26.Qa3 Qd8 27.Rg3 Rf7 28.Ng6 a5
2. = (0.11): 19.Bd3 c6 20.Na4 d6 21.b4 Bd4 22.Rxe6 Bxf2+ 23.Kxf2[] Bxe6 24.Nc3 Rae8 25.a4 h6 26.Ne2
3. = (0.05): 19.Rh5 c6 20.Bd3 d5 21.Ne2 Be7 22.Rxf5 Rxf5 23.Qxf5 Qxf5 24.Bxf5 Bf6 25.c3 g6 26.Bc2 Bd7 27.f4 c5
4. = (-0.25): 19.a3 c6 20.Bd3 h6 21.Re5 d5 22.Rxf5 Rxf5 23.Qxf5 Qxf5 24.Bxf5 Ng5 25.Bxc8 Rxc8 26.Bxg5 hxg5 27.Rg4 Re8 28.Kf1 Be7 29.Ne2 Kf7
5. =/+ (-0.33): 19.b3 c6 20.Bd3 h6 21.Re5 d5

(, 16.05.2009)

So assuming that it took 40 seconds to send Nd5 to one of the narrow slaves to ponder, let us see how long it takes Rybka on that node to give a high evaluation. So clip analysis after the move Nd5 is forced on the same computer

Rybka - Shredder, Olympiad Pamplona 2009
[d]r1b2rk1/pppp2pp/4n1q1/1BbN1p2/7R/7Q/PPPB1PPP/4R1K1 b - - 0 1

Analysis by Rybka 3:

19...c6
-/+ (-1.33) Depth: 6 00:00:00 2kN
19...c6
-/+ (-1.35) Depth: 7 00:00:00 6kN
19...c6
-/+ (-0.83) Depth: 8 00:00:00 20kN
19...c6
-/+ (-1.03 !) Depth: 9 00:00:00 66kN
19...c6
-/+ (-0.77) Depth: 9 00:00:00 140kN
19...c6
-/+ (-0.77) Depth: 10 00:00:00 164kN
19...c6
-/+ (-0.75) Depth: 11 00:00:00 248kN
19...c6 20.Rxe6[] Qxe6[] 21.Nf4[] Qxa2 22.Bf1[] h6 23.Qc3 d5 24.Qxc5 Qxb2 25.Ng6[] Rf6 26.Ne7+ Kh7 27.Bg5 Re6[] 28.Nxf5 Re1[]
+/= (0.44) Depth: 12 00:00:07 2970kN
19...c6 20.Rxe6[] dxe6[] 21.Nf4[]
+/- (0.85) Depth: 13 00:00:20 7875kN
19...c6 20.Rxe6[] dxe6[] 21.Nf4[] Qe8 22.Rxh7[] cxb5 23.Rh8+[] Kf7[] 24.Qh5+[] Kf6 25.Bc3+ Ke7 26.Ng6+ Kd8 27.Nxf8 Qxh5 28.Rxh5 Ke8 29.Rh8 Kf7 30.Nd7 b4 31.Be5
+/- (0.99) Depth: 14 00:00:26 10553kN
19...c6 20.Rxe6[] dxe6[] 21.Nf4[] Qe8 22.Rxh7[] cxb5 23.Rh8+[] Kf7[] 24.Qh5+[] Kf6 25.Bc3+ Ke7 26.Ng6+ Kd8 27.Qg5+ Kc7 28.Nxf8 Bxf8 29.Qg3+ Kb6[] 30.Qe3+ Bc5 31.Rxe8 Bxe3 32.fxe3 g5[] 33.h3 Kc5[] 34.Rg8 b4
+/- (1.03) Depth: 15 00:00:39 16064kN
19...c6 20.Rxe6[] dxe6[] 21.Nf4[] Qe8 22.Rxh7[] cxb5 23.Rh8+[] Kf7[] 24.Qh5+[] Kf6 25.Bc3+ Ke7[] 26.Ng6+[] Kd8 27.Qg5+ Kc7 28.Nxf8[] Bxf8[] 29.Qg3+[] e5[] 30.Bb4 Qd7[] 31.Qc3+ Kb6[] 32.Qe3+ Ka6 33.Rxf8 b6 34.h4 Qd1+
+/- (1.16) Depth: 16 00:01:08 29533kN

(, 16.05.2009)

As you can see that it takes 7 seconds to like Nd5 and around 20 seconds to really like it, you could say a total of less than a mintue. And now here is the master's clip analysis for a little more than 30 minutes

Rybka - Shredder, Olympiad Pamplona 2009
[d]r1b2rk1/pppp2pp/4n1q1/1Bb2p2/7R/2N4Q/PPPB1PPP/4R1K1 w - - 0 1

Analysis by Rybka 3:

19.Bd3 c6 20.Rh5
= (0.20) Depth: 6 00:00:00 3kN
19.Bd3
+/= (0.40 !) Depth: 7 00:00:00 5kN
19.Bd3 b6 20.Rh5 Nd4
+/= (0.40) Depth: 7 00:00:00 10kN
19.Bd3 h6 20.Re5 b6 21.Rxf5 Rxf5 22.Bxf5 Qf6 23.Qg4 Kf8 24.Rh3
= (0.24) Depth: 8 00:00:00 61kN
19.Bd3 h6 20.Re5 b6 21.Rxf5 Rxf5 22.Bxf5 Qf6 23.Qg4 Kf8 24.Rh3
= (0.24) Depth: 9 00:00:00 65kN
19.Bd3 h6 20.Re5 b6 21.Rxf5 Rxf5 22.Bxf5 Qf6 23.Qg4 Kf8 24.Rh3
= (0.24) Depth: 10 00:00:00 97kN
19.Bd3 h6 20.Re5 b6 21.Rxf5 Rxf5 22.Bxf5 Qf6 23.Qg4 Kf8 24.Na4 Be7
+/= (0.28) Depth: 11 00:00:00 167kN
19.Bd3 h6 20.Re5 b6 21.Rxf5 Rxf5 22.Bxf5 Qf6 23.Qg4 Kf8 24.Na4 Be7
+/= (0.28) Depth: 12 00:00:00 341kN
19.Bd3 h6 20.Re5 b6 21.Rxf5 Rxf5 22.Bxf5 Qf7 23.Qg4 Ng5 24.Nd1 Be7 25.f4
+/= (0.38) Depth: 13 00:00:03 1520kN
19.Bd3 h6 20.Re5 b6 21.Rxf5 Rxf5 22.Bxf5 Qf6 23.Re4 Bb7 24.Bxe6+ dxe6 25.Qxe6+ Qxe6 26.Rxe6 Kf7 27.Re1 Rd8 28.Be3
+/= (0.38) Depth: 14 00:00:06 2623kN
19.Bd3 h6 20.Re5 c6 21.Rxf5 Rxf5 22.Bxf5 Qf7 23.Qg4 Ng5 24.Nd1 Be7 25.Ne3
+/= (0.43) Depth: 15 00:00:21 9711kN
19.Bd3 h6 20.Re5 c6 21.Rxf5 Rxf5 22.Bxf5 Qf7 23.Qg4 Ng5 24.Nd1 Be7 25.Ne3
+/= (0.43) Depth: 16 00:00:27 12018kN
19.Bd3 c6 20.Na4 d6 21.b4 Bd4 22.Rxe6 Bxf2+ 23.Kxf2 Bxe6 24.Nc3 Rae8 25.a3 b6 26.Ne2 h6 27.Qg3 Qxg3+ 28.Nxg3 c5 29.c4 Bd7 30.Bc3 Rf7
= (0.24) Depth: 17 00:01:46 47530kN
19.Bd3 c6 20.Na4 d6 21.b4 Bd4 22.Rxe6 Bxf2+[] 23.Kxf2 Bxe6 24.Nc3 Rae8 25.a3 b6 26.Ne2 h6 27.Qg3 Qxg3+ 28.Nxg3 c5 29.c4 Bd7 30.Bc3 Rf7
= (0.24) Depth: 18 00:02:02 53891kN
19.Bd3 c6 20.Na4 d6 21.b4 Bd4 22.Rxe6[] Bxf2+ 23.Kxf2[] Bxe6[] 24.Nc3 Rae8 25.a3 h6 26.Ne2 Bd7 27.Qg3 Qxg3+ 28.Nxg3 c5 29.Rh5 b5
= (0.19) Depth: 19 00:04:04 106mN
19.Nd5
+/= (0.44 !) Depth: 19 00:05:50 154mN
19.Nd5
+/= (0.64 !) Depth: 19 00:06:39 173mN
19.Nd5
+/- (1.04 !) Depth: 19 00:09:17 246mN
19.Nd5 Bd6 20.Bf4 Bc5
+/- (1.40) Depth: 19 00:22:23 610mN
19.Nd5
+- (1.60 !) Depth: 20 00:31:17 833mN
19.Nd5
+- (1.80 !) Depth: 20 00:34:15 905mN

(, 16.05.2009)

As you can see Nd5 evaluation is quickly increasing and within the same time frame it has superceded the master evaluation by quite a bit and will most likely trigger a change of moves of the master (which was still Bd3). You could say then that it took less than a minute for the cluster to find Nd5 while it would have taken the master maybe 5 x or more as much time. Ofcourse things are not so simple but again this gives you an idea of how you can easily make a cluster useful.

Gian-Carlo Pascutto · Sat May 16, 2009 10:28 am

Zach Wegner wrote:Rybka cluster algorithm: divide moves into N groups, where N is the number of nodes. In each group, search the moves as normal, as if the root position had only those moves. The master then picks the move with the best score.

This will fail completely if one group of nodes fails, which is contrary to what Vas stated (works as long as the master works).

It's also going to work quite badly because you cannot redistribute work.

What makes you believe this is what he's doing?

bob · Post by **bob** » Sat May 16, 2009 5:15 pm

Uri wrote:
Gian-Carlo Pascutto wrote:The master tries to detect when, during it's own search, it's in a position where there is a large amount of work to do. It cuts the position into pieces (so to speak), and sends them via the network to the clients. Meanwhile it searches some pieces itself. When the clients are done, they send results back to the master, which collects them until all work for this position is done.
But what if you have 5 different computers which are not of equal processing strength like 2 old computers with only 1 core and 3 newer computers which are dual core, quad-core and and 6-core respectively?

Let's say that the position has 48 possible moves and the two single-core computers evaluate 10 possible moves each and the stronger three evaluate the remaining 38. The dual-core evaluates the 13 remaining, the quad-core evaluates the other 13 remaining and the 6-core evaluates the 12 last remaining moves. In such a case since the individual computers are not of equal strength how can you know if the evaluation is not polluted by the weaker computers?

That's part of the fun. You can compute a "power index" up front, and try to do load balancing based on this. Or you can do as I am trying and simply use dynamic scheduling so that the nodes take work as they can, and slower nodes will end up taking on less work since they are less powerful.

Most clusters are symmetrical however, which means all nodes are equal. All of the ones we have here are, and the next generation cluster we are working on acquiring will also be symmetric.

How do you build a chess cluster?

Re: How do you build a chess cluster?

Re: How do you build a chess cluster?

Re: How do you build a chess cluster?

Re: How do you build a chess cluster?

Re: How do you build a chess cluster?

Re: How do you build a chess cluster?

Re: How do you build a chess cluster?

Re: How do you build a chess cluster?

Re: How do you build a chess cluster?

Re: How do you build a chess cluster?