Cluster Rybka

Uri Blass · Post by **Uri Blass** » Tue Nov 11, 2008 12:32 pm

bob wrote:
Uri Blass wrote:
bob wrote:Those numbers sound pretty reasonable. I'm not so happy with "those" that report numbers that are simply fictional, and which anybody that has done any reading or research into parallel search could immediately recognize as bogus.

I still hope that one day someone will post some real numbers on Rybka's parallel speedup on an 8-way box, by running some "normal" positions to a fixed depth using 1, 2, 4 and 8 processors. He claims to scale better than any other program. Somehow I doubt it. Maybe "as good as" if he is lucky. But so far we just have urban legend to go on. Speedups for my program are quite easy to produce and anybody can do it.
I think that fixed depth may be misleading because rybka may play better at fixed depth with more cores thanks to doing less pruning at the same depth.

It is possible to test it simply by playing fixed depth match between
rybka single core and rybka 4 cores.

Uri
I was more interested in the speedup scaling, since Vas has claimed publicly that his speedup is better than anybody else's... I just don't have any good boxes with windows or I would try that myself...

The problem is that fixed depth search times may be misleading because the same depth with 4 cpu does not mean the same as the same depth with 1 cpu
and the only good test to find effective speed up is by games between
rybka 4 cpu and rybka 1 cpu with unequal time control.

For example
If rybka 4 cpu can win by result like 5300:4700 after 10,000 ponder off games with 3:1 time handicap then you can say that the effective speed up is more than 3:1 and it may be a good idea to try 3.5:1 time handicap.

I have no time for this type of test and hopefully other can do it.

Uri

krazyken · Post by **krazyken** » Tue Nov 11, 2008 4:13 pm

bob wrote:
krazyken wrote:One place I'd expect a cluster to be more useful is in pondering. You should be able to get a 100% ponder hit with enough nodes.
doesn't work. If you spread nodes across all moves you ponder, you search less deeply which hurts rather than helps, since each different move will only be pondered with one node, not the entire group.

Well yeah, it may not be the most efficient use of nodes, but it should be a simple way to produce better results than a single node alone.

bob · Post by **bob** » Tue Nov 11, 2008 4:55 pm

krazyken wrote:
bob wrote:
krazyken wrote:One place I'd expect a cluster to be more useful is in pondering. You should be able to get a 100% ponder hit with enough nodes.
doesn't work. If you spread nodes across all moves you ponder, you search less deeply which hurts rather than helps, since each different move will only be pondered with one node, not the entire group.
Well yeah, it may not be the most efficient use of nodes, but it should be a simple way to produce better results than a single node alone.

The whole idea however is to produce a stronger engine. This is going to be a 2 Elo improvement. We predict correctly > 50% of the time. If we don't predict correctly we still do a normal full search for the right move. This is going to produce a one-node move no matter what which is not exactly impressive in performance increase...

bob · Post by **bob** » Tue Nov 11, 2008 4:58 pm

Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:Those numbers sound pretty reasonable. I'm not so happy with "those" that report numbers that are simply fictional, and which anybody that has done any reading or research into parallel search could immediately recognize as bogus.

I still hope that one day someone will post some real numbers on Rybka's parallel speedup on an 8-way box, by running some "normal" positions to a fixed depth using 1, 2, 4 and 8 processors. He claims to scale better than any other program. Somehow I doubt it. Maybe "as good as" if he is lucky. But so far we just have urban legend to go on. Speedups for my program are quite easy to produce and anybody can do it.
I think that fixed depth may be misleading because rybka may play better at fixed depth with more cores thanks to doing less pruning at the same depth.

It is possible to test it simply by playing fixed depth match between
rybka single core and rybka 4 cores.

Uri
I was more interested in the speedup scaling, since Vas has claimed publicly that his speedup is better than anybody else's... I just don't have any good boxes with windows or I would try that myself...
The problem is that fixed depth search times may be misleading because the same depth with 4 cpu does not mean the same as the same depth with 1 cpu
and the only good test to find effective speed up is by games between
rybka 4 cpu and rybka 1 cpu with unequal time control.

For example
If rybka 4 cpu can win by result like 5300:4700 after 10,000 ponder off games with 3:1 time handicap then you can say that the effective speed up is more than 3:1 and it may be a good idea to try 3.5:1 time handicap.

I have no time for this type of test and hopefully other can do it.

Uri

the problem is that from a parallel processing research point of view, "speedup" is _the_ number we want to see. That is a linear function, whereas Elo is not necessarily linear. Everyone seems to believe in diminishing returns, which means Elo doesn't linearly increase with speedup/search depth. Comparing SMP searches can only be done with time-to-ply measurements... We don't really care what the strength improvement is, just what is the parallel speedup...

ernest · Post by **ernest** » Tue Nov 11, 2008 5:26 pm

Dann Corbit wrote:I have no idea how well Rybka scales on a cluster, but here are the SMP scaling numbers:
So we have:
1 CPU -> 2 CPUs (+31 Elo, +52 Elo) average 41.5 Elo gain
2 CPUs-> 4 CPUs (+42 Elo, +41 Elo) average 41.5 Elo gain

Small correction , Dann

1 CPU -> 2 CPUs (+21 Elo, +52 Elo) average 36.5 Elo gain

Dann Corbit wrote:Seems like the regular rule of thumb holds pretty well in both cases.
(around 40-60 Elo per doubling of speed)

You mean 40-60 Elo per doubling of cores! (doubling of speed is more like 70 Elo)

bob · Post by **bob** » Tue Nov 11, 2008 8:19 pm

ernest wrote:
Dann Corbit wrote:I have no idea how well Rybka scales on a cluster, but here are the SMP scaling numbers:
So we have:
1 CPU -> 2 CPUs (+31 Elo, +52 Elo) average 41.5 Elo gain
2 CPUs-> 4 CPUs (+42 Elo, +41 Elo) average 41.5 Elo gain
Small correction , Dann
1 CPU -> 2 CPUs (+21 Elo, +52 Elo) average 36.5 Elo gain

Dann Corbit wrote:Seems like the regular rule of thumb holds pretty well in both cases.
(around 40-60 Elo per doubling of speed)
You mean 40-60 Elo per doubling of cores! (doubling of speed is more like 70 Elo)

And the real issue is that for parallel search research, nobody cares about Elo per core gain. We care about raw speedup numbers. Measuring Elo, which is an approximation, and then extrapolating speedup based on that is an approximation of an approximation. I just want to know "with N processors, how much faster is a program than when it uses just 1 processor?"

Ovyron · Post by **Ovyron** » Tue Nov 11, 2008 10:02 pm

Games of 40core Rybka against 8core Rybka or equivalent have been published:

http://rybkaforum.net/cgi-bin/rybkaforu ... l?tid=8332

Uri Blass · Post by **Uri Blass** » Tue Nov 11, 2008 10:48 pm

bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:Those numbers sound pretty reasonable. I'm not so happy with "those" that report numbers that are simply fictional, and which anybody that has done any reading or research into parallel search could immediately recognize as bogus.

I still hope that one day someone will post some real numbers on Rybka's parallel speedup on an 8-way box, by running some "normal" positions to a fixed depth using 1, 2, 4 and 8 processors. He claims to scale better than any other program. Somehow I doubt it. Maybe "as good as" if he is lucky. But so far we just have urban legend to go on. Speedups for my program are quite easy to produce and anybody can do it.
I think that fixed depth may be misleading because rybka may play better at fixed depth with more cores thanks to doing less pruning at the same depth.

It is possible to test it simply by playing fixed depth match between
rybka single core and rybka 4 cores.

Uri
I was more interested in the speedup scaling, since Vas has claimed publicly that his speedup is better than anybody else's... I just don't have any good boxes with windows or I would try that myself...
The problem is that fixed depth search times may be misleading because the same depth with 4 cpu does not mean the same as the same depth with 1 cpu
and the only good test to find effective speed up is by games between
rybka 4 cpu and rybka 1 cpu with unequal time control.

For example
If rybka 4 cpu can win by result like 5300:4700 after 10,000 ponder off games with 3:1 time handicap then you can say that the effective speed up is more than 3:1 and it may be a good idea to try 3.5:1 time handicap.

I have no time for this type of test and hopefully other can do it.

Uri
the problem is that from a parallel processing research point of view, "speedup" is _the_ number we want to see. That is a linear function, whereas Elo is not necessarily linear. Everyone seems to believe in diminishing returns, which means Elo doesn't linearly increase with speedup/search depth. Comparing SMP searches can only be done with time-to-ply measurements... We don't really care what the strength improvement is, just what is the parallel speedup...

time to ply measurement means nothing if the program does not play the same move at the same depth.

It is possible that smp rybka play better moves at depth 10 relative to single processor rybka because smp rybka does less pruning.

Uri

bob · Post by **bob** » Wed Nov 12, 2008 1:24 am

Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:Those numbers sound pretty reasonable. I'm not so happy with "those" that report numbers that are simply fictional, and which anybody that has done any reading or research into parallel search could immediately recognize as bogus.

I still hope that one day someone will post some real numbers on Rybka's parallel speedup on an 8-way box, by running some "normal" positions to a fixed depth using 1, 2, 4 and 8 processors. He claims to scale better than any other program. Somehow I doubt it. Maybe "as good as" if he is lucky. But so far we just have urban legend to go on. Speedups for my program are quite easy to produce and anybody can do it.
I think that fixed depth may be misleading because rybka may play better at fixed depth with more cores thanks to doing less pruning at the same depth.

It is possible to test it simply by playing fixed depth match between
rybka single core and rybka 4 cores.

Uri
I was more interested in the speedup scaling, since Vas has claimed publicly that his speedup is better than anybody else's... I just don't have any good boxes with windows or I would try that myself...
The problem is that fixed depth search times may be misleading because the same depth with 4 cpu does not mean the same as the same depth with 1 cpu
and the only good test to find effective speed up is by games between
rybka 4 cpu and rybka 1 cpu with unequal time control.

For example
If rybka 4 cpu can win by result like 5300:4700 after 10,000 ponder off games with 3:1 time handicap then you can say that the effective speed up is more than 3:1 and it may be a good idea to try 3.5:1 time handicap.

I have no time for this type of test and hopefully other can do it.

Uri
the problem is that from a parallel processing research point of view, "speedup" is _the_ number we want to see. That is a linear function, whereas Elo is not necessarily linear. Everyone seems to believe in diminishing returns, which means Elo doesn't linearly increase with speedup/search depth. Comparing SMP searches can only be done with time-to-ply measurements... We don't really care what the strength improvement is, just what is the parallel speedup...
time to ply measurement means nothing if the program does not play the same move at the same depth.

It is possible that smp rybka play better moves at depth 10 relative to single processor rybka because smp rybka does less pruning.

Uri

this is rare, but does happen. But it does not "mean nothing". There is no other viable way to measure parallel speedup. we don't need guesses, approximations, and such, when precise numbers are easy to obtain...

bob · Post by **bob** » Wed Nov 12, 2008 1:26 am

Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:Those numbers sound pretty reasonable. I'm not so happy with "those" that report numbers that are simply fictional, and which anybody that has done any reading or research into parallel search could immediately recognize as bogus.

I still hope that one day someone will post some real numbers on Rybka's parallel speedup on an 8-way box, by running some "normal" positions to a fixed depth using 1, 2, 4 and 8 processors. He claims to scale better than any other program. Somehow I doubt it. Maybe "as good as" if he is lucky. But so far we just have urban legend to go on. Speedups for my program are quite easy to produce and anybody can do it.
I think that fixed depth may be misleading because rybka may play better at fixed depth with more cores thanks to doing less pruning at the same depth.

It is possible to test it simply by playing fixed depth match between
rybka single core and rybka 4 cores.

Uri
I was more interested in the speedup scaling, since Vas has claimed publicly that his speedup is better than anybody else's... I just don't have any good boxes with windows or I would try that myself...
The problem is that fixed depth search times may be misleading because the same depth with 4 cpu does not mean the same as the same depth with 1 cpu
and the only good test to find effective speed up is by games between
rybka 4 cpu and rybka 1 cpu with unequal time control.

For example
If rybka 4 cpu can win by result like 5300:4700 after 10,000 ponder off games with 3:1 time handicap then you can say that the effective speed up is more than 3:1 and it may be a good idea to try 3.5:1 time handicap.

I have no time for this type of test and hopefully other can do it.

Uri
the problem is that from a parallel processing research point of view, "speedup" is _the_ number we want to see. That is a linear function, whereas Elo is not necessarily linear. Everyone seems to believe in diminishing returns, which means Elo doesn't linearly increase with speedup/search depth. Comparing SMP searches can only be done with time-to-ply measurements... We don't really care what the strength improvement is, just what is the parallel speedup...
time to ply measurement means nothing if the program does not play the same move at the same depth.

It is possible that smp rybka play better moves at depth 10 relative to single processor rybka because smp rybka does less pruning.

Uri

this is rare, but does happen. But it does not "mean nothing". There is no other viable way to measure parallel speedup. we don't need guesses, approximations, and such, when precise numbers are easy to obtain...

If it plays better moves by pruning less, then the sequential algorithm ought to prune less and play better as well. This argument is circular and leading nowhere...

Cluster Rybka

Re: Cluster Rybka

Re: Cluster Rybka

Re: Cluster Rybka

Re: Cluster Rybka

Re: Cluster Rybka

Re: Cluster Rybka

Re: Cluster Rybka

Re: Cluster Rybka

Re: Cluster Rybka

Re: Cluster Rybka