M ANSARI wrote:bob wrote:George Tsavdaris wrote:bob wrote:
First, the 100 Elo claim is nonsense.
How do you know for sure?
Because I understand parallel search as well as anyone around. We've already been thru this discussion once.
IMHO, the ones wanting this restriction are basically saying "I am not intelligent enough to develop a parallel/distributed search that works, and since I can't do it, I don't want anyone else to be able to use their fancy stuff that I don't know how to develop to be able to compete with them..."
This or they just can't afford so much money for having such a hardware.
several programs are university projects. They have plenty of good hardware available. Others have gotten local companies or whatever to provide loaner hardware. I never bought a Cray in my life, for example...
Bob ... with all due respect ... the Rybka Cluster has nothing to do with parrallel search as you define it, and has obviously taken a completely different route from that type of setup. You might be right that the 100 elo figure sounds high ... but that was in testing in blitz games and on that platform 100 elo sounds more than plausible. At LTC it could be a little less ... but not by much.
Let me explain this one more time...
(1) based on the _output_ from Rybka, specifically during the game between Rybka and Crafty in the last ACCA event, Rybka is using a "split only at the root" algorithm. How was this deduced. By capturing Rybka's output and trying to figure out what was going on.
If you can find the game, at some point Crafty played QxQ in that game. And while I had not paid any attention to Rybka's prior kibitzes, someone asked "Why is Rybka losing a queen here?" I looked to see what had caused that question and what I found was that there were five nodes, each doing an unsynchronized search on a subset of the root moves. Unsynchronized means that each node searches its group of root moves, and when it finishes it goes immediately to the next depth without waiting on the others to finish the same iteration. What we were seeing was for each different depth, multiple PVS were being kibitzed. That is not so unusual in and of itself, but in this position, there was only one way to re-capture the queen to remain material ahead. So several moves/scores were being kibitzed and since there was only one way to recapture the queen and maintain equality, the other nodes were searching nonsensical moves that would never be played, but they were kibitzing the scores/PVs anyway. And since those nodes had a simpler tree to search (they were down a queen) they were going 3-4 plies deeper than the _real_ search for the queen recapture. We were seeing PVs with depth=19, depth=22, depth=18, depth=21, depth=19, bouncing all over the place. Once figured out what was going on, if you took the same move, and found the PVs for that move, you would find orderly depth increases. For any move you tried.
So that was almost certainly what the search was doing.
(2) As far as the +100 Elo goes, that's patently impossible using that parallel search approach. Why? Several experimented with this 20+ years ago. My first parallel search on the Cray used this approach. We discovered that we could not produce a speedup of over 1.5X using this, regardless of the number of processors we threw at it. Monty Newborn used this same approach for a year or two in his parallel version of Ostrich. Same findings.
(3) so based on the output, we can deduce the algorithm. Knowing the algorithm, we can accurately state the speedup. And 1.5x faster (upper bound) will _not_ produce a +100 Elo improvement.
Is it possible that the output was once again obfuscated? Given the past history of Rybka, anything is possible. However, from a +100 Elo improvement, that would require a roughly 4x speed improvement. And getting 4x from 5 nodes has not yet been done yet, and may well never be done because of the concessions you have to make when doing message-passing (no shared hash table, killer move list, etc, unless you share them with messages, which kills the search due to network latency, even if you use something decent like infiniband which we have here.
If you want to believe +100, that's your choice to make. Personally, I consider it baloney (to be polite). Vincent believes they have a 40 core shared-memory machine. I've not seen such a configuration anywhere but that doesn't mean there isn't one. I have run on up to 64 cores in fact, but the machines are very pricey and multiplying nodes by a factor of 5 will _never_ give you a factor of 4.0 speedup (at least in Crafty, and I strongly doubt in Rybka either) so no matter what the platform, +100 is far more fiction than fact.
That's as clearly as I can explain it. If I had a copy of that version of Rybka, I could easily test it because I have a cluster with 70 nodes, each node with 8 cores. It would be easy enough to test a 5-node version against a 1 node version to measure times and see what kind of speedup it produces. Since no such version is available, we just get to listen to hyperbole and wonder.
My first parallel search was done in 1978 on a dual-cpu univac 1100 box. That was 30 years ago. In the intervening 30 years, if something sounded too good to be true, it was too good to be true. I do not believe it is any different here...