Leveling The Playing Feild

bob · Post by **bob** » Wed Dec 17, 2008 6:25 pm

Eelco de Groot wrote:
M ANSARI wrote:
bob wrote:
George Tsavdaris wrote:
bob wrote: First, the 100 Elo claim is nonsense.
How do you know for sure?
Because I understand parallel search as well as anyone around. We've already been thru this discussion once.

IMHO, the ones wanting this restriction are basically saying "I am not intelligent enough to develop a parallel/distributed search that works, and since I can't do it, I don't want anyone else to be able to use their fancy stuff that I don't know how to develop to be able to compete with them..."
This or they just can't afford so much money for having such a hardware.
several programs are university projects. They have plenty of good hardware available. Others have gotten local companies or whatever to provide loaner hardware. I never bought a Cray in my life, for example...
Bob ... with all due respect ... the Rybka Cluster has nothing to do with parrallel search as you define it, and has obviously taken a completely different route from that type of setup. You might be right that the 100 elo figure sounds high ... but that was in testing in blitz games and on that platform 100 elo sounds more than plausible. At LTC it could be a little less ... but not by much.
For Robert Hyatt and Vincent maybe some interesting information was posted about Rybka's cluster set-up that they have not yet read.

I don't really know what Vincent's big Beijing cover-up story is all about, maybe somebody knows the facts about that Hey Vincent is Toga now supposed to be part of Chessbase or something

But at least for Rybka I'm pretty sure this is or was not an SMP-box or a supercomputer and not a setup simply splitting at the root either, of course that is not all there is to it, can't be, and I don't for one second believe that Bob believes himself that Rybka's Kibitz-output would be proof of "splitting at the root" only, or not "sharing state information" between the computers as Alan put it on Rybka forum.

I would agree that Vas has obfuscated so much in the past, it is possible that he obfuscated here as well. But _highly_ unlikely, because I did look at the output carefully for one specific move, and the only way that output could be produced was to search different moves with an open window.

Log on to ICC and type "search crafty rybka". Examine the first game in the list, game 0. (most recent). Step down to black's move 30 where Crafty played Qxf2. if you notice, there is only one way to re-capture the queen, any other move simply loses.

We were seeing actual scores that were reasonable, and PVs that were reasonable, for moves other than Rxf2 to equalize material. The scores were all -9.x since white would be down a queen, but the moves being kibitzed (the PVs), the depths, and times, were all consistent.

Now feel free to explain to me, and please disregard all the hyperbole from the Rybka camp, how one can produce a real PV that makes sense, with a real backed-up score, for _any_ move other than Rxf2 in that position? No other program will do so unless you use multi-PV mode, which no sane person would do in a tournament game. Then explain to me how you would see depth 18 for Rxf2 with a near-equal score, then depth 22 for Rfc1 with score = -9, then depth 20 for Rbc1, then depth 19 for Rxf2 again with a near-zero score, then back to deeper depths with -9 scores for the other nonsensical moves?

There really is only _one_ explanation. I had _exactly_ this issue (although I did not do unsynchronized search) in 1983 when we played in the WCCC and won with exactly that algorithm. At one time Ken jumped up and said "YES!!" when he saw us kibitz a PV that gave back a pawn we had won, but then the real best move was displayed by the other processor and the score was back to +1.x again. So this is not new. It is very old in fact. It is a reasonable attempt for a quick-and-dirty cluster implementation. And it is not going to produce enough of a speedup to get anywhere near +100 Elo. In our tests back in 1983, the best we saw was a speedup of 1.5X averaged over 5 moves. Most of the time is was worse, although never worse than 1.0 so we used it anyway.

The questions/answers you posted are basically useless. The answers are evasive and non-technical. And some are simply fictitious with regards to Elo.

But it doesn't really matter. "it is what it is." I'm certain I know what it "is".

The reason is this: The first day, Rybka was not kibitzing PVs. Several complained because the rules explicitly required this. Someone put in a quick hack to make it kibitz the best PV from each node, which probably revealed more than they thought they were revealing, because yours-truly just happened to be the one they were playing the first time this change was tested, and I just happened to notice the fluctuating depths and scores and quickly (offline) created a file with the output, and then manually put it back into the proper order where it made sense, and voila', what was going on was crystal clear.

Well, anyway I think it was very interesting to read some more from Lukas Cimiotti and his work on the cluster.

Eelco

By Kullberg Date 2008-12-06 19:48 My cluster has only 5 computers = nodes. Each computer has 8 cores.
Hardware specs are:
Skulltrail 4 GHz
Skulltrail 3.8 GHz
Asus Z7S WS 2x X5460 @ 3.8 GHz
Asus Z7S WS 2x X5450 @ 3.6 GHz
Asus DSEB-DG 1x E5430, 1x E5420 @3 GHz (subject to change in the near future).
All computers have 8 GB of RAM each.
I built them all myself.

Regards,
Lukas on playchess I am Rechenschieber, Victor_Kullberg and Abdul H

By Roland Rösler Date 2008-12-07 01:10 1. Did you ever solve test suites or single test positions with your cluster?
1.1 If yes, what are the results in comparison to the fastest system in your Cluster?
1.1.1 Did you ever tried this test position? It needs wideness in the beginning, depth in the middle and wideness at the end; only eval >2 is solved!
1.2 If no, why not?

2. Is the Cluster a permanent configuration or is it only for big tournaments we have seen?
2.1 If yes, how many games did you play with the Cluster and what are the results (Elo ?)?
2.2 Do you believe, Cluster Rybka is better >100 Elo than your fastest system in the Cluster?
2.3 How many updates did you get from Vas after WCCC in Beijing for Cluster Rybka?

3. Are five systems the upper bound for the Cluster now?
3.1 If no, what would be the benefit of a sixth equal system (imagine the first five systems are rather equal)?
3.2 If yes, what would be the benefit, if you changed your slowest system by a system which is equal to your fastest?

4. Is there any gain, that the systems of the Cluster are not identical?
4.1 If yes, do the software know (automatic ?), which system is the fastest and which is the slowest, or doesn´t this matter?
4.2 If no, is unpredictability of mp search enough for system (ressource) allocation?
4.3 What would be the result, if your Cluster has to play against a Cluster with five identic 4 GHz Core i7 systems (price ~ Euro 5,000; Phil told me )?

Many questions . Some answers would be nice!
I´m only interested in your estimation; no proofs are required.

By Kullberg Date 2008-12-07 13:11 >1. Did you ever solve test suites or single test positions with your cluster?

no - the cluster is for playing games, not for test positions

>2. Is the Cluster a permanent configuration or is it only for big tournaments we have seen?

I also use it on playchess - there I played ~170 games. Results were good, but I didn't put real work into my book - so they could be better.

>2.2 Do you believe, Cluster Rybka is better >100 Elo than your fastest system in the Cluster?

no - you get ~+100 Elo going from one to 5 computers if all computers are equally fast. I guess I get something like 80 - 90 Elo +

>3. Are five systems the upper bound for the Cluster now?

no - atm. 25 computers is the maximum - but using 5 computers is a very good setup. And I've only got 5 monitors.

>3.1 If no, what would be the benefit of a sixth equal system (imagine the first five systems are rather equal)?

I don't know

>3.2 If yes, what would be the benefit, if you changed your slowest system by a system which is equal to your fastest?

maybe +5 Elo I guess

>4. Is there any gain, that the systems of the Cluster are not identical?

yes - it's fun to build different computers - equal computers would be very boring

>4.1 If yes, does the software know (automatic ?), which system is the fastest and which is the slowest, or doesn´t this matter?

it matters and I tell the software

>4.3 What would be the result, if your Cluster has to play against a Cluster with five identic 4 GHz Core i7 systems (price ~ Euro 5,000; Phil told me )?

5 of these computers would be great for an affordable cluster. I guess my cluster would be ~10 Elo stronger only.

Regards,
Lukas on playchess I am Rechenschieber, Victor_Kullberg and Abdul H

bob · Post by **bob** » Wed Dec 17, 2008 6:29 pm

Erik Roggenburg wrote:Just about every single form of racing has some sort of restrictions - NASCAR, Top-fuel dragsters, Indy cars, F1, etc. Why not chess? Is the WCCC supposed to reward the guy with the biggest hardware, or the guy with the best combo of book, engine, and tweaked out hardware?

So what if they limit to 8 cores? It isn't as though everyone will show up with identical hardware. Some will be OC'd out the yin-yang, so I think this will lead to true teams: Programmer, Book Cooker, Hardware Guru, etc.

top fuel dragsters don't have any limitations I know of, and I am at the drag strip at least once per month with my son running his mustang. Nascar has a "sort of restriction" in terms of CID and allowable engine modifications. And they are moving toward a "single chassis" design which may well kill Nascar completely. Most want to go see a manufacturer of their choice win, whether it be a Chevy, a Ford, a Chrysler, or whatever. Removing that removes interest. Just watch.

Do you think you can go to a Nascar race with your 30,000 dollar right-off-the-showroom car and compete? Prepare to spend _millions_ first. So I guess I completely miss your point. The big races are for the big dogs. "if you can't run with the big dogs, stay under the front porch..."

Zach Wegner · Post by **Zach Wegner** » Wed Dec 17, 2008 6:31 pm

Rémi Coulom wrote:Hi,

David Levy asked me to circulate this message. Please, send an e-mail to him if you have an opinion. He has to make a final decision quickly.

I suggested that it would be better to have an open ICGA forum for discussing this kind of issue, rather than using private mail. He supports the idea, so it is likely I will create such an official ICGA forum.

He invites reactions by past participants in his message, but he also invited me to send his message to programmers who have not yet participated, and may intend to participate. So, even if you have not participated yet, he would welcome your opinion.

Rémi

Thanks for your help Remi.

Zach

bob · Post by **bob** » Wed Dec 17, 2008 6:35 pm

diep wrote:
bob wrote:
M ANSARI wrote:
bob wrote:
George Tsavdaris wrote:
bob wrote: First, the 100 Elo claim is nonsense.
How do you know for sure?
Because I understand parallel search as well as anyone around. We've already been thru this discussion once.

IMHO, the ones wanting this restriction are basically saying "I am not intelligent enough to develop a parallel/distributed search that works, and since I can't do it, I don't want anyone else to be able to use their fancy stuff that I don't know how to develop to be able to compete with them..."
This or they just can't afford so much money for having such a hardware.
several programs are university projects. They have plenty of good hardware available. Others have gotten local companies or whatever to provide loaner hardware. I never bought a Cray in my life, for example...
Bob ... with all due respect ... the Rybka Cluster has nothing to do with parrallel search as you define it, and has obviously taken a completely different route from that type of setup. You might be right that the 100 elo figure sounds high ... but that was in testing in blitz games and on that platform 100 elo sounds more than plausible. At LTC it could be a little less ... but not by much.
Let me explain this one more time...

(1) based on the _output_ from Rybka, specifically during the game between Rybka and Crafty in the last ACCA event, Rybka is using a "split only at the root" algorithm. How was this deduced. By capturing Rybka's output and trying to figure out what was going on.

If you can find the game, at some point Crafty played QxQ in that game. And while I had not paid any attention to Rybka's prior kibitzes, someone asked "Why is Rybka losing a queen here?" I looked to see what had caused that question and what I found was that there were five nodes, each doing an unsynchronized search on a subset of the root moves. Unsynchronized means that each node searches its group of root moves, and when it finishes it goes immediately to the next depth without waiting on the others to finish the same iteration. What we were seeing was for each different depth, multiple PVS were being kibitzed. That is not so unusual in and of itself, but in this position, there was only one way to re-capture the queen to remain material ahead. So several moves/scores were being kibitzed and since there was only one way to recapture the queen and maintain equality, the other nodes were searching nonsensical moves that would never be played, but they were kibitzing the scores/PVs anyway. And since those nodes had a simpler tree to search (they were down a queen) they were going 3-4 plies deeper than the _real_ search for the queen recapture. We were seeing PVs with depth=19, depth=22, depth=18, depth=21, depth=19, bouncing all over the place. Once figured out what was going on, if you took the same move, and found the PVs for that move, you would find orderly depth increases. For any move you tried.

So that was almost certainly what the search was doing.

(2) As far as the +100 Elo goes, that's patently impossible using that parallel search approach. Why? Several experimented with this 20+ years ago. My first parallel search on the Cray used this approach. We discovered that we could not produce a speedup of over 1.5X using this, regardless of the number of processors we threw at it. Monty Newborn used this same approach for a year or two in his parallel version of Ostrich. Same findings.

(3) so based on the output, we can deduce the algorithm. Knowing the algorithm, we can accurately state the speedup. And 1.5x faster (upper bound) will _not_ produce a +100 Elo improvement.

Is it possible that the output was once again obfuscated? Given the past history of Rybka, anything is possible. However, from a +100 Elo improvement, that would require a roughly 4x speed improvement. And getting 4x from 5 nodes has not yet been done yet, and may well never be done because of the concessions you have to make when doing message-passing (no shared hash table, killer move list, etc, unless you share them with messages, which kills the search due to network latency, even if you use something decent like infiniband which we have here.

If you want to believe +100, that's your choice to make. Personally, I consider it baloney (to be polite). Vincent believes they have a 40 core shared-memory machine. I've not seen such a configuration anywhere but that doesn't mean there isn't one. I have run on up to 64 cores in fact, but the machines are very pricey and multiplying nodes by a factor of 5 will _never_ give you a factor of 4.0 speedup (at least in Crafty, and I strongly doubt in Rybka either) so no matter what the platform, +100 is far more fiction than fact.

That's as clearly as I can explain it. If I had a copy of that version of Rybka, I could easily test it because I have a cluster with 70 nodes, each node with 8 cores. It would be easy enough to test a 5-node version against a 1 node version to measure times and see what kind of speedup it produces. Since no such version is available, we just get to listen to hyperbole and wonder.

My first parallel search was done in 1978 on a dual-cpu univac 1100 box. That was 30 years ago. In the intervening 30 years, if something sounded too good to be true, it was too good to be true. I do not believe it is any different here...
I agree fully with you Bob. That type of algorithm is going ugly bad during a game. Shared hashtable is just too important. It was a 3 hour hack of course to cover up.

With Diep i had the privilege of course of doing a number of experiments there on a 1024 processor supercomputer. If you want numbers on how many plies you lose without shared hashtable i can give you, it is a LOT.

A shared hashtable is really a necessity otherwise you get big problems.

The reason for that is that in contradiction to the 80s where most searched a ply or 8 as a maximum (deep thought got 8 ply at 3 minutes a search @ 500k nps or so), today we get depths of 20 or more plies.

The average is far above 20 ply. So the branching factor is absolutely crucial. You cannot make it a lot worse by doing embarrassingly parallel things.

The rybka programmer fully understands this.
The toga team i doubt they even know what a cluster is.

Toga AFAIK is open source program. Its evaluation function end 2008 is still nearly 100% the same (1 or 2 modifications really tiny ones in king safety) to Fruit 2.1 which is from start 2005.

So it's just fruit with a better (and parallel) search.

Whereas rybka is closed source, toga is open source.
Where is that cluster version of it that ran at 3 nodes of 8 cores so called?

These guys in those uni's had like 25 years to parallellize software and when i compared my own scaling of diep to theirs, it was funny that a program with mighty more evaluation inside it, was getting 5+ times more nps at the same origin3800 machine, whereas on paper my program is like 10 to 20 times slower in nps compared to the engines with tiny evaluations.

In that sense Donninger did do a good job of course with hydra. The nps of it is quite ok (he's doing a printf("220 million nps\n"); past few years), but of course not using hashtables last 6+ plies and having in total a 400MB shared memory hashtable for a cluster of 64 FPGA cards that is really hurting bigtime in overhead of course.

That's why at 220 million nps it gets 18-20 ply, forward pruning in the last 2 to 3 plies in hardware in fact in a very crappy manner. This for an evaluation function that at a todays core2 core would get a million or 3 to 4 nps single core already.

Whereas that 18-20 ply was very good in 2005, of course in 2009 that's not competative in any manner.

Only in a world champs the pc programs can usually get good hardware, so i would never want to deny the sheikh of joining if he wants to. Of course they wouldn't, they would get hammered bigtime now.

Just using the cpu's instead of putting thigns in hardware would've been a better plan, but then of course the marketing is tougher.

Point is, that the losses thanks to hardware and parallel search are HUGE, if you just look to the search depth reached. Clusters are very difficult things to program for.

Vincent

I'm aware of the cluster issues. I am actively looking at the issue since we have this cluster of 70 nodes, 8 cores per node, 12gigs of RAM per node, and with infiniband interconnect. I have used one of these nodes in the past couple of internet chess tournaments, and am averaging about 20M nodes per second using 8 2.33ghz cores. There is a _lot_ of computing power in that cluster and I am working to find a viable way to use it. It will be nowhere near as good as a real SMP box, but it will be far better than just using one node also.

More as this progresses...

AdminX · Post by **AdminX** » Wed Dec 17, 2008 6:35 pm

bob wrote: "If you can't run with the big dogs, stay under the front porch..."

I guess thats why we have the Amateurs and on the other side we have the Pros.

bob · Post by **bob** » Wed Dec 17, 2008 6:37 pm

AdminX wrote:
bob wrote: "If you can't run with the big dogs, stay under the front porch..."

BTW enjoyed meeting you at the ACCA...

AdminX · Post by **AdminX** » Wed Dec 17, 2008 6:39 pm

bob wrote:
AdminX wrote:
bob wrote: "If you can't run with the big dogs, stay under the front porch..."

BTW enjoyed meeting you at the ACCA...

Same here Bob, I enjoyed meeting everyone there.

bob · Post by **bob** » Wed Dec 17, 2008 6:41 pm

lexdom wrote:
Nid Hogge wrote:It doesn't matter, the whole purpose is for them handicap Rybka(or any other program out there that is going beat them silly and make the WCCC completely irrelevant) in any possible way, so when they do win the tourney, they'll have something big and shiny to stick to they're product boxes and websites. Just like the overly lying messege on hiarcs.com website. "HIARCS wins only major tournament of 2008 with ALL top chess software competing.." Yes.. Right!
I looked at the hiarcs site and maybe a compromise can be made. An alternative is a single tournament, with two titles. One for "Open Champion" and the other for "Single-Computer Champion".

http://www.hiarcs.com/

HIARCS wins only major tournament of 2008 with ALL top chess software competing.

Recent Tournaments:
HIARCS top single-computer in 28th Dutch Open Computer Chess Championship, Leiden, The Netherlands, November 2008
HIARCS top single-computer in 16th World Computer Chess Championship, Beijing, China, October 2008
HIARCS wins 17th International Thüringer Computer Chess Championship, Germany, May 2008
HIARCS wins 17th International Paderborn Computer Chess Championship, Paderborn, Germany, December 2007

Would you want to see an auto race where they had the category "3 wheel cars"??? The "world champion" ought to be the best there is, not with restrictions or an asterisk by the name.

What many commercial programmers would like is a specific title they are guaranteed to win. "world champion MP" or "world champion cluster" or "world champion single-cpu" or "world champion X86" or "world champion Apple" or "world champion itanium" etc. Make enough titles so that each commercial program can win one of them, and then put "2009 world champion " on the box and everyone will be happy. And nobody will know which program is best overall.

Watchman · Post by **Watchman** » Wed Dec 17, 2008 6:50 pm

diep wrote:Hiarcs (using rybka's old box of 8x4Ghz,

It is not "rybka's old box"... although I wouldn't mind having a few of Lukas's hand-me-downs...

It was a "new box" that I built in April of this year. Intel BOXD5400XS | 2xQX9775 | 8GB Mushkin DDR2-800 | 1xWD1600YS (boot) and 3ware 9650SE-4LPML w/4xWD1600YS in Raid0 |

bob · Post by **bob** » Wed Dec 17, 2008 7:17 pm

By the way, I did get a kick out of the evasive answers. Here is one example, with a "translation":

1. Did you ever solve test suites or single test positions with your cluster?

no - the cluster is for playing games, not for test positions

translation:

No, we don't use test positions to measure cluster performance, because then we would be able to compute actual speed-up and revealing that would probably be embarassing... Instead, we played a hundred games or two (with an error bar measured in hundreds of Elo +/-) to measure cluster performance... Our rating climbed by almost 100 elo so that is what the cluster gives us over the non-cluster Rybka.

I suppose in light of that, one might expect to see a "cluster Rybka" for sale before long, at an inflated price. Since it sounds like pure marketing-speak to me...

Leveling The Playing Feild

Re: Leveling The Playing Feild

Re: Leveling The Playing Feild

Re: Leveling The Playing Feild

Re: Leveling The Playing Feild

Re: Leveling The Playing Feild

Re: Leveling The Playing Feild

Re: Leveling The Playing Feild

Re: A Compromise?

Re: Leveling The Playing Feild

Re: Leveling The Playing Feild