2009 ACCA World Computer Rapid Chess Championships

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

CRoberson
Posts: 2056
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Re: Log files

Post by CRoberson »

Zach Wegner wrote:
Also, what's your distributed search like? What algorithm are you using? Last I knew you didn't have a multiprocessor search, which is significantly easier.
I have had an SMP search for sometime but I am not happy with it. I've coded it and recoded it based on various threading and
multiprocessing extensions (research). The distributed search is an algorithm I made up that is loosely based on YBWC.
CRoberson
Posts: 2056
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Re: 2009 ACCA World Computer Rapid Chess Championships

Post by CRoberson »

Only 2 weeks left for the tournament, and we have some new entries.

o Scorpio
o Learning Lemming
o Tornado
o NoonianChess

Ready to accept more entries. Remember: the ones that finish last are the ones that don't enter.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Log files

Post by bob »

sje wrote:The main machine keeps a search task (thread) status board. When a search thread wants to split, it consults this board for a free thread. If/when one is found, the idle thread is used as a target for a simple function call with the position, the window, the starting ply, the fractional depth, and a returned PV move list. It doesn't matter if the thread is local or remote, but the choice of thread may be dependent on the speed of the call, to amount of work to be done, and the target machine. As there is some overhead here, there is a lower bound on the allowable fractional depth.

The details of managing the central status board and when to split are kept private at the moment as they are very likely to change with testing. A big concern here is how to efficiently handle the possibility of a network failure or remote failure without the need to restart the search or the program.
You need to do a lot more in assigning positions to threads or this is going to fail. For example, once you send a position to a particular node for a search, store something you can use so that if you split at this point again, the search goes to the same node since its hash is already loaded and ready. Ditto for positions below this point in the search. Send 'em to the node that can best handle them based on past searches done.

A more complex issue is handling multi-core clusters. There are good ways and bad ways to split the work up, with the bad ways often being worse than not searching in parallel at all.

Very complex topic to get any sort of useful speedup.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Log files

Post by bob »

CRoberson wrote:
Zach Wegner wrote:
sje wrote:
Zach Wegner wrote:
sje wrote:2) It is possible that the program might be running on a small (ten core) cluster by the time of the tournament instead of a single machine.
Good luck!
The main difficulty with a heterogeneous distributed search is not coding the communication, but rather managing load balancing. If a distributed search spends too much time waiting for slower machines to respond, then any benefits of having extra processors are lost.
If you think you can go from a single processor search to a multiprocessor distributed search in a matter of a few weeks, I must repeat myself: good luck. If it works at all, don't expect it to work well. Just saying you would have machines "waiting for others to respond" shows you're going to have some major inefficiencies to deal with.

What sort of algorithm are you planning, for both multiprocessor and distributed search?
You are not factoring in experience. My first parallel coding effort was in 1984. I have a distributed version of Telepath running that took me
two weeks. The last week was mostly tweaking and tuning. I am still tuning but it is running.

I expect Steve's experience is sufficient to produce a distributed version of Symbolic in 2 weeks. Tuning takes longer.

Telepath may be running on a small cluster in WCRCC. Likely an ad hoc one.
I disagree. My first parallel program ran in 1978 on a dual-processor univac 1100 box. And I have not completed a distributed search and I have been working on it off and on for _way_ more than 2 weeks. This is much less about experience in programming chess stuff than it is about understanding the many parallel issues that have to be dealt with in what is simply a sequential algorithm from the ground up (alpha/beta) and trying to make it work using a programming model that does not fit it very well at all.
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Log files

Post by diep »

bob wrote:
CRoberson wrote:
Zach Wegner wrote:
sje wrote:
Zach Wegner wrote:
sje wrote:2) It is possible that the program might be running on a small (ten core) cluster by the time of the tournament instead of a single machine.
Good luck!
The main difficulty with a heterogeneous distributed search is not coding the communication, but rather managing load balancing. If a distributed search spends too much time waiting for slower machines to respond, then any benefits of having extra processors are lost.
If you think you can go from a single processor search to a multiprocessor distributed search in a matter of a few weeks, I must repeat myself: good luck. If it works at all, don't expect it to work well. Just saying you would have machines "waiting for others to respond" shows you're going to have some major inefficiencies to deal with.

What sort of algorithm are you planning, for both multiprocessor and distributed search?
You are not factoring in experience. My first parallel coding effort was in 1984. I have a distributed version of Telepath running that took me
two weeks. The last week was mostly tweaking and tuning. I am still tuning but it is running.

I expect Steve's experience is sufficient to produce a distributed version of Symbolic in 2 weeks. Tuning takes longer.

Telepath may be running on a small cluster in WCRCC. Likely an ad hoc one.
I disagree. My first parallel program ran in 1978 on a dual-processor univac 1100 box. And I have not completed a distributed search and I have been working on it off and on for _way_ more than 2 weeks. This is much less about experience in programming chess stuff than it is about understanding the many parallel issues that have to be dealt with in what is simply a sequential algorithm from the ground up (alpha/beta) and trying to make it work using a programming model that does not fit it very well at all.
I agree with Bob here.

A distributed search that works on a machine with a network where latencies get measured in microseconds is very hard to make for a modern program.

There is no way that Edwards will manage a distributed search at a modern chessprogram with a modern branching factor that gets in the millions of nps a machine, that has a speedup of any decency at a network which latencies measure in the microseconds.

I'm expecting him to completely fail there even if we give him 2 years.

There is actually 2 chessprograms right now that can handle bad latencies in the microseconds. It's diep and it's zappa, that last one lifting on Diep.

If you would run any of the old engines like Zugzwang on a modern supercomputer, it still would get maybe 1 million nps at all 1024 cores.

We get that already today at a single node.

To design such a search you need to prove your entire concept first on paper. That takes months. Then implementing it again months and months.

I don't see Edwards to EVER manage it himself. It is a different world from where he lives in.

Note that i do expect there is many ways to get things done, but the way how diep's doing it is rather interesting from mathematical viewpoint. I calculated in some 'average cases' to go ok, otherwise it would also total fail there.

I do expect for Edwards he's clever enough to get a working concept setup on paper. But i also do expect that same setup is not so efficient and will have way too much overhead. Also that overhead he'll have to prove on paper first correct. Just that proof already will take very long.

Realize that at an itanium2 supercomputer, diep gets nearly the same nps like Zappa, whereas at a 4 core opteron, Zappa got 3.3 mln nps versus diep at the same box 500k nps.

It is very difficult to make something that scales well using YBW.

So i'm still speaking of a search on a cluster here. Not even a distributed search over the internet with dying connections and nodes and latencies that get measured in milliseconds rather than microseconds. That's another order of a magnitude difference.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Log files

Post by bob »

diep wrote:
bob wrote:
CRoberson wrote:
Zach Wegner wrote:
sje wrote:
Zach Wegner wrote:
sje wrote:2) It is possible that the program might be running on a small (ten core) cluster by the time of the tournament instead of a single machine.
Good luck!
The main difficulty with a heterogeneous distributed search is not coding the communication, but rather managing load balancing. If a distributed search spends too much time waiting for slower machines to respond, then any benefits of having extra processors are lost.
If you think you can go from a single processor search to a multiprocessor distributed search in a matter of a few weeks, I must repeat myself: good luck. If it works at all, don't expect it to work well. Just saying you would have machines "waiting for others to respond" shows you're going to have some major inefficiencies to deal with.

What sort of algorithm are you planning, for both multiprocessor and distributed search?
I'm not sure that at very low (Itanium-like) latencies it is that hard. Eugene ran Crafty on his 64 cpu Itanium several years ago and we scaled NPS at about 75% of theoretical. I'd have to find his original data to remember the speedup. Was not as good as the current results on current hardware, but with 64 nodes we saw a definite improvement over 32, etc...

Did not think much of the machine, however, and still don.t

You are not factoring in experience. My first parallel coding effort was in 1984. I have a distributed version of Telepath running that took me
two weeks. The last week was mostly tweaking and tuning. I am still tuning but it is running.

I expect Steve's experience is sufficient to produce a distributed version of Symbolic in 2 weeks. Tuning takes longer.

Telepath may be running on a small cluster in WCRCC. Likely an ad hoc one.
I disagree. My first parallel program ran in 1978 on a dual-processor univac 1100 box. And I have not completed a distributed search and I have been working on it off and on for _way_ more than 2 weeks. This is much less about experience in programming chess stuff than it is about understanding the many parallel issues that have to be dealt with in what is simply a sequential algorithm from the ground up (alpha/beta) and trying to make it work using a programming model that does not fit it very well at all.
I agree with Bob here.

A distributed search that works on a machine with a network where latencies get measured in microseconds is very hard to make for a modern program.

There is no way that Edwards will manage a distributed search at a modern chessprogram with a modern branching factor that gets in the millions of nps a machine, that has a speedup of any decency at a network which latencies measure in the microseconds.

I'm expecting him to completely fail there even if we give him 2 years.

There is actually 2 chessprograms right now that can handle bad latencies in the microseconds. It's diep and it's zappa, that last one lifting on Diep.

If you would run any of the old engines like Zugzwang on a modern supercomputer, it still would get maybe 1 million nps at all 1024 cores.

We get that already today at a single node.

To design such a search you need to prove your entire concept first on paper. That takes months. Then implementing it again months and months.

I don't see Edwards to EVER manage it himself. It is a different world from where he lives in.

Note that i do expect there is many ways to get things done, but the way how diep's doing it is rather interesting from mathematical viewpoint. I calculated in some 'average cases' to go ok, otherwise it would also total fail there.

I do expect for Edwards he's clever enough to get a working concept setup on paper. But i also do expect that same setup is not so efficient and will have way too much overhead. Also that overhead he'll have to prove on paper first correct. Just that proof already will take very long.

Realize that at an itanium2 supercomputer, diep gets nearly the same nps like Zappa, whereas at a 4 core opteron, Zappa got 3.3 mln nps versus diep at the same box 500k nps.

It is very difficult to make something that scales well using YBW.

So i'm still speaking of a search on a cluster here. Not even a distributed search over the internet with dying connections and nodes and latencies that get measured in milliseconds rather than microseconds. That's another order of a magnitude difference.
CRoberson
Posts: 2056
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Re: 2009 ACCA World Computer Rapid Chess Championships

Post by CRoberson »

We have some new entrants

Pandix by Gyula Hovrath
HFC by Csaba Jergler
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

MT/Distributed search

Post by sje »

I might not have the distributed search running in time for the WCRCC; I'm still shaking out some deficiencies in the multithreaded search.

It was only this morning that I sent out the order for a combined 16 GB of RAM chips for three of the machines in the cluster. In the next month or two, I'll get more memory until all of the Mac 64 bit machines are "full up". I might even add some of the slower, 32 bit Linux boxes into the mix.

By the way, it is not necessary for a distributed search to operate with equal task peers. Jonathan Schaefer used this idea long ago with his tactical only searches, and I'm probably going to have something similar, perhaps with mate/lose only searches and searches used only for move ordering.

For those who were wondering, Symbolic's MT search uses a variant of PVSplit. It has what I call the "leftmost retry" enhancement; the first branch tried is the leftmost branch and gets lots of special treatment with move ordering, IID, etc. all it the hope that the extra effort will pay off by reducing the overall search. But if a leftmost branch fails low, then the next branch becomes the new leftmost and also gets special treatment.
User avatar
mclane
Posts: 18764
Joined: Thu Mar 09, 2006 6:40 pm
Location: US of Europe, germany
Full name: Thorsten Czub

Re: 2009 ACCA World Computer Rapid Chess Championships

Post by mclane »

CRoberson wrote:We have some new entrants

Pandix by Gyula Hovrath

very good that pandix participates !!!!

:!:
What seems like a fairy tale today may be reality tomorrow.
Here we have a fairy tale of the day after tomorrow....
CRoberson
Posts: 2056
Joined: Mon Mar 13, 2006 2:31 am
Location: North Carolina, USA

Re: 2009 ACCA World Computer Rapid Chess Championships

Post by CRoberson »

We have 2 new entrants:
Bright by Allard Siemelink
Spark by Allard Siemelink