SMP rating influence

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
Greg Strong
Posts: 388
Joined: Sun Dec 21, 2008 6:57 pm
Location: Washington, DC

Re: SMP rating influence

Post by Greg Strong »

bob wrote:The only fly in the ointment is the parallel search efficiency issue. Not all parallel searches are created equal, and if the 2-cpu speedup is 1.3x, then this is not going to hold. Or if the 2cpu speedup is 2.0x, then it will be even better than what I reported...
Any idea how effecient Young Brothers Wait is, compared to the search tree splitting that Crafty uses?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SMP rating influence

Post by bob »

Greg Strong wrote:
bob wrote:The only fly in the ointment is the parallel search efficiency issue. Not all parallel searches are created equal, and if the 2-cpu speedup is 1.3x, then this is not going to hold. Or if the 2cpu speedup is 2.0x, then it will be even better than what I reported...
Any idea how effecient Young Brothers Wait is, compared to the search tree splitting that Crafty uses?
Crafty is essentially a YBW algorithm. The only difference is that is it an unsynchronized search so that as a processor becomes idle because there are no more moves to search at the current ply, that processor can go help a busy processor at another ply, although it still must satisfy the YBW criterion.

Crafty's speed comes from the cleverness of never allowing a processor to sit and wait on work, at least nore for more than a time measured in microseconds...
User avatar
Greg Strong
Posts: 388
Joined: Sun Dec 21, 2008 6:57 pm
Location: Washington, DC

Re: SMP rating influence

Post by Greg Strong »

bob wrote:Crafty is essentially a YBW algorithm. The only difference is that is it an unsynchronized search so that as a processor becomes idle because there are no more moves to search at the current ply, that processor can go help a busy processor at another ply, although it still must satisfy the YBW criterion.

Crafty's speed comes from the cleverness of never allowing a processor to sit and wait on work, at least nore for more than a time measured in microseconds...
Cool. Does anyone happen to know if Glaurung is similar in this regard?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SMP rating influence

Post by bob »

Greg Strong wrote:
bob wrote:Crafty is essentially a YBW algorithm. The only difference is that is it an unsynchronized search so that as a processor becomes idle because there are no more moves to search at the current ply, that processor can go help a busy processor at another ply, although it still must satisfy the YBW criterion.

Crafty's speed comes from the cleverness of never allowing a processor to sit and wait on work, at least nore for more than a time measured in microseconds...
Cool. Does anyone happen to know if Glaurung is similar in this regard?
No idea. The best-known alternative is DTS as used in Cray Blitz and now in a few other modern programs. DTS is _still_ a YBW-type algorithm for the most part, but it is applied globally when a processor becomes idle, rather than at a local specific ply in a processor that is still busy. So DTS has the opportunity to find a better split point than the Crafty approach. But you give up the recursive search to use DTS and I chose to not go that direction a second time as the code is very messy...
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: SMP rating influence

Post by Laskos »

bob wrote:This week (and weekend) I have been chasing a very old parallel search issue. Effect was minor but I had decided it was time for it to go away. And I finally found it. It didn't happen very often so I decided to do a couple of cluster runs, with the only difference being that Crafty would use 2 CPUs while the opponents would use one. I started this run after the bug was fixed, to make sure that after 64K games or so it did not happen again. It is getting close to done, but when looking at the results, I realized that I had some interesting data. I had two 32K runs with one cpu for crafty vs one cpu for opponents. I've published these numbers several times. But now I had the _same_ results for Crafty with 2 cpus against 1 cpu opponents.

What I was able to discover is what kind of a rating difference I found, changing absolutely nothing but giving Crafty an extra CPU against the same opponents and positions, same time control, etc.

Bottom line was +75 Elo. In previous runs, Crafty was about 60 Elo below the latest glaurung2 and Toga2, about 50 Elo better than Fruit, and about 110 Elo above Glaurung 1. With 2 cpus, Crafty finished +10 over glaurung 2, +25 over Toga2, +125 over fruit 2 and +190 over glaurung 2.

When I have time I will try to repeat this using 1, 2, 4 and 8 cpus on the other cluster. I will post the Bayeselo output once all the games have finished.. Things only run half as fast since I can only run one game per node, not two.
As far as I understand these are very very short games, when the plies are low and similarly important. My guess is that HG Muller formula is better than a linear one for normal or long time controls. Probably it is linear in milliseconds games (a logarithmic curve is close to linear in a very narrow range, besides that ln(1+epsilon)=epsilon+O(epsilon^2)) and logarithmic on longer (broader) time controls (ln((A>>1)+epsilon)=ln(A)+O(epsilon)/A). At least other engines exhibit this behaviour. But maybe your parallel implementation is better, and you should put Crafty on a massively parallel supercomputer to see who is the best :)

Kai
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SMP rating influence

Post by bob »

Laskos wrote:
bob wrote:This week (and weekend) I have been chasing a very old parallel search issue. Effect was minor but I had decided it was time for it to go away. And I finally found it. It didn't happen very often so I decided to do a couple of cluster runs, with the only difference being that Crafty would use 2 CPUs while the opponents would use one. I started this run after the bug was fixed, to make sure that after 64K games or so it did not happen again. It is getting close to done, but when looking at the results, I realized that I had some interesting data. I had two 32K runs with one cpu for crafty vs one cpu for opponents. I've published these numbers several times. But now I had the _same_ results for Crafty with 2 cpus against 1 cpu opponents.

What I was able to discover is what kind of a rating difference I found, changing absolutely nothing but giving Crafty an extra CPU against the same opponents and positions, same time control, etc.

Bottom line was +75 Elo. In previous runs, Crafty was about 60 Elo below the latest glaurung2 and Toga2, about 50 Elo better than Fruit, and about 110 Elo above Glaurung 1. With 2 cpus, Crafty finished +10 over glaurung 2, +25 over Toga2, +125 over fruit 2 and +190 over glaurung 2.

When I have time I will try to repeat this using 1, 2, 4 and 8 cpus on the other cluster. I will post the Bayeselo output once all the games have finished.. Things only run half as fast since I can only run one game per node, not two.
As far as I understand these are very very short games, when the plies are low and similarly important. My guess is that HG Muller formula is better than a linear one for normal or long time controls. Probably it is linear in milliseconds games (a logarithmic curve is close to linear in a very narrow range, besides that ln(1+epsilon)=epsilon+O(epsilon^2)) and logarithmic on longer (broader) time controls (ln((A>>1)+epsilon)=ln(A)+O(epsilon)/A). At least other engines exhibit this behaviour. But maybe your parallel implementation is better, and you should put Crafty on a massively parallel supercomputer to see who is the best :)

Kai
I tried the same test at longer time controls, but not nearly as many games. Same result. I tried the shorter time control to see if the longer time control data would hold up with a very large number of games or not.
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: SMP rating influence

Post by Tord Romstad »

Greg Strong wrote:Cool. Does anyone happen to know if Glaurung is similar in this regard?
In principle, yes -- but I have never tried running Glaurung on a computer with more than 2 CPU cores, and it is therefore probably badly tuned for computers with 4 or more CPUs.

Tord
hcyrano

Re: SMP rating influence

Post by hcyrano »

Tord Romstad wrote:t is therefore probably badly tuned for computers with 4 or more CPUs.
what are the possible settings?
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: SMP rating influence

Post by bob »

hcyrano wrote:
Tord Romstad wrote:t is therefore probably badly tuned for computers with 4 or more CPUs.
what are the possible settings?
It is likely more complicated than that. it would probably require programming modifications rather than simple tuning, because you first have to determine what the issues are before you can fix them. Not knowing them in advance makes it unlikely that code was included to address every case if properly modified by tuning parameters...
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: SMP rating influence

Post by Tord Romstad »

bob wrote:
hcyrano wrote:
Tord Romstad wrote:t is therefore probably badly tuned for computers with 4 or more CPUs.
what are the possible settings?
It is likely more complicated than that. it would probably require programming modifications rather than simple tuning, because you first have to determine what the issues are before you can fix them. Not knowing them in advance makes it unlikely that code was included to address every case if properly modified by tuning parameters...
Yes. I meant "tuning" in a very broad sense of the word. My parallel search works, sort of -- it doesn't crash, and the strength clearly improves with an increasing number of CPUs, at least up to 4 (I have almost no data for more than 4 CPUs). But because I have never seen my program run on a quad, there are probably many inefficiencies which I am not aware of, and which I will not be able to address until I am able to test my program on a quad.

I don't worry much about this. By the time quads become common (and I don't think this will be very soon, as the current trend seems to be that tiny notebooks, "netbooks" and phones are becoming the most popular computing devices), I will probably have a quad myself. :)

Tord