Real Speedup due to core doubling etc

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Werewolf
Posts: 1796
Joined: Thu Sep 18, 2008 10:24 pm

Re: Real Speedup due to core doubling etc

Post by Werewolf »

CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.

It was a long time ago but I'm not sure about this because they have repeatedly said that Rybka "thickens" (their term) its plies with more cores being used.

i.e. A quad core searching 20 plies deep will be slightly stronger than a dual core searching 20 plies deep.

I can't vouch for the accuracy of their belief though.
Werewolf
Posts: 1796
Joined: Thu Sep 18, 2008 10:24 pm

Re: Real Speedup due to core doubling etc

Post by Werewolf »

Thanks Bob.

Just out curiosity your 1 + (N-1)* 0.7 (again how did you arrive at this??)
is far more generous than N^0.76 (where did the 0.76 come from?) when core count is high.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Real Speedup due to core doubling etc

Post by bob »

Werewolf wrote:Thanks Bob.

Just out curiosity your 1 + (N-1)* 0.7 (again how did you arrive at this??)
is far more generous than N^0.76 (where did the 0.76 come from?) when core count is high.
my formula was produced by running a ton of test positions with 1 cpu, 2 cpus, 4 cpus and 8 cpus. A few years later it was tested with 16 cpus. Note that it is a straight line, which happens to be a bit pessimistic for 2 processors, and perhaps a tiny bit optimistic for 16. I'd suspect that it gets farther off if you go to 32 and 64. I've run on a machine with 64 processors, but it was an itanium-based box and I didn't have enough time to run enough tests to get any decent speedup data.

Where did the 0.76 come from? I think Vas came up with that. I'd assume it fit his program reasonably well at that time. All programs are not created equally in terms of speedup, however, and sometimes the same program behaves differently after what appears to be a minor change somewhere, or when run on different hardware that has different cache and memory organizations...

Fitting a straight line to a set of N data points is a simple least-square problem that can be done with several different freeware applications.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Real Speedup due to core doubling etc

Post by bob »

CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.
That does not compute. :)

1 + (n-1)*.7 is time-to-depth speedup. Nothing to do with NPS. I have ALWAYS given speedup numbers as time-to-depth. NPS is irrelevant in that context.
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Real Speedup due to core doubling etc

Post by Vinvin »

bob wrote:
CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.
That does not compute. :)

1 + (n-1)*.7 is time-to-depth speedup. Nothing to do with NPS. I have ALWAYS given speedup numbers as time-to-depth. NPS is irrelevant in that context.
Yes but Vasik wanted a formula where "more NPS" always mean "stronger" (taking account of the number of CPU), so the NPS are converted with the help of a formula close to "1 + (N-1)*0.7".
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Real Speedup due to core doubling etc

Post by bob »

Vinvin wrote:
bob wrote:
CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.
That does not compute. :)

1 + (n-1)*.7 is time-to-depth speedup. Nothing to do with NPS. I have ALWAYS given speedup numbers as time-to-depth. NPS is irrelevant in that context.
Yes but Vasik wanted a formula where "more NPS" always mean "stronger" (taking account of the number of CPU), so the NPS are converted with the help of a formula close to "1 + (N-1)*0.7".
Doesn't make any sense at all. More NPS is generally stronger. His formula doesn't seem to apply to any numbers I produce in Crafty. This sounds like more of his node nonsense to me... "In rybka I count nodes differently..."

What he REALLY meant was "In rybka, I obfuscate the node count to make it harder to figure out what I am doing."
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Real Speedup due to core doubling etc

Post by Vinvin »

bob wrote:
Vinvin wrote:
bob wrote:
CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.
That does not compute. :)

1 + (n-1)*.7 is time-to-depth speedup. Nothing to do with NPS. I have ALWAYS given speedup numbers as time-to-depth. NPS is irrelevant in that context.
Yes but Vasik wanted a formula where "more NPS" always mean "stronger" (taking account of the number of CPU), so the NPS are converted with the help of a formula close to "1 + (N-1)*0.7".
Doesn't make any sense at all. More NPS is generally stronger. His formula doesn't seem to apply to any numbers I produce in Crafty. This sounds like more of his node nonsense to me... "In rybka I count nodes differently..."

What he REALLY meant was "In rybka, I obfuscate the node count to make it harder to figure out what I am doing."
4 Mn/s on 4 CPU is probably weaker than 3.5 Mn/s on 1 CPU.
That's why Rybka display converted number. 4 CPU -> 1+(0.7*3) = 3.1, so a ratio "/4*3.1" is applied to speed displayed.

(obfuscated numbers on 1 CPU is another story ;) )
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Real Speedup due to core doubling etc

Post by bob »

Vinvin wrote:
bob wrote:
Vinvin wrote:
bob wrote:
CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.
That does not compute. :)

1 + (n-1)*.7 is time-to-depth speedup. Nothing to do with NPS. I have ALWAYS given speedup numbers as time-to-depth. NPS is irrelevant in that context.
Yes but Vasik wanted a formula where "more NPS" always mean "stronger" (taking account of the number of CPU), so the NPS are converted with the help of a formula close to "1 + (N-1)*0.7".
Doesn't make any sense at all. More NPS is generally stronger. His formula doesn't seem to apply to any numbers I produce in Crafty. This sounds like more of his node nonsense to me... "In rybka I count nodes differently..."

What he REALLY meant was "In rybka, I obfuscate the node count to make it harder to figure out what I am doing."
4 Mn/s on 4 CPU is probably weaker than 3.5 Mn/s on 1 CPU.
That's why Rybka display converted number. 4 CPU -> 1+(0.7*3) = 3.1, so a ratio "/4*3.1" is applied to speed displayed.

(obfuscated numbers on 1 CPU is another story ;) )
Rybka doesn't use 1 + .7*3. That is MY formula. And I agree, 4M with 1 cpu is stronger than 4m on 4 cpus, because of search overhead. Rybka used the n^.76 (or whatever the fraction was).

But in any case, when talking about SMP performance, NPS is not the right number to compare. time to depth is the reasonable measurement.
Vinvin
Posts: 5228
Joined: Thu Mar 09, 2006 9:40 am
Full name: Vincent Lejeune

Re: Real Speedup due to core doubling etc

Post by Vinvin »

bob wrote:
Vinvin wrote:
bob wrote:
Vinvin wrote:
bob wrote:
CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.
That does not compute. :)

1 + (n-1)*.7 is time-to-depth speedup. Nothing to do with NPS. I have ALWAYS given speedup numbers as time-to-depth. NPS is irrelevant in that context.
Yes but Vasik wanted a formula where "more NPS" always mean "stronger" (taking account of the number of CPU), so the NPS are converted with the help of a formula close to "1 + (N-1)*0.7".
Doesn't make any sense at all. More NPS is generally stronger. His formula doesn't seem to apply to any numbers I produce in Crafty. This sounds like more of his node nonsense to me... "In rybka I count nodes differently..."

What he REALLY meant was "In rybka, I obfuscate the node count to make it harder to figure out what I am doing."
4 Mn/s on 4 CPU is probably weaker than 3.5 Mn/s on 1 CPU.
That's why Rybka display converted number. 4 CPU -> 1+(0.7*3) = 3.1, so a ratio "/4*3.1" is applied to speed displayed.
(obfuscated numbers on 1 CPU is another story ;) )
Rybka doesn't use 1 + .7*3. That is MY formula. And I agree, 4M with 1 cpu is stronger than 4m on 4 cpus, because of search overhead. Rybka used the n^.76 (or whatever the fraction was).

But in any case, when talking about SMP performance, NPS is not the right number to compare. time to depth is the reasonable measurement.
I don't find the formula used in Rybka ... only this post by Vasik in 2008 :
Vasik wrote:When Rybka displays a 2x higher kn/s, she is effectively 2x faster and correspondingly stronger. It's no different than if you give her 2x more time.
Other engines don't make this adjustment, so it may look like they scale better. I don't really care about this - we're just going to do it the way I think is right.
http://rybkaforum.net/cgi-bin/rybkaforu ... 0#pid86950
Werewolf
Posts: 1796
Joined: Thu Sep 18, 2008 10:24 pm

Re: Real Speedup due to core doubling etc

Post by Werewolf »

Thanks for your help.

I've produced a series of videos for beginners using Aquarium / IDeA. IN Appendix 2 you get a brief mention 9-10 minutes in :)


https://www.youtube.com/channel/UCTLMpf ... g80SzD4IEg