Real Speedup due to core doubling etc

Werewolf · Post by **Werewolf** » Wed Jul 16, 2014 9:08 am

CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.

It was a long time ago but I'm not sure about this because they have repeatedly said that Rybka "thickens" (their term) its plies with more cores being used.

i.e. A quad core searching 20 plies deep will be slightly stronger than a dual core searching 20 plies deep.

I can't vouch for the accuracy of their belief though.

Werewolf · Post by **Werewolf** » Wed Jul 16, 2014 9:14 am

Thanks Bob.

Just out curiosity your 1 + (N-1)* 0.7 (again how did you arrive at this??)
is far more generous than N^0.76 (where did the 0.76 come from?) when core count is high.

bob · Post by **bob** » Wed Jul 16, 2014 5:26 pm

Werewolf wrote:Thanks Bob.

Just out curiosity your 1 + (N-1)* 0.7 (again how did you arrive at this??)
is far more generous than N^0.76 (where did the 0.76 come from?) when core count is high.

my formula was produced by running a ton of test positions with 1 cpu, 2 cpus, 4 cpus and 8 cpus. A few years later it was tested with 16 cpus. Note that it is a straight line, which happens to be a bit pessimistic for 2 processors, and perhaps a tiny bit optimistic for 16. I'd suspect that it gets farther off if you go to 32 and 64. I've run on a machine with 64 processors, but it was an itanium-based box and I didn't have enough time to run enough tests to get any decent speedup data.

Where did the 0.76 come from? I think Vas came up with that. I'd assume it fit his program reasonably well at that time. All programs are not created equally in terms of speedup, however, and sometimes the same program behaves differently after what appears to be a minor change somewhere, or when run on different hardware that has different cache and memory organizations...

Fitting a straight line to a set of N data points is a simple least-square problem that can be done with several different freeware applications.

bob · Post by **bob** » Wed Jul 16, 2014 5:28 pm

CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.

That does not compute.

1 + (n-1)*.7 is time-to-depth speedup. Nothing to do with NPS. I have ALWAYS given speedup numbers as time-to-depth. NPS is irrelevant in that context.

Vinvin · Post by **Vinvin** » Wed Jul 16, 2014 7:03 pm

bob wrote:
CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.
That does not compute.

1 + (n-1)*.7 is time-to-depth speedup. Nothing to do with NPS. I have ALWAYS given speedup numbers as time-to-depth. NPS is irrelevant in that context.

Yes but Vasik wanted a formula where "more NPS" always mean "stronger" (taking account of the number of CPU), so the NPS are converted with the help of a formula close to "1 + (N-1)*0.7".

bob · Post by **bob** » Wed Jul 16, 2014 10:58 pm

Vinvin wrote:
bob wrote:
CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.
That does not compute.

1 + (n-1)*.7 is time-to-depth speedup. Nothing to do with NPS. I have ALWAYS given speedup numbers as time-to-depth. NPS is irrelevant in that context.
Yes but Vasik wanted a formula where "more NPS" always mean "stronger" (taking account of the number of CPU), so the NPS are converted with the help of a formula close to "1 + (N-1)*0.7".

Doesn't make any sense at all. More NPS is generally stronger. His formula doesn't seem to apply to any numbers I produce in Crafty. This sounds like more of his node nonsense to me... "In rybka I count nodes differently..."

What he REALLY meant was "In rybka, I obfuscate the node count to make it harder to figure out what I am doing."

Vinvin · Post by **Vinvin** » Wed Jul 16, 2014 11:14 pm

bob wrote:
Vinvin wrote:
bob wrote:
CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.
That does not compute.

1 + (n-1)*.7 is time-to-depth speedup. Nothing to do with NPS. I have ALWAYS given speedup numbers as time-to-depth. NPS is irrelevant in that context.
Yes but Vasik wanted a formula where "more NPS" always mean "stronger" (taking account of the number of CPU), so the NPS are converted with the help of a formula close to "1 + (N-1)*0.7".
Doesn't make any sense at all. More NPS is generally stronger. His formula doesn't seem to apply to any numbers I produce in Crafty. This sounds like more of his node nonsense to me... "In rybka I count nodes differently..."

What he REALLY meant was "In rybka, I obfuscate the node count to make it harder to figure out what I am doing."

4 Mn/s on 4 CPU is probably weaker than 3.5 Mn/s on 1 CPU.
That's why Rybka display converted number. 4 CPU -> 1+(0.7*3) = 3.1, so a ratio "/4*3.1" is applied to speed displayed.

(obfuscated numbers on 1 CPU is another story

)

bob · Post by **bob** » Thu Jul 17, 2014 6:02 am

Vinvin wrote:
bob wrote:
Vinvin wrote:
bob wrote:
CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.
That does not compute.

1 + (n-1)*.7 is time-to-depth speedup. Nothing to do with NPS. I have ALWAYS given speedup numbers as time-to-depth. NPS is irrelevant in that context.
Yes but Vasik wanted a formula where "more NPS" always mean "stronger" (taking account of the number of CPU), so the NPS are converted with the help of a formula close to "1 + (N-1)*0.7".
Doesn't make any sense at all. More NPS is generally stronger. His formula doesn't seem to apply to any numbers I produce in Crafty. This sounds like more of his node nonsense to me... "In rybka I count nodes differently..."

What he REALLY meant was "In rybka, I obfuscate the node count to make it harder to figure out what I am doing."
4 Mn/s on 4 CPU is probably weaker than 3.5 Mn/s on 1 CPU.
That's why Rybka display converted number. 4 CPU -> 1+(0.7*3) = 3.1, so a ratio "/4*3.1" is applied to speed displayed.

(obfuscated numbers on 1 CPU is another story )

Rybka doesn't use 1 + .7*3. That is MY formula. And I agree, 4M with 1 cpu is stronger than 4m on 4 cpus, because of search overhead. Rybka used the n^.76 (or whatever the fraction was).

But in any case, when talking about SMP performance, NPS is not the right number to compare. time to depth is the reasonable measurement.

Vinvin · Post by **Vinvin** » Thu Jul 17, 2014 10:24 am

bob wrote:
Vinvin wrote:
bob wrote:
Vinvin wrote:
bob wrote:
CRoberson wrote:IIRC, the Rybka team knew of the equation NPS speedup = 1 + (N-1)*0.7, but they saw many customers getting confused when the TTP (Time To Ply) speed uo didn't equal the same value as the NPS speedup due to the workload gain. So, they adjusted the equation to be a TTP equation.
That does not compute.

1 + (n-1)*.7 is time-to-depth speedup. Nothing to do with NPS. I have ALWAYS given speedup numbers as time-to-depth. NPS is irrelevant in that context.
Yes but Vasik wanted a formula where "more NPS" always mean "stronger" (taking account of the number of CPU), so the NPS are converted with the help of a formula close to "1 + (N-1)*0.7".
Doesn't make any sense at all. More NPS is generally stronger. His formula doesn't seem to apply to any numbers I produce in Crafty. This sounds like more of his node nonsense to me... "In rybka I count nodes differently..."

What he REALLY meant was "In rybka, I obfuscate the node count to make it harder to figure out what I am doing."
4 Mn/s on 4 CPU is probably weaker than 3.5 Mn/s on 1 CPU.
That's why Rybka display converted number. 4 CPU -> 1+(0.7*3) = 3.1, so a ratio "/4*3.1" is applied to speed displayed.
(obfuscated numbers on 1 CPU is another story )
Rybka doesn't use 1 + .7*3. That is MY formula. And I agree, 4M with 1 cpu is stronger than 4m on 4 cpus, because of search overhead. Rybka used the n^.76 (or whatever the fraction was).

But in any case, when talking about SMP performance, NPS is not the right number to compare. time to depth is the reasonable measurement.

I don't find the formula used in Rybka ... only this post by Vasik in 2008 :

Vasik wrote:When Rybka displays a 2x higher kn/s, she is effectively 2x faster and correspondingly stronger. It's no different than if you give her 2x more time.
Other engines don't make this adjustment, so it may look like they scale better. I don't really care about this - we're just going to do it the way I think is right.

http://rybkaforum.net/cgi-bin/rybkaforu ... 0#pid86950

Werewolf · Post by **Werewolf** » Thu Jul 17, 2014 12:48 pm

Thanks for your help.

I've produced a series of videos for beginners using Aquarium / IDeA. IN Appendix 2 you get a brief mention 9-10 minutes in

https://www.youtube.com/channel/UCTLMpf ... g80SzD4IEg

Real Speedup due to core doubling etc

Re: Real Speedup due to core doubling etc

Re: Real Speedup due to core doubling etc

Re: Real Speedup due to core doubling etc

Re: Real Speedup due to core doubling etc

Re: Real Speedup due to core doubling etc

Re: Real Speedup due to core doubling etc

Re: Real Speedup due to core doubling etc

Re: Real Speedup due to core doubling etc

Re: Real Speedup due to core doubling etc

Re: Real Speedup due to core doubling etc