Increase in Elo ..Question For The Experts

Don · Post by **Don** » Tue Dec 06, 2011 4:55 pm

Laskos wrote:
Don wrote:
Laskos wrote:
Don wrote:
Laskos wrote:
Don wrote: I'm almost positive that modern programs get a lot more ELO increase for doubling that the old program got.
That would mean that the difference in strength between modern programs and old ones would increase with the time control. I have to check the rating lists (no time now), but I doubt this statement.

Kai
Just so you understand what I'm saying, I assume that both program are starting from the same point on the ELO scale. Then you double them both and see which will benefit the most. Is that what you understood?
Almost so, by comparing difference X between 2 engine at say 40/4 with difference Y between same engines at say 40/40. It's easy to do this having rating lists and a bit of time for a representative sample (errors are not so small in all these lists).

Kai
My comment applies to older programs that are a several hundred ELO weaker and have large branching factors compared to modern programs. I don't believe there is a substantial difference when comparing program that are just 3 or 4 years older or that are hundreds of ELO weaker. In such a case I would agree with you if you are not starting with equalized programs.

A way to test such a thing is to start with an ancient version of Crafty for instance, and test it against Critter or Stockfish or Komodo. Add some time to Crafty and substract some time from the modern program to the extent that get a roughly even score against each other. Then increase the time for both programs by a constant factor and my hypothesis is that the modern program will be noticeably stronger.

If you compare what the old program does when doubled with what a modern program does when doubled it won't come out the same because the modern program may be starting out at 3000 and the old program may by starting out at 1800 and of course it's well known that doubling the time has a much larger impact on a program that is significantly weaker.

Don
I got what you meant, but I am not sure that would be a fair comparison. If one pits a strong program at 1 second with a weak program at 2 minutes (to equal the strength), then at 10 seconds and 20 minutes (10x time control), to compare the difference, and we see that the stronger program improves more, that could mean many things, hash filling or optimization for certain size of the tree, etc.

I could be wrong, but intuitively doubling the time close to the optimum of the engine is the same for weak or strong engines as Elo gain goes. The ply was more important as Elo gain for weaker, older engines having high branching factor. I would rather say that modern engines gain less from each new ply as compared to old (weak) engines. Besides that, the Elo gain with ply is diminishing with depth in the case of the modern engines.

One has to test to see what really happens.

Kai

A very simple test I have done before is to just take 2 programs, test them at 3,4,5,6,7,8,9 and 10 ply (or some similar range) and then plot the ratings based on TIME, not depth. I have done this before and I can already tell you what you will see, 2 lines where the line of the modern program climbs much more rapidly. That's really what I am talking about.

It's not fair testing a modern program going from 18 to 19 ply against an old program going from 3 or 4 ply because ANY program will gain a huge amount going from 3 to 4 compared to going from 18 to 19.

If you don't think the test is fair we can run the test where in the sweet spot range of the OLD program instead of the modern program and I think you will still see the phenomenon I am describing. You won't be able to make the point that the modern program was optimized for really high depths and the older one wasn't.

Don

bob · Post by **bob** » Tue Dec 06, 2011 9:47 pm

Don wrote:I cannot seem to find older versions of Crafty, does anyone know where I can get a version that is at least 10 years old?

Don

Uri Blass wrote:
Laskos wrote:
Don wrote: I'm almost positive that modern programs get a lot more ELO increase for doubling that the old program got.
That would mean that the difference in strength between modern programs and old ones would increase with the time control. I have to check the rating lists (no time now), but I doubt this statement.

Kai
That is not what Don means
Don means that if you start from the same elo modern programs earn more from doubling(I think that you can replace modern with significantly stronger)

I think that if you take 2 programs when the rating difference is more than 300 elo and give the weaker program significantly superior hardware to get result close to 50% at 5 minute per game then the stronger program is going to perform better at longer time control.

It is not about modern programs but about stronger programs and
I expect it to happen even if you test Crafty23.4 against Houdini1.5(difference of more than 300 elo rating points based on the CCRL list).

The first step is to find hardware difference that Crafty get 50% against houdini at 5 minutes per game and the second step is to test them at 1 hour per game with the same hardware difference.

The exact age is difficult. You can find the Jakarta version of Crafty (10.18) on my ftp site for sure. That was 1996 or 1997, which makes it 14-15 years old.

Version 15.0 was the first parallel search version. That came in 1997 for certain, close to the beginning of 98... First box I used was a dual pentium 300mhz that was on loan from Intel... I think a pentium II but I am not certain. Version 19.x was the version copied to make Rybka 1.6.1, so those two bracket the time-frame (age) you want. Somewhere in the 15.x-17.x would probably be about 10 years old...

wgarvin · Post by **wgarvin** » Tue Dec 06, 2011 10:25 pm

kasinp wrote:Steve,

Please have a look at this resource:

http://tldp.org/HOWTO/BogoMips/bogo-list.html

For the 486 66MHz I think the speed index is around 33.
For P3 866MHz the speed index is ca. 1730.

These numbers give a much bigger difference that the raw clock speed comparison. Bob is absolutely right - once cycle of the 486 is NOT the same as one cycle of the P3.

I used these index values to calibrate DOS Box version of the Genius 3 program to my dedicated Mephisto unit (there are Motorola 68030 speeds here as well). The results were really close to the index predictions.

Regards,
PK

The other thing to keep in mind is that while CPU clock speeds (and instructions per clock) have improved by orders of magnitude, memory speeds have not exactly kept up (main memory latency in cycles is perhaps 1/4th of what it was 15 years ago?). We rely on bigger and smarter caches to paper over this difference, and cache misses (e.g. transposition table access) will really slow things down. Different programs use different instructions and different memory access patterns, so they won't all see the exact same gain when changing to a faster machine. Also the modern programs can take advantage of 64-bit instructions and things like the popcnt instruction that didn't exist back then.

It seems to me that the best way to measure "speedup" for a particular engine when its run on a much newer machine, is to use a known starting position and do a long-enough fixed-depth (or fixed-nodes) search and measure the wall clock time. Measure that time on both machines, and whatever the ratio is, thats your "speedup factor" for the combination of that engine and those two machines.

CRoberson · Post by **CRoberson** » Thu Dec 08, 2011 1:38 am

IIRC, the Pentium 90 ran some things (neural networks with much floating point) 5 times faster than the 486 66. Don't recall the chess program gains. The Pentium III (the name really pissed off the engineering team that designed it) was the first of Intel's superscalar architecture chips. The 800 MHz version was 3 or 4 steppings after the original P III release. It got better. Thus, there should be a nonlinear gain (relative to MHz) between the two in chess performance.

To help this discussion out. I actually have 2 operational machines that can help. One is a Pentium 90 (still with the original FP bug) and a Pentium III 800 Mhz machine.

If you have any specific experiments for me to run, let me know.

bob · Post by **bob** » Thu Dec 08, 2011 5:39 pm

CRoberson wrote:IIRC, the Pentium 90 ran some things (neural networks with much floating point) 5 times faster than the 486 66. Don't recall the chess program gains. The Pentium III (the name really pissed off the engineering team that designed it) was the first of Intel's superscalar architecture chips. The 800 MHz version was 3 or 4 steppings after the original P III release. It got better. Thus, there should be a nonlinear gain (relative to MHz) between the two in chess performance.

To help this discussion out. I actually have 2 operational machines that can help. One is a Pentium 90 (still with the original FP bug) and a Pentium III 800 Mhz machine.

If you have any specific experiments for me to run, let me know.

The original pentium was super-scalar. It could issue two instructions per clock. Pentium pro was first OOE Intel processor, and that basic design approach is still used today (reorder buffer, register renaming, multiple issue, L1/L2 cache, etc...)

Werewolf · Post by **Werewolf** » Thu Dec 08, 2011 6:51 pm

Bob,
Is there any chance you could provide a quick definition of 'super-scaler' and 'out-of order' and explain why the out of order approach is so much better?

Much appreciated.

bob · Post by **bob** » Thu Dec 08, 2011 8:26 pm

Werewolf wrote:Bob,
Is there any chance you could provide a quick definition of 'super-scaler' and 'out-of order' and explain why the out of order approach is so much better?

Much appreciated.

Superscalar architectures simply issue two (or more) instructions per clock cycle. The original pentium issued two at a time when possible. More recent versions can issue up to 4, perhaps more on very recent versions.

OOE is a huge gain. First, X86 has very few accessible registers. OOE came from Tomasulo and the IBM /360 model 91, and brought along with it the idea of register renaming, which is a way of obtaining the data flow through a large set of instructions. Once you know that, you can easily determine which group (if any) of instructions can be executed in parallel, even if they don't appear adjacent to each other in the binary. It eliminates "name dependencies" when applied to register names.

For example:

mov eax, x
add eax, y
move z, eax

mov eax, a
add eax, b
mov c, eax

give us a problem. The first three can't be executed while any of the second group are executing because they all use eax. If the programmer had used ebx in the second group, they could be executed in parallel. OOE and register renaming solves that inside the hardware as both groups of instructions will be using different register names, completely, exposing instruction-level parallelism that was not present.

We no longer need a clever instruction scheduling phase in the compiler optimizer, the hardware can do an even better job of it.

bob · Post by **bob** » Thu Dec 08, 2011 9:21 pm

wgarvin wrote:
kasinp wrote:Steve,

Please have a look at this resource:

http://tldp.org/HOWTO/BogoMips/bogo-list.html

For the 486 66MHz I think the speed index is around 33.
For P3 866MHz the speed index is ca. 1730.

These numbers give a much bigger difference that the raw clock speed comparison. Bob is absolutely right - once cycle of the 486 is NOT the same as one cycle of the P3.

I used these index values to calibrate DOS Box version of the Genius 3 program to my dedicated Mephisto unit (there are Motorola 68030 speeds here as well). The results were really close to the index predictions.

Regards,
PK
The other thing to keep in mind is that while CPU clock speeds (and instructions per clock) have improved by orders of magnitude, memory speeds have not exactly kept up (main memory latency in cycles is perhaps 1/4th of what it was 15 years ago?). We rely on bigger and smarter caches to paper over this difference, and cache misses (e.g. transposition table access) will really slow things down. Different programs use different instructions and different memory access patterns, so they won't all see the exact same gain when changing to a faster machine. Also the modern programs can take advantage of 64-bit instructions and things like the popcnt instruction that didn't exist back then.

It seems to me that the best way to measure "speedup" for a particular engine when its run on a much newer machine, is to use a known starting position and do a long-enough fixed-depth (or fixed-nodes) search and measure the wall clock time. Measure that time on both machines, and whatever the ratio is, thats your "speedup factor" for the combination of that engine and those two machines.

I totally agree. The ONLY way to measure performance is to use the program you actually want to run. For a very specific application (such as chess) the performance will vary widely from program to program due to cache issues. TLB issues, and particularly SMP issues (such as cache coherency traffic and such).

Werewolf · Post by **Werewolf** » Thu Dec 08, 2011 11:43 pm

Bob, thank you.

Can I ask one more question: how relevant is the size of cache in a processor? I notice Intel have gone from 2 MB /per core to 2.5 MB / per core in their xeons but some people say cache isn't relevant for chess.

Can you say whether it is or isn't with a brief (and simple

) explanation of why this is so, please?

MikeGL · Post by **MikeGL** » Fri Dec 09, 2011 4:33 am

Inline with your subject I have also similar question that
I'm just curious about.

If I have an average engine, say rated 2500 running at P-III, but it has in its
disposal upto 4-5-6-7-8men TB, then, even just in theory, will it be able to
defeat stronger engines (elo 3200) running at powerful hardware?
Assuming the stronger engine has only upto 5 piece TB.

I have seen 2TB external drives even in small electronic shops, so I guess
6 & 7-men TB is already possible (although maybe hidden for private use).

Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts

Re: Increase in Elo ..Question For The Experts