Hardware vs Software

bob · Post by **bob** » Wed Dec 03, 2008 6:42 pm

CRoberson wrote:Here is what I meant by PV verification:

    for all moves 
    &#123;
        if first move
             v = - S&#40;-beta,-alpha&#41;
        else
        &#123;
            v = -S&#40;-alpha-1,-alpha&#41;
            if &#40;v>alpha&#41; && &#40;v<beta&#41;
               v = -S&#40;-beta,-alpha&#41;
        &#125;
    &#125;

Change that to:

Code: Select all

      for all moves
      &#123;
           v = -S&#40;-beta,-alpha&#41;
      &#125;

I could do that but why? PVS existed in 1978 and was used in (at the time, Blitz, suggested by Murray Campbell). Ken Thompson used PVS in 1980 belle chess hardware. So that is 30 years old... I could tell you about the first program to ever use that in an ACM event, quite by accident, if you are interested...

bob · Post by **bob** » Wed Dec 03, 2008 6:44 pm

Bo Persson wrote:
bob wrote: I have the test running. 8 x 32,000 games will take around 8 hours so I should have the results around 11:00pm CST.

Isn't this in itself enough evidence that the hardware evolution is a major factor?

Running 256,000 tests wasn't even considered a few years ago.

I believe that for me, this will be the single-most important evolutionary improvement I have seen over the past 40 years of computer chess activity I have been involved in. Even though it is not directly making the search faster since I am not using 256 processors or 540 processors to go faster, I am using them to make testing go incredibly fast to accurately measure the effect of even small algorithmic changes...

mhull · Post by **mhull** » Wed Dec 03, 2008 7:09 pm

bob wrote:
Bo Persson wrote:
bob wrote: I have the test running. 8 x 32,000 games will take around 8 hours so I should have the results around 11:00pm CST.

Isn't this in itself enough evidence that the hardware evolution is a major factor?

Running 256,000 tests wasn't even considered a few years ago.
I believe that for me, this will be the single-most important evolutionary improvement I have seen over the past 40 years of computer chess activity I have been involved in. Even though it is not directly making the search faster since I am not using 256 processors or 540 processors to go faster, I am using them to make testing go incredibly fast to accurately measure the effect of even small algorithmic changes...

But we still look forward to the first freely available implementation of a chess program that more or less optimally utilizes a cluster. It would be nice if this happened before a commercial project succeeds in using one effectively in a big tournament.

hgm · Post by **hgm** » Wed Dec 03, 2008 7:26 pm

bob wrote:normal is roughly 2637 in this test. Removing null-move or LMR drops this by approximately 40 Elo. Removing both drops the rating by around 120 Elo.

An interesting thing to note is that the results are non-additive. Taking the program without both as a rference, adding null-move pruning adds 80 Elo, adding LMR adds 80 Elo, but adding them both only adds 120 Elo, not 80 + 80 = 160 Elo. This suggests both search tricks partially do the same thing. (It could also mean that by doing both you are overdoing it, i.e. overshoot the optimum, but in the case of Crafty that seems unlikely, as I am sure you tested very well that the version that has everything is optimal w.r.t. small variations in all its parameters.)

This means that determining how much a search trick is worth by disabling the trick in a program that has everything, is likely to produce a far smaller difference than adding the same trick to a program that has nothing. This idea is often expressed as: "for a weak program, everything works". The "everything else" usually does a good job in plugging the leak you create by disabling a single trick (by obstructing the rate at which search effort flows to waste through the leak).

In micro-Max, adding null-move pruning caused a big jump in strength when I added it, although I have forgotten exactly how much. (I think it was something like 200-300 Elo). I added it before adding any other search tricks.

krazyken · Post by **krazyken** » Wed Dec 03, 2008 7:38 pm

Mark Uniacke in this interview had this to say

Do you think that computing power (hardware strength) will play the decisive role in the future of chess engines?

Hardware is important but so is software and both play a critical role of course. At present the software is advancing as fast or faster than the hardware.

Harald · Post by **Harald** » Wed Dec 03, 2008 8:27 pm

CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).

I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.

Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576

So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?

Also, I believe eval improvements have caused an improvement in
rating scores.

An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.

Another not very serious thought:
A new piece of good chess software (a new commercial engine)
costs about 100 USD or EUR.
A new PC costs about 1000$.
How much ELO/$ do I get? How should I invest money
when I have a ten year old PC and engine?
Would Bob sell me his famous cluster for 100$?

Harald

bob · Post by **bob** » Wed Dec 03, 2008 8:53 pm

mhull wrote:
bob wrote:
Bo Persson wrote:
bob wrote: I have the test running. 8 x 32,000 games will take around 8 hours so I should have the results around 11:00pm CST.

Isn't this in itself enough evidence that the hardware evolution is a major factor?

Running 256,000 tests wasn't even considered a few years ago.
I believe that for me, this will be the single-most important evolutionary improvement I have seen over the past 40 years of computer chess activity I have been involved in. Even though it is not directly making the search faster since I am not using 256 processors or 540 processors to go faster, I am using them to make testing go incredibly fast to accurately measure the effect of even small algorithmic changes...
But we still look forward to the first freely available implementation of a chess program that more or less optimally utilizes a cluster. It would be nice if this happened before a commercial project succeeds in using one effectively in a big tournament.

It will happen. I am giving it a lot of thought, although there are a few ideas left on my to-do list for current Crafty...

bob · Post by **bob** » Wed Dec 03, 2008 9:03 pm

hgm wrote:
bob wrote:normal is roughly 2637 in this test. Removing null-move or LMR drops this by approximately 40 Elo. Removing both drops the rating by around 120 Elo.
An interesting thing to note is that the results are non-additive. Taking the program without both as a rference, adding null-move pruning adds 80 Elo, adding LMR adds 80 Elo, but adding them both only adds 120 Elo, not 80 + 80 = 160 Elo. This suggests both search tricks partially do the same thing. (It could also mean that by doing both you are overdoing it, i.e. overshoot the optimum, but in the case of Crafty that seems unlikely, as I am sure you tested very well that the version that has everything is optimal w.r.t. small variations in all its parameters.)

I had noticed that, but since both are doing the same thing in different ways, this tells me that there is some overlap. It might be possible to tune this further although I have not yet taken the time to consider just how that might happen. And it might simply be that null-move causes so many quick cutoffs, that LMR never gets a chance, but with null-move removed it does. Probably the same thing can happen in reverse as well, LMRs at places where null-failed causes null-move to become less effective below that point in the tree since it is already a reduced-effort search.

Whether they can be tuned somehow to be more "additive-like" is not so clear, but I have only thought about the issue for a short time since I saw the results for the first time late last night.

This means that determining how much a search trick is worth by disabling the trick in a program that has everything, is likely to produce a far smaller difference than adding the same trick to a program that has nothing. This idea is often expressed as: "for a weak program, everything works". The "everything else" usually does a good job in plugging the leak you create by disabling a single trick (by obstructing the rate at which search effort flows to waste through the leak).

In micro-Max, adding null-move pruning caused a big jump in strength when I added it, although I have forgotten exactly how much. (I think it was something like 200-300 Elo). I added it before adding any other search tricks.

Crafty's basic search is pretty simple. Just PVS + null-move + LMR + check extension (all others have been removed after testing showed they hurt performance rather than helped), + simple q-search checks (I am playing with this as I write this however and it might change) + simple q-search. I've used killers and hashing since middle 70's so those are not new. Bitboards were around in the middle 70's so that's not new. I don't think there is anything remarkable in my evaluation, certainly nothing we were not doing 15 years ago or longer, other than tuning adjustments.

As far as tuning goes on LMR and null-move, I have not gone very far afield there. I use R=3 everywhere. LMR is a 1-ply reduction although I have plans to try a variable reduction so that for moves that look really ugly I reduce them even further (white playing Na1 for example, with king on the other side of the board, no passed pawns, etc...)

I have even run without the check extension, the only one that is left. Many think the purpose of this extension is to find deep mates. That's wrong. The purpose is to try to expose horizon-effect moves and avoid them when possible. I also want to test a restricted check extension, where it is only applied in the last N plies where the horizon effect is most notable, where right now I apply them everywhere with no limit, always +1 ply added to depth.

bob · Post by **bob** » Wed Dec 03, 2008 9:07 pm

krazyken wrote:Mark Uniacke in this interview had this to say
Do you think that computing power (hardware strength) will play the decisive role in the future of chess engines?

Hardware is important but so is software and both play a critical role of course. At present the software is advancing as fast or faster than the hardware.

I think that is more fiction than fact, personally. I'm going to try to quantify this a bit when I can. Programmers have a tendency to exaggerate their contributions and minimize the contributions of others, in this case the hardware designers. It _might_ be as close as a 50-50 deal (50% software, 50% hardware) but personally, I strongly doubt it. more on this later.

michiguel · Post by **michiguel** » Thu Dec 04, 2008 12:04 am

bob wrote:
michiguel wrote:
bob wrote:Here are the results. Crafty-22.9X1 is normal crafty. 22.9X2 is normal except null-move completely commented out. 22.9X3 is normal except that LMR has been completely disabled. 22.9X4 is normal but with both LMR and null-move removed. the -1 or -2 just means run #1 or run#2 to give a fell for what kind of variation there is between runs.
Code: Select all
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.1      2695    4    4 62256   65%  2585   20% 
   2 Toga2             2695    4    3 62256   64%  2585   20% 
   3 Crafty-22.9X1-1   2638    4    4 31128   51%  2629   21% 
   4 Crafty-22.9X1-2   2636    5    5 31128   51%  2629   21% 
   5 Fruit 2.1         2597    3    3 62256   52%  2585   22% 
   6 Crafty-22.9X2-2   2596    4    4 31128   45%  2629   21% 
   7 Crafty-22.9X3-2   2596    4    4 31128   46%  2629   20% 
   8 Crafty-22.9X2-1   2594    4    5 31128   45%  2629   21% 
   9 Crafty-22.9X3-1   2591    4    5 31128   45%  2629   20% 
  10 Glaurung 1.1 SMP  2530    3    4 62256   43%  2585   19% 
  11 Crafty-22.9X4-1   2517    5    5 31128   35%  2629   19% 
  12 Crafty-22.9X4-2   2514    5    5 31128   35%  2629   18% 
normal is roughly 2637 in this test. Removing null-move or LMR drops this by approximately 40 Elo. Removing both drops the rating by around 120 Elo.

Null-move and LMR are the two biggest search enhancements of the past 15 years. And they added +120 Elo. I could always try normal crafty, but take the NPS from about 1M on this hardware (I only test with 1 cpu here) and back it down to about 75K which is what I was getting in 1996 on a pentium pro 200, roughly a factor of 15x and see how that impacts performance. Although I should probably factor in the single-core pentium pro vs a quad-core xeon for today, which runs around 10M nps, as that is a more representative example of what hardware speeds have done since 1996. So 75K to 10M is a factor of 24 or so. But something tells me that factor of 24-25x is _way_ more than 120 Elo...
You may be right but this is not a valid comparison! you should compare the improvement between Crafty model 1996 vs. Crafty model 2008, not the contribution of two single techniques as implemented in 2008.
No idea what you are talking about. This was an absolutely perfect attempt to quantify the effect of null-move and LMR. Null-move existed in 1995. Null-move existed in 1990. So that is _barely_ an enhancement from the last 20 years. Beal's original paper was somewhere in the 1988 range.

The discussion was not about 1988 vs 2008 for a 20 year span, it was about specific techniques developed during that time frame and how much they actually improve a program, vs the speed differential between machines over the last 20 years. I shortened it to 1995 because Crafty was available then, and I have a detailed record of what has been done over the last 13 years....

What is the Elo difference between Crafty 1996 vs Crafty 2008 running in equal hardware? This is not the optimum either but it is closer to something more meaningful.

I can probably produce that result. Let me see what the oldest version I have is, and see if I can figure out when it was current. I certainly know when the Jakarta version was done, since we know when the WCCC in Jakarta was played. But that would be an invalid test as well, because software "improvements" in the present context means "new ideas developed since XXX". Many of Crafty's improvements have been the result of tuning, but based on old ideas. I do not call those "software improvements" because they are not "general enhancements for all programs (such as null-move or LMR is) but are specific to a specific program's previously existing code...

The same can be said about hardware, they just put more Ghz on them.

Tuning individual components of software is software, as the title of the thread says. Null move, LMR etc. do not work in a vacuum.
I see your point but the comparison is not fair. You are not including bitbboards, all the nice eval things that you can do with them, etc. etc.
Yes, bitboards are old, but efficient implementations are not. We are not even taking into account the synergy between software and hardware.

Crafty98+Hardware98 ---> Elo A
Crafty08+Hardware08 ---> Elo B

Hardware + Software improvement = X = Elo B - Elo A

Crafty98+Hardware08 ---> Elo C

Hardware improvement = Y = Elo C - Elo A

Sofwtare improvement = X - Y

Is a better approximation to all the components combined, at least from your perspective That is still crude, because it does not take into account synergy, but at least we can start talking about it.

As a general statement, a more fair comparison will be "TopProgram98" and "TopProgram08", maybe SSDF has this already in some form.

Miguel

Miguel

Other suggestions???

BTW how many are surprised that removing both is only a 120 Elo loss???

Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software - test results

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software - test results

Re: Hardware vs Software

Re: Hardware vs Software - test results