Hardware vs Software

bob · Post by **bob** » Thu Dec 04, 2008 2:18 am

michiguel wrote:
bob wrote:
michiguel wrote:
bob wrote:Here are the results. Crafty-22.9X1 is normal crafty. 22.9X2 is normal except null-move completely commented out. 22.9X3 is normal except that LMR has been completely disabled. 22.9X4 is normal but with both LMR and null-move removed. the -1 or -2 just means run #1 or run#2 to give a fell for what kind of variation there is between runs.
Code: Select all
Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.1      2695    4    4 62256   65%  2585   20% 
   2 Toga2             2695    4    3 62256   64%  2585   20% 
   3 Crafty-22.9X1-1   2638    4    4 31128   51%  2629   21% 
   4 Crafty-22.9X1-2   2636    5    5 31128   51%  2629   21% 
   5 Fruit 2.1         2597    3    3 62256   52%  2585   22% 
   6 Crafty-22.9X2-2   2596    4    4 31128   45%  2629   21% 
   7 Crafty-22.9X3-2   2596    4    4 31128   46%  2629   20% 
   8 Crafty-22.9X2-1   2594    4    5 31128   45%  2629   21% 
   9 Crafty-22.9X3-1   2591    4    5 31128   45%  2629   20% 
  10 Glaurung 1.1 SMP  2530    3    4 62256   43%  2585   19% 
  11 Crafty-22.9X4-1   2517    5    5 31128   35%  2629   19% 
  12 Crafty-22.9X4-2   2514    5    5 31128   35%  2629   18% 
normal is roughly 2637 in this test. Removing null-move or LMR drops this by approximately 40 Elo. Removing both drops the rating by around 120 Elo.

Null-move and LMR are the two biggest search enhancements of the past 15 years. And they added +120 Elo. I could always try normal crafty, but take the NPS from about 1M on this hardware (I only test with 1 cpu here) and back it down to about 75K which is what I was getting in 1996 on a pentium pro 200, roughly a factor of 15x and see how that impacts performance. Although I should probably factor in the single-core pentium pro vs a quad-core xeon for today, which runs around 10M nps, as that is a more representative example of what hardware speeds have done since 1996. So 75K to 10M is a factor of 24 or so. But something tells me that factor of 24-25x is _way_ more than 120 Elo...
You may be right but this is not a valid comparison! you should compare the improvement between Crafty model 1996 vs. Crafty model 2008, not the contribution of two single techniques as implemented in 2008.
No idea what you are talking about. This was an absolutely perfect attempt to quantify the effect of null-move and LMR. Null-move existed in 1995. Null-move existed in 1990. So that is _barely_ an enhancement from the last 20 years. Beal's original paper was somewhere in the 1988 range.

The discussion was not about 1988 vs 2008 for a 20 year span, it was about specific techniques developed during that time frame and how much they actually improve a program, vs the speed differential between machines over the last 20 years. I shortened it to 1995 because Crafty was available then, and I have a detailed record of what has been done over the last 13 years....

What is the Elo difference between Crafty 1996 vs Crafty 2008 running in equal hardware? This is not the optimum either but it is closer to something more meaningful.

I can probably produce that result. Let me see what the oldest version I have is, and see if I can figure out when it was current. I certainly know when the Jakarta version was done, since we know when the WCCC in Jakarta was played. But that would be an invalid test as well, because software "improvements" in the present context means "new ideas developed since XXX". Many of Crafty's improvements have been the result of tuning, but based on old ideas. I do not call those "software improvements" because they are not "general enhancements for all programs (such as null-move or LMR is) but are specific to a specific program's previously existing code...
The same can be said about hardware, they just put more Ghz on them.

Tuning individual components of software is software, as the title of the thread says. Null move, LMR etc. do not work in a vacuum.
I see your point but the comparison is not fair. You are not including bitbboards, all the nice eval things that you can do with them, etc. etc.
Yes, bitboards are old, but efficient implementations are not. We are not even taking into account the synergy between software and hardware.

Crafty98+Hardware98 ---> Elo A
Crafty08+Hardware08 ---> Elo B

Hardware + Software improvement = X = Elo B - Elo A

Crafty98+Hardware08 ---> Elo C

Hardware improvement = Y = Elo C - Elo A

Sofwtare improvement = X - Y

Is a better approximation to all the components combined, at least from your perspective That is still crude, because it does not take into account synergy, but at least we can start talking about it.

As a general statement, a more fair comparison will be "TopProgram98" and "TopProgram08", maybe SSDF has this already in some form.

Miguel

Miguel

Other suggestions???

BTW how many are surprised that removing both is only a 120 Elo loss???

I was simply responding to Charles' request to test with/without null-move and LMR since they are two well-known software ideas although null-move is probably older than he suspects. I am trying to find a 1996 version of Crafty to run on the cluster at today's speeds to see how it does... I can then run it at 1996's speeds to get the Elo from purely hardware gains...

CRoberson · Post by **CRoberson** » Thu Dec 04, 2008 3:44 am

bob wrote:
CRoberson wrote:Here is what I meant by PV verification:
Code: Select all
    for all moves 
    {
        if first move
             v = - S(-beta,-alpha)
        else
        {
            v = -S(-alpha-1,-alpha)
            if (v>alpha) && (v<beta)
               v = -S(-beta,-alpha)
        }
    }

 
Change that to:
Code: Select all
      for all moves
      {
           v = -S(-beta,-alpha)
      }
I could do that but why? PVS existed in 1978 and was used in (at the time, Blitz, suggested by Murray Campbell). Ken Thompson used PVS in 1980 belle chess hardware. So that is 30 years old... I could tell you about the first program to ever use that in an ACM event, quite by accident, if you are interested...

I agree that it doesn't fit the timeline we've been discussing, but
I'd like to know the answer to that one. Also, I agree that a 1992-96
version of Crafty test could be very enlightening. IIRC, Crafty came out
in 1995?

Something I did last summer, I stripped out all the eval except
raw material value. Based on my tests (much fewer than yours),
the new version of Telepath dropped 500 rating points.

bob · Post by **bob** » Thu Dec 04, 2008 7:39 am

CRoberson wrote:
bob wrote:
CRoberson wrote:Here is what I meant by PV verification:
Code: Select all
    for all moves 
    {
        if first move
             v = - S(-beta,-alpha)
        else
        {
            v = -S(-alpha-1,-alpha)
            if (v>alpha) && (v<beta)
               v = -S(-beta,-alpha)
        }
    }

 
Change that to:
Code: Select all
      for all moves
      {
           v = -S(-beta,-alpha)
      }
I could do that but why? PVS existed in 1978 and was used in (at the time, Blitz, suggested by Murray Campbell). Ken Thompson used PVS in 1980 belle chess hardware. So that is 30 years old... I could tell you about the first program to ever use that in an ACM event, quite by accident, if you are interested...
I agree that it doesn't fit the timeline we've been discussing, but
I'd like to know the answer to that one. Also, I agree that a 1992-96
version of Crafty test could be very enlightening. IIRC, Crafty came out
in 1995?

Something I did last summer, I stripped out all the eval except
raw material value. Based on my tests (much fewer than yours),
the new version of Telepath dropped 500 rating points.

That test is not so easy to run. There are several places where Search() is recursively called, which means a significant number of changes to get rid of the PVS code is needed. However, I can run the evaluation test and will queue that up right now behind two other tests I have running. More on this one tomorrow morning...

Pablo Vazquez · Post by **Pablo Vazquez** » Thu Dec 04, 2008 11:45 am

If you find any old versions, could you also upload them to your ftp box if you don't mind?

Thanks

bob · Post by **bob** » Thu Dec 04, 2008 4:13 pm

I ran this overnight. I simply made Evaluate() return the material score only. It was almost exactly a 400 point drop in Elo from the version with the most recent evaluation.

Code: Select all

Crafty-22.9R01     2650    5    5 31128   51%  2644   21% 
Crafty-22.9R02     2261    5    6 31128    9%  2644    7%

Uri Blass · Post by **Uri Blass** » Thu Dec 04, 2008 4:50 pm

bob wrote:I ran this overnight. I simply made Evaluate() return the material score only. It was almost exactly a 400 point drop in Elo from the version with the most recent evaluation.
Code: Select all
Crafty-22.9R01     2650    5    5 31128   51%  2644   21% 
Crafty-22.9R02     2261    5    6 31128    9%  2644    7% 

I am surprised because I expected bigger difference.

I could expect something like this from piece square table evaluation(no pawn structure mobility or king safety) but I believe that the difference between material only evaluation and normal evaluation is something like 1000 elo and not in the order of 400 elo.

It may be interesting to see some games that Crafty(only material) could win.

Uri

hgm · Post by **hgm** » Thu Dec 04, 2008 5:12 pm

CRoberson wrote:Something I did last summer, I stripped out all the eval except raw material value. Based on my tests (much fewer than yours), the new version of Telepath dropped 500 rating points.

It is amazing how much you can recoup from that by not much more than a little bit of piece-square evaluation, as I learned when building micro-Max. A weak attraction to the center for the light pieces and King, discouragements of King moves in the middle game (which you could implement as a PST initialized in the root, where the King gets a bonus for being where it is in the root), and a hefty bonus for Pawns on 6th or 7th rank already leads to quite reasonable play. Reasonable Pawn structure is maintained by penalizing Pawn moves when a Pawn 2 files left or right is missing. (This is basically the only non-PST-like term.)

Tuning the piece values is quite critical, in such a material-dominated eval: it is very important that N> 3P, B > N and B+N > R+P. Having any of these reversed, or even equal, loses a lot of games.

Carey · Post by **Carey** » Thu Dec 04, 2008 5:19 pm

bob wrote:I ran this overnight. I simply made Evaluate() return the material score only. It was almost exactly a 400 point drop in Elo from the version with the most recent evaluation.
Code: Select all
Crafty-22.9R01     2650    5    5 31128   51%  2644   21% 
Crafty-22.9R02     2261    5    6 31128    9%  2644    7% 

Interesting testings.... I hope you continue to run tests to determine the aprox. value of different chess ideas. Something like this is really worth doing and publishing.

Could you please try Don Beal's random mobility estimator thing? Ever since you mentioned it I've been really curious as to how well that would reall work, compared to just material and / or some other simple & quick mobility scoring method.

Anyway, what were the time conditions for this and the hardware used?

And do you think these results would carry over to slower games?

Carey

Laskos · Post by **Laskos** » Thu Dec 04, 2008 5:23 pm

CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).

I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.

Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576

So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?

Also, I believe eval improvements have caused an improvement in
rating scores.

An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.

During the last 12 years or so Hardware improvement was about 400 Elo points, Software improvement 600-700 points (Rybka). I bet R.Hyatt is right in his statement if talking about Crafty.

Kai

mhull · Post by **mhull** » Thu Dec 04, 2008 5:43 pm

Laskos wrote:
CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).

I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.

Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576

So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?

Also, I believe eval improvements have caused an improvement in
rating scores.

An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.

During the last 12 years or so Hardware improvement was about 400 Elo points, Software improvement 600-700 points (Rybka). I bet R.Hyatt is right in his statement if talking about Crafty.

Kai

A more harmonious (and hard to find) tuning combination of commonly known elements is no more a software improvement than a more pleasing (and hard to find) combination of dials on a Moog is a synthesizer improvement.

Hardware vs Software

Re: Hardware vs Software - test results

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software - test results

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software