Hardware vs Software

CRoberson · Post by **CRoberson** » Tue Dec 02, 2008 7:33 pm

While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).

I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.

Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576

So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?

Also, I believe eval improvements have caused an improvement in
rating scores.

An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.

bob · Post by **bob** » Tue Dec 02, 2008 8:16 pm

CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).

I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.

Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576

So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?

Also, I believe eval improvements have caused an improvement in
rating scores.

An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.

I believe my statement was more along the lines "Hardware has had a _larger_ influence over program performance increases over the past 20 years than the software has." This is based on my cluster testing where I now know just what some of the "great enhancements" have brought. If you'd like to pick one "revolutionary idea" (null-move? LMR? check extension? some evaluation concept? etc...) I can give it a test using crafty with and without, assuming crafty uses the idea you want to compare. I have not found a single +100 elo idea i Crafty. LMR is a modest improvement. I don't recall the exact amount at present but could compute it. Remember, just because we are physically searching deeper, today's "ply" is not comparable to "ply" of 20 years ago. Todays "plies" have significantly more errors in them due to various types of pruning and reductions going on...

I'll test whatever you think is the "biggest" to see what happens...

CRoberson · Post by **CRoberson** » Tue Dec 02, 2008 8:52 pm

Yes, we did fall into the discussion of what is each part worth.

So, here is what I'd like to see tested.

1) Crafty without LMR and without Null Move.
2) Crafty without LMR and with Null Move.
3) Crafty with LMR and without Null Move
4) Crafty with both.

I think we could leave all other Crafty parameters as is which means
the 4th experiment is your current base code.

Once these 4 experiments are done. I'd like to see the PV verification
prunning tested - using a 0 window on the sibling nodes.

bob · Post by **bob** » Tue Dec 02, 2008 10:07 pm

CRoberson wrote:Yes, we did fall into the discussion of what is each part worth.

So, here is what I'd like to see tested.

1) Crafty without LMR and without Null Move.
2) Crafty without LMR and with Null Move.
3) Crafty with LMR and without Null Move
4) Crafty with both.

I think we could leave all other Crafty parameters as is which means
the 4th experiment is your current base code.

Once these 4 experiments are done. I'd like to see the PV verification
prunning tested - using a 0 window on the sibling nodes.

I can run those. I think I might start this in a few minutes. I will run two 32,000 game tests for each of the above 4 scenarios. Note that these will be very fast games, which may well exaggerate the effectiveness of any pruning idea, but at least these tests will establish an estimated upper bound on the improvement each provides. As far as the PV verification, this is not something I do in crafty and am not sure exactly what this is unless it is known by another name...

I have the test running. 8 x 32,000 games will take around 8 hours so I should have the results around 11:00pm CST. I will post them when they are done, unless the cluster slows down a bit due to other users and it takes a little longer than expected...

Uri Blass · Post by **Uri Blass** » Tue Dec 02, 2008 11:16 pm

bob wrote:
CRoberson wrote:Yes, we did fall into the discussion of what is each part worth.

So, here is what I'd like to see tested.

1) Crafty without LMR and without Null Move.
2) Crafty without LMR and with Null Move.
3) Crafty with LMR and without Null Move
4) Crafty with both.

I think we could leave all other Crafty parameters as is which means
the 4th experiment is your current base code.

Once these 4 experiments are done. I'd like to see the PV verification
prunning tested - using a 0 window on the sibling nodes.
I can run those. I think I might start this in a few minutes. I will run two 32,000 game tests for each of the above 4 scenarios. Note that these will be very fast games, which may well exaggerate the effectiveness of any pruning idea, but at least these tests will establish an estimated upper bound on the improvement each provides. As far as the PV verification, this is not something I do in crafty and am not sure exactly what this is unless it is known by another name...

I have the test running. 8 x 32,000 games will take around 8 hours so I should have the results around 11:00pm CST. I will post them when they are done, unless the cluster slows down a bit due to other users and it takes a little longer than expected...

fast games can also reduce the effect of pruning and we do not know
if it is going to reduce or increase the effect without testing.

If you take an extreme case(that of course does not happen) then it is obvious that null move is not used at depth=1 so it does not change nothing if Crafty cannot get more than depth 1.

Uri

Uri Blass · Post by **Uri Blass** » Tue Dec 02, 2008 11:46 pm

CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).

I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.

Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576

So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?

Also, I believe eval improvements have caused an improvement in
rating scores.

An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.

Branching factor proves nothing because programs that do more pruning play weaker at fixed depth but I can say not based on branching factor that the improvement in software in the last years is very big and bigger than the improvement in hardware(not sure about improvement since 1992 because it is not clear how we define it but sure about improvement from 2005 to 2008).

Note that the tests of Bob can show only that hardware helped more than software for Crafty.

Tests of the SSDF showed the following results.

Rybka 3 A1200 - Deep Shredder 11 Q6600 20-19
Rybka 3 A1200 -Zappa Mexico II Q6600 20-20

A1200 = 1 x 1.2 GHz
Q6600 = 4 x 2.4 GHz

Note that both Zappa and Shredder are clearly stronger than Fruit that was the leading program in 2005 for single processor machines

I think that we can safely say that the software improvement in the last 3 years were more than 10:1 and I do not see hardware improvement of 10:1 in the last 3 years.

Uri

Uri Blass · Post by **Uri Blass** » Tue Dec 02, 2008 11:54 pm

bob wrote:
CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).

I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.

Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576

So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?

Also, I believe eval improvements have caused an improvement in
rating scores.

An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.
I believe my statement was more along the lines "Hardware has had a _larger_ influence over program performance increases over the past 20 years than the software has." This is based on my cluster testing where I now know just what some of the "great enhancements" have brought. If you'd like to pick one "revolutionary idea" (null-move? LMR? check extension? some evaluation concept? etc...) I can give it a test using crafty with and without, assuming crafty uses the idea you want to compare. I have not found a single +100 elo idea i Crafty. LMR is a modest improvement. I don't recall the exact amount at present but could compute it. Remember, just because we are physically searching deeper, today's "ply" is not comparable to "ply" of 20 years ago. Todays "plies" have significantly more errors in them due to various types of pruning and reductions going on...

I'll test whatever you think is the "biggest" to see what happens...

Your cluster tests can prove that hardware helped Crafty more than software.
They cannot prove that it is the case in general.

Uri

bob · Post by **bob** » Wed Dec 03, 2008 7:22 am

Uri Blass wrote:
bob wrote:
CRoberson wrote:Yes, we did fall into the discussion of what is each part worth.

So, here is what I'd like to see tested.

1) Crafty without LMR and without Null Move.
2) Crafty without LMR and with Null Move.
3) Crafty with LMR and without Null Move
4) Crafty with both.

I think we could leave all other Crafty parameters as is which means
the 4th experiment is your current base code.

Once these 4 experiments are done. I'd like to see the PV verification
prunning tested - using a 0 window on the sibling nodes.
I can run those. I think I might start this in a few minutes. I will run two 32,000 game tests for each of the above 4 scenarios. Note that these will be very fast games, which may well exaggerate the effectiveness of any pruning idea, but at least these tests will establish an estimated upper bound on the improvement each provides. As far as the PV verification, this is not something I do in crafty and am not sure exactly what this is unless it is known by another name...

I have the test running. 8 x 32,000 games will take around 8 hours so I should have the results around 11:00pm CST. I will post them when they are done, unless the cluster slows down a bit due to other users and it takes a little longer than expected...
fast games can also reduce the effect of pruning and we do not know
if it is going to reduce or increase the effect without testing.

If you take an extreme case(that of course does not happen) then it is obvious that null move is not used at depth=1 so it does not change nothing if Crafty cannot get more than depth 1.

Uri

The only problem with that is that I have run the necessary tests for null-move and found that as the depth increases, the Elo benefit reduces. Hence my "upper bound" comment.

bob · Post by **bob** » Wed Dec 03, 2008 7:41 am

Here are the results. Crafty-22.9X1 is normal crafty. 22.9X2 is normal except null-move completely commented out. 22.9X3 is normal except that LMR has been completely disabled. 22.9X4 is normal but with both LMR and null-move removed. the -1 or -2 just means run #1 or run#2 to give a fell for what kind of variation there is between runs.

Code: Select all

Rank Name               Elo    +    - games score oppo. draws
   1 Glaurung 2.1      2695    4    4 62256   65%  2585   20% 
   2 Toga2             2695    4    3 62256   64%  2585   20% 
   3 Crafty-22.9X1-1   2638    4    4 31128   51%  2629   21% 
   4 Crafty-22.9X1-2   2636    5    5 31128   51%  2629   21% 
   5 Fruit 2.1         2597    3    3 62256   52%  2585   22% 
   6 Crafty-22.9X2-2   2596    4    4 31128   45%  2629   21% 
   7 Crafty-22.9X3-2   2596    4    4 31128   46%  2629   20% 
   8 Crafty-22.9X2-1   2594    4    5 31128   45%  2629   21% 
   9 Crafty-22.9X3-1   2591    4    5 31128   45%  2629   20% 
  10 Glaurung 1.1 SMP  2530    3    4 62256   43%  2585   19% 
  11 Crafty-22.9X4-1   2517    5    5 31128   35%  2629   19% 
  12 Crafty-22.9X4-2   2514    5    5 31128   35%  2629   18%

normal is roughly 2637 in this test. Removing null-move or LMR drops this by approximately 40 Elo. Removing both drops the rating by around 120 Elo.

Null-move and LMR are the two biggest search enhancements of the past 15 years. And they added +120 Elo. I could always try normal crafty, but take the NPS from about 1M on this hardware (I only test with 1 cpu here) and back it down to about 75K which is what I was getting in 1996 on a pentium pro 200, roughly a factor of 15x and see how that impacts performance. Although I should probably factor in the single-core pentium pro vs a quad-core xeon for today, which runs around 10M nps, as that is a more representative example of what hardware speeds have done since 1996. So 75K to 10M is a factor of 24 or so. But something tells me that factor of 24-25x is _way_ more than 120 Elo...

Other suggestions???

BTW how many are surprised that removing both is only a 120 Elo loss???

bob · Post by **bob** » Wed Dec 03, 2008 7:43 am

Uri Blass wrote:
bob wrote:
CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).

I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.

Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576

So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?

Also, I believe eval improvements have caused an improvement in
rating scores.

An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.
I believe my statement was more along the lines "Hardware has had a _larger_ influence over program performance increases over the past 20 years than the software has." This is based on my cluster testing where I now know just what some of the "great enhancements" have brought. If you'd like to pick one "revolutionary idea" (null-move? LMR? check extension? some evaluation concept? etc...) I can give it a test using crafty with and without, assuming crafty uses the idea you want to compare. I have not found a single +100 elo idea i Crafty. LMR is a modest improvement. I don't recall the exact amount at present but could compute it. Remember, just because we are physically searching deeper, today's "ply" is not comparable to "ply" of 20 years ago. Todays "plies" have significantly more errors in them due to various types of pruning and reductions going on...

I'll test whatever you think is the "biggest" to see what happens...
Your cluster tests can prove that hardware helped Crafty more than software.
They cannot prove that it is the case in general.

Uri

I'd claim Crafty is pretty representative of _most_ programs of today, null-move R=3, LMR, check extensions, iterative deepening, PVS, simple q-search + checks.. Etc...

Now if you want to claim most programs are far different from Crafty with respect to search space/techniques, feel free to do so. But I doubt that will convince anyone...

Hardware vs Software

Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software - test results

Re: Hardware vs Software