While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).
I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.
Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576
So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?
Also, I believe eval improvements have caused an improvement in
rating scores.
An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.
Hardware vs Software
Moderator: Ras
-
- Posts: 2080
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Hardware vs Software
I believe my statement was more along the lines "Hardware has had a _larger_ influence over program performance increases over the past 20 years than the software has." This is based on my cluster testing where I now know just what some of the "great enhancements" have brought. If you'd like to pick one "revolutionary idea" (null-move? LMR? check extension? some evaluation concept? etc...) I can give it a test using crafty with and without, assuming crafty uses the idea you want to compare. I have not found a single +100 elo idea i Crafty. LMR is a modest improvement. I don't recall the exact amount at present but could compute it. Remember, just because we are physically searching deeper, today's "ply" is not comparable to "ply" of 20 years ago. Todays "plies" have significantly more errors in them due to various types of pruning and reductions going on...CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).
I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.
Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576
So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?
Also, I believe eval improvements have caused an improvement in
rating scores.
An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.
I'll test whatever you think is the "biggest" to see what happens...
-
- Posts: 2080
- Joined: Mon Mar 13, 2006 2:31 am
- Location: North Carolina, USA
Re: Hardware vs Software
Yes, we did fall into the discussion of what is each part worth.
So, here is what I'd like to see tested.
1) Crafty without LMR and without Null Move.
2) Crafty without LMR and with Null Move.
3) Crafty with LMR and without Null Move
4) Crafty with both.
I think we could leave all other Crafty parameters as is which means
the 4th experiment is your current base code.
Once these 4 experiments are done. I'd like to see the PV verification
prunning tested - using a 0 window on the sibling nodes.
So, here is what I'd like to see tested.
1) Crafty without LMR and without Null Move.
2) Crafty without LMR and with Null Move.
3) Crafty with LMR and without Null Move
4) Crafty with both.
I think we could leave all other Crafty parameters as is which means
the 4th experiment is your current base code.
Once these 4 experiments are done. I'd like to see the PV verification
prunning tested - using a 0 window on the sibling nodes.
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Hardware vs Software
I can run those. I think I might start this in a few minutes. I will run two 32,000 game tests for each of the above 4 scenarios. Note that these will be very fast games, which may well exaggerate the effectiveness of any pruning idea, but at least these tests will establish an estimated upper bound on the improvement each provides. As far as the PV verification, this is not something I do in crafty and am not sure exactly what this is unless it is known by another name...CRoberson wrote:Yes, we did fall into the discussion of what is each part worth.
So, here is what I'd like to see tested.
1) Crafty without LMR and without Null Move.
2) Crafty without LMR and with Null Move.
3) Crafty with LMR and without Null Move
4) Crafty with both.
I think we could leave all other Crafty parameters as is which means
the 4th experiment is your current base code.
Once these 4 experiments are done. I'd like to see the PV verification
prunning tested - using a 0 window on the sibling nodes.
I have the test running. 8 x 32,000 games will take around 8 hours so I should have the results around 11:00pm CST. I will post them when they are done, unless the cluster slows down a bit due to other users and it takes a little longer than expected...
-
- Posts: 10682
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Hardware vs Software
fast games can also reduce the effect of pruning and we do not knowbob wrote:I can run those. I think I might start this in a few minutes. I will run two 32,000 game tests for each of the above 4 scenarios. Note that these will be very fast games, which may well exaggerate the effectiveness of any pruning idea, but at least these tests will establish an estimated upper bound on the improvement each provides. As far as the PV verification, this is not something I do in crafty and am not sure exactly what this is unless it is known by another name...CRoberson wrote:Yes, we did fall into the discussion of what is each part worth.
So, here is what I'd like to see tested.
1) Crafty without LMR and without Null Move.
2) Crafty without LMR and with Null Move.
3) Crafty with LMR and without Null Move
4) Crafty with both.
I think we could leave all other Crafty parameters as is which means
the 4th experiment is your current base code.
Once these 4 experiments are done. I'd like to see the PV verification
prunning tested - using a 0 window on the sibling nodes.
I have the test running. 8 x 32,000 games will take around 8 hours so I should have the results around 11:00pm CST. I will post them when they are done, unless the cluster slows down a bit due to other users and it takes a little longer than expected...
if it is going to reduce or increase the effect without testing.
If you take an extreme case(that of course does not happen) then it is obvious that null move is not used at depth=1 so it does not change nothing if Crafty cannot get more than depth 1.
Uri
-
- Posts: 10682
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Hardware vs Software
CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).
I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.
Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576
So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?
Also, I believe eval improvements have caused an improvement in
rating scores.
An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.
Branching factor proves nothing because programs that do more pruning play weaker at fixed depth but I can say not based on branching factor that the improvement in software in the last years is very big and bigger than the improvement in hardware(not sure about improvement since 1992 because it is not clear how we define it but sure about improvement from 2005 to 2008).
Note that the tests of Bob can show only that hardware helped more than software for Crafty.
Tests of the SSDF showed the following results.
Rybka 3 A1200 - Deep Shredder 11 Q6600 20-19
Rybka 3 A1200 -Zappa Mexico II Q6600 20-20
A1200 = 1 x 1.2 GHz
Q6600 = 4 x 2.4 GHz
Note that both Zappa and Shredder are clearly stronger than Fruit that was the leading program in 2005 for single processor machines
I think that we can safely say that the software improvement in the last 3 years were more than 10:1 and I do not see hardware improvement of 10:1 in the last 3 years.
Uri
-
- Posts: 10682
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: Hardware vs Software
Your cluster tests can prove that hardware helped Crafty more than software.bob wrote:I believe my statement was more along the lines "Hardware has had a _larger_ influence over program performance increases over the past 20 years than the software has." This is based on my cluster testing where I now know just what some of the "great enhancements" have brought. If you'd like to pick one "revolutionary idea" (null-move? LMR? check extension? some evaluation concept? etc...) I can give it a test using crafty with and without, assuming crafty uses the idea you want to compare. I have not found a single +100 elo idea i Crafty. LMR is a modest improvement. I don't recall the exact amount at present but could compute it. Remember, just because we are physically searching deeper, today's "ply" is not comparable to "ply" of 20 years ago. Todays "plies" have significantly more errors in them due to various types of pruning and reductions going on...CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).
I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.
Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576
So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?
Also, I believe eval improvements have caused an improvement in
rating scores.
An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.
I'll test whatever you think is the "biggest" to see what happens...
They cannot prove that it is the case in general.
Uri
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Hardware vs Software
The only problem with that is that I have run the necessary tests for null-move and found that as the depth increases, the Elo benefit reduces. Hence my "upper bound" comment.Uri Blass wrote:fast games can also reduce the effect of pruning and we do not knowbob wrote:I can run those. I think I might start this in a few minutes. I will run two 32,000 game tests for each of the above 4 scenarios. Note that these will be very fast games, which may well exaggerate the effectiveness of any pruning idea, but at least these tests will establish an estimated upper bound on the improvement each provides. As far as the PV verification, this is not something I do in crafty and am not sure exactly what this is unless it is known by another name...CRoberson wrote:Yes, we did fall into the discussion of what is each part worth.
So, here is what I'd like to see tested.
1) Crafty without LMR and without Null Move.
2) Crafty without LMR and with Null Move.
3) Crafty with LMR and without Null Move
4) Crafty with both.
I think we could leave all other Crafty parameters as is which means
the 4th experiment is your current base code.
Once these 4 experiments are done. I'd like to see the PV verification
prunning tested - using a 0 window on the sibling nodes.
I have the test running. 8 x 32,000 games will take around 8 hours so I should have the results around 11:00pm CST. I will post them when they are done, unless the cluster slows down a bit due to other users and it takes a little longer than expected...
if it is going to reduce or increase the effect without testing.
If you take an extreme case(that of course does not happen) then it is obvious that null move is not used at depth=1 so it does not change nothing if Crafty cannot get more than depth 1.
Uri
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Hardware vs Software - test results
Here are the results. Crafty-22.9X1 is normal crafty. 22.9X2 is normal except null-move completely commented out. 22.9X3 is normal except that LMR has been completely disabled. 22.9X4 is normal but with both LMR and null-move removed. the -1 or -2 just means run #1 or run#2 to give a fell for what kind of variation there is between runs.
normal is roughly 2637 in this test. Removing null-move or LMR drops this by approximately 40 Elo. Removing both drops the rating by around 120 Elo.
Null-move and LMR are the two biggest search enhancements of the past 15 years. And they added +120 Elo. I could always try normal crafty, but take the NPS from about 1M on this hardware (I only test with 1 cpu here) and back it down to about 75K which is what I was getting in 1996 on a pentium pro 200, roughly a factor of 15x and see how that impacts performance. Although I should probably factor in the single-core pentium pro vs a quad-core xeon for today, which runs around 10M nps, as that is a more representative example of what hardware speeds have done since 1996. So 75K to 10M is a factor of 24 or so. But something tells me that factor of 24-25x is _way_ more than 120 Elo...
Other suggestions???
BTW how many are surprised that removing both is only a 120 Elo loss???
Code: Select all
Rank Name Elo + - games score oppo. draws
1 Glaurung 2.1 2695 4 4 62256 65% 2585 20%
2 Toga2 2695 4 3 62256 64% 2585 20%
3 Crafty-22.9X1-1 2638 4 4 31128 51% 2629 21%
4 Crafty-22.9X1-2 2636 5 5 31128 51% 2629 21%
5 Fruit 2.1 2597 3 3 62256 52% 2585 22%
6 Crafty-22.9X2-2 2596 4 4 31128 45% 2629 21%
7 Crafty-22.9X3-2 2596 4 4 31128 46% 2629 20%
8 Crafty-22.9X2-1 2594 4 5 31128 45% 2629 21%
9 Crafty-22.9X3-1 2591 4 5 31128 45% 2629 20%
10 Glaurung 1.1 SMP 2530 3 4 62256 43% 2585 19%
11 Crafty-22.9X4-1 2517 5 5 31128 35% 2629 19%
12 Crafty-22.9X4-2 2514 5 5 31128 35% 2629 18%
Null-move and LMR are the two biggest search enhancements of the past 15 years. And they added +120 Elo. I could always try normal crafty, but take the NPS from about 1M on this hardware (I only test with 1 cpu here) and back it down to about 75K which is what I was getting in 1996 on a pentium pro 200, roughly a factor of 15x and see how that impacts performance. Although I should probably factor in the single-core pentium pro vs a quad-core xeon for today, which runs around 10M nps, as that is a more representative example of what hardware speeds have done since 1996. So 75K to 10M is a factor of 24 or so. But something tells me that factor of 24-25x is _way_ more than 120 Elo...
Other suggestions???
BTW how many are surprised that removing both is only a 120 Elo loss???
-
- Posts: 20943
- Joined: Mon Feb 27, 2006 7:30 pm
- Location: Birmingham, AL
Re: Hardware vs Software
I'd claim Crafty is pretty representative of _most_ programs of today, null-move R=3, LMR, check extensions, iterative deepening, PVS, simple q-search + checks.. Etc...Uri Blass wrote:Your cluster tests can prove that hardware helped Crafty more than software.bob wrote:I believe my statement was more along the lines "Hardware has had a _larger_ influence over program performance increases over the past 20 years than the software has." This is based on my cluster testing where I now know just what some of the "great enhancements" have brought. If you'd like to pick one "revolutionary idea" (null-move? LMR? check extension? some evaluation concept? etc...) I can give it a test using crafty with and without, assuming crafty uses the idea you want to compare. I have not found a single +100 elo idea i Crafty. LMR is a modest improvement. I don't recall the exact amount at present but could compute it. Remember, just because we are physically searching deeper, today's "ply" is not comparable to "ply" of 20 years ago. Todays "plies" have significantly more errors in them due to various types of pruning and reductions going on...CRoberson wrote:While at the 2008 ACCA Pan American Computer Chess Championships,
Bob claimed he didn't believe software played a serious role in all the
rating improvements we've seen. He thought hardware deserved the
credit (assuming I understood the statement correctly. We were jumping
across several subjects and back that night.).
I beleive software has had much to do with it for several reasons.
I will start with one. The EBF with only MiniMax is 40. With Alpha-Beta
pruning, it drops to 6. In the early 1990's, the EBF was 4. Now, it is 2.
Dropping the EBF from 2 to 4 is huge. Lets look at a 20 ply search.
The speedup of EBF=2 vs EBF=4 is:
4^20/2^20 = 2^20 = 1,048,576
So, that is over a 1 million x speed up. Has hardware produced that much
since 1992?
Also, I believe eval improvements have caused an improvement in
rating scores.
An example of nonhardware improvements is on the SSDF rating list.
Rybka 1.0 beta score 2775 on a 450 MHz AMD.
I'll test whatever you think is the "biggest" to see what happens...
They cannot prove that it is the case in general.
Uri
Now if you want to claim most programs are far different from Crafty with respect to search space/techniques, feel free to do so. But I doubt that will convince anyone...