old crafty vs new crafty on new hardware.

bob · Post by **bob** » Sat Sep 11, 2010 3:38 am

Finally got the thing to run. Not exactly a fair comparison yet, as the new version is compiled using PGO while the old version is not. I will work on that next. But for the results... this is using just one cpu on the E5345 box I mentioned. I picked the first position from my test file and ran both. Had to (obviously) use different depths since the old program doesn't do LMR or anywhere near the forward-pruning 23.4 is doing. But check out the nps numbers:

log.001: time=9.63 mat=0 n=38507117 fh=91% nps=4.0M
log.002: time: 6.39 cpu:100% mat:0 n:25801056 nps:4017283

For this position, the NPS is almost exactly the same, which is pretty damned good. Likely would mean that the old program can reach 4.5M nps with PGO (assuming 10%, it could be a bit more). I sort of expected the old version to be faster since the new version has a bit more in the eval, So they end up close.

And when you add in SMP, where both versions go to over 30M nodes per second, it seems that at least for Crafty, that 1000x number is correct. Got some cleanup to do (old version only uses protocol version 1 stuff, but my referee expects "move xxx" so I have to get that to work next.

I will add that for the above position, new version (log.001) searched to depth=19, old version (log.002) searched to depth=12. Tried to find something close to comparable. Quite a difference in depth, but the plies are nowhere near equivalent. Be interesting to see how this old version performs on the cluster test...

bob · Post by **bob** » Sat Sep 11, 2010 3:43 am

bob wrote:Finally got the thing to run. Not exactly a fair comparison yet, as the new version is compiled using PGO while the old version is not. I will work on that next. But for the results... this is using just one cpu on the E5345 box I mentioned. I picked the first position from my test file and ran both. Had to (obviously) use different depths since the old program doesn't do LMR or anywhere near the forward-pruning 23.4 is doing. But check out the nps numbers:

log.001: time=9.63 mat=0 n=38507117 fh=91% nps=4.0M
log.002: time: 6.39 cpu:100% mat:0 n:25801056 nps:4017283

For this position, the NPS is almost exactly the same, which is pretty damned good. Likely would mean that the old program can reach 4.5M nps with PGO (assuming 10%, it could be a bit more). I sort of expected the old version to be faster since the new version has a bit more in the eval, So they end up close.

And when you add in SMP, where both versions go to over 30M nodes per second, it seems that at least for Crafty, that 1000x number is correct. Got some cleanup to do (old version only uses protocol version 1 stuff, but my referee expects "move xxx" so I have to get that to work next.

I will add that for the above position, new version (log.001) searched to depth=19, old version (log.002) searched to depth=12. Tried to find something close to comparable. Quite a difference in depth, but the plies are nowhere near equivalent. Be interesting to see how this old version performs on the cluster test...

Ack. More than I thought. Need some protover 2 stuff (myname) which my referee needs for the pgn. Old crafty only used "setboard" while new will accept a straight FEN string by itself. Referee does not send setboard, looks easier to fix referee than old version. Most likely there are other things as well. Looks like something to play around with for the weekend.. Really want to get an Elo for this 1995 version, if I can...

mhull · Post by **mhull** » Sat Sep 11, 2010 6:37 am

bob wrote:
bob wrote:Finally got the thing to run. Not exactly a fair comparison yet, as the new version is compiled using PGO while the old version is not. I will work on that next. But for the results... this is using just one cpu on the E5345 box I mentioned. I picked the first position from my test file and ran both. Had to (obviously) use different depths since the old program doesn't do LMR or anywhere near the forward-pruning 23.4 is doing. But check out the nps numbers:

log.001: time=9.63 mat=0 n=38507117 fh=91% nps=4.0M
log.002: time: 6.39 cpu:100% mat:0 n:25801056 nps:4017283

For this position, the NPS is almost exactly the same, which is pretty damned good. Likely would mean that the old program can reach 4.5M nps with PGO (assuming 10%, it could be a bit more). I sort of expected the old version to be faster since the new version has a bit more in the eval, So they end up close.

And when you add in SMP, where both versions go to over 30M nodes per second, it seems that at least for Crafty, that 1000x number is correct. Got some cleanup to do (old version only uses protocol version 1 stuff, but my referee expects "move xxx" so I have to get that to work next.

I will add that for the above position, new version (log.001) searched to depth=19, old version (log.002) searched to depth=12. Tried to find something close to comparable. Quite a difference in depth, but the plies are nowhere near equivalent. Be interesting to see how this old version performs on the cluster test...
Ack. More than I thought. Need some protover 2 stuff (myname) which my referee needs for the pgn. Old crafty only used "setboard" while new will accept a straight FEN string by itself. Referee does not send setboard, looks easier to fix referee than old version. Most likely there are other things as well. Looks like something to play around with for the weekend.. Really want to get an Elo for this 1995 version, if I can...

What version is old crafty, 9.x or thereabouts or earlier than that?

bob · Post by **bob** » Sat Sep 11, 2010 4:25 pm

mhull wrote:
bob wrote:
bob wrote:Finally got the thing to run. Not exactly a fair comparison yet, as the new version is compiled using PGO while the old version is not. I will work on that next. But for the results... this is using just one cpu on the E5345 box I mentioned. I picked the first position from my test file and ran both. Had to (obviously) use different depths since the old program doesn't do LMR or anywhere near the forward-pruning 23.4 is doing. But check out the nps numbers:

log.001: time=9.63 mat=0 n=38507117 fh=91% nps=4.0M
log.002: time: 6.39 cpu:100% mat:0 n:25801056 nps:4017283

For this position, the NPS is almost exactly the same, which is pretty damned good. Likely would mean that the old program can reach 4.5M nps with PGO (assuming 10%, it could be a bit more). I sort of expected the old version to be faster since the new version has a bit more in the eval, So they end up close.

And when you add in SMP, where both versions go to over 30M nodes per second, it seems that at least for Crafty, that 1000x number is correct. Got some cleanup to do (old version only uses protocol version 1 stuff, but my referee expects "move xxx" so I have to get that to work next.

I will add that for the above position, new version (log.001) searched to depth=19, old version (log.002) searched to depth=12. Tried to find something close to comparable. Quite a difference in depth, but the plies are nowhere near equivalent. Be interesting to see how this old version performs on the cluster test...
Ack. More than I thought. Need some protover 2 stuff (myname) which my referee needs for the pgn. Old crafty only used "setboard" while new will accept a straight FEN string by itself. Referee does not send setboard, looks easier to fix referee than old version. Most likely there are other things as well. Looks like something to play around with for the weekend.. Really want to get an Elo for this 1995 version, if I can...
What version is old crafty, 9.x or thereabouts or earlier than that?

This is 10.18, which is all I have. This played in the 1996 WMCCC event during the Summer (Jakarta event). Back in 1995 I had the complete disk failure that lost all old versions, and we discovered that our tape backup system was merrily writing backup tapes that could not be read. Versions thru 9 were done very early in 1995. If you look at the comments, most new versions (major versions) were done quickly as major features were added.,..

Version 10.0 was a new book format (learning etc) and was started in August/September 1995. In looking at the comments, most changes were related to that. We didn't release 10.18 until Jakarta was done, and at that point the new versions started to slow down. Thru the middle of the 10.x series, I was releasing a new version almost daily, fixing bugs or adding features that were requested, many of which did not improve the chess playing (annotate code, analyze mode for analysis, etc.)

bob · Post by **bob** » Sat Sep 11, 2010 7:24 pm

This is quite early, but is perhaps a bit surprising. Looks like I have this 10x version working on the cluster (will have to wait for a complete run and check the PGN for oddities to make sure it is not losing on time excessively or anything). Here is results compared to 23.4, and some of the lower-rated programs in my test group:

Code: Select all

   Crafty-23.4        2703    4    4 30000   66%  2579   22% 
   Crafty-23.3        2693    4    4 30000   65%  2579   22% 
   Crafty-23.1        2622    4    4 30000   55%  2579   23% 
   Glaurung 2.2       2606    3    3 60277   46%  2636   22% 
   Toga2              2599    3    3 60275   45%  2636   23% 
   Fruit 2.1          2501    3    3 60248   32%  2636   21% 
   Glaurung 1.1 SMP   2444    3    3 60267   26%  2636   17% 
   Crafty-10.18       2326   19   19  1327   20%  2580   14%

I was thinking this would be much worse. To clarify what the above is...

Everything is running on our cluster. This is the cluster I have used to post _all_ results here in recent years, it is hardware about 4 years old as previously mentioned. Crafty-10.18 is about 10% slower than it should be as I have yet to tackle the PGO stuff. Took a lot of work to make the old version work with more modern xboard protocol. Had a lot of fun with force and such.

I'll report the final results for this run, although this will not be the overall "final results." Got to make sure nothing odd is happening in the PGN, and then get the PGO working.

More later, but at least it seems to be playing... All this really measures is "how far behind is 1995 Crafty, giving everyone equal (and modern) hardware. I'd suspect it would not be as far behind if everyone was on a P5/90, will work on that angle later.

bob · Post by **bob** » Sat Sep 11, 2010 8:32 pm

Did find one small bug and have re-started. En Passant target changed. In 1995 my FEN parser assumed that the target was the square the pawn stopped on, not the square the pawn passed over. This caused an occasional time loss as the few positions with EP captures would cause Crafty to lose on time. There were not many, statistically, but I have fixed it and have re-started the test. Will let it run for 15 minutes or so to see if I see any other time losses that should not happen in an increment game...

bob · Post by **bob** » Sat Sep 11, 2010 9:10 pm

bob wrote:Did find one small bug and have re-started. En Passant target changed. In 1995 my FEN parser assumed that the target was the square the pawn stopped on, not the square the pawn passed over. This caused an occasional time loss as the few positions with EP captures would cause Crafty to lose on time. There were not many, statistically, but I have fixed it and have re-started the test. Will let it run for 15 minutes or so to see if I see any other time losses that should not happen in an increment game...

Looks pretty good so far. 6000+ games, 6 lost on time, all by old crafty. In two of those it was winning, 1 was lost, and 3 were just games. Not going to try to fix this as this is what was in 1995...

Don · Post by **Don** » Sun Sep 12, 2010 1:17 am

Ok, Are these running head to head with no time handicap?

bob wrote:This is quite early, but is perhaps a bit surprising. Looks like I have this 10x version working on the cluster (will have to wait for a complete run and check the PGN for oddities to make sure it is not losing on time excessively or anything). Here is results compared to 23.4, and some of the lower-rated programs in my test group:
Code: Select all
   Crafty-23.4        2703    4    4 30000   66%  2579   22% 
   Crafty-23.3        2693    4    4 30000   65%  2579   22% 
   Crafty-23.1        2622    4    4 30000   55%  2579   23% 
   Glaurung 2.2       2606    3    3 60277   46%  2636   22% 
   Toga2              2599    3    3 60275   45%  2636   23% 
   Fruit 2.1          2501    3    3 60248   32%  2636   21% 
   Glaurung 1.1 SMP   2444    3    3 60267   26%  2636   17% 
   Crafty-10.18       2326   19   19  1327   20%  2580   14% 
I was thinking this would be much worse. To clarify what the above is...

Everything is running on our cluster. This is the cluster I have used to post _all_ results here in recent years, it is hardware about 4 years old as previously mentioned. Crafty-10.18 is about 10% slower than it should be as I have yet to tackle the PGO stuff. Took a lot of work to make the old version work with more modern xboard protocol. Had a lot of fun with force and such.

I'll report the final results for this run, although this will not be the overall "final results." Got to make sure nothing odd is happening in the PGN, and then get the PGO working.

More later, but at least it seems to be playing... All this really measures is "how far behind is 1995 Crafty, giving everyone equal (and modern) hardware. I'd suspect it would not be as far behind if everyone was on a P5/90, will work on that angle later.

Don · Post by **Don** » Sun Sep 12, 2010 2:20 am

Hi Bob,

I think these numbers are proving that software is a bigger contributor to chess improvement than hardware. I am surprised that YOUR test (which I consider biased) is proving MY point in this case.

Don

bob wrote:This is quite early, but is perhaps a bit surprising. Looks like I have this 10x version working on the cluster (will have to wait for a complete run and check the PGN for oddities to make sure it is not losing on time excessively or anything). Here is results compared to 23.4, and some of the lower-rated programs in my test group:
Code: Select all
   Crafty-23.4        2703    4    4 30000   66%  2579   22% 
   Crafty-23.3        2693    4    4 30000   65%  2579   22% 
   Crafty-23.1        2622    4    4 30000   55%  2579   23% 
   Glaurung 2.2       2606    3    3 60277   46%  2636   22% 
   Toga2              2599    3    3 60275   45%  2636   23% 
   Fruit 2.1          2501    3    3 60248   32%  2636   21% 
   Glaurung 1.1 SMP   2444    3    3 60267   26%  2636   17% 
   Crafty-10.18       2326   19   19  1327   20%  2580   14% 
I was thinking this would be much worse. To clarify what the above is...

Everything is running on our cluster. This is the cluster I have used to post _all_ results here in recent years, it is hardware about 4 years old as previously mentioned. Crafty-10.18 is about 10% slower than it should be as I have yet to tackle the PGO stuff. Took a lot of work to make the old version work with more modern xboard protocol. Had a lot of fun with force and such.

I'll report the final results for this run, although this will not be the overall "final results." Got to make sure nothing odd is happening in the PGN, and then get the PGO working.

More later, but at least it seems to be playing... All this really measures is "how far behind is 1995 Crafty, giving everyone equal (and modern) hardware. I'd suspect it would not be as far behind if everyone was on a P5/90, will work on that angle later.

bob · Post by **bob** » Sun Sep 12, 2010 5:11 am

Don wrote:Ok, Are these running head to head with no time handicap?

I've explained that several times now. This is version 10.x, compiled on my E5345 box, running perfectly normally. Same time control that I used to produce the 23.4 results. No time handicap. Same (actually almost the same) hash sizes. 23.4 uses a power of 2 since bucketsize=4, 10.x uses the older Belle approach which means 3/4 of a power of 2. So 10.x is using 3/4 the hash size of 23.4... can't fix that without changing the hash, and then it would not be quite 10.x any more.

bob wrote:This is quite early, but is perhaps a bit surprising. Looks like I have this 10x version working on the cluster (will have to wait for a complete run and check the PGN for oddities to make sure it is not losing on time excessively or anything). Here is results compared to 23.4, and some of the lower-rated programs in my test group:
Code: Select all
   Crafty-23.4        2703    4    4 30000   66%  2579   22% 
   Crafty-23.3        2693    4    4 30000   65%  2579   22% 
   Crafty-23.1        2622    4    4 30000   55%  2579   23% 
   Glaurung 2.2       2606    3    3 60277   46%  2636   22% 
   Toga2              2599    3    3 60275   45%  2636   23% 
   Fruit 2.1          2501    3    3 60248   32%  2636   21% 
   Glaurung 1.1 SMP   2444    3    3 60267   26%  2636   17% 
   Crafty-10.18       2326   19   19  1327   20%  2580   14% 
I was thinking this would be much worse. To clarify what the above is...

Everything is running on our cluster. This is the cluster I have used to post _all_ results here in recent years, it is hardware about 4 years old as previously mentioned. Crafty-10.18 is about 10% slower than it should be as I have yet to tackle the PGO stuff. Took a lot of work to make the old version work with more modern xboard protocol. Had a lot of fun with force and such.

I'll report the final results for this run, although this will not be the overall "final results." Got to make sure nothing odd is happening in the PGN, and then get the PGO working.

More later, but at least it seems to be playing... All this really measures is "how far behind is 1995 Crafty, giving everyone equal (and modern) hardware. I'd suspect it would not be as far behind if everyone was on a P5/90, will work on that angle later.

Here's the final numbers:

Code: Select all

    Crafty-23.4-2        2749    3    3 30000   66%  2626   22% 
    Crafty-23.4-1        2746    3    3 30000   66%  2626   22% 
    Crafty-10.18-1       2388    4    4 30000   22%  2626   14% 
    Crafty-10.18-2       2387    4    4 30000   22%  2626   14%

I ran the test twice to check consistency. Elo difference is about 360. So for _my_ program, hardware has provided much more than software. 1500x is over 10 doublings. If I only got 36 per doubling, it would be a break-even, but it is higher than that by measurement over the years.

old crafty vs new crafty on new hardware.

old crafty vs new crafty on new hardware.

Re: old crafty vs new crafty on new hardware.

Re: old crafty vs new crafty on new hardware.

Re: old crafty vs new crafty on new hardware.

Re: old crafty vs new crafty on new hardware. Some results.

Re: old crafty vs new crafty on new hardware. Some results.

Re: old crafty vs new crafty on new hardware. Some results.

Re: old crafty vs new crafty on new hardware. Some results.

Re: old crafty vs new crafty on new hardware. Some results.

Re: complete results