Hardware vs Software

Tord Romstad · Post by **Tord Romstad** » Thu Dec 04, 2008 5:55 pm

bob wrote:
Uri Blass wrote:It may be interesting to do the same comparison for Glaurung to see how much rating Glaurung earns from LMR and null move.
if you have the time, go for it. I don't have the time to study the source to see what needs to be commented out.

I don't have time to run any tests, but if either of you (or somebody else) wants, I can make a special version where LMR, null move and the tapered super-qsearch (which I suspect is also worth a significant number of Elo points) can easily be switched on and off. Alternatively, if it makes testing easier, I can add some compile-time switches which makes it easy to disable the features you want.

By the way, it's amusing to see that LMR is now generally accepted as effective. Back when I started advocating it, the technique was largely abandoned since many years, and those few programmers I managed to convince to give it a try mostly reported that it didn't work for them.

Tord

CRoberson · Post by **CRoberson** » Thu Dec 04, 2008 6:15 pm

bob wrote:I ran this overnight. I simply made Evaluate() return the material score only. It was almost exactly a 400 point drop in Elo from the version with the most recent evaluation.
Code: Select all
Crafty-22.9R01     2650    5    5 31128   51%  2644   21% 
Crafty-22.9R02     2261    5    6 31128    9%  2644    7% 

This is good info. However, I expected more. Telepath lost more than
500 but Crafty only 400. Maybe, Crafty's search is much better.
I think increasing the TC's for deeper searches may increase the
gap. With deeper searches, the horizon effect may be reduced but
not eliminated.

So far, we have eval=400 (min) + (NM+LMR) = 120 which totals
at least 520 Elo. Currently, a pure brute force Crafty without NM+LMR
would be 2140 = below Master!

This leads to something else I've been saying for a while: The college
AI text books are an insufficient resource to produce a Master level
Chess Program on todays single processor hardware!

More than a few Professor's have looked at me strangely after telling
them that. They thought that hardware speed had everything to do with it.

BubbaTough · Post by **BubbaTough** » Thu Dec 04, 2008 6:33 pm

CRoberson wrote:This leads to something else I've been saying for a while: The college AI text books are an insufficient resource to produce a Master level Chess Program on todays single processor hardware!

More than a few Professor's have looked at me strangely after telling them that. They thought that hardware speed had everything to do with it.

This is so obviously true its not funny. I think you can waive the # processor limitation, and perhaps lower the target strength to expert level and still be right. Even a chess player / programmer / genius would take years to come up with a decent program without any hints about how to do fast move generation and have an effective quiescence search, let alone anything fancy. Pure alpha-beta is just not good enough to compete with tournament level humans even with today's hardware.

-Sam

krazyken · Post by **krazyken** » Thu Dec 04, 2008 6:52 pm

mhull wrote:
A more harmonious (and hard to find) tuning combination of commonly known elements is no more a software improvement than a more pleasing (and hard to find) combination of dials on a Moog is a synthesizer improvement.

Well, I'm fairly certain that the tuning is not a hardware improvement. So you are saying there are parts of a chess program that are neither hardware nor software? Figuring out how to maximize the combination of technique A and technique B sounds to me like a definite software improvement.

mhull · Post by **mhull** » Thu Dec 04, 2008 7:00 pm

krazyken wrote:
mhull wrote:
A more harmonious (and hard to find) tuning combination of commonly known elements is no more a software improvement than a more pleasing (and hard to find) combination of dials on a Moog is a synthesizer improvement.
Well, I'm fairly certain that the tuning is not a hardware improvement. So you are saying there are parts of a chess program that are neither hardware nor software? Figuring out how to maximize the combination of technique A and technique B sounds to me like a definite software improvement.

Variables/parameters/inputs aren't software. So there are three elements: Hardware, software and tuning.

I've made sophisticated report programs that are data (input) driven. The report outputs automatically adjust/transform based on data configuration changes. So a change to data doesn't change the software, but it does change the output results.

If the current top chess program is using the same techniques as all the others, but the balancing of them is more optimal, then how could you say there was a software advance? Tuning a car engine doesn't create a more advanced engine.

bob · Post by **bob** » Thu Dec 04, 2008 9:42 pm

Uri Blass wrote:
bob wrote:I ran this overnight. I simply made Evaluate() return the material score only. It was almost exactly a 400 point drop in Elo from the version with the most recent evaluation.
Code: Select all
Crafty-22.9R01     2650    5    5 31128   51%  2644   21% 
Crafty-22.9R02     2261    5    6 31128    9%  2644    7% 
I am surprised because I expected bigger difference.

I could expect something like this from piece square table evaluation(no pawn structure mobility or king safety) but I believe that the difference between material only evaluation and normal evaluation is something like 1000 elo and not in the order of 400 elo.

It may be interesting to see some games that Crafty(only material) could win.

Uri

About all I can offer is a huge wad of PGN games, 32,000 to be exact...

bob · Post by **bob** » Thu Dec 04, 2008 9:47 pm

Carey wrote:
bob wrote:I ran this overnight. I simply made Evaluate() return the material score only. It was almost exactly a 400 point drop in Elo from the version with the most recent evaluation.
Code: Select all
Crafty-22.9R01     2650    5    5 31128   51%  2644   21% 
Crafty-22.9R02     2261    5    6 31128    9%  2644    7% 
Interesting testings.... I hope you continue to run tests to determine the aprox. value of different chess ideas. Something like this is really worth doing and publishing.

Could you please try Don Beal's random mobility estimator thing? Ever since you mentioned it I've been really curious as to how well that would reall work, compared to just material and / or some other simple & quick mobility scoring method.

Anyway, what were the time conditions for this and the hardware used?

And do you think these results would carry over to slower games?

Carey

The games are all 10sec + .1sec, 10 seconds on the initial clock, 1/10th second added after each move. The hardware is a 64 bit intel box with E5345 2.3ghz processors, 12 gigs of ram. Crafty's typical NPS (all tests use one processor, no pondering) is about 3M or so.

I think I might try that. Later this week I think I will run standard crafty 3x on this test, then crafty with only material scoring, then crafty with material +random value added at each endpoint. 9 runs takes about 9 hours or so and would be interesting to look at to see where random evaluation fits in between material only and full evaluation..

bob · Post by **bob** » Thu Dec 04, 2008 9:50 pm

CRoberson wrote:
bob wrote:I ran this overnight. I simply made Evaluate() return the material score only. It was almost exactly a 400 point drop in Elo from the version with the most recent evaluation.
Code: Select all
Crafty-22.9R01     2650    5    5 31128   51%  2644   21% 
Crafty-22.9R02     2261    5    6 31128    9%  2644    7% 
This is good info. However, I expected more. Telepath lost more than
500 but Crafty only 400. Maybe, Crafty's search is much better.
I think increasing the TC's for deeper searches may increase the
gap. With deeper searches, the horizon effect may be reduced but
not eliminated.

So far, we have eval=400 (min) + (NM+LMR) = 120 which totals
at least 520 Elo. Currently, a pure brute force Crafty without NM+LMR
would be 2140 = below Master!

Not so sure. You are making the same mistake everyone else makes, namely that the Elo numbers I publish are absolute numbers. They are anything but. All you can conclude is that the _difference_ between the two programs is about 400. Good should win 95% of the games or so. But both programs could be above GM, or both could be below 2000 for all we know.

This leads to something else I've been saying for a while: The college
AI text books are an insufficient resource to produce a Master level
Chess Program on todays single processor hardware!

More than a few Professor's have looked at me strangely after telling
them that. They thought that hardware speed had everything to do with it.

Think about the results. Even if the numbers I published were absolute ratings, normal tree search is 2200 or so by itself, which is hardware-derived performance.. That is the biggest part of the rating by far.

bob · Post by **bob** » Thu Dec 04, 2008 9:54 pm

I found what is apparently an old Jakarta version, which I think was 10.18 or something similar. Someone sent it to me several years ago when I asked for old versions after I lost all old versions with a disk crash and a total backup system failure as well. The problem is they are not easy to make work today. They had some imbedded 32 bit code in them which causes problems. I am going to try to get the Jakarta version to compile cleanly using everything possible from the original except for modifying the assembly languge to work with 64 bit registers...

Once I get it to work, I will put it in an "ancient" directory of some sort on the ftp box.

bob · Post by **bob** » Thu Dec 04, 2008 9:56 pm

Tord Romstad wrote:
bob wrote:
Uri Blass wrote:It may be interesting to do the same comparison for Glaurung to see how much rating Glaurung earns from LMR and null move.
if you have the time, go for it. I don't have the time to study the source to see what needs to be commented out.
I don't have time to run any tests, but if either of you (or somebody else) wants, I can make a special version where LMR, null move and the tapered super-qsearch (which I suspect is also worth a significant number of Elo points) can easily be switched on and off. Alternatively, if it makes testing easier, I can add some compile-time switches which makes it easy to disable the features you want.

By the way, it's amusing to see that LMR is now generally accepted as effective. Back when I started advocating it, the technique was largely abandoned since many years, and those few programmers I managed to convince to give it a try mostly reported that it didn't work for them.

Tord

I'm one of "those".

Bruce and I experimented with this idea back in 1996 right around the Jakarta WCCC tournament. We liked the increased speed/depth that the plies flew by at, but we were seeing mistakes. But we did not try to restrict this in any way that was very rational. I am not sure I am restricting them very well today, but they obviously make a big difference as testing has shown.

Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software - test results

Re: Hardware vs Software