That's inaccurate. Null-move was first discussed by Don Beal, "Selective search without tears". Published in 1986. The 1989 WCCC version of Cray Blitz used null-move, non-recursively, R=1. The Singular Extension paper by Hsu/Campbell also mentions null-move search. Fritz was far from the first, and the Donninger paper was not the one that got most of the early users going. Fritz pushed R=2 however. But even recursive null-move was in the version of Cray Blitz that has been made available through Carey and his historic computer chess program web site. We just used R=1 back then. Fritz got everyone to using R=2.diep wrote:Whatever you try or do, measuring the improvement of software versus hardware is going to be very difficult, because the improvement is not a constnt, but dependant upon which search depth you use and which testing method.bhlangonijr wrote:Very interesting thread.
Let us start from the beginning. I'm wondering if there is a scientific method to verify this contest.
How can we quantify the "gain function" from benchmarking the chess playing agent (x hardware and y software) in order to determine which factor had the major influence over the playing strength? Just computing ELO gains in a kind of tournament?
Is the approach suggested by H.G. Muller sufficient to verify this?
Could we emulate the old hardware playing the engines with time/resource handicap?
Perhaps this question is more complicated than suggests the simplicity of its statement. As Bob mentioned, old software is tuned to old hardware. This way it is not accurate to benchmark playing the old software on today hardware and so on. In other words, maybe we should take in account testing the subjacent idea of a given software from a given year rather than really testing its implemented version.
Whatever is takes, my bet is on the hardware.
Note that nullmove was applied after 1995, after Fritz became world champion with recursive nullmove. The generic publication from Donninger in 1991 didn't trigger anyone yet to use it. It was Frans Morsch world title that woke up everyone.
It is not a quantum leap. I played some 32,000 game matches and posted the results here a while back. Removing Null-move _and_ LMR costs about 100 points Elo or so (I don't remember the exact numbers, but they can be found in a thread from late last year here)
In 2000 i tried a test. Diep end 2000, after some major evaluation improvements (long after world champs) against software from 1997-1998.
This at a number of machines from Jan Louwman. Equal hardware in short.
Similar books were used, so the 1998 software didn't have that disadvantage. In fact the book i used for diep to test with in 2000 was actually worse than what these engines had in 1997.
The 1997-1998 hardware got so totally annihilated that even i was in shock.
1999 was a year of big progress from most engines. That same progress most top engines made in 1999 i made in 2000 with it. Improvements in endgame especially it was. The search depth win i already had made in 1998, as diep was probably worlds first engine to use R=3 in nullmove, in fact i combined R=3 with R=2 and R=1. I posted that onto RGCC at the time. That resulted in a publication of Ernst A Heinz called: "adaptive nullmove".
Testing against too old software doesn't even make sense. A big problem is always that software is tuned to the hardware the programmer can get his hands on at home (regardless of which hardware you show up with in world champs).
Testing against old engines or weak engines DOES make sense in itself of course, but for the purpose of determining the elowin from software versus hardware as that is not so clear.
Nullmove was at the time a quantum leap in elo.
The big problem with software is: "what do you want to measure with it?"
Another second big problem is that software from start of 80s getting 1 or 2 ply total search depth, versus todays engines when giving them 1 or 2 ply, that the elodifference will be at least 1500 elopoints in the advantage of todays engines. Is it a fair compare however if you realize back then those engines were in assembler and some dedicated computers ran at 1Mhz processors with like 2 KB or 4 KB ram and further some ROM.
Todays tables are all together a lot of data.
Also it is questionable whether cpu's from back then could have executed diep's huge evaluation function of back then even a single time.
Search inefficiency was huge. Deep Thought could get like half a million nps and searched 8 ply with it, winning the world title in 1988. Just for not having big bugs in search unlike other supercomputers. If you see those games and get the same search depth with todays software like supercomputers of back then, first of all you can do it in a lot less nodes, secondly they blundered entire pieces and allowed mates.
Cray Blitz blundered a piece against Deep Blue allowing a simple tactic, even worse is Zugzwangs bug to grab a rook, allowing it to get mated in 7.
My poor diep out of 1994 at a 486dx2 already didn't have problems avoiding that mate.
The bugs in search from those days you should factor in. They simply gave away pieces and even at todays hardware, most likely the software from those days would crash when getting that much nps.
Even at todays hardware software from back then would also do worse against human players today. That is because a 2300 FIDE rated guy from today is a lot better than he was in 1990.
Another big bug in software from the 90s, crafty excepted, is the branching factor. Schach 3.0 i have run not so long ago at a modern processor. A P3 laptop 1Ghz. After 24 hours it still hasn't reached 12 ply.
That is with nullmove with singular extensions with recapture extensions and so on.
Branching factors of over 10.0 and then exploding above 10-12 ply in branching factor because of extension gets triggered upon extension.
Most software just goes to a hashtable of 32MB or so from back then as that was the DOS limit. It is 16 bits so even if todays hardware would run DOS, it is going to be ugly slow for that software relatively spoken.
Most engines used 16 bits counters to count the nps. That was in fact luxury already.
So software that "by accident" has a good branching factor from back, it rocks.
I remember how Diep was one of the few programs to speedup bigtime when moving from P5-100Mhz to pentiumpro 200Mhz.
The P5-133 was exactly factor 3 slower than the p6-200 at the time, because diep used a combination of 32 bits code with 8 bits code.
All the assembler engines were 8 bits code with 16 bits code.
Took years for some to convert to 32 bits.
The next problem you'll have is the limit of 750 elopoints.
Rounding off elo with the default k factor means that at 750 elopoints difference you have a 100% score difference.
So you simply can't measure more than 750 elopoints. That is a major problem.
If you'd give software of 80s for example todays hardware, like psion 1.0, and cyrus you will simply win with 100% scores which gives 750 elo points difference, but it will be a lot more.
It is not fair to do this compare IMHO.
Newer hardware not only allows new algorithms and new chessknowledge, it also means your tuning is different.
If we would start a competition now at 4KB ram and 64KB rom, who wins? I bet at the dutch team by the way in that case
Vincent