Hardware vs Software

Discussion of chess software programming and technical issues.

Moderator: Ras

bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Hardware vs Software

Post by bob »

Uri Blass wrote:
hgm wrote:
bob wrote:If you presume that you reduce the tree without introducing any side-effects that weaken the search. Reducing the size of the tree by 50% is not exactly the same thing as searching 2x faster. And 2x faster is 50-70 Elo. LMR reduces the tree by much more than that and yet it is not worth 140+ Elo or anything remotely close to that... Or even 1/2 of that...
So you think that changing the ordering of captures can have side effects that weaken the search?
I think that improving the order of moves can make LMR more productive

The idea of LMR is to reduce depth after moves that have small chance to fail high(probably bad moves).

The main disadvantage of LMR is that sometimes you reduce good moves that can fail high and do not have enough depth for them to see that they fail high because of the LMR.

If you have better order of moves you reduce less good moves that can fail high(without LMR) so the disadvantage is less significant.

Uri
personally I believe that the "L" in LMR is going to disappear before long. At least it will in Crafty. I'm not willing to randomly reduce some moves and not others, for no reason other than they are later in the move list. We already extend some, reduce some, and leave some completely alone. I believe that this needs to be done in a better way, but nothing as ugly or useless as using the old history imformation ideas...
User avatar
hgm
Posts: 28337
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Hardware vs Software

Post by hgm »

Laskos wrote:
hgm wrote:The rule of thumb that works reasonably for most enngines is

Elo = 100 ln(T) + constant.

As ln 2 = 0.693 this predicts 69.3 Elo gain for a doubling of the search time. For factors very close to 1 the logarithm can be expanded as ln(T) = ln(1+(T-1)) ~ T-1.

In this limit you gain 1 Elo per percent of speedup.
I agree with you in general terms, but I think it should be refined, Elo = A ln(1+B*T)+constant, but Bob says that in his case it is pretty linear, B*T + constant. My guess is that Bob's logarithm is getting linear because in his case Elo = A ln(1+epsilon*T)+constant~=B*T+constant, thus linear. For large B or T it becomes logarithmic. I just explained that earlier.

Kai
This is a fishy formula. You expect the strength of an engine to approach a finite value if you have it run at zero time?

If there is any correction term inside the logarithm, it must be a negative one:

Elo = A*ln(B*T-epsilon) + constant

Because when T is so small that the engine cannot print the required number of moves, it will forfeit 100% of the games on time, and so will have minus-infinite Elo.

But by today's standards epsilon is totally negligible (if we talk about second, or even millisecond games).
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Hardware vs Software

Post by Laskos »

hgm wrote:
Laskos wrote:
hgm wrote:The rule of thumb that works reasonably for most enngines is

Elo = 100 ln(T) + constant.

As ln 2 = 0.693 this predicts 69.3 Elo gain for a doubling of the search time. For factors very close to 1 the logarithm can be expanded as ln(T) = ln(1+(T-1)) ~ T-1.

In this limit you gain 1 Elo per percent of speedup.
I agree with you in general terms, but I think it should be refined, Elo = A ln(1+B*T)+constant, but Bob says that in his case it is pretty linear, B*T + constant. My guess is that Bob's logarithm is getting linear because in his case Elo = A ln(1+epsilon*T)+constant~=B*T+constant, thus linear. For large B or T it becomes logarithmic. I just explained that earlier.

Kai
This is a fishy formula. You expect the strength of an engine to approach a finite value if you have it run at zero time?

If there is any correction term inside the logarithm, it must be a negative one:

Elo = A*ln(B*T-epsilon) + constant

Because when T is so small that the engine cannot print the required number of moves, it will forfeit 100% of the games on time, and so will have minus-infinite Elo.

But by today's standards epsilon is totally negligible (if we talk about second, or even millisecond games).
You are right, I didn't take into account the extreme small time scale. I would maintain this with adding your term, which should become important at very short times

A ln(1+B*T) + C ln(T) + constant.

At T small it is linear, at T->0 it is logarithmic to the negative and at T->infinity it is logarithmic to positive. At least this is my observation. I saw that most engines scale logarithmically at longer TC and faster hardware, but not exactly as A ln(T) + constant. This would accomodate both Bob's claims of linearity and other results of a damped scaling. A negative epsilon I would not like, as it will give a complex number at sufficiently short TC, then we would be talking about Riemann sheets :)

But generally I agree that there is some sort of logarithm there (or even ln(ln..) ) and linearity is an accident.

Regards,
Kai
Uri Blass
Posts: 10784
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Hardware vs Software

Post by Uri Blass »

bob wrote:
Uri Blass wrote:
hgm wrote:
bob wrote:If you presume that you reduce the tree without introducing any side-effects that weaken the search. Reducing the size of the tree by 50% is not exactly the same thing as searching 2x faster. And 2x faster is 50-70 Elo. LMR reduces the tree by much more than that and yet it is not worth 140+ Elo or anything remotely close to that... Or even 1/2 of that...
So you think that changing the ordering of captures can have side effects that weaken the search?
I think that improving the order of moves can make LMR more productive

The idea of LMR is to reduce depth after moves that have small chance to fail high(probably bad moves).

The main disadvantage of LMR is that sometimes you reduce good moves that can fail high and do not have enough depth for them to see that they fail high because of the LMR.

If you have better order of moves you reduce less good moves that can fail high(without LMR) so the disadvantage is less significant.

Uri
personally I believe that the "L" in LMR is going to disappear before long. At least it will in Crafty. I'm not willing to randomly reduce some moves and not others, for no reason other than they are later in the move list. We already extend some, reduce some, and leave some completely alone. I believe that this needs to be done in a better way, but nothing as ugly or useless as using the old history imformation ideas...
not randomely but I guess that you do not reduce good captures or killer moves so practically the L is there because often you do not reduce the first moves(I guess that you can reduce first move only in case of not having hash good captures and killers).

I believe that if you have better order of moves after killers you can reduce more later moves.

Uri
User avatar
hgm
Posts: 28337
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Hardware vs Software

Post by hgm »

Laskos wrote:You are right, I didn't take into account the extreme small time scale. I would maintain this with adding your term, which should become important at very short times

A ln(1+B*T) + C ln(T) + constant.

At T small it is linear, at T->0 it is logarithmic to the negative and at T->infinity it is logarithmic to positive. At least this is my observation. I saw that most engines scale logarithmically at longer TC and faster hardware, but not exactly as A ln(T) + constant. This would accomodate both Bob's claims of linearity and other results of a damped scaling. A negative epsilon I would not like, as it will give a complex number at sufficiently short TC, then we would be talking about Riemann sheets :)

But generally I agree that there is some sort of logarithm there (or even ln(ln..) ) and linearity is an accident.
I have strong doubts on the existence of a significant linear range. In such a linear range, the hit an engine takes in reducing its search time from 100 to 55 would be the same as reducing it from 55 to 10. In almost any engine an extra factor 5.5 would give it more extra plies in search depth than a factor 1.8, even if you would assume that the effective branching ratio would increase at very low depth, where prunings like LMR are not yet activated. An this effect is usually more than counteracted by the fact that the average "loss per move" compared to perfect play is much larger at low search depth, so that potentially decisive errors are made in a much earlier stage, where they still tend to grow durng the game.

Most engines reach their asymptotic branching ratio already at 4 or 5 ply, and the time it takes to reach that depth on modern hardware is almost noting. Most tests we are talking about here are done at 1-min TC, which reaches depth of 10 to 15 ply.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Hardware vs Software

Post by bob »

Uri Blass wrote:
bob wrote:
Uri Blass wrote:
BubbaTough wrote:
bob wrote:
BubbaTough wrote:Given that one area of steady progress in software improvements has been in decreasing the branching factor, it would not be surprising if the longer the game, the more valuable the software changes prove. Perhaps, no matter what hardware difference is claimed, at some length of time, perhaps days or weeks, software changes prove more important than hardware. Conversely, at some short length of time, the speed of hardware is much more important than software changes.

-Sam
...
That is based on conjecture rather than fact. For example, I have run extensive tests on null-move, and could find no difference in Elo gain/loss for several time controls. And turning it off is not a huge deal. Same thing for LMR. Your assumption is that today's plies are the same as plies 10 years ago, we just get more of them with more and more time because of the reduced branching factor. That is not a given, in my opinion. Which means that the 'additional plies" with longer time controls is not necessarily _that_ much better overall.
...
It is not just based on conjecture, it IS conjecture. Notice key words/phrases such as "Not surprising if" and "Perhaps". I have not made any assumptions about equal plies...unequal plys were definitely on my mind as I typed (otherwise I would not need my qualifiers). My personal experience has been older programs make much worse use of extra time on today's hardware than today's programs. MUCH worse. Thus my speculation...

Though it hardly needs to be said: as always, my speculative hypothesis based on flimsy data might be wrong. I don't have a particularly strong belief that its true myself, it just sounds plausible...perhaps 63% likely.

-Sam
My personal experience is also that older programs earn less from time.
It is based on matches at unequal time control of movei against other programs.

Movei could perform better against weaker programs when the time control was longer and rybka2.3.2a could perform better against movei when the time control was longer when I chose 10:1 time handicap under arena.

Note also that better books is part of software improvement unless the book is too big to be memorized by old hardware but inspite of it
I agree to having the same book because I am interested in engine improvement and not in total software improvements.

Uri
Why use a book? That's a silly way of testing when you imply "I am not interested in book improvement..." I don't test with books of any kind, to get rid of that aspect of bias.
I am also not interested in book improvement but the subject is about hardware vs software.

If we talk about software improvement then book is included in software
so I think that
the subject should be hardware vs software(not including books)

Uri
I don't agree. The "book" has absolutely _nothing_ to do with software improvements, it is simply data used to play the beginning of the game. I am interested in internal engine improvements, which is the "software" we have been debating..
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Hardware vs Software

Post by bob »

Uri Blass wrote:
bob wrote:
Uri Blass wrote:
hgm wrote:
bob wrote:If you presume that you reduce the tree without introducing any side-effects that weaken the search. Reducing the size of the tree by 50% is not exactly the same thing as searching 2x faster. And 2x faster is 50-70 Elo. LMR reduces the tree by much more than that and yet it is not worth 140+ Elo or anything remotely close to that... Or even 1/2 of that...
So you think that changing the ordering of captures can have side effects that weaken the search?
I think that improving the order of moves can make LMR more productive

The idea of LMR is to reduce depth after moves that have small chance to fail high(probably bad moves).

The main disadvantage of LMR is that sometimes you reduce good moves that can fail high and do not have enough depth for them to see that they fail high because of the LMR.

If you have better order of moves you reduce less good moves that can fail high(without LMR) so the disadvantage is less significant.

Uri
personally I believe that the "L" in LMR is going to disappear before long. At least it will in Crafty. I'm not willing to randomly reduce some moves and not others, for no reason other than they are later in the move list. We already extend some, reduce some, and leave some completely alone. I believe that this needs to be done in a better way, but nothing as ugly or useless as using the old history imformation ideas...
not randomely but I guess that you do not reduce good captures or killer moves so practically the L is there because often you do not reduce the first moves(I guess that you can reduce first move only in case of not having hash good captures and killers).

I believe that if you have better order of moves after killers you can reduce more later moves.

Uri
The "L" is wrong, because not _all_ of the "L" moves are reduced. Once I pass the killers, etc, I am in the "L" part of that node's moves, but I still don't reduce any of them that exhibit any sort of tactically interesting features such as advancing a passed pawn, checks, captures and eventually a few other ideas in the to-do list...

I prefer to think of the moves as fitting into ENR. Some are "E"xtended (checks only for me now after testing the other extensions and finding that they hurt slightly). Some are "R"educed as in lousy-looking moves. And some are "N"either moves that are neither extended nor reduced...

The "L" in LMR really means "moves that occur Late in the move ordering, and are not tactical in nature"
User avatar
hgm
Posts: 28337
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Hardware vs Software

Post by hgm »

Just call it "Lame Move Reductions".... :lol:
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Hardware vs Software

Post by bob »

hgm wrote:Just call it "Lame Move Reductions".... :lol:
Not bad. Or "Lousy"...
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Hardware vs Software

Post by diep »

bhlangonijr wrote:Very interesting thread.

Let us start from the beginning. I'm wondering if there is a scientific method to verify this contest.

How can we quantify the "gain function" from benchmarking the chess playing agent (x hardware and y software) in order to determine which factor had the major influence over the playing strength? Just computing ELO gains in a kind of tournament?

Is the approach suggested by H.G. Muller sufficient to verify this?

Could we emulate the old hardware playing the engines with time/resource handicap?

Perhaps this question is more complicated than suggests the simplicity of its statement. As Bob mentioned, old software is tuned to old hardware. This way it is not accurate to benchmark playing the old software on today hardware and so on. In other words, maybe we should take in account testing the subjacent idea of a given software from a given year rather than really testing its implemented version.

Whatever is takes, my bet is on the hardware.
Whatever you try or do, measuring the improvement of software versus hardware is going to be very difficult, because the improvement is not a constnt, but dependant upon which search depth you use and which testing method.

Note that nullmove was applied after 1995, after Fritz became world champion with recursive nullmove. The generic publication from Donninger in 1991 didn't trigger anyone yet to use it. It was Frans Morsch world title that woke up everyone.

In 2000 i tried a test. Diep end 2000, after some major evaluation improvements (long after world champs) against software from 1997-1998.

This at a number of machines from Jan Louwman. Equal hardware in short.
Similar books were used, so the 1998 software didn't have that disadvantage. In fact the book i used for diep to test with in 2000 was actually worse than what these engines had in 1997.

The 1997-1998 hardware got so totally annihilated that even i was in shock.

1999 was a year of big progress from most engines. That same progress most top engines made in 1999 i made in 2000 with it. Improvements in endgame especially it was. The search depth win i already had made in 1998, as diep was probably worlds first engine to use R=3 in nullmove, in fact i combined R=3 with R=2 and R=1. I posted that onto RGCC at the time. That resulted in a publication of Ernst A Heinz called: "adaptive nullmove".

Testing against too old software doesn't even make sense. A big problem is always that software is tuned to the hardware the programmer can get his hands on at home (regardless of which hardware you show up with in world champs).

Testing against old engines or weak engines DOES make sense in itself of course, but for the purpose of determining the elowin from software versus hardware as that is not so clear.

Nullmove was at the time a quantum leap in elo.

The big problem with software is: "what do you want to measure with it?"

Another second big problem is that software from start of 80s getting 1 or 2 ply total search depth, versus todays engines when giving them 1 or 2 ply, that the elodifference will be at least 1500 elopoints in the advantage of todays engines. Is it a fair compare however if you realize back then those engines were in assembler and some dedicated computers ran at 1Mhz processors with like 2 KB or 4 KB ram and further some ROM.

Todays tables are all together a lot of data.

Also it is questionable whether cpu's from back then could have executed diep's huge evaluation function of back then even a single time.

Search inefficiency was huge. Deep Thought could get like half a million nps and searched 8 ply with it, winning the world title in 1988. Just for not having big bugs in search unlike other supercomputers. If you see those games and get the same search depth with todays software like supercomputers of back then, first of all you can do it in a lot less nodes, secondly they blundered entire pieces and allowed mates.

Cray Blitz blundered a piece against Deep Blue allowing a simple tactic, even worse is Zugzwangs bug to grab a rook, allowing it to get mated in 7.

My poor diep out of 1994 at a 486dx2 already didn't have problems avoiding that mate.

The bugs in search from those days you should factor in. They simply gave away pieces and even at todays hardware, most likely the software from those days would crash when getting that much nps.

Even at todays hardware software from back then would also do worse against human players today. That is because a 2300 FIDE rated guy from today is a lot better than he was in 1990.

Another big bug in software from the 90s, crafty excepted, is the branching factor. Schach 3.0 i have run not so long ago at a modern processor. A P3 laptop 1Ghz. After 24 hours it still hasn't reached 12 ply.

That is with nullmove with singular extensions with recapture extensions and so on.

Branching factors of over 10.0 and then exploding above 10-12 ply in branching factor because of extension gets triggered upon extension.

Most software just goes to a hashtable of 32MB or so from back then as that was the DOS limit. It is 16 bits so even if todays hardware would run DOS, it is going to be ugly slow for that software relatively spoken.

Most engines used 16 bits counters to count the nps. That was in fact luxury already.

So software that "by accident" has a good branching factor from back, it rocks.

I remember how Diep was one of the few programs to speedup bigtime when moving from P5-100Mhz to pentiumpro 200Mhz.

The P5-133 was exactly factor 3 slower than the p6-200 at the time, because diep used a combination of 32 bits code with 8 bits code.

All the assembler engines were 8 bits code with 16 bits code.
Took years for some to convert to 32 bits.

The next problem you'll have is the limit of 750 elopoints.

Rounding off elo with the default k factor means that at 750 elopoints difference you have a 100% score difference.

So you simply can't measure more than 750 elopoints. That is a major problem.

If you'd give software of 80s for example todays hardware, like psion 1.0, and cyrus you will simply win with 100% scores which gives 750 elo points difference, but it will be a lot more.

It is not fair to do this compare IMHO.

Newer hardware not only allows new algorithms and new chessknowledge, it also means your tuning is different.

If we would start a competition now at 4KB ram and 64KB rom, who wins? I bet at the dutch team by the way in that case :)

Vincent