Hardware vs Software

bob · Post by **bob** » Mon Jan 19, 2009 10:52 pm

Uri Blass wrote:
bob wrote:
hgm wrote:
bob wrote:
Uri Blass wrote:
1)The order of move 10 years ago is not optimal move ordering so common sense tell me that it is an area when an improvement is possible.

There are clearly ideas for better move ordering relative to the idea that you use.

For example
suppose black plays a6 that threats Bb5 and suppose your program has no moves in the hash.

Does your program search escapes with the bishop first?
If you search only hash captures killers and history there is no way for you to know to search escapes of the bishop b5 first.
So? we program for the _general_ case. The time spent detecting the fact that the bishop is under attack, so that you can try to move it (and presumably to a safe square as well) does not sound like "time well-spent'>

But in any case, does anybody do such today???
I thought many people include the SEE of a threat against a piece in the SEE score (used for capture sorting) of moves that triviaally solve that threat (by withdrawing the victim or capturing the attacker). Joker does that, (evein if it is a non-capture), and it seemed to help. As logic dictates it should.

The point is that the info you need to do it is nearly free. You only have to generate and sort captures after a null-move fail low, so you know the piece that is threatened from the null-move refutation. So there really is no overhead in cases where sorting the move in front would not almost certaily give a large time savings.

Crafty's move ordering is really quite primitive and sub-optimal. Only a few months ago you established yourself that sorting the good captures by SEE was 7 Elo points inferior to a sorting scheme (also theoretically suboptimal, b.t.w.) used by others. And 7 Elo points is a lot, it corresponds to a 7% reduction in tree size even if the change was free in terms of nps. After missing such a large effect, I would not be surprised at all if there were many more subtle move-sorting tricks that you missed as well.
May well be lots of things I have not tried. But the "null-move refutation" is not one of them. I tried this a few months back and found no improvement at all in real games. I don't doubt that it might help in tactical positions, but chess is not only about tactics. When I tested this, the Elo dropped very slightly. I decided to do this when Tord mentioned the idea in a thread here some time back.

BTW I don't agree with your "7%" numbers. 70 Elo requires reducing the tree by about 50% total. 7 Elo doesn't sound like "7% tree reduction" to me, although I have not given it a lot of thought.
if 70 elo requires reducing the tree to 50% of the original size then it means that
7 elo require reducing the tree to 0.5^(1/10) of the original size.

0.5^(1/10)~=0.933 so it is close to 6.7% tree reduction.

Uri

If you presume that you reduce the tree without introducing any side-effects that weaken the search. Reducing the size of the tree by 50% is not exactly the same thing as searching 2x faster. And 2x faster is 50-70 Elo. LMR reduces the tree by much more than that and yet it is not worth 140+ Elo or anything remotely close to that... Or even 1/2 of that...

Uri Blass · Post by **Uri Blass** » Mon Jan 19, 2009 11:30 pm

bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
Dirt wrote:
bob wrote:Couple of points. First, no way you could get 75K on a P200/mmx. The Pentium Pro 200 would do 75 K, but the basic Pentium, all the way thru the P200/mmx was a different processor architecture, no out of order execution or any of that. I had a p200/mmx laptop that would not break 50K. The Pentium pro 200 was getting around 75K. I ran on it in Jakarta, and in my office, until I got the quad Pentium pro 200 box.

If you take the 400mhz processor, which I am not sure was around in 1988, at least the xeon 2mb L2 that I had, then 150K nodes per second would be the answer. Late in 2008 an i7 was producing 20M nodes per second.. If you divide 20M by 150K you get a raw factor of 133x. If you factor in the 3.3/4.0 speedup for a parallel search you get a factor of 110x.

We could settle on 100:1 for simplicity, although that is _still_ a daunting advantage. I find it amusing that so many are convinced that software is most of the improvement, yet we are quibbling over a 10%-20% speed difference in hardware. It seems, to me, that perhaps most really do believe that hardware is the more important contributor to today's ratings in spite of rhetoric to the contrary...

Once we get to the improvement in hardware as settled, then there are going to be enough arguments to make the test hopeless anyway. 1 sec to 100 secs is a _long_ game. At 60 moves as a reasonable average game length, that turns into almost 2 hours per game. Even at 256 games at a time on my cluster, it takes a long time to play enough games to get good answers. And some are not going to accept 1 second vs 100 seconds. And want 1 minute to 100 minutes, which is not going to happen with me...
One hundred to one does sound like a nice number. You seem convinced that is too little, others are sure it's too much.

Are you considering 1 vs 100 seconds as an increment? That would be unfair to Fritz. The fairest time would have the time they were tuned for as the geometric mean. While everyone I'm sure tries to tune for longer time controls, in practice I think it must be much shorter, perhaps someplace around 30 seconds per game. I think either 2"/40 vs 200"/40 or 1" + 0.03 vs 100" + 3 would be in the right ballpark, assuming Rybka's time management doesn't completely fail at that level.
I do not care about the time the program were tuned for.

progress of software is based on result and not based on intentions.
If program X was better at 120/40 ponder on ssdf games and weaker at blitz at 1998 then I consider program X to be better for the discussion.

120/40 ponder on games mean something equivalent to 4.5 minutes ponder off on P200MMX so it clearly mean more than 1 second on hardware of today.

Uri
You totally miss his point. Programs of a specific period are tuned for the usual time controls of that period, on the hardware of that period. Whether this is an issue with a very long increment or not I don't know. Crafty can work with any increment at all....
I talk about ssdf time control(120/40 ponder on) or something equivalent to it.

Programmers clearly cared about winning ssdf and world championship was even equivalent to slower time control if you consider the fact that programs use a better hardware in WCCC.

They had not enough computer time to test on equivalent time control and I agree that part of the software improvement is thanks to better hardware that allow more testing but the reason for software improvement does not change the fact that it is a software improvement.

Uri
You are _still_ missing the point. Programmers of _today_ are tuning their programs based on the hardware speeds we have today. At longer time controls. Programmers of 10 years ago were tuning their programs based on the hardware speeds of 10 years ago. Part of what we do in software has been enabled by having faster hardware. No way we can compare normal time controls of today against 1998 hardware. 100:1 makes the games intractable, even for my cluster...

Ok
Thinking about it again I see what you mean and I think that it is not relevant to the question of hardware vs software and it only can be one of the reasons for software improvement.

You say that
software of 1998 is not efficient in taking advantage from today hardware because they were not programmed for that purpose and it is a disadvantage of the software of 1998.

The reasons for the disadvantage are not important for the analysis and I do not blame the programmers of 1998.

The question is simple in which case we could get better chess computer players at 120/40 time control

1)hardware from january 1999 remains constant but we can use the software of today.
2)software from january 1999 remains constant but we can use the hardware of today.

Uri

hgm · Post by **hgm** » Tue Jan 20, 2009 1:02 am

bob wrote:If you presume that you reduce the tree without introducing any side-effects that weaken the search. Reducing the size of the tree by 50% is not exactly the same thing as searching 2x faster. And 2x faster is 50-70 Elo. LMR reduces the tree by much more than that and yet it is not worth 140+ Elo or anything remotely close to that... Or even 1/2 of that...

So you think that changing the ordering of captures can have side effects that weaken the search?

bhlangonijr · Post by **bhlangonijr** » Tue Jan 20, 2009 1:50 am

Very interesting thread.

Let us start from the beginning. I'm wondering if there is a scientific method to verify this contest.

How can we quantify the "gain function" from benchmarking the chess playing agent (x hardware and y software) in order to determine which factor had the major influence over the playing strength? Just computing ELO gains in a kind of tournament?

Is the approach suggested by H.G. Muller sufficient to verify this?

Could we emulate the old hardware playing the engines with time/resource handicap?

Perhaps this question is more complicated than suggests the simplicity of its statement. As Bob mentioned, old software is tuned to old hardware. This way it is not accurate to benchmark playing the old software on today hardware and so on. In other words, maybe we should take in account testing the subjacent idea of a given software from a given year rather than really testing its implemented version.

Whatever is takes, my bet is on the hardware.

Laskos · Post by **Laskos** » Tue Jan 20, 2009 4:49 am

hgm wrote:The rule of thumb that works reasonably for most enngines is

Elo = 100 ln(T) + constant.

As ln 2 = 0.693 this predicts 69.3 Elo gain for a doubling of the search time. For factors very close to 1 the logarithm can be expanded as ln(T) = ln(1+(T-1)) ~ T-1.

In this limit you gain 1 Elo per percent of speedup.

I agree with you in general terms, but I think it should be refined, Elo = A ln(1+B*T)+constant, but Bob says that in his case it is pretty linear, B*T + constant. My guess is that Bob's logarithm is getting linear because in his case Elo = A ln(1+epsilon*T)+constant~=B*T+constant, thus linear. For large B or T it becomes logarithmic. I just explained that earlier.

Kai

Uri Blass · Post by **Uri Blass** » Tue Jan 20, 2009 7:50 am

hgm wrote:
bob wrote:If you presume that you reduce the tree without introducing any side-effects that weaken the search. Reducing the size of the tree by 50% is not exactly the same thing as searching 2x faster. And 2x faster is 50-70 Elo. LMR reduces the tree by much more than that and yet it is not worth 140+ Elo or anything remotely close to that... Or even 1/2 of that...
So you think that changing the ordering of captures can have side effects that weaken the search?

I think that improving the order of moves can make LMR more productive

The idea of LMR is to reduce depth after moves that have small chance to fail high(probably bad moves).

The main disadvantage of LMR is that sometimes you reduce good moves that can fail high and do not have enough depth for them to see that they fail high because of the LMR.

If you have better order of moves you reduce less good moves that can fail high(without LMR) so the disadvantage is less significant.

Uri

Uri Blass · Post by **Uri Blass** » Tue Jan 20, 2009 8:01 am

bhlangonijr wrote:Very interesting thread.

Let us start from the beginning. I'm wondering if there is a scientific method to verify this contest.

How can we quantify the "gain function" from benchmarking the chess playing agent (x hardware and y software) in order to determine which factor had the major influence over the playing strength? Just computing ELO gains in a kind of tournament?

Is the approach suggested by H.G. Muller sufficient to verify this?

Could we emulate the old hardware playing the engines with time/resource handicap?

Perhaps this question is more complicated than suggests the simplicity of its statement. As Bob mentioned, old software is tuned to old hardware. This way it is not accurate to benchmark playing the old software on today hardware and so on. In other words, maybe we should take in account testing the subjacent idea of a given software from a given year rather than really testing its implemented version.

Whatever is takes, my bet is on the hardware.

I see nothing unfair.

The fact that old software is tuned to old hardware is weakness of the software(not fault of the programmers but weakness of the software).

tuning the software to new hardware is clearly a software improvement.

Let say that
We have the following in comp-comp rating at 120/40:

1998 hardware+1998 software=2500 elo
1998 hardware+2008 software=3000 elo
2008 hardware+1998 software=3000 elo
2008 hardware+2008 software=3400 elo

In this case half of the improvement is thanks to software and half of the improvement is thanks to hardware.

bob · Post by **bob** » Tue Jan 20, 2009 8:09 am

hgm wrote:
bob wrote:If you presume that you reduce the tree without introducing any side-effects that weaken the search. Reducing the size of the tree by 50% is not exactly the same thing as searching 2x faster. And 2x faster is 50-70 Elo. LMR reduces the tree by much more than that and yet it is not worth 140+ Elo or anything remotely close to that... Or even 1/2 of that...
So you think that changing the ordering of captures can have side effects that weaken the search?

Unlikely. But an N% reduction in the size of the tree doesn't equate to some specific Elo improvement either.

bob · Post by **bob** » Tue Jan 20, 2009 8:18 am

bhlangonijr wrote:Very interesting thread.

Let us start from the beginning. I'm wondering if there is a scientific method to verify this contest.

How can we quantify the "gain function" from benchmarking the chess playing agent (x hardware and y software) in order to determine which factor had the major influence over the playing strength? Just computing ELO gains in a kind of tournament?

Is the approach suggested by H.G. Muller sufficient to verify this?

Could we emulate the old hardware playing the engines with time/resource handicap?

Perhaps this question is more complicated than suggests the simplicity of its statement. As Bob mentioned, old software is tuned to old hardware. This way it is not accurate to benchmark playing the old software on today hardware and so on. In other words, maybe we should take in account testing the subjacent idea of a given software from a given year rather than really testing its implemented version.

Whatever is takes, my bet is on the hardware.

It is a fairly straightforward process, with one complicating factor, the 100:1 speed gain from 1998 to 2008. If you play very fast games with today's software, compared with 100x longer time control for 1998 software, which would effectively measure the effect of 100x faster hardware. But the 1998 software is going to play very slowly, and that makes it more difficult to run.

I think the interesting tests would be as follows:

(1) old and new software on 1998 hardware. This, along with (2) below would give us the ability to measure the synergism produced when faster hardware allows us to do more things that we could risk in 1998.

(2) old and new software on 2008 hardware. If there is a significant difference between the two results (1) vs (2) this would show this synergism clearly, since if the new software does better on the new hardware then that has to be the result of the program taking better advantage of the new speed than the old program.

(3) Finally the great equalizer test would be old vs new software with old software on new hardware, vs new software on old hardware to see what the 100x hardware speedup offers for the old software. We already know what the software provided by looking at (2) above.

You could omit (1) but I think the issue is interesting, because I _know_ that I now do things that I considered too expensive in 1998 (mobility for one, but there are others).

bob · Post by **bob** » Tue Jan 20, 2009 8:22 am

Uri Blass wrote:
bob wrote:
Uri Blass wrote:
bob wrote:
Uri Blass wrote:
Dirt wrote:
bob wrote:Couple of points. First, no way you could get 75K on a P200/mmx. The Pentium Pro 200 would do 75 K, but the basic Pentium, all the way thru the P200/mmx was a different processor architecture, no out of order execution or any of that. I had a p200/mmx laptop that would not break 50K. The Pentium pro 200 was getting around 75K. I ran on it in Jakarta, and in my office, until I got the quad Pentium pro 200 box.

If you take the 400mhz processor, which I am not sure was around in 1988, at least the xeon 2mb L2 that I had, then 150K nodes per second would be the answer. Late in 2008 an i7 was producing 20M nodes per second.. If you divide 20M by 150K you get a raw factor of 133x. If you factor in the 3.3/4.0 speedup for a parallel search you get a factor of 110x.

We could settle on 100:1 for simplicity, although that is _still_ a daunting advantage. I find it amusing that so many are convinced that software is most of the improvement, yet we are quibbling over a 10%-20% speed difference in hardware. It seems, to me, that perhaps most really do believe that hardware is the more important contributor to today's ratings in spite of rhetoric to the contrary...

Once we get to the improvement in hardware as settled, then there are going to be enough arguments to make the test hopeless anyway. 1 sec to 100 secs is a _long_ game. At 60 moves as a reasonable average game length, that turns into almost 2 hours per game. Even at 256 games at a time on my cluster, it takes a long time to play enough games to get good answers. And some are not going to accept 1 second vs 100 seconds. And want 1 minute to 100 minutes, which is not going to happen with me...
One hundred to one does sound like a nice number. You seem convinced that is too little, others are sure it's too much.

Are you considering 1 vs 100 seconds as an increment? That would be unfair to Fritz. The fairest time would have the time they were tuned for as the geometric mean. While everyone I'm sure tries to tune for longer time controls, in practice I think it must be much shorter, perhaps someplace around 30 seconds per game. I think either 2"/40 vs 200"/40 or 1" + 0.03 vs 100" + 3 would be in the right ballpark, assuming Rybka's time management doesn't completely fail at that level.
I do not care about the time the program were tuned for.

progress of software is based on result and not based on intentions.
If program X was better at 120/40 ponder on ssdf games and weaker at blitz at 1998 then I consider program X to be better for the discussion.

120/40 ponder on games mean something equivalent to 4.5 minutes ponder off on P200MMX so it clearly mean more than 1 second on hardware of today.

Uri
You totally miss his point. Programs of a specific period are tuned for the usual time controls of that period, on the hardware of that period. Whether this is an issue with a very long increment or not I don't know. Crafty can work with any increment at all....
I talk about ssdf time control(120/40 ponder on) or something equivalent to it.

Programmers clearly cared about winning ssdf and world championship was even equivalent to slower time control if you consider the fact that programs use a better hardware in WCCC.

They had not enough computer time to test on equivalent time control and I agree that part of the software improvement is thanks to better hardware that allow more testing but the reason for software improvement does not change the fact that it is a software improvement.

Uri
You are _still_ missing the point. Programmers of _today_ are tuning their programs based on the hardware speeds we have today. At longer time controls. Programmers of 10 years ago were tuning their programs based on the hardware speeds of 10 years ago. Part of what we do in software has been enabled by having faster hardware. No way we can compare normal time controls of today against 1998 hardware. 100:1 makes the games intractable, even for my cluster...
Ok
Thinking about it again I see what you mean and I think that it is not relevant to the question of hardware vs software and it only can be one of the reasons for software improvement.

You say that
software of 1998 is not efficient in taking advantage from today hardware because they were not programmed for that purpose and it is a disadvantage of the software of 1998.

The reasons for the disadvantage are not important for the analysis and I do not blame the programmers of 1998.

The question is simple in which case we could get better chess computer players at 120/40 time control

1)hardware from january 1999 remains constant but we can use the software of today.
2)software from january 1999 remains constant but we can use the hardware of today.

Uri

To me the main issue becomes "what software"??? Non-linux software makes this difficult to do because my cluster won't run windoze stuff...

Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software

Re: Hardware vs Software