move count based pruning

Don · Post by **Don** » Wed Sep 08, 2010 2:48 am

bob wrote:
Don wrote:
bob wrote:
Don wrote:I don't know where you are getting the 1000x factor from - I'm looking at 1994 and the pentium with the 200 MHZ processors were available then. It's a lot slower than a single core of an i7, but it's nothing like 1000x slower. You have to go back a lot farther than that to see 1000x.

But I'm hoping someone here actually still has a old 1994 vintage computer (the best of the day if we compare it to i7) and we can run chess genius or crafty or something on it and test it.

Meanwhile, I'm going to run 1 or 2 games per day with chess genius vs Robbolitto.

I'm going to adjust the time control for each match in an attempt to find the rough break even point. I should be able to get within 100 ELO this way without having to play millions of games.

Don
I am talking pure NPS. I will see if I can find any printed information. I do remember that when the P6/200 came out, we were able to hit around 70K nodes per second. This was available in early 1996 and we used it in the 1996 WMCCC event. The P5/200 was a pure dog compared to the OOE P6 chip. The first P5 we bought was early in 1994, and this is the chip I was discussing as representative of that time frame. In looking back, by the middle of 1994 we had the p5/133 boxes here, having started with a 60mhz and then 90mhz version.

I believe Crafty peaked around 30K nodes per second on the fastest P5 I ever ran on. Compared to 100M two years ago. So 1000x is not completely unrealistic, IMHO.
100 million nodes per second 2 years ago? Stockfish does something like only 2 million so Crafty is about 50x faster in nodes per second?
Crafty is doing 20M-30M nodes per second on my 3-year old dual quad box I have been using. The 100M number was on a machine with 32 cores, about 2 years ago or so. It was an 8-cpu quad-core server box prototype. Nothing particularly fancy or pricey... I have seen 30M-40M on a dual-quad Nehalem box we had here for a while for evaluation, this about 6-8 months ago.

For this "comparison" do we ignore the most significant hardware advances over the past 5-6 years, which has mainly been multi-core???

I did not have mult-core in mind when I first brought this issue up so for the comparison we should not consider that. Muti-processing makes this even more difficult because we had multi-processors many decades ago too and it's an extra variable. So for the sake of getting an estimate let's leave it out.

It occurred to me that are so many variables this cannot be a truly fair test. The best we can hope for is to get a rough estimation. For example the programs probably had inferior compilers - unless of course we use crafty and you compile it with the same compiler. But programs also tend to be optimized for the platform that the author used when developing those programs. But most of these issues are low order bits in the bigger picture.

But what may be more important is the time control the programs were designed for. In a sense most programs are designed for human-like time controls on the platform of the day. I have a feeling that if we ran todays programs on yesterdays hardware at human-like time controls we would get a different answer than if we ran the same two programs on todays hardware.

Then there is the 64 bit issue. That is a hardware advance, but we did have 64 bit programs a few decades ago - for example Cilkchess and presumably Blitz and Crafty of course was always 64 bit.

michiguel · Post by **michiguel** » Wed Sep 08, 2010 2:59 am

Don wrote:
bob wrote:
Don wrote:
bob wrote:
Don wrote:I don't know where you are getting the 1000x factor from - I'm looking at 1994 and the pentium with the 200 MHZ processors were available then. It's a lot slower than a single core of an i7, but it's nothing like 1000x slower. You have to go back a lot farther than that to see 1000x.

But I'm hoping someone here actually still has a old 1994 vintage computer (the best of the day if we compare it to i7) and we can run chess genius or crafty or something on it and test it.

Meanwhile, I'm going to run 1 or 2 games per day with chess genius vs Robbolitto.

I'm going to adjust the time control for each match in an attempt to find the rough break even point. I should be able to get within 100 ELO this way without having to play millions of games.

Don
I am talking pure NPS. I will see if I can find any printed information. I do remember that when the P6/200 came out, we were able to hit around 70K nodes per second. This was available in early 1996 and we used it in the 1996 WMCCC event. The P5/200 was a pure dog compared to the OOE P6 chip. The first P5 we bought was early in 1994, and this is the chip I was discussing as representative of that time frame. In looking back, by the middle of 1994 we had the p5/133 boxes here, having started with a 60mhz and then 90mhz version.

I believe Crafty peaked around 30K nodes per second on the fastest P5 I ever ran on. Compared to 100M two years ago. So 1000x is not completely unrealistic, IMHO.
100 million nodes per second 2 years ago? Stockfish does something like only 2 million so Crafty is about 50x faster in nodes per second?
Crafty is doing 20M-30M nodes per second on my 3-year old dual quad box I have been using. The 100M number was on a machine with 32 cores, about 2 years ago or so. It was an 8-cpu quad-core server box prototype. Nothing particularly fancy or pricey... I have seen 30M-40M on a dual-quad Nehalem box we had here for a while for evaluation, this about 6-8 months ago.

For this "comparison" do we ignore the most significant hardware advances over the past 5-6 years, which has mainly been multi-core???
I did not have mult-core in mind when I first brought this issue up so for the comparison we should not consider that. Muti-processing makes this even more difficult because we had multi-processors many decades ago too and it's an extra variable. So for the sake of getting an estimate let's leave it out.

It occurred to me that are so many variables this cannot be a truly fair test. The best we can hope for is to get a rough estimation. For example the programs probably had inferior compilers - unless of course we use crafty and you compile it with the same compiler. But programs also tend to be optimized for the platform that the author used when developing those programs. But most of these issues are low order bits in the bigger picture.

But what may be more important is the time control the programs were designed for. In a sense most programs are designed for human-like time controls on the platform of the day. I have a feeling that if we ran todays programs on yesterdays hardware at human-like time controls we would get a different answer than if we ran the same two programs on todays hardware.

Then there is the 64 bit issue. That is a hardware advance, but we did have 64 bit programs a few decades ago - for example Cilkchess and presumably Blitz and Crafty of course was always 64 bit.

You guys are making this so complicated it is not fun anymore

By the year ~93 the best machine you could buy from a store was Pentium 90, Today it is an Intel i7 Quad. Whatever is available for a normal chess player should be the point, I think. At least, that would be a simple criteria.

Miguel

bob · Post by **bob** » Wed Sep 08, 2010 3:37 am

michiguel wrote:
Don wrote:
bob wrote:
Don wrote:
bob wrote:
Don wrote:I don't know where you are getting the 1000x factor from - I'm looking at 1994 and the pentium with the 200 MHZ processors were available then. It's a lot slower than a single core of an i7, but it's nothing like 1000x slower. You have to go back a lot farther than that to see 1000x.

But I'm hoping someone here actually still has a old 1994 vintage computer (the best of the day if we compare it to i7) and we can run chess genius or crafty or something on it and test it.

Meanwhile, I'm going to run 1 or 2 games per day with chess genius vs Robbolitto.

I'm going to adjust the time control for each match in an attempt to find the rough break even point. I should be able to get within 100 ELO this way without having to play millions of games.

Don
I am talking pure NPS. I will see if I can find any printed information. I do remember that when the P6/200 came out, we were able to hit around 70K nodes per second. This was available in early 1996 and we used it in the 1996 WMCCC event. The P5/200 was a pure dog compared to the OOE P6 chip. The first P5 we bought was early in 1994, and this is the chip I was discussing as representative of that time frame. In looking back, by the middle of 1994 we had the p5/133 boxes here, having started with a 60mhz and then 90mhz version.

I believe Crafty peaked around 30K nodes per second on the fastest P5 I ever ran on. Compared to 100M two years ago. So 1000x is not completely unrealistic, IMHO.
100 million nodes per second 2 years ago? Stockfish does something like only 2 million so Crafty is about 50x faster in nodes per second?
Crafty is doing 20M-30M nodes per second on my 3-year old dual quad box I have been using. The 100M number was on a machine with 32 cores, about 2 years ago or so. It was an 8-cpu quad-core server box prototype. Nothing particularly fancy or pricey... I have seen 30M-40M on a dual-quad Nehalem box we had here for a while for evaluation, this about 6-8 months ago.

For this "comparison" do we ignore the most significant hardware advances over the past 5-6 years, which has mainly been multi-core???
I did not have mult-core in mind when I first brought this issue up so for the comparison we should not consider that. Muti-processing makes this even more difficult because we had multi-processors many decades ago too and it's an extra variable. So for the sake of getting an estimate let's leave it out.

It occurred to me that are so many variables this cannot be a truly fair test. The best we can hope for is to get a rough estimation. For example the programs probably had inferior compilers - unless of course we use crafty and you compile it with the same compiler. But programs also tend to be optimized for the platform that the author used when developing those programs. But most of these issues are low order bits in the bigger picture.

But what may be more important is the time control the programs were designed for. In a sense most programs are designed for human-like time controls on the platform of the day. I have a feeling that if we ran todays programs on yesterdays hardware at human-like time controls we would get a different answer than if we ran the same two programs on todays hardware.

Then there is the 64 bit issue. That is a hardware advance, but we did have 64 bit programs a few decades ago - for example Cilkchess and presumably Blitz and Crafty of course was always 64 bit.
You guys are making this so complicated it is not fun anymore

By the year ~93 the best machine you could buy from a store was Pentium 90, Today it is an Intel i7 Quad. Whatever is available for a normal chess player should be the point, I think. At least, that would be a simple criteria.

Miguel

It is worse than that. I am having a hell of a time making the 1995 version of Crafty run on 64 bit hardware. The current version will run on anything, but the old version had some assembly language code that uses a slightly different syntax, doesn't understand %rax and such, and today's code can't be "borrowed" because I renumbered the damned bits a couple of years back. After a few hours of fiddle-farting around, all I can do is produce core files.

I can't even borrow the current move generator stuff, because the bits are wrong. Currently puzzling over what to try next...

bob · Post by **bob** » Wed Sep 08, 2010 4:29 am

bob wrote:
silentshark wrote:
bob wrote:
silentshark wrote:
bob wrote:
silentshark wrote:This is all great debate..

But do any of you have any hard data on whether LMP (not LMR) is worthwhile?

My data is this - after 1300 games, I am seeing a +20 ELO improvement using a simple LMP scheme. This is in addtion to other pruning like LMR and null move. Of course, there's still a reasonable margin of error after that number of games.. Note this is "first cut", a very suboptimal implementation of LMP, I'm sure.

I'd be interested to see what Crafty could gain from a simple LMP scheme.

I'd also be interested to see what Komodo would lose by removing its LMP.

And what would SF lose by removing its LMP?
Give me some details about what you are doing and I can test. Our futility code has a bit of the LMP concept embedded since early moves (good moves, hopefully) don't get discarded, although this is only done in the last 4 plies at present.
okay.. my LMP implementation is pretty dumb. Summary:

- do not apply for hash moves, killers, captures or pawn promotions
- do not apply if stm is in check
- otherwise:
- at depth 1, prune once you have searched 7 moves
- at depth 2, prune once you have searched 11 moves
- at depth 3, prune once you have searched 23 moves
- at depth 4, prune once you have searched 35 moves

I'd be interested to see if this improves Crafty or not.

The Stockfish LMP implementation is more sophisticated, much more aggressive (and I'm pretty certain, much better!)
Should be easy enough, since I am already doing futility pruning in last 4 plies. I assume you ignore alpha bound, material score, and just prune away after N moves as given above?
yep, exactly.

So crude it shouldn't work, right?

I'll bet there are better values for N, too..
I'll get this going today. For the first cut, I am trying to decide whether to simply add it on top of futility pruning, or just blindly prune on move number and leave the futility part completely out. (I do mean use normal futility pruning, and when it decides not to prune because of alpha bound + material, then use your rule instead, as my futility stuff will prune some stuff yours would skip (early moves) while yours will prune some stuff mine will skip.

OK, I am starting this test now. Will report results tomorrow morning.

First run is to replace my forward-pruning with your LMP. I might then try a "merger" to see if the two ideas can be combined so that I test material vs alpha _and_ number of moves searched...

bob · Post by **bob** » Wed Sep 08, 2010 4:57 am

Ugh. This is not going so well. I used exactly your numbers, doing "LMP" in the last 4 plies after searching the number of moves you gave. I removed my forward pruning code (which is based on alpha, material score, and an error margin that gets larger as depth increases, but still limited to last 4 plies only.

Early results so far:

Code: Select all

    Crafty-23.4-2        2676    4    4 30000   66%  2552   22% 
    Crafty-23.4-1        2673    4    4 30000   66%  2552   22% 
    Crafty-23.4R03       2600    7    7  5891   56%  2551   25%

You might want to give my forward pruning a try. I simply compare material-score + error-margin and compare to alpha. If that value is <= alpha, I prune. Only in last 4 plies. The 4 margins are 125, 125, 300, 300. I have margins for deeper depths but cut it off at remaining depth = 4. I've not had any better results pruning at 5 or 6, yet.

Don · Post by **Don** » Wed Sep 08, 2010 6:00 am

bob wrote: For this "comparison" do we ignore the most significant hardware advances over the past 5-6 years, which has mainly been multi-core???

We should do this test with multi-core machines.

silentshark · Post by **silentshark** » Wed Sep 08, 2010 1:22 pm

bob wrote:Ugh. This is not going so well. I used exactly your numbers, doing "LMP" in the last 4 plies after searching the number of moves you gave. I removed my forward pruning code (which is based on alpha, material score, and an error margin that gets larger as depth increases, but still limited to last 4 plies only.

Early results so far:
Code: Select all
    Crafty-23.4-2        2676    4    4 30000   66%  2552   22% 
    Crafty-23.4-1        2673    4    4 30000   66%  2552   22% 
    Crafty-23.4R03       2600    7    7  5891   56%  2551   25% 
You might want to give my forward pruning a try. I simply compare material-score + error-margin and compare to alpha. If that value is <= alpha, I prune. Only in last 4 plies. The 4 margins are 125, 125, 300, 300. I have margins for deeper depths but cut it off at remaining depth = 4. I've not had any better results pruning at 5 or 6, yet.

Hmm. That doesn't look too promising. Can you give it a whirl as an addition to all your normal foward pruning stuff?

If this implementation doesn't work, I wouldn't write LMP off. A few things:

- this seems to work for Stockfish, Toga (and others?)
- successful implementation is probably very program dependent
- move ordering of (non hash, non killer, non capture) moves may be important if you are doing LMP. iirc, for these moves Crafty applies no move ordering.

I'm still interested what the result of taking LMP out of Komodo or
Stockfish looks like.

rbarreira · Post by **rbarreira** » Wed Sep 08, 2010 1:37 pm

Don wrote: But what may be more important is the time control the programs were designed for. In a sense most programs are designed for human-like time controls on the platform of the day. I have a feeling that if we ran todays programs on yesterdays hardware at human-like time controls we would get a different answer than if we ran the same two programs on todays hardware.

I think you are definitely on to something here... Whether we run a new program on old hardware or an old program on new hardware, we're taking a fish out of the water. Today's programs probably do things that would be detrimental on old hardware (trusting the availability of large amounts of RAM, for one thing...), and old programs aren't tuned to take advantage of very fast hardware...

It's probably possible to get some reasonably good data out of these experiments anyway, but it might always have caveats.

Don · Post by **Don** » Wed Sep 08, 2010 3:18 pm

rbarreira wrote:
Don wrote: But what may be more important is the time control the programs were designed for. In a sense most programs are designed for human-like time controls on the platform of the day. I have a feeling that if we ran todays programs on yesterdays hardware at human-like time controls we would get a different answer than if we ran the same two programs on todays hardware.
I think you are definitely on to something here... Whether we run a new program on old hardware or an old program on new hardware, we're taking a fish out of the water. Today's programs probably do things that would be detrimental on old hardware (trusting the availability of large amounts of RAM, for one thing...), and old programs aren't tuned to take advantage of very fast hardware...

It's probably possible to get some reasonably good data out of these experiments anyway, but it might always have caveats.

Yes, it has to be taken with a grain of salt. Software and hardware tend to evolve together.

For example the MP issue. MP is not just hardware but represents a software advance too. So when we do these experiments with MP turned on in order to measure the "benefit" of hardware advances we will be pretending that software has nothing to do with it, even though it does.

There are bunch of the caveats that make this not an apples to apples comparison that I have come up with. The design of the program is based on the current hardware, we would write a "modern" program differently if it were run on older hardware. We depend on bigger cache sizes, we use compilers designed for new processors (although some have switches for older architectures), hash table sizes were often limited in the older programs which couldn't take advantage of todays memory sizes anyway. (But we should consider memory size on the hardware side of the issue.) Even though it may not be considered relevant, 64 bit programs are suddenly very popular in a time where 64 bit OS's are also very popular. Is this a hardware or software advance? Crafty has always been 64 bit most of the commodity OS's were 32 bit and probably could not take full advantage. I believe that on 64 bit hardware 64 bit programs have an advantage, but on 32 bit hardware I think it's probably a slight disadvantage.

I think a big issue may also be scalability. The results may depend very much on the time control - where running really fast would be more favorable for the older programs and running really slow for the newer programs. So if you run a modern program on ancient hardware where you only doing 3 or 4 ply searches, it's like putting a fast runner who is a slow starter in a really short race - he does not have a chance to really stretch his legs out.

bob · Post by **bob** » Wed Sep 08, 2010 6:21 pm

silentshark wrote:
bob wrote:Ugh. This is not going so well. I used exactly your numbers, doing "LMP" in the last 4 plies after searching the number of moves you gave. I removed my forward pruning code (which is based on alpha, material score, and an error margin that gets larger as depth increases, but still limited to last 4 plies only.

Early results so far:
Code: Select all
    Crafty-23.4-2        2676    4    4 30000   66%  2552   22% 
    Crafty-23.4-1        2673    4    4 30000   66%  2552   22% 
    Crafty-23.4R03       2600    7    7  5891   56%  2551   25% 
You might want to give my forward pruning a try. I simply compare material-score + error-margin and compare to alpha. If that value is <= alpha, I prune. Only in last 4 plies. The 4 margins are 125, 125, 300, 300. I have margins for deeper depths but cut it off at remaining depth = 4. I've not had any better results pruning at 5 or 6, yet.
Hmm. That doesn't look too promising. Can you give it a whirl as an addition to all your normal foward pruning stuff?

If this implementation doesn't work, I wouldn't write LMP off. A few things:

- this seems to work for Stockfish, Toga (and others?)
- successful implementation is probably very program dependent
- move ordering of (non hash, non killer, non capture) moves may be important if you are doing LMP. iirc, for these moves Crafty applies no move ordering.

I'm still interested what the result of taking LMP out of Komodo or
Stockfish looks like.

here is the final data:

Code: Select all

   Crafty-23.4-2        2676    3    3 30000   66%  2553   22%
   Crafty-23.4-1        2673    3    3 30000   66%  2553   22%
   Crafty-23.4R03-1     2596    3    3 30000   56%  2553   24%
   Crafty-23.4R03-2     2596    3    3 30000   56%  2553   24%

Now for some possible reasons. First, if I search the same position to the same depth using the two above versions, R03 (LMP) searches a tree at least 2x bigger, so my futility pruning code is significantly more aggressive. It is also more selective in that it doesn't prune moves that get us close to alpha, while dumping those that do not.

I prune in the last 4 plies, At any ply, before I prune, I must satisfy all of the following:

(1) this move is not a hash move, nor a capture with SEE >0, nor a killer move.

(2) I am not in check (I have plans to relax this a bit by asking "am I in check, and did I extend the move that checked me?" since I don't extend losing checks (according to SEE)).

(3) I did not extend the current move that I am considering pruning (this means this move is a safe check)

(4) I have searched at least one move of some sort (first move is never pruned).

(5) material score + error margin < alpha.

If I get thru those tests, then I start to test for pruning, which only happens in the last 4 plies using the 4 different error margins (indexed by depth) that I gave last night...

I believe this is yet another case of the reason why I dislike the concept of "move number pruning/reduction decisions."

Another issue here is that I already know that for typical middle-game positions, if Crafty fails high at any node, about 92% of the time it fails high on the first node. By the time I get thru the hash move, good captures, and killers, this number is way on up (I do not measure this now, but am going to run some tests shortly and will report the results) there, possibly as high as 99%. If that is so, then reducing everything beyond those moves makes a lot of sense, unless something marks them as "remarkable" (such as winning material, safe check, etc.).

Notice that in the above test, I removed futility pruning since they overlap with each other. But I did keep the reduction code so that moves that were not pruned away were still subject to my normal reductions.

You mentioned it is working for StockFish. I have not looked at their code (yet). Do they outright prune based only on the move's position in the list, or do they have the material+margin < alpha requirement as well, which really changes the way this works???

move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning