move count based pruning

bob · Post by **bob** » Mon Sep 06, 2010 11:02 pm

Don wrote:As you say, there is working "well" and working "optimally" but the only point I'm making is that you can say anything you want to about how well something is working if you pick and choose your level of expectation.

But perhaps the most important point I want to make is that we "give up" as soon as we have something that satisfies us. In other words, if it beats human players we don't try for anything better and we imagine that we have found the best thing.

I don't agree there. I started doing this in 1968, when everyone's goal was to simply beat a master. I continued doing this thru the 80's where the goal was to beat a GM here and there. Thru the 90's when we began to beat super-GM players. Thru today where we can beat all players although not every game. Perhaps the next goal is to never lose a game against a human, then to never draw a game. So expectations keep rising. Probabilities can take you so far, because that is purely an estimate. To get to the "end" you either need an infinite number of samples so that the estimation error goes to zero, or you need a more accurate search that takes fewer probabilistic leaps of faith, and uses explicit information instead.

It's funny that in GO, classic alpha/beta pruning and search was not perceived to be working very well so they didn't stop looking! I personally thought classic A/B WAS was working pretty well based on the observation that it behaved like chess, a 2 ply search dominated a 1 ply search. Nobody was looking at it from that point of view, they were saying it was not working well based on some arbitrary standard.

It seemed to me that all they had to do was change their point of reference. Any of the good programs could crush human players just learning the game. So how can you say it wasn't working?

I guess what I'm saying is that I'm annoyed at how ill-defined these kind of statements are. A simple definitions of what works is anything that is scalable - so alpha/beta pruning works and works extremely well even in GO. This in no way is a statement about whether something else might be far superior nor does it ignore the fact that with modern hardware A/B programs play very badly IN COMPARISON to even relatively weak human players.

In this same context, modern chess search "works" but that doesn't mean it is anywhere near optimal.

The games are way different, however. Go has a very large branching factor, which makes the trees very difficult to manage, even for humans. Monte-Carlo seems reasonable when something is so large it is difficult to grasp. Chess is not that large. The best GM players search incredibly deeply. So do the best programs. MC does not. It substitutes quantity (thousands of fast games) for quality (an accurate / intelligent / deep search). I can grasp why this works. If you can create tens of thousands of positions almost instantly, a human is going to have trouble dealing with them. And, by necessity, does some pretty heavy-handed pruning. A MC search uses a computer's strength (lots of calculations done very quickly) to play thousands (or perhaps millions on fast hardware at some point) of games to assess each of these possible positions. Which is playing into a human's weakness.

I do not believe that thousands of very fast games is going to be competitive with strong humans. Those thousands of games will be based on limited depth, and the human is going to out-calculate them pretty badly. Larry's game against the supercon is an example. You could let the supercon analyze for months, playing very short games to evaluate each possible move, and it would still be clueless to the long-range things that the human sees, but which its sorely restricted search does not.

I would not be surprised to see in a few years that MC worked well at first, but begins to get abandoned because of the extreme short-sighted analysis it is depending on. Or it is perhaps possible that go is just "large enough" in terms of search space that the short-sighted probabilistic analysis is good enough to beat humans anyway.

I've played the game a very little bit, and this was 25+ years ago, so I am hardly a decent player, and, therefore, have not given any great thought to how I might try to attack the game electronically. But my analysis, above, is probably the explanation for the MC successes.

In fact, there is strong evidence that it is not. A few months or maybe even years ago there was discussion here about whether we progressed more with hardware or software. I argued that it was hardware but I think the consensus in general (which I now agree with) is that it is a close call. The best programs of today are literally hundreds of ELO stronger (not considering hardware) and the hardware has also given us literally hundreds of ELO. So to me it's not a stretch of the imagination to believe that in 10 or 20 years the software will be far superior yet again.

I don't think there's any doubt it is hardware, as I was on that side at the time and remain there. Mainly because I have run cluster tests each time someone said something like "but null-move was a gigantic improvement. If you count +80 Elo, correct. Else, not. Chess, it turns out, is just simple enough that it can be tackled with brute-force methods, effectively. As the branching factor goes up, however, clearly alternatives are needed. Exponential is still exponential.

This is one of those deals where it's almost impossible to believe that this could be the case, but I think in 1990 if you had told the very best chess program authors about Rybka (for example) and that it was a few hundred ELO stronger than anything out there (even without the hardware) it would have appeared to be pretty incredible.

I'm not sure why. Chess 4.0 was "way out there" with respect to all the other programs, in the 70's. In the 80's it was hardware all the way, from Belle, to Cray Blitz, to Deep Thought and then Deep Blue. Deep Thought /Deep blue was certainly 300-400 better than anything around in the late 80's and early 90's. And then along came Genius for the microcomputers, and we were "there again" on low-end hardware with one program head and shoulders above the rest.

Of course there is surely a point of diminishing returns, but something I have discovered about diminishing returns is that people always think they are in the thick of it, even when there is a lot left - because it's just human nature to believe that what you have and see right now is the most representative of reality.

bob wrote: Which humans can compete with these things on good hardware today? I'd say it is working pretty well. Particularly when you compare that to the go results, which are _far_ worse, monte-carlo or not.

don wrote: Are you kidding? This is not working very well at all. I don't know about you but I'm totally dissatisfied with the state of the art.

If you play even the very best computers against each other, you will notice that they are still losing lots of games with either color. This is proof that they play the game far from optimally and something is wrong. What we are doing is just not working.

bob wrote: There is "working well" and "working optimally". I do not believe there is an optimal solution until you find one that finds win/lose/draw from the initial position. So, for the next few thousand years, "optimal" isn't going to happen. "well" is a different matter. "well" can always be improved on, until well == optimal, but what we are doing today certainly meets the "working well" definition as applied to AI.

don wrote:
We waited decades for computer hardware to get to the point where it is right now, to the point where computers could actually beat the best human players most of the time. In fact, that is the ONLY reason computers have a chance against these weak human players (who also play far from optimally.) The algorithms have improved significantly but we still must have this ridiculously powerful hardware to beat the humans.

I estimate that humans play about 1500 ELO below optimal (a totally arbitrary guess) so beating them is not a very impressive accomplishment with the enormously powerful hardware we have today. To me it's lame to judge how well something is working by comparing to such a low bar.

If you were a high jumper would you set the bar to 6 inches and then make a big fuss over how awesome you are because you can clear it and then claim that your training program is working pretty well?

bob wrote: If no 'state-of-the-art jumper' cound clear that 6" bar, say we are talking about jumping on the surface of Juipter for example, then I would say that training program is working pretty well.

Which do you think would be stronger, a monte-carlo go program of today, or slip into a time-warp, jump a couple of hundred years into the future, and use today's alpha/beta search on tomorrow's hardware to search as deeply in go as we search in chess today? The MC go programs would be totally crushed.
I cannot answer that with any degree of certainty either way. But you should realize that MC programs have reasonable PV's that are very long on todays hardware and this would only improve. So both styles of programs would of course improve massively - and if I had to make a bet I would go with the MCTS programs since every bit of evidence we currently has says this is the way to go.

One thing is sure, the algorithms would evolve over time so by the time we got to that point we would know a lot more about how things work than we know today.

Just remember that those "reasonable PV's" are highly probabilistic and selective, based on very fast game results. I don't see much sense in a PV that is 15 moves deep where each ply's best move was chosen by playing thousands of very fast, very shallow games. One could always try this in chess to see what would happen. I suspect we can predict that, however.

Don · Post by **Don** » Tue Sep 07, 2010 12:03 am

I don't think you actually understand MCTS as you keep talking like all it does is "play 1000 games real fast."

Also, you claim it has a serious weakness in that it doesn't properly see long range threats. But what are you comparing that to? Alpha/beta pruning? What you are saying is way off - this beat classic alpha beta hands down.

I won't deny that long range threats are a problem - but that applies to all known techniques - this is so far the best solution to that.

But MCTS stands for Monte Carlo Tree Search. It's not just an normal alpha beta search where at end nodes you "play 1000 games real fast."

And it's recursive. As the search progresses, more information is combined to steer the tree in the proper direction. It's a very sophisticated things and just because the phrase "Monte Carlo" is in there doesn't mean it's a "random" search.

Now as far as progress in computer chess goes - after the big discussion of many months ago I re-thought my position.

We should do this experiment: What is the oldest world class chess program that will run in our testers? We probably get an ancient version of Fritz or Shredder or something and see if Rybka is any stronger. I'll bet you that the difference is not going to be explainable by 50 ELO for null move pruning.

Don

bob wrote:
Don wrote:As you say, there is working "well" and working "optimally" but the only point I'm making is that you can say anything you want to about how well something is working if you pick and choose your level of expectation.

But perhaps the most important point I want to make is that we "give up" as soon as we have something that satisfies us. In other words, if it beats human players we don't try for anything better and we imagine that we have found the best thing.
I don't agree there. I started doing this in 1968, when everyone's goal was to simply beat a master. I continued doing this thru the 80's where the goal was to beat a GM here and there. Thru the 90's when we began to beat super-GM players. Thru today where we can beat all players although not every game. Perhaps the next goal is to never lose a game against a human, then to never draw a game. So expectations keep rising. Probabilities can take you so far, because that is purely an estimate. To get to the "end" you either need an infinite number of samples so that the estimation error goes to zero, or you need a more accurate search that takes fewer probabilistic leaps of faith, and uses explicit information instead.

It's funny that in GO, classic alpha/beta pruning and search was not perceived to be working very well so they didn't stop looking! I personally thought classic A/B WAS was working pretty well based on the observation that it behaved like chess, a 2 ply search dominated a 1 ply search. Nobody was looking at it from that point of view, they were saying it was not working well based on some arbitrary standard.

It seemed to me that all they had to do was change their point of reference. Any of the good programs could crush human players just learning the game. So how can you say it wasn't working?

I guess what I'm saying is that I'm annoyed at how ill-defined these kind of statements are. A simple definitions of what works is anything that is scalable - so alpha/beta pruning works and works extremely well even in GO. This in no way is a statement about whether something else might be far superior nor does it ignore the fact that with modern hardware A/B programs play very badly IN COMPARISON to even relatively weak human players.

In this same context, modern chess search "works" but that doesn't mean it is anywhere near optimal.
The games are way different, however. Go has a very large branching factor, which makes the trees very difficult to manage, even for humans. Monte-Carlo seems reasonable when something is so large it is difficult to grasp. Chess is not that large. The best GM players search incredibly deeply. So do the best programs. MC does not. It substitutes quantity (thousands of fast games) for quality (an accurate / intelligent / deep search). I can grasp why this works. If you can create tens of thousands of positions almost instantly, a human is going to have trouble dealing with them. And, by necessity, does some pretty heavy-handed pruning. A MC search uses a computer's strength (lots of calculations done very quickly) to play thousands (or perhaps millions on fast hardware at some point) of games to assess each of these possible positions. Which is playing into a human's weakness.

I do not believe that thousands of very fast games is going to be competitive with strong humans. Those thousands of games will be based on limited depth, and the human is going to out-calculate them pretty badly. Larry's game against the supercon is an example. You could let the supercon analyze for months, playing very short games to evaluate each possible move, and it would still be clueless to the long-range things that the human sees, but which its sorely restricted search does not.

I would not be surprised to see in a few years that MC worked well at first, but begins to get abandoned because of the extreme short-sighted analysis it is depending on. Or it is perhaps possible that go is just "large enough" in terms of search space that the short-sighted probabilistic analysis is good enough to beat humans anyway.

I've played the game a very little bit, and this was 25+ years ago, so I am hardly a decent player, and, therefore, have not given any great thought to how I might try to attack the game electronically. But my analysis, above, is probably the explanation for the MC successes.

In fact, there is strong evidence that it is not. A few months or maybe even years ago there was discussion here about whether we progressed more with hardware or software. I argued that it was hardware but I think the consensus in general (which I now agree with) is that it is a close call. The best programs of today are literally hundreds of ELO stronger (not considering hardware) and the hardware has also given us literally hundreds of ELO. So to me it's not a stretch of the imagination to believe that in 10 or 20 years the software will be far superior yet again.
I don't think there's any doubt it is hardware, as I was on that side at the time and remain there. Mainly because I have run cluster tests each time someone said something like "but null-move was a gigantic improvement. If you count +80 Elo, correct. Else, not. Chess, it turns out, is just simple enough that it can be tackled with brute-force methods, effectively. As the branching factor goes up, however, clearly alternatives are needed. Exponential is still exponential.

This is one of those deals where it's almost impossible to believe that this could be the case, but I think in 1990 if you had told the very best chess program authors about Rybka (for example) and that it was a few hundred ELO stronger than anything out there (even without the hardware) it would have appeared to be pretty incredible.
I'm not sure why. Chess 4.0 was "way out there" with respect to all the other programs, in the 70's. In the 80's it was hardware all the way, from Belle, to Cray Blitz, to Deep Thought and then Deep Blue. Deep Thought /Deep blue was certainly 300-400 better than anything around in the late 80's and early 90's. And then along came Genius for the microcomputers, and we were "there again" on low-end hardware with one program head and shoulders above the rest.

Of course there is surely a point of diminishing returns, but something I have discovered about diminishing returns is that people always think they are in the thick of it, even when there is a lot left - because it's just human nature to believe that what you have and see right now is the most representative of reality.

bob wrote: Which humans can compete with these things on good hardware today? I'd say it is working pretty well. Particularly when you compare that to the go results, which are _far_ worse, monte-carlo or not.

don wrote: Are you kidding? This is not working very well at all. I don't know about you but I'm totally dissatisfied with the state of the art.

If you play even the very best computers against each other, you will notice that they are still losing lots of games with either color. This is proof that they play the game far from optimally and something is wrong. What we are doing is just not working.

bob wrote: There is "working well" and "working optimally". I do not believe there is an optimal solution until you find one that finds win/lose/draw from the initial position. So, for the next few thousand years, "optimal" isn't going to happen. "well" is a different matter. "well" can always be improved on, until well == optimal, but what we are doing today certainly meets the "working well" definition as applied to AI.

don wrote:
We waited decades for computer hardware to get to the point where it is right now, to the point where computers could actually beat the best human players most of the time. In fact, that is the ONLY reason computers have a chance against these weak human players (who also play far from optimally.) The algorithms have improved significantly but we still must have this ridiculously powerful hardware to beat the humans.

I estimate that humans play about 1500 ELO below optimal (a totally arbitrary guess) so beating them is not a very impressive accomplishment with the enormously powerful hardware we have today. To me it's lame to judge how well something is working by comparing to such a low bar.

If you were a high jumper would you set the bar to 6 inches and then make a big fuss over how awesome you are because you can clear it and then claim that your training program is working pretty well?

bob wrote: If no 'state-of-the-art jumper' cound clear that 6" bar, say we are talking about jumping on the surface of Juipter for example, then I would say that training program is working pretty well.

Which do you think would be stronger, a monte-carlo go program of today, or slip into a time-warp, jump a couple of hundred years into the future, and use today's alpha/beta search on tomorrow's hardware to search as deeply in go as we search in chess today? The MC go programs would be totally crushed.
I cannot answer that with any degree of certainty either way. But you should realize that MC programs have reasonable PV's that are very long on todays hardware and this would only improve. So both styles of programs would of course improve massively - and if I had to make a bet I would go with the MCTS programs since every bit of evidence we currently has says this is the way to go.

One thing is sure, the algorithms would evolve over time so by the time we got to that point we would know a lot more about how things work than we know today.
Just remember that those "reasonable PV's" are highly probabilistic and selective, based on very fast game results. I don't see much sense in a PV that is 15 moves deep where each ply's best move was chosen by playing thousands of very fast, very shallow games. One could always try this in chess to see what would happen. I suspect we can predict that, however.

bob · Post by **bob** » Tue Sep 07, 2010 1:43 am

Don wrote:I don't think you actually understand MCTS as you keep talking like all it does is "play 1000 games real fast."

I did not say "all it does is..." I said that decisions are based on playing real short games... results down each path combine those results. First version of this I recall seeing explained, perhaps in JICCA or something, was UCT, in fact. It is, quite simply, a form of best-first search, where the "best" is based on the "simulations". And the path / PV you return is a very selective one that is produced as a result of playing all of these matches and using those results to guide the search. Key being "using those results" where "those results" are very fast games... Just how hard do you think this idea is to understand???

Also, you claim it has a serious weakness in that it doesn't properly see long range threats. But what are you comparing that to? Alpha/beta pruning? What you are saying is way off - this beat classic alpha beta hands down.

Alpha/beta "pruning" is backward pruning. Has _zero_ effect on the final best score and best move. So I am not sure why that is even in the conversation.

I won't deny that long range threats are a problem - but that applies to all known techniques - this is so far the best solution to that.

For go. I am not convinced it is even usable for chess.

But MCTS stands for Monte Carlo Tree Search. It's not just an normal alpha beta search where at end nodes you "play 1000 games real fast."

Never said it was...

And it's recursive. As the search progresses, more information is combined to steer the tree in the proper direction. It's a very sophisticated things and just because the phrase "Monte Carlo" is in there doesn't mean it's a "random" search.

Now as far as progress in computer chess goes - after the big discussion of many months ago I re-thought my position.

We should do this experiment: What is the oldest world class chess program that will run in our testers? We probably get an ancient version of Fritz or Shredder or something and see if Rybka is any stronger. I'll bet you that the difference is not going to be explainable by 50 ELO for null move pruning.

This is an idea, but a flawed one. I have always tuned/balanced things in my chess engines based on the speed (and capabilities, such as vector stuff on Cray) available. If you can search 5K nodes per second, you probably will be quite critical of where you spend valuable computing cycles, because you need both depth and evaluation, but at slow speeds, depth is more important. As you go from 5K to 500K, suddenly different doors open, and you can afford to start to burn some of those cycles doing other things, whether it be better eval or selective search is just a choice.

Don

bob wrote:
Don wrote:As you say, there is working "well" and working "optimally" but the only point I'm making is that you can say anything you want to about how well something is working if you pick and choose your level of expectation.

But perhaps the most important point I want to make is that we "give up" as soon as we have something that satisfies us. In other words, if it beats human players we don't try for anything better and we imagine that we have found the best thing.
I don't agree there. I started doing this in 1968, when everyone's goal was to simply beat a master. I continued doing this thru the 80's where the goal was to beat a GM here and there. Thru the 90's when we began to beat super-GM players. Thru today where we can beat all players although not every game. Perhaps the next goal is to never lose a game against a human, then to never draw a game. So expectations keep rising. Probabilities can take you so far, because that is purely an estimate. To get to the "end" you either need an infinite number of samples so that the estimation error goes to zero, or you need a more accurate search that takes fewer probabilistic leaps of faith, and uses explicit information instead.

It's funny that in GO, classic alpha/beta pruning and search was not perceived to be working very well so they didn't stop looking! I personally thought classic A/B WAS was working pretty well based on the observation that it behaved like chess, a 2 ply search dominated a 1 ply search. Nobody was looking at it from that point of view, they were saying it was not working well based on some arbitrary standard.

It seemed to me that all they had to do was change their point of reference. Any of the good programs could crush human players just learning the game. So how can you say it wasn't working?

I guess what I'm saying is that I'm annoyed at how ill-defined these kind of statements are. A simple definitions of what works is anything that is scalable - so alpha/beta pruning works and works extremely well even in GO. This in no way is a statement about whether something else might be far superior nor does it ignore the fact that with modern hardware A/B programs play very badly IN COMPARISON to even relatively weak human players.

In this same context, modern chess search "works" but that doesn't mean it is anywhere near optimal.
The games are way different, however. Go has a very large branching factor, which makes the trees very difficult to manage, even for humans. Monte-Carlo seems reasonable when something is so large it is difficult to grasp. Chess is not that large. The best GM players search incredibly deeply. So do the best programs. MC does not. It substitutes quantity (thousands of fast games) for quality (an accurate / intelligent / deep search). I can grasp why this works. If you can create tens of thousands of positions almost instantly, a human is going to have trouble dealing with them. And, by necessity, does some pretty heavy-handed pruning. A MC search uses a computer's strength (lots of calculations done very quickly) to play thousands (or perhaps millions on fast hardware at some point) of games to assess each of these possible positions. Which is playing into a human's weakness.

I do not believe that thousands of very fast games is going to be competitive with strong humans. Those thousands of games will be based on limited depth, and the human is going to out-calculate them pretty badly. Larry's game against the supercon is an example. You could let the supercon analyze for months, playing very short games to evaluate each possible move, and it would still be clueless to the long-range things that the human sees, but which its sorely restricted search does not.

I would not be surprised to see in a few years that MC worked well at first, but begins to get abandoned because of the extreme short-sighted analysis it is depending on. Or it is perhaps possible that go is just "large enough" in terms of search space that the short-sighted probabilistic analysis is good enough to beat humans anyway.

I've played the game a very little bit, and this was 25+ years ago, so I am hardly a decent player, and, therefore, have not given any great thought to how I might try to attack the game electronically. But my analysis, above, is probably the explanation for the MC successes.

In fact, there is strong evidence that it is not. A few months or maybe even years ago there was discussion here about whether we progressed more with hardware or software. I argued that it was hardware but I think the consensus in general (which I now agree with) is that it is a close call. The best programs of today are literally hundreds of ELO stronger (not considering hardware) and the hardware has also given us literally hundreds of ELO. So to me it's not a stretch of the imagination to believe that in 10 or 20 years the software will be far superior yet again.
I don't think there's any doubt it is hardware, as I was on that side at the time and remain there. Mainly because I have run cluster tests each time someone said something like "but null-move was a gigantic improvement. If you count +80 Elo, correct. Else, not. Chess, it turns out, is just simple enough that it can be tackled with brute-force methods, effectively. As the branching factor goes up, however, clearly alternatives are needed. Exponential is still exponential.

This is one of those deals where it's almost impossible to believe that this could be the case, but I think in 1990 if you had told the very best chess program authors about Rybka (for example) and that it was a few hundred ELO stronger than anything out there (even without the hardware) it would have appeared to be pretty incredible.
I'm not sure why. Chess 4.0 was "way out there" with respect to all the other programs, in the 70's. In the 80's it was hardware all the way, from Belle, to Cray Blitz, to Deep Thought and then Deep Blue. Deep Thought /Deep blue was certainly 300-400 better than anything around in the late 80's and early 90's. And then along came Genius for the microcomputers, and we were "there again" on low-end hardware with one program head and shoulders above the rest.

Of course there is surely a point of diminishing returns, but something I have discovered about diminishing returns is that people always think they are in the thick of it, even when there is a lot left - because it's just human nature to believe that what you have and see right now is the most representative of reality.

bob wrote: Which humans can compete with these things on good hardware today? I'd say it is working pretty well. Particularly when you compare that to the go results, which are _far_ worse, monte-carlo or not.

don wrote: Are you kidding? This is not working very well at all. I don't know about you but I'm totally dissatisfied with the state of the art.

If you play even the very best computers against each other, you will notice that they are still losing lots of games with either color. This is proof that they play the game far from optimally and something is wrong. What we are doing is just not working.

bob wrote: There is "working well" and "working optimally". I do not believe there is an optimal solution until you find one that finds win/lose/draw from the initial position. So, for the next few thousand years, "optimal" isn't going to happen. "well" is a different matter. "well" can always be improved on, until well == optimal, but what we are doing today certainly meets the "working well" definition as applied to AI.

don wrote:
We waited decades for computer hardware to get to the point where it is right now, to the point where computers could actually beat the best human players most of the time. In fact, that is the ONLY reason computers have a chance against these weak human players (who also play far from optimally.) The algorithms have improved significantly but we still must have this ridiculously powerful hardware to beat the humans.

I estimate that humans play about 1500 ELO below optimal (a totally arbitrary guess) so beating them is not a very impressive accomplishment with the enormously powerful hardware we have today. To me it's lame to judge how well something is working by comparing to such a low bar.

If you were a high jumper would you set the bar to 6 inches and then make a big fuss over how awesome you are because you can clear it and then claim that your training program is working pretty well?

bob wrote: If no 'state-of-the-art jumper' cound clear that 6" bar, say we are talking about jumping on the surface of Juipter for example, then I would say that training program is working pretty well.

Which do you think would be stronger, a monte-carlo go program of today, or slip into a time-warp, jump a couple of hundred years into the future, and use today's alpha/beta search on tomorrow's hardware to search as deeply in go as we search in chess today? The MC go programs would be totally crushed.
I cannot answer that with any degree of certainty either way. But you should realize that MC programs have reasonable PV's that are very long on todays hardware and this would only improve. So both styles of programs would of course improve massively - and if I had to make a bet I would go with the MCTS programs since every bit of evidence we currently has says this is the way to go.

One thing is sure, the algorithms would evolve over time so by the time we got to that point we would know a lot more about how things work than we know today.
Just remember that those "reasonable PV's" are highly probabilistic and selective, based on very fast game results. I don't see much sense in a PV that is 15 moves deep where each ply's best move was chosen by playing thousands of very fast, very shallow games. One could always try this in chess to see what would happen. I suspect we can predict that, however.

michiguel · Post by **michiguel** » Tue Sep 07, 2010 1:59 am

bob wrote:
Don wrote:I don't think you actually understand MCTS as you keep talking like all it does is "play 1000 games real fast."
I did not say "all it does is..." I said that decisions are based on playing real short games... results down each path combine those results. First version of this I recall seeing explained, perhaps in JICCA or something, was UCT, in fact. It is, quite simply, a form of best-first search, where the "best" is based on the "simulations". And the path / PV you return is a very selective one that is produced as a result of playing all of these matches and using those results to guide the search. Key being "using those results" where "those results" are very fast games... Just how hard do you think this idea is to understand???

Also, you claim it has a serious weakness in that it doesn't properly see long range threats. But what are you comparing that to? Alpha/beta pruning? What you are saying is way off - this beat classic alpha beta hands down.
Alpha/beta "pruning" is backward pruning. Has _zero_ effect on the final best score and best move. So I am not sure why that is even in the conversation.

I won't deny that long range threats are a problem - but that applies to all known techniques - this is so far the best solution to that.
For go. I am not convinced it is even usable for chess.

But MCTS stands for Monte Carlo Tree Search. It's not just an normal alpha beta search where at end nodes you "play 1000 games real fast."
Never said it was...

And it's recursive. As the search progresses, more information is combined to steer the tree in the proper direction. It's a very sophisticated things and just because the phrase "Monte Carlo" is in there doesn't mean it's a "random" search.

Now as far as progress in computer chess goes - after the big discussion of many months ago I re-thought my position.

We should do this experiment: What is the oldest world class chess program that will run in our testers? We probably get an ancient version of Fritz or Shredder or something and see if Rybka is any stronger. I'll bet you that the difference is not going to be explainable by 50 ELO for null move pruning.
This is an idea, but a flawed one. I have always tuned/balanced things in my chess engines based on the speed (and capabilities, such as vector stuff on Cray) available. If you can search 5K nodes per second, you probably will be quite critical of where you spend valuable computing cycles, because you need both depth and evaluation, but at slow speeds, depth is more important. As you go from 5K to 500K, suddenly different doors open, and you can afford to start to burn some of those cycles doing other things, whether it be better eval or selective search is just a choice.

Stockfish (Even weaker than Rybka 4) vs Fritz 5 in a Pentium == Massacre.

Miguel

Don

bob wrote:
Don wrote:As you say, there is working "well" and working "optimally" but the only point I'm making is that you can say anything you want to about how well something is working if you pick and choose your level of expectation.

But perhaps the most important point I want to make is that we "give up" as soon as we have something that satisfies us. In other words, if it beats human players we don't try for anything better and we imagine that we have found the best thing.
I don't agree there. I started doing this in 1968, when everyone's goal was to simply beat a master. I continued doing this thru the 80's where the goal was to beat a GM here and there. Thru the 90's when we began to beat super-GM players. Thru today where we can beat all players although not every game. Perhaps the next goal is to never lose a game against a human, then to never draw a game. So expectations keep rising. Probabilities can take you so far, because that is purely an estimate. To get to the "end" you either need an infinite number of samples so that the estimation error goes to zero, or you need a more accurate search that takes fewer probabilistic leaps of faith, and uses explicit information instead.

It's funny that in GO, classic alpha/beta pruning and search was not perceived to be working very well so they didn't stop looking! I personally thought classic A/B WAS was working pretty well based on the observation that it behaved like chess, a 2 ply search dominated a 1 ply search. Nobody was looking at it from that point of view, they were saying it was not working well based on some arbitrary standard.

It seemed to me that all they had to do was change their point of reference. Any of the good programs could crush human players just learning the game. So how can you say it wasn't working?

I guess what I'm saying is that I'm annoyed at how ill-defined these kind of statements are. A simple definitions of what works is anything that is scalable - so alpha/beta pruning works and works extremely well even in GO. This in no way is a statement about whether something else might be far superior nor does it ignore the fact that with modern hardware A/B programs play very badly IN COMPARISON to even relatively weak human players.

In this same context, modern chess search "works" but that doesn't mean it is anywhere near optimal.
The games are way different, however. Go has a very large branching factor, which makes the trees very difficult to manage, even for humans. Monte-Carlo seems reasonable when something is so large it is difficult to grasp. Chess is not that large. The best GM players search incredibly deeply. So do the best programs. MC does not. It substitutes quantity (thousands of fast games) for quality (an accurate / intelligent / deep search). I can grasp why this works. If you can create tens of thousands of positions almost instantly, a human is going to have trouble dealing with them. And, by necessity, does some pretty heavy-handed pruning. A MC search uses a computer's strength (lots of calculations done very quickly) to play thousands (or perhaps millions on fast hardware at some point) of games to assess each of these possible positions. Which is playing into a human's weakness.

I do not believe that thousands of very fast games is going to be competitive with strong humans. Those thousands of games will be based on limited depth, and the human is going to out-calculate them pretty badly. Larry's game against the supercon is an example. You could let the supercon analyze for months, playing very short games to evaluate each possible move, and it would still be clueless to the long-range things that the human sees, but which its sorely restricted search does not.

I would not be surprised to see in a few years that MC worked well at first, but begins to get abandoned because of the extreme short-sighted analysis it is depending on. Or it is perhaps possible that go is just "large enough" in terms of search space that the short-sighted probabilistic analysis is good enough to beat humans anyway.

I've played the game a very little bit, and this was 25+ years ago, so I am hardly a decent player, and, therefore, have not given any great thought to how I might try to attack the game electronically. But my analysis, above, is probably the explanation for the MC successes.

In fact, there is strong evidence that it is not. A few months or maybe even years ago there was discussion here about whether we progressed more with hardware or software. I argued that it was hardware but I think the consensus in general (which I now agree with) is that it is a close call. The best programs of today are literally hundreds of ELO stronger (not considering hardware) and the hardware has also given us literally hundreds of ELO. So to me it's not a stretch of the imagination to believe that in 10 or 20 years the software will be far superior yet again.
I don't think there's any doubt it is hardware, as I was on that side at the time and remain there. Mainly because I have run cluster tests each time someone said something like "but null-move was a gigantic improvement. If you count +80 Elo, correct. Else, not. Chess, it turns out, is just simple enough that it can be tackled with brute-force methods, effectively. As the branching factor goes up, however, clearly alternatives are needed. Exponential is still exponential.

This is one of those deals where it's almost impossible to believe that this could be the case, but I think in 1990 if you had told the very best chess program authors about Rybka (for example) and that it was a few hundred ELO stronger than anything out there (even without the hardware) it would have appeared to be pretty incredible.
I'm not sure why. Chess 4.0 was "way out there" with respect to all the other programs, in the 70's. In the 80's it was hardware all the way, from Belle, to Cray Blitz, to Deep Thought and then Deep Blue. Deep Thought /Deep blue was certainly 300-400 better than anything around in the late 80's and early 90's. And then along came Genius for the microcomputers, and we were "there again" on low-end hardware with one program head and shoulders above the rest.

Of course there is surely a point of diminishing returns, but something I have discovered about diminishing returns is that people always think they are in the thick of it, even when there is a lot left - because it's just human nature to believe that what you have and see right now is the most representative of reality.

bob wrote: Which humans can compete with these things on good hardware today? I'd say it is working pretty well. Particularly when you compare that to the go results, which are _far_ worse, monte-carlo or not.

don wrote: Are you kidding? This is not working very well at all. I don't know about you but I'm totally dissatisfied with the state of the art.

If you play even the very best computers against each other, you will notice that they are still losing lots of games with either color. This is proof that they play the game far from optimally and something is wrong. What we are doing is just not working.

bob wrote: There is "working well" and "working optimally". I do not believe there is an optimal solution until you find one that finds win/lose/draw from the initial position. So, for the next few thousand years, "optimal" isn't going to happen. "well" is a different matter. "well" can always be improved on, until well == optimal, but what we are doing today certainly meets the "working well" definition as applied to AI.

don wrote:
We waited decades for computer hardware to get to the point where it is right now, to the point where computers could actually beat the best human players most of the time. In fact, that is the ONLY reason computers have a chance against these weak human players (who also play far from optimally.) The algorithms have improved significantly but we still must have this ridiculously powerful hardware to beat the humans.

I estimate that humans play about 1500 ELO below optimal (a totally arbitrary guess) so beating them is not a very impressive accomplishment with the enormously powerful hardware we have today. To me it's lame to judge how well something is working by comparing to such a low bar.

If you were a high jumper would you set the bar to 6 inches and then make a big fuss over how awesome you are because you can clear it and then claim that your training program is working pretty well?

bob wrote: If no 'state-of-the-art jumper' cound clear that 6" bar, say we are talking about jumping on the surface of Juipter for example, then I would say that training program is working pretty well.

Which do you think would be stronger, a monte-carlo go program of today, or slip into a time-warp, jump a couple of hundred years into the future, and use today's alpha/beta search on tomorrow's hardware to search as deeply in go as we search in chess today? The MC go programs would be totally crushed.
I cannot answer that with any degree of certainty either way. But you should realize that MC programs have reasonable PV's that are very long on todays hardware and this would only improve. So both styles of programs would of course improve massively - and if I had to make a bet I would go with the MCTS programs since every bit of evidence we currently has says this is the way to go.

One thing is sure, the algorithms would evolve over time so by the time we got to that point we would know a lot more about how things work than we know today.
Just remember that those "reasonable PV's" are highly probabilistic and selective, based on very fast game results. I don't see much sense in a PV that is 15 moves deep where each ply's best move was chosen by playing thousands of very fast, very shallow games. One could always try this in chess to see what would happen. I suspect we can predict that, however.

bob · Post by **bob** » Tue Sep 07, 2010 4:53 am

michiguel wrote:
bob wrote:
Don wrote:I don't think you actually understand MCTS as you keep talking like all it does is "play 1000 games real fast."
I did not say "all it does is..." I said that decisions are based on playing real short games... results down each path combine those results. First version of this I recall seeing explained, perhaps in JICCA or something, was UCT, in fact. It is, quite simply, a form of best-first search, where the "best" is based on the "simulations". And the path / PV you return is a very selective one that is produced as a result of playing all of these matches and using those results to guide the search. Key being "using those results" where "those results" are very fast games... Just how hard do you think this idea is to understand???

Also, you claim it has a serious weakness in that it doesn't properly see long range threats. But what are you comparing that to? Alpha/beta pruning? What you are saying is way off - this beat classic alpha beta hands down.
Alpha/beta "pruning" is backward pruning. Has _zero_ effect on the final best score and best move. So I am not sure why that is even in the conversation.

I won't deny that long range threats are a problem - but that applies to all known techniques - this is so far the best solution to that.
For go. I am not convinced it is even usable for chess.

But MCTS stands for Monte Carlo Tree Search. It's not just an normal alpha beta search where at end nodes you "play 1000 games real fast."
Never said it was...

And it's recursive. As the search progresses, more information is combined to steer the tree in the proper direction. It's a very sophisticated things and just because the phrase "Monte Carlo" is in there doesn't mean it's a "random" search.

Now as far as progress in computer chess goes - after the big discussion of many months ago I re-thought my position.

We should do this experiment: What is the oldest world class chess program that will run in our testers? We probably get an ancient version of Fritz or Shredder or something and see if Rybka is any stronger. I'll bet you that the difference is not going to be explainable by 50 ELO for null move pruning.
This is an idea, but a flawed one. I have always tuned/balanced things in my chess engines based on the speed (and capabilities, such as vector stuff on Cray) available. If you can search 5K nodes per second, you probably will be quite critical of where you spend valuable computing cycles, because you need both depth and evaluation, but at slow speeds, depth is more important. As you go from 5K to 500K, suddenly different doors open, and you can afford to start to burn some of those cycles doing other things, whether it be better eval or selective search is just a choice.

Stockfish (Even weaker than Rybka 4) vs Fritz 5 in a Pentium == Massacre.

Miguel

Don

bob wrote:
Don wrote:As you say, there is working "well" and working "optimally" but the only point I'm making is that you can say anything you want to about how well something is working if you pick and choose your level of expectation.

But perhaps the most important point I want to make is that we "give up" as soon as we have something that satisfies us. In other words, if it beats human players we don't try for anything better and we imagine that we have found the best thing.
I don't agree there. I started doing this in 1968, when everyone's goal was to simply beat a master. I continued doing this thru the 80's where the goal was to beat a GM here and there. Thru the 90's when we began to beat super-GM players. Thru today where we can beat all players although not every game. Perhaps the next goal is to never lose a game against a human, then to never draw a game. So expectations keep rising. Probabilities can take you so far, because that is purely an estimate. To get to the "end" you either need an infinite number of samples so that the estimation error goes to zero, or you need a more accurate search that takes fewer probabilistic leaps of faith, and uses explicit information instead.

It's funny that in GO, classic alpha/beta pruning and search was not perceived to be working very well so they didn't stop looking! I personally thought classic A/B WAS was working pretty well based on the observation that it behaved like chess, a 2 ply search dominated a 1 ply search. Nobody was looking at it from that point of view, they were saying it was not working well based on some arbitrary standard.

It seemed to me that all they had to do was change their point of reference. Any of the good programs could crush human players just learning the game. So how can you say it wasn't working?

I guess what I'm saying is that I'm annoyed at how ill-defined these kind of statements are. A simple definitions of what works is anything that is scalable - so alpha/beta pruning works and works extremely well even in GO. This in no way is a statement about whether something else might be far superior nor does it ignore the fact that with modern hardware A/B programs play very badly IN COMPARISON to even relatively weak human players.

In this same context, modern chess search "works" but that doesn't mean it is anywhere near optimal.
The games are way different, however. Go has a very large branching factor, which makes the trees very difficult to manage, even for humans. Monte-Carlo seems reasonable when something is so large it is difficult to grasp. Chess is not that large. The best GM players search incredibly deeply. So do the best programs. MC does not. It substitutes quantity (thousands of fast games) for quality (an accurate / intelligent / deep search). I can grasp why this works. If you can create tens of thousands of positions almost instantly, a human is going to have trouble dealing with them. And, by necessity, does some pretty heavy-handed pruning. A MC search uses a computer's strength (lots of calculations done very quickly) to play thousands (or perhaps millions on fast hardware at some point) of games to assess each of these possible positions. Which is playing into a human's weakness.

I do not believe that thousands of very fast games is going to be competitive with strong humans. Those thousands of games will be based on limited depth, and the human is going to out-calculate them pretty badly. Larry's game against the supercon is an example. You could let the supercon analyze for months, playing very short games to evaluate each possible move, and it would still be clueless to the long-range things that the human sees, but which its sorely restricted search does not.

I would not be surprised to see in a few years that MC worked well at first, but begins to get abandoned because of the extreme short-sighted analysis it is depending on. Or it is perhaps possible that go is just "large enough" in terms of search space that the short-sighted probabilistic analysis is good enough to beat humans anyway.

I've played the game a very little bit, and this was 25+ years ago, so I am hardly a decent player, and, therefore, have not given any great thought to how I might try to attack the game electronically. But my analysis, above, is probably the explanation for the MC successes.

In fact, there is strong evidence that it is not. A few months or maybe even years ago there was discussion here about whether we progressed more with hardware or software. I argued that it was hardware but I think the consensus in general (which I now agree with) is that it is a close call. The best programs of today are literally hundreds of ELO stronger (not considering hardware) and the hardware has also given us literally hundreds of ELO. So to me it's not a stretch of the imagination to believe that in 10 or 20 years the software will be far superior yet again.
I don't think there's any doubt it is hardware, as I was on that side at the time and remain there. Mainly because I have run cluster tests each time someone said something like "but null-move was a gigantic improvement. If you count +80 Elo, correct. Else, not. Chess, it turns out, is just simple enough that it can be tackled with brute-force methods, effectively. As the branching factor goes up, however, clearly alternatives are needed. Exponential is still exponential.

This is one of those deals where it's almost impossible to believe that this could be the case, but I think in 1990 if you had told the very best chess program authors about Rybka (for example) and that it was a few hundred ELO stronger than anything out there (even without the hardware) it would have appeared to be pretty incredible.
I'm not sure why. Chess 4.0 was "way out there" with respect to all the other programs, in the 70's. In the 80's it was hardware all the way, from Belle, to Cray Blitz, to Deep Thought and then Deep Blue. Deep Thought /Deep blue was certainly 300-400 better than anything around in the late 80's and early 90's. And then along came Genius for the microcomputers, and we were "there again" on low-end hardware with one program head and shoulders above the rest.

Of course there is surely a point of diminishing returns, but something I have discovered about diminishing returns is that people always think they are in the thick of it, even when there is a lot left - because it's just human nature to believe that what you have and see right now is the most representative of reality.

bob wrote: Which humans can compete with these things on good hardware today? I'd say it is working pretty well. Particularly when you compare that to the go results, which are _far_ worse, monte-carlo or not.

don wrote: Are you kidding? This is not working very well at all. I don't know about you but I'm totally dissatisfied with the state of the art.

If you play even the very best computers against each other, you will notice that they are still losing lots of games with either color. This is proof that they play the game far from optimally and something is wrong. What we are doing is just not working.

bob wrote: There is "working well" and "working optimally". I do not believe there is an optimal solution until you find one that finds win/lose/draw from the initial position. So, for the next few thousand years, "optimal" isn't going to happen. "well" is a different matter. "well" can always be improved on, until well == optimal, but what we are doing today certainly meets the "working well" definition as applied to AI.

don wrote:
We waited decades for computer hardware to get to the point where it is right now, to the point where computers could actually beat the best human players most of the time. In fact, that is the ONLY reason computers have a chance against these weak human players (who also play far from optimally.) The algorithms have improved significantly but we still must have this ridiculously powerful hardware to beat the humans.

I estimate that humans play about 1500 ELO below optimal (a totally arbitrary guess) so beating them is not a very impressive accomplishment with the enormously powerful hardware we have today. To me it's lame to judge how well something is working by comparing to such a low bar.

If you were a high jumper would you set the bar to 6 inches and then make a big fuss over how awesome you are because you can clear it and then claim that your training program is working pretty well?

bob wrote: If no 'state-of-the-art jumper' cound clear that 6" bar, say we are talking about jumping on the surface of Juipter for example, then I would say that training program is working pretty well.

Which do you think would be stronger, a monte-carlo go program of today, or slip into a time-warp, jump a couple of hundred years into the future, and use today's alpha/beta search on tomorrow's hardware to search as deeply in go as we search in chess today? The MC go programs would be totally crushed.
I cannot answer that with any degree of certainty either way. But you should realize that MC programs have reasonable PV's that are very long on todays hardware and this would only improve. So both styles of programs would of course improve massively - and if I had to make a bet I would go with the MCTS programs since every bit of evidence we currently has says this is the way to go.

One thing is sure, the algorithms would evolve over time so by the time we got to that point we would know a lot more about how things work than we know today.
Just remember that those "reasonable PV's" are highly probabilistic and selective, based on very fast game results. I don't see much sense in a PV that is 15 moves deep where each ply's best move was chosen by playing thousands of very fast, very shallow games. One could always try this in chess to see what would happen. I suspect we can predict that, however.

So? Nobody says there has been _no_ software improvements. But what happens with the same two programs on current hardware? Would not be surprised for them to be about the same distance apart, yet worlds stronger...

In that case, which provides more, hardware or software? I suspect this is a lot clearer than most want to admit.

Don · Post by **Don** » Tue Sep 07, 2010 6:23 am

bob wrote: So? Nobody says there has been _no_ software improvements. But what happens with the same two programs on current hardware? Would not be surprised for them to be about the same distance apart, yet worlds stronger...

In that case, which provides more, hardware or software? I suspect this is a lot clearer than most want to admit.

The issue is how much different are they in software, then we compare the hardware too. We don't need to debate it, we just need to run the experiment if we can.

bob · Post by **bob** » Tue Sep 07, 2010 9:23 pm

Don wrote:
bob wrote: So? Nobody says there has been _no_ software improvements. But what happens with the same two programs on current hardware? Would not be surprised for them to be about the same distance apart, yet worlds stronger...

In that case, which provides more, hardware or software? I suspect this is a lot clearer than most want to admit.
The issue is how much different are they in software, then we compare the hardware too. We don't need to debate it, we just need to run the experiment if we can.

About the only data I could provide would be to run a match between Crafty and Cray Blitz on current hardware, then figure out how to slow the hardware down by something like 1000X (in 1995 Crafty did 5K, today 20M so maybe more like 4000X for the hardware speed change in the PC from the original Pentium-whatever in late 1994 to early 1995, compared to today's speeds (the best I have hit is 100M on an 8x4 prototype server-type box about 2 years ago, we maybe my 4,000x is still too low).

In 1996 Crafty finished in 3rd and 4th places (Crafty and the gunda-1 clone) at the WMCCC event, so it was pretty representative of computer chess engines of the time. Only problem is that the Cray Blitz version I have is not going to work with multple-cpus, it was built around some specific stuff Cray used (Task common for one). So maybe we can just figure out single-cpu comparisons. I know we did 5K on an early pentium. I am not sure if was a -66, -90, or -133 however. I know that I can hit 5M on a single CPU I7 today. So that is back to the 1,000x hardware.

So for fun, let's take a fast I7 today, single-cpu, against a single-cpu from the past. How far back to go? I can't go back beyond 1994-1995 for Crafty since it played on ICS (now ICC) in December of 1994 and didn't exist prior to that. Therefore no guess about what 486 and earlier CPUs would have done.

Given a hardware factor, I should at least be able to compare CB vs Crafty on old, and CB vs Crafty on new, to see if hardware speed helps one more than the other. My suspicion would be that CB would gain more, since it's branching factor was much higer, as was its accuracy. But I'm not certain. I'd be concerned about crafty on old and CB on new, or vice-versa, as that is a _huge_ time handicap, perhaps enough to produce 30000-0 results, or something real close to it, which wouldn't provice any information...

I am not even sure you can take an old commercial program like fritz and compare its old rating to the same program on new hardware, since the comparison would involve some inflation in ratings produced over the years...

I think measuring Elo of an old engine on old hardware is going to be more than just problematic.

Don · Post by **Don** » Tue Sep 07, 2010 10:10 pm

bob wrote:
Don wrote:
bob wrote: So? Nobody says there has been _no_ software improvements. But what happens with the same two programs on current hardware? Would not be surprised for them to be about the same distance apart, yet worlds stronger...

In that case, which provides more, hardware or software? I suspect this is a lot clearer than most want to admit.
The issue is how much different are they in software, then we compare the hardware too. We don't need to debate it, we just need to run the experiment if we can.
About the only data I could provide would be to run a match between Crafty and Cray Blitz on current hardware, then figure out how to slow the hardware down by something like 1000X (in 1995 Crafty did 5K, today 20M so maybe more like 4000X for the hardware speed change in the PC from the original Pentium-whatever in late 1994 to early 1995, compared to today's speeds (the best I have hit is 100M on an 8x4 prototype server-type box about 2 years ago, we maybe my 4,000x is still too low).

In 1996 Crafty finished in 3rd and 4th places (Crafty and the gunda-1 clone) at the WMCCC event, so it was pretty representative of computer chess engines of the time. Only problem is that the Cray Blitz version I have is not going to work with multple-cpus, it was built around some specific stuff Cray used (Task common for one). So maybe we can just figure out single-cpu comparisons. I know we did 5K on an early pentium. I am not sure if was a -66, -90, or -133 however. I know that I can hit 5M on a single CPU I7 today. So that is back to the 1,000x hardware.

So for fun, let's take a fast I7 today, single-cpu, against a single-cpu from the past. How far back to go? I can't go back beyond 1994-1995 for Crafty since it played on ICS (now ICC) in December of 1994 and didn't exist prior to that. Therefore no guess about what 486 and earlier CPUs would have done.

Given a hardware factor, I should at least be able to compare CB vs Crafty on old, and CB vs Crafty on new, to see if hardware speed helps one more than the other. My suspicion would be that CB would gain more, since it's branching factor was much higer, as was its accuracy. But I'm not certain. I'd be concerned about crafty on old and CB on new, or vice-versa, as that is a _huge_ time handicap, perhaps enough to produce 30000-0 results, or something real close to it, which wouldn't provice any information...

I am not even sure you can take an old commercial program like fritz and compare its old rating to the same program on new hardware, since the comparison would involve some inflation in ratings produced over the years...

I think measuring Elo of an old engine on old hardware is going to be more than just problematic.

This would work if the Crafty versions of the eras you are testing are representative. In other words I think the ELO between Crafty and the BEST program of the day needs to be the same. Otherwise you are not comparing state of the art. For example if Crafty is 300 ELO weaker than Rybka (I'm just guess, have no idea) then you need a OLD version of Crafty that is about 300 ELO weaker than the best program of its day.

We might have to dig out some old ratings lists. Even if there is some difference we can make an adjustment.

But I would like to see this test if you are willing to run it.

Meanwhile, I found a version of Chess Genius that I believe represents something close to the state of the art from the early 90s. I ran 4 games with Robbolitto and it lost all 4 of course. I had to run this manually.

Then I tried 24 to 1 time handicap - Robbo is playing almost instantly and still crushing Genius, which seemed impossible to beat back in the 90s.

So however this comes out, I think you have to agree that software progress has been enormous. I think it's on the same order as hardware improvement but it would not hurt my feelings to be proved wrong. And since I have been on both sides of this debate I would like to see a serious test to settle this issue.

Don · Post by **Don** » Tue Sep 07, 2010 10:24 pm

Does anyone have a 200 MHZ Pentium machine from 1994? Would like to do some timing tests on chess computers. One that runs DOS would be perfect!

Don

bob wrote:
Don wrote:
bob wrote: So? Nobody says there has been _no_ software improvements. But what happens with the same two programs on current hardware? Would not be surprised for them to be about the same distance apart, yet worlds stronger...

In that case, which provides more, hardware or software? I suspect this is a lot clearer than most want to admit.
The issue is how much different are they in software, then we compare the hardware too. We don't need to debate it, we just need to run the experiment if we can.
About the only data I could provide would be to run a match between Crafty and Cray Blitz on current hardware, then figure out how to slow the hardware down by something like 1000X (in 1995 Crafty did 5K, today 20M so maybe more like 4000X for the hardware speed change in the PC from the original Pentium-whatever in late 1994 to early 1995, compared to today's speeds (the best I have hit is 100M on an 8x4 prototype server-type box about 2 years ago, we maybe my 4,000x is still too low).

In 1996 Crafty finished in 3rd and 4th places (Crafty and the gunda-1 clone) at the WMCCC event, so it was pretty representative of computer chess engines of the time. Only problem is that the Cray Blitz version I have is not going to work with multple-cpus, it was built around some specific stuff Cray used (Task common for one). So maybe we can just figure out single-cpu comparisons. I know we did 5K on an early pentium. I am not sure if was a -66, -90, or -133 however. I know that I can hit 5M on a single CPU I7 today. So that is back to the 1,000x hardware.

So for fun, let's take a fast I7 today, single-cpu, against a single-cpu from the past. How far back to go? I can't go back beyond 1994-1995 for Crafty since it played on ICS (now ICC) in December of 1994 and didn't exist prior to that. Therefore no guess about what 486 and earlier CPUs would have done.

Given a hardware factor, I should at least be able to compare CB vs Crafty on old, and CB vs Crafty on new, to see if hardware speed helps one more than the other. My suspicion would be that CB would gain more, since it's branching factor was much higer, as was its accuracy. But I'm not certain. I'd be concerned about crafty on old and CB on new, or vice-versa, as that is a _huge_ time handicap, perhaps enough to produce 30000-0 results, or something real close to it, which wouldn't provice any information...

I am not even sure you can take an old commercial program like fritz and compare its old rating to the same program on new hardware, since the comparison would involve some inflation in ratings produced over the years...

I think measuring Elo of an old engine on old hardware is going to be more than just problematic.

bob · Post by **bob** » Tue Sep 07, 2010 10:28 pm

Don wrote:
bob wrote:
Don wrote:
bob wrote: So? Nobody says there has been _no_ software improvements. But what happens with the same two programs on current hardware? Would not be surprised for them to be about the same distance apart, yet worlds stronger...

In that case, which provides more, hardware or software? I suspect this is a lot clearer than most want to admit.
The issue is how much different are they in software, then we compare the hardware too. We don't need to debate it, we just need to run the experiment if we can.
About the only data I could provide would be to run a match between Crafty and Cray Blitz on current hardware, then figure out how to slow the hardware down by something like 1000X (in 1995 Crafty did 5K, today 20M so maybe more like 4000X for the hardware speed change in the PC from the original Pentium-whatever in late 1994 to early 1995, compared to today's speeds (the best I have hit is 100M on an 8x4 prototype server-type box about 2 years ago, we maybe my 4,000x is still too low).

In 1996 Crafty finished in 3rd and 4th places (Crafty and the gunda-1 clone) at the WMCCC event, so it was pretty representative of computer chess engines of the time. Only problem is that the Cray Blitz version I have is not going to work with multple-cpus, it was built around some specific stuff Cray used (Task common for one). So maybe we can just figure out single-cpu comparisons. I know we did 5K on an early pentium. I am not sure if was a -66, -90, or -133 however. I know that I can hit 5M on a single CPU I7 today. So that is back to the 1,000x hardware.

So for fun, let's take a fast I7 today, single-cpu, against a single-cpu from the past. How far back to go? I can't go back beyond 1994-1995 for Crafty since it played on ICS (now ICC) in December of 1994 and didn't exist prior to that. Therefore no guess about what 486 and earlier CPUs would have done.

Given a hardware factor, I should at least be able to compare CB vs Crafty on old, and CB vs Crafty on new, to see if hardware speed helps one more than the other. My suspicion would be that CB would gain more, since it's branching factor was much higer, as was its accuracy. But I'm not certain. I'd be concerned about crafty on old and CB on new, or vice-versa, as that is a _huge_ time handicap, perhaps enough to produce 30000-0 results, or something real close to it, which wouldn't provice any information...

I am not even sure you can take an old commercial program like fritz and compare its old rating to the same program on new hardware, since the comparison would involve some inflation in ratings produced over the years...

I think measuring Elo of an old engine on old hardware is going to be more than just problematic.
This would work if the Crafty versions of the eras you are testing are representative. In other words I think the ELO between Crafty and the BEST program of the day needs to be the same. Otherwise you are not comparing state of the art. For example if Crafty is 300 ELO weaker than Rybka (I'm just guess, have no idea) then you need a OLD version of Crafty that is about 300 ELO weaker than the best program of its day.

I actually believe I have a Crafty version from that time period. I have the 1996 version that competed in the WMCCC event and was competitive with the best of the time (again, it finished in 3rd and 4th places and could play with most any program).

I could probably test that version on my cluster in a mix with my current version, to see how far Crafty has come in 14 years or so, although it probably won't be a "drop in and play" deal since the xboard protocol changed a lot between then and now and my referee depends on "now".

Once I can compare 1996C to 2010C, there are several places to extrapolate further to see how far ahead Rybka is. That will give us a software improvement from 1996 to present pretty accurately. The hardware part is harder. I can probably figure out a way to slow a cluster node down by a factor of 1,000 to see what the hardware gain produced by today's hardware is, but extrapolating for Rybka would be much harder since I can't run just any old thing on the cluster and need to compile it...

We might have to dig out some old ratings lists. Even if there is some difference we can make an adjustment.

But I would like to see this test if you are willing to run it.

Meanwhile, I found a version of Chess Genius that I believe represents something close to the state of the art from the early 90s. I ran 4 games with Robbolitto and it lost all 4 of course. I had to run this manually.

Then I tried 24 to 1 time handicap - Robbo is playing almost instantly and still crushing Genius, which seemed impossible to beat back in the 90s.

So however this comes out, I think you have to agree that software progress has been enormous. I think it's on the same order as hardware improvement but it would not hurt my feelings to be proved wrong. And since I have been on both sides of this debate I would like to see a serious test to settle this issue.

Let me look at the old version of Crafty, and see if it will play on my cluster. The next issue is going to be how weak it is compared to my test programs. This might be an issue, as if everybody is +400 over it, then I can't get an accurate rating and will have to go back to square zero and find a few weak opponents to factor in...

As far as "enormous software improvement" I suppose one has to first define "enormous". I think that 1000x hardware speedup could be considered enormous. It will be interesting to see just how that speed affects the old program. Something I am going to try to measure today or tonight, if I can get it to run correctly...

I suspect that in 1996 my branching factor was in the 5.0 range, maybe a bit lower. so an additional 6 plies of search will be an improvement. Be interesting to just look at the old source to see what I was doing back then with null-move and such. I believe I was either using R=2 or R=2~3. No reductions at the time, but it did have futility pruning.

move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning

Re: move count based pruning