How does Monte Carlo analysis work?

Uri · Post by **Uri** » Sun Jul 20, 2008 12:28 am

How does Monte Carlo analysis work? Does it use the evaluation function or does it simply play random moves in a given position without evaluating them?

Harald Johnsen · Post by **Harald Johnsen** » Sun Jul 20, 2008 9:07 am

Monte Carlo can surely be used in a lot of ways.

If you play random games then the Monte Carlo search (ie the outcome of games) is your evaluation function. There are some good papers on how to backup the result of the mc search to combine it with alpha beta.

I suppose the purpose of mc is to play random games, if you use an evaluation function there won't be any randomness anymore. Anyway an eval function is useless if you play the full game because you will know the true score (win/draw/loss) of your mc game.

HJ.

Ovyron · Post by **Ovyron** » Sun Jul 20, 2008 9:17 am

Harald Johnsen wrote:I suppose the purpose of mc is to play random games, if you use an evaluation function there won't be any randomness anymore.

The way Rybka does it is changing the evaluation of the moves to get sure of never repeating the same game, that way you get a montecarlo in where the games played have the quality of chess masters' games.

bob · Post by **bob** » Sun Jul 20, 2008 4:50 pm

Uri wrote:How does Monte Carlo analysis work? Does it use the evaluation function or does it simply play random moves in a given position without evaluating them?

The basic idea for Monte Carlo simulations is the following:

Suppose you have an odd area painted on a wall and you want to know how many square inches the area covers. But it is not possible to measure it and compute because of the odd shape. So you bound the area by drawing a square or rectangle around it. Now you can measure this rectangle and compute its area, and you know the area you want to measure is less than this but how much?

Now you start the simulation and randomly shoot arrows into that rectangle. Each X,Y coordinate where an arrow will "hit" is comprised of two random numbers. Suppose you decide to shoot 1,000,000 arrows for starters. Then you count the number that hit _inside_ the shape you want to measure. And you get exactly 500,000. So the area of the shape is roughly 1/2 the known area of the rectangle. The more accuracy you want, the more arrows you shoot.

Vas probably does something similar to assess the value of a position, although it is _not_ going to be very accurate. All you can do to "shoot the arrows" is to play a very fast game. And if you play 1,000,000 games from a particular position, and you win 700,000 of them, then you could say that position has a .70 probability of a win.

Personally I do not think this is worth much. Why? Because to play enough games, you have to play very _fast_ games. Which means low-quality but numerous games. Computers miss plenty of tactics at such speeds so that winning probability is going to be based on weak data. I suppose the idea is to play enough games to average out such mistakes, but whether it will work or not I am not sure. I've run tons of matches on our cluster and discovered early on that matches with very fast time controls often end up significantly different when played at longer time controls, so I am not sure what this kind of analysis is going to prove...

Mark · Post by **Mark** » Mon Jul 21, 2008 3:40 am

bob wrote:
Uri wrote:How does Monte Carlo analysis work? Does it use the evaluation function or does it simply play random moves in a given position without evaluating them?
The basic idea for Monte Carlo simulations is the following:

Suppose you have an odd area painted on a wall and you want to know how many square inches the area covers. But it is not possible to measure it and compute because of the odd shape. So you bound the area by drawing a square or rectangle around it. Now you can measure this rectangle and compute its area, and you know the area you want to measure is less than this but how much?

Now you start the simulation and randomly shoot arrows into that rectangle. Each X,Y coordinate where an arrow will "hit" is comprised of two random numbers. Suppose you decide to shoot 1,000,000 arrows for starters. Then you count the number that hit _inside_ the shape you want to measure. And you get exactly 500,000. So the area of the shape is roughly 1/2 the known area of the rectangle. The more accuracy you want, the more arrows you shoot.

Vas probably does something similar to assess the value of a position, although it is _not_ going to be very accurate. All you can do to "shoot the arrows" is to play a very fast game. And if you play 1,000,000 games from a particular position, and you win 700,000 of them, then you could say that position has a .70 probability of a win.

Personally I do not think this is worth much. Why? Because to play enough games, you have to play very _fast_ games. Which means low-quality but numerous games. Computers miss plenty of tactics at such speeds so that winning probability is going to be based on weak data. I suppose the idea is to play enough games to average out such mistakes, but whether it will work or not I am not sure. I've run tons of matches on our cluster and discovered early on that matches with very fast time controls often end up significantly different when played at longer time controls, so I am not sure what this kind of analysis is going to prove...

Speaking of fast games, I understand that the Rybka team tests each evaluation change with something like 80,000 games at game in 1 second to measure increases as small as 1 elo. I was surprised that differences of only 1 elo could even be measured, but it looks like it works out well based on the success of Rybka.

bob · Post by **bob** » Mon Jul 21, 2008 7:27 am

Mark wrote:
bob wrote:
Uri wrote:How does Monte Carlo analysis work? Does it use the evaluation function or does it simply play random moves in a given position without evaluating them?
The basic idea for Monte Carlo simulations is the following:

Suppose you have an odd area painted on a wall and you want to know how many square inches the area covers. But it is not possible to measure it and compute because of the odd shape. So you bound the area by drawing a square or rectangle around it. Now you can measure this rectangle and compute its area, and you know the area you want to measure is less than this but how much?

Now you start the simulation and randomly shoot arrows into that rectangle. Each X,Y coordinate where an arrow will "hit" is comprised of two random numbers. Suppose you decide to shoot 1,000,000 arrows for starters. Then you count the number that hit _inside_ the shape you want to measure. And you get exactly 500,000. So the area of the shape is roughly 1/2 the known area of the rectangle. The more accuracy you want, the more arrows you shoot.

Vas probably does something similar to assess the value of a position, although it is _not_ going to be very accurate. All you can do to "shoot the arrows" is to play a very fast game. And if you play 1,000,000 games from a particular position, and you win 700,000 of them, then you could say that position has a .70 probability of a win.

Personally I do not think this is worth much. Why? Because to play enough games, you have to play very _fast_ games. Which means low-quality but numerous games. Computers miss plenty of tactics at such speeds so that winning probability is going to be based on weak data. I suppose the idea is to play enough games to average out such mistakes, but whether it will work or not I am not sure. I've run tons of matches on our cluster and discovered early on that matches with very fast time controls often end up significantly different when played at longer time controls, so I am not sure what this kind of analysis is going to prove...
Speaking of fast games, I understand that the Rybka team tests each evaluation change with something like 80,000 games at game in 1 second to measure increases as small as 1 elo. I was surprised that differences of only 1 elo could even be measured, but it looks like it works out well based on the success of Rybka.

They might think they are measuring changes as small as one Elo with 80K games, but they aren't... But 80K games can give you a very accurate idea of whether A' is better than A, even if A' is only different from A in a minor way (that is, the difference between A and A' is very small).

Some of what they do is probably less about proving as opposed to disproving. For example, both are strong chess players and probably have good ideas about what should or should not be in an eval. So when they make a change, they are already pretty sure it is a good one, and then 80K games can be viewed more of "did this make it play weaker?" rather than "did this make it play stronger?" Then, so long as it wasn't weaker, the change can be kept as testing verified what human judgement said was reasonable. That's a bit different from trying to test a range of guesses about something, because then you are trying to prove which is best, not is one worse...

hgm · Post by **hgm** » Mon Jul 21, 2008 10:00 am

Well, truly strategic features of the evaluation should, by definition, be quite independent of search depth. What you have to be careful with is changes in your search.

I use Monte-Carlo sampling of win probabilities quite often, to determine piece values, and I have observed that the results are usually quite independent of search depth. They can be dependent on evaluation, though. If I play a Bishop pair against two Knights and a Pawn (all other pieces being present as well) from shuffled opening setups, the score (between exactly equal engines) is the same within statistical erros from 40/10 to 40/0:10. But other engines might get a different score. This difference largely disappears when you normalize the excess score on the pawn-odds advantage the various engine are scoring.

This is almost independent of how strong the engine is, in particular what it thinks the piece values should be. Even if you make a Rook worth less than a Bishop, a Rook will still systematically beat the Bishop, as the side with the Bishop will also value the Bishop higher, and avoid trading it for the Rook until it is too late.

This all within reason: if you would play engines that are so damaged that they, say, cannot generate Knight moves, obviously a Knight would be of zero value to them. So the engines must play something that at least remotely resembles proper Chess, for the piece values to translate to scores.

Uri Blass · Post by **Uri Blass** » Mon Jul 21, 2008 10:47 am

hgm wrote:Well, truly strategic features of the evaluation should, by definition, be quite independent of search depth. What you have to be careful with is changes in your search.

I use Monte-Carlo sampling of win probabilities quite often, to determine piece values, and I have observed that the results are usually quite independent of search depth. They can be dependent on evaluation, though. If I play a Bishop pair against two Knights and a Pawn (all other pieces being present as well) from shuffled opening setups, the score (between exactly equal engines) is the same within statistical erros from 40/10 to 40/0:10. But other engines might get a different score. This difference largely disappears when you normalize the excess score on the pawn-odds advantage the various engine are scoring.

This is almost independent of how strong the engine is, in particular what it thinks the piece values should be. Even if you make a Rook worth less than a Bishop, a Rook will still systematically beat the Bishop, as the side with the Bishop will also value the Bishop higher, and avoid trading it for the Rook until it is too late.

This all within reason: if you would play engines that are so damaged that they, say, cannot generate Knight moves, obviously a Knight would be of zero value to them. So the engines must play something that at least remotely resembles proper Chess, for the piece values to translate to scores.

I believe that very fast time control with relatively weak engines can lead to more draws.

The engine may be unable to win KQ vs K and if the engine is very weak it may make stalemates even in better positions because it gets only depth 1 and has nothing in the evaluation to prevent stalemate.

Another problem may be repetitions that lead to draw.

All of this is not relevant to the rybka team because rybka at 1 second per game has no problem to get to bigger depths.

Uri

hgm · Post by **hgm** » Mon Jul 21, 2008 11:14 am

Well, I am not talking here about only doing 1-ply search or buggy engines. Even at 40/0:10, a typical engine will search over 100,000 nodes, good for 4-ply even with a branching factor of 10.

And even if they cannot win KQK, and are much more draw prone: the Pawn odds score suffers almost equally for this than any other material advantage. So if a strong engine would reach 68% with Pawn odds, and 59% with NNP vs BB imbalances, a weaker engine might only convert those advantages to 62% and 56%, respectively. But (56-50)/(62-50) would still be equal to (59-50)/(68-50), namely 0.5. This is what I see in practice.

ernest · Post by **ernest** » Mon Jul 21, 2008 4:48 pm

bob wrote:[They might think they are measuring changes as small as one Elo with 80K games, but they aren't...

Theoretically, 80K games, with 1/3 draws, gives a standard deviation (SD) of 0.144%, that is (x 7) 1 Elo.
And 2 SD (95% probability) is 2 Elo

How does Monte Carlo analysis work?

How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?