One billion random games

sje · Post by **sje** » Sat Aug 27, 2011 4:34 pm

One billion random games generated with a single thread on a 3.2 GHz Intel Core i3:

&#91;&#93; rgstat 1000000000
Summary&#58;
  Checkmate 153,023,351 &#40;0.153023&#41;
  FiftyMoves 199,868,185 &#40;0.199868&#41;
  Insufficient 560,510,846 &#40;0.560511&#41;
  Repetition 25,427,172 &#40;0.0254272&#41;
  Stalemate 61,170,446 &#40;0.0611704&#41;
Count&#58; 1,000,000,000   Elapsed&#58; 159960  &#40;6251.57 Hz / 0.00015996 s&#41;

mar · Post by **mar** » Sat Aug 27, 2011 7:39 pm

I got similar numbers after 1M:

took 530719 ms
4 min plies
886 max plies
340838221 total plies
340.838221 avg plies
1000000 total games
153797 checkmates (0.153797)
76895 whitewins (0.076895)
76902 blackwins (0.076902)
215345 fiftyrule (0.215345)
544013 insuffmaterial (0.544013)
25976 repetition (0.025976)
60869 stalemate (0.060869)

sje · Post by **sje** » Sun Aug 28, 2011 3:30 am

At present, my code does not have instrumentation to record mean ply length. But a long ago version did, and I seem to recall that the number was about 338.

The mean branching factor at each ply could also be monitored and this could be used for perft() estimations. However, an adjustment would be required to skip repetition draws as threefold occurrences are not recognized by the usual perft() rules. Actually, I suppose that insufficient material and fifty move draws should also be skipped for perft() estimations. Perhaps we should change the definition of perft() to include all draws.

I'm considering a ten billion game run that would require about a month on a spare machine. But prudence demands first getting a UPS.

hgm · Post by **hgm** » Sun Aug 28, 2011 10:09 am

I am really surprised by the large fraction of insufficient-material draws. I had expected Chess to be an unstable game even in the random limit, because the side that happens to blunder away important material first would have less opportunity to capture material, so that the imbalance would only grow. But this suggests it is actually a stable game in this limit, where imbalances tend to correct themselves.

Of course having less material reduces the number of non-captures as well as the number of captures, so that it does not reduce the probabilty to play a capture. I guess this is what I overlooked.

Interesting fundamental questions would be:
*) what happens when you bias the move choice towards captures, by a fixed factor. Would the game become unstable when the bias exceeds some threshold (manifesting itself in a huge drop of material / 50-move draws, and a large increase in checkmates / stalemates).
*) How are the wins distributed over white wins and black wins? Does white have an advantage even in the random limit, and if so, how big is it?
*) Is it a better strategy to bias towards captures (i.e. if you only bias one side to prefer captures, would it beat a random mover)? Is there an optimum for the bias factor beyond which increasing the tendency to capture becomes counter-productive (if the opponent keeps using the optimum factor)?
*) Would biasing in proportion to the victim value (i.e. play each move with probability proportional to (1+f*victimValue)) be an improvement over fixed-factor biasing?
*) Would it be an improvement to bias against moving (and capturing) with more valuable pieces?

And finally:

*) Can this method be used to calculate piece values? How is the average result affected when you give Pawn odds / piece odds / Rook odds? Do you already see an effect of this in the random limit, or would it only show up after you succeeded making the game unstable by biasing the move choice in some way?

(Hint: this all seems much more interesting than playing nine billion additional random games to gain half a digit extra precision...

)

Sven · Post by **Sven** » Sun Aug 28, 2011 1:04 pm

sje wrote:The mean branching factor at each ply could also be monitored and this could be used for perft() estimations. However, an adjustment would be required to skip repetition draws as threefold occurrences are not recognized by the usual perft() rules. Actually, I suppose that insufficient material and fifty move draws should also be skipped for perft() estimations. Perhaps we should change the definition of perft() to include all draws.

I think this is a good idea.

But one could view it also the other way round: with a second definition "perft_v2()" that includes "all draws" (i.e., all draw-by-rule cases that we specify - FIDE rules have more cases that many chess programs do not detect today) we could do estimations of the mean branching factor for arbitrary ply levels. E.g. you could play a number of random games of fixed length N (now obeying draw by rule exactly as in the new perft_v2() definition), at each leaf node run a shallow but complete perft_v2() - let's say 3 plies, and monitor mean branching factor at plies N, N+1, N+2. Do this for different N, and combine the results.

The focus in this case would be on getting information about the branching factor for other purposes, not on using it for perft (perft_v2) estimation. Also this approach would be a bit more general than always starting perft calculation from the initial position, maybe increasing the chances to get realistic BF estimates also for later game phases. So it is something very different from your proposal, not meant as any kind of improvement of yours.

What do you think?

Sven

mar · Post by **mar** » Sun Aug 28, 2011 1:35 pm

hgm wrote:*) How are the wins distributed over white wins and black wins? Does white have an advantage even in the random limit, and if so, how big is it?

white/black wins are perfectly balanced

Daniel Shawul · Post by **Daniel Shawul** » Sun Aug 28, 2011 1:38 pm

The mean branching factor at each ply could also be monitored and this could be used for perft() estimations. However, an adjustment would be required to skip repetition draws as threefold occurrences are not recognized by the usual perft() rules.

I already tried that method and the result was not very good. IIRC for perft(13) I got about 1.4e18 with regular arithmetic averaging of BF at each ply. I got a better result by assuming log-normal distribution but is still had too much error to compete with the monte-carlo methods.

Daniel Shawul · Post by **Daniel Shawul** » Sun Aug 28, 2011 2:11 pm

I am really surprised by the large fraction of insufficient-material draws. I had expected Chess to be an unstable game even in the random limit, because the side that happens to blunder away important material first would have less opportunity to capture material, so that the imbalance would only grow. But this suggests it is actually a stable game in this limit, where imbalances tend to correct themselves.

I do not trust random games for chess because of this "capture problem". For checkers , captures are forced and plain monte carlo works very well, but for chess a player has an option not to capture. I have tried biased move selection for captures (both more/less frequency), still with very bad results. Maybe SEE would help but I never tried as that is too slow for MC. A capture may be good or bad and that takes some plies to discover. So plain biasing without using a complicated heuristic like SEE is doomed to fail. Chess is full of tactics (captures at each ply) and a MC searcher ,even though it goes deep, returns crap results missing 2-ply tactics. First, It moved its queen for the opponent pawn to capture it in the next ply. To solve that I tried giving larger probabilities to captures but then it started capturing defened pawns with its queen. So clearly I needed a good tactical searcher for the monte-carlo part and that is very _expensive_. Without that the statistics derived from it about who is winning is not dependable. To prove this point we can try a position where one side is up by a rook. I will not be surpized if the result is still 50-50.

Interesting fundamental questions would be:
*) what happens when you bias the move choice towards captures, by a fixed factor. Would the game become unstable when the bias exceeds some threshold (manifesting itself in a huge drop of material / 50-move draws, and a large increase in checkmates / stalemates).
*) How are the wins distributed over white wins and black wins? Does white have an advantage even in the random limit, and if so, how big is it?
*) Is it a better strategy to bias towards captures (i.e. if you only bias one side to prefer captures, would it beat a random mover)? Is there an optimum for the bias factor beyond which increasing the tendency to capture becomes counter-productive (if the opponent keeps using the optimum factor)?
*) Would biasing in proportion to the victim value (i.e. play each move with probability proportional to (1+f*victimValue)) be an improvement over fixed-factor biasing?
*) Would it be an improvement to bias against moving (and capturing) with more valuable pieces?

And finally:

*) Can this method be used to calculate piece values? How is the average result affected when you give Pawn odds / piece odds / Rook odds? Do you already see an effect of this in the random limit, or would it only show up after you succeeded making the game unstable by biasing the move choice in some way?

My answers for those questions would be a pessimistic No. I don't know if you remember but I started experimenting with MC to determine piece values. But once I saw the games...yek! It is absolutely not dependable unless we have a tactical searcher inside the MC part. For go some people use two searchers (a tactical alpha-beta + a monte-carlo searcher). Sometimes breadth-first searches are used to detect recurrent tactical situations but all are done in the tree part though.

hgm · Post by **hgm** » Sun Aug 28, 2011 3:05 pm

Interesting. What I had in mind was to make the probabilities proportional to (1+f*victimValue)/pieceValue. This does seem to avoid the problem you mention; QxP captures could be discouraged, while PxQ captures would be strongly enhanced.

Mind you, I am not claiming that you could ever play good Chess with such an algorithm. I am just curious if one could create such a simple strategy that was at least able to exploit a material advantage (making the game unstable, diverging away from equality rather than converging towards it).

Intuitively I would think that paranoia is good, because pieces could be defended and empty squares could be attacked. OTOH, QxQ should probably be strongly encouraged, because the Queen could have been unprotected. So f=2 seems a reasonable value, which would prefer QxQ by a factor 19/9=2.11 over a Pawn push, while PxQ would have relative probability 19, and QxP 3/9=0.33.

PK · Post by PK » Sun Aug 28, 2011 4:53 pm

Some time ago I have tried to reformulate Monte Carlo for chess as "take an average of multiple searches with large random factor in eval function", which is probably about the slowest and the most knowledge-heavy simulation strategy one can design. The result was a very funny failure: engine estimated mobility over material, needed a special incentive at the root to castle at all, and reached at most 1200 Elo.

I also have a very basic and bad engine playing the game of go with shallow, selective global alpha-beta. 10-12 moves are selected by Monte Carlo simulations and leaves are evaluated in the same manner. From time to time I add some rule preventing certain kind of a bad move, but even on a 9x9 board chances to pick something horrible are unfanthomable. Alpha-beta helps pure Monte Carlo / AMAF a lot, but the resulting program is still a sore loser. I came to admire go programmers a lot after trying that.

Still, it would be a great and instructive achievement to build a 1800-ish Monte Carlo chess program, but I have no clue how it can be done.

One billion random games

One billion random games

Re: One billion random games

Re: One billion random games

Re: One billion random games

Re: One billion random games

Re: One billion random games

Re: One billion random games

Re: One billion random games

Re: One billion random games

Re: One billion random games