multi-player games thoughts

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: multi-player games thoughts

Post by hgm »

I am not sure what you mean by 'shallow' pruning. I don't see anything shallow in the algorithm; the deep cutoffs work as always. Unless you consider all alpha-beta pruning shallow...

I agree that in the typical multi-player case it will be less effective than in the two-player case. This is because the cutoff condition is score >= 100%-max(oppoScore), rather than score >= 100%-SUM(oppoScore). When translated to excess win probabilities this becomes

xScore + 100%/N >= 100% - max(xOppoScore + 100%/N)

or

xScore >= 100% * (1 - 2/N) - max(xOppoScore)

So rather than sign flipping alpha to get beta, there is a raise of (1-2/N)*100%. For 3 players this would already be 33%, and it gets worse with more.

Isn't it true that you can save at most BF^(d/3), because only one of the 3 players can have cut nodes?
User avatar
OliverUwira
Posts: 170
Joined: Mon Sep 13, 2010 9:57 am
Location: Frankfurt am Main

Re: multi-player games thoughts

Post by OliverUwira »

hgm wrote:I am not sure what you mean by 'shallow' pruning. I don't see anything shallow in the algorithm; the deep cutoffs work as always. Unless you consider all alpha-beta pruning shallow...
I was referring to backgammon engines. They usually search only 2-3 ply because the 36 possible dice rolls blow up the tree. At the leaves they then apply their neural network eval. The 2-3 ply were what I labeled shallow, and GNU Backgammon at 3 ply has already a noticeable delay before it moves.
hgm wrote: I agree that in the typical multi-player case it will be less effective than in the two-player case. This is because the cutoff condition is score >= 100%-max(oppoScore), rather than score >= 100%-SUM(oppoScore). When translated to excess win probabilities this becomes

xScore + 100%/N >= 100% - max(xOppoScore + 100%/N)

or

xScore >= 100% * (1 - 2/N) - max(xOppoScore)

So rather than sign flipping alpha to get beta, there is a raise of (1-2/N)*100%. For 3 players this would already be 33%, and it gets worse with more.

Isn't it true that you can save at most BF^(d/3), because only one of the 3 players can have cut nodes?
That's another issue that I haven't grasped completely. Technically, the minimax principle should work as all players try to maximize their score and thus choose the respective moves.

But, for pruning, wouldn't you have to carry around lower and upper bounds for every player and let a player cut off when it's clear that their opponents would all never allow that position to be reached?

I need to read up on this again, it's already been awhile...
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: multi-player games thoughts

Post by hgm »

My posting was a reply to Daniel's, that crossed yours because I was delayed in submitting it.

What you say about the cutoffs is basically correct, except that you would only need to keep track of a lower-bound for each player. The upper-bounds are not independent, but follow from the lower bounds of the other players. With two players you need alpha and beta, because beta is the disguised alpha of the opponent. The expressions you quoted show how to derive 'beta' from the opponent 'alphas' in the multi-player case (which indeed reduces to beta = -alpha for N=2).
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: multi-player games thoughts

Post by Daniel Shawul »

Deep pruning happens when the alpha beta bounds are transmitted recursively and pruning happens at deeper depths. That kind of pruning is impossible unless you make some speculative assumptions. Note that all of this is based on what I read, not experience. Here is a paper which discusses why alpha-beta is such a specialized algorithm..
alfa-beta multi player
Abstract
We consider two generalizations of the standard
two-player game model: different evaluation
functions for the players, and more than two
players. Relaxing the assumption that players
share the same evaluation function produces a
hierarchy of levels of knowledge as deep as the
search tree. Alpha-beta pruning is only possible
when the different evaluation functions
behave identically. In extending the standard
model to more than two players, the minimax
algorithm is generalized to the maxn algorithm
applied to vectors of N-tuples representing the
evaluations for each of the players. If we assume
an upper bound on the sum of the components
for each player, and a lower bound on
each individual component, then shallow alphabeta
pruning is possible, but not deep pruning.
In the best case, the asymptotic branching factor
is reduced to (1 + \/4*b — 3)/2. In the average
case, however, pruning does not reduce
the asymptotic branching factor. Thus, alphabeta
pruning is found to be effective only in
the special case of two players with a common
evaluation function.
There is a detailed analysis of pruning algorithms for multi-player games here in Chapter 5. They proposed last-branch pruning and speculative pruning for getting deep cutoffs. But in general the deep cutoffs of alpha-beta are not possible. I will go through your method in detail now.

More later
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: multi-player games thoughts

Post by Daniel Shawul »

I believe what you call alpha-beta pruning in this context would generally be the elimination of dominated strategies. A strategy is dominated if there is another strategy that is at most equal for all possible outcomes of the game.

If you can table the game your considering, this is easy to do, but it obviously harder for more complex games. In case every player has a dominating strategy, you'll find a nash equilibrium, where nobody can possibly deviate without being worse off, given the other players keep their strategy.

For Poker, the first thing one would do is to design a player model, which is updated after every observation. Those models are meant to approximate a players strategy and taken as the base for determining the expected value of a move to be made.

You would now build a search tree and weigh the branches according to the approximated likelihood of their occurence, as backgammon engines do during their shallow 2-3 ply search. Another possibility is to monte carlo on grounds of the player model. Note that you never determine a single move but rather a mixed strategy which gives the frequency of the various moves available.
Yes I agree. For simple cases like prisoner's dilemma the dominated strategy can be easily found by tabulating all possible scenarios. For actual game playing, a monte-carlo method can be used to consider risk. I think a version of UCT for multi-player Go has been successfully used. Also for imperfect information games like Dark chess, monte-carlo is essential.
An unexploitable strategy would be what you called a paranoid strategy. The more exploitative a strategy becomes, the more the strategy closes in towards what you called maxn. In poker the decision about how exploitatively a strategy could be is very tough, because the risk depends on the ability of the opponents to adapt their strategies.

Consider playing Rock, Paper, Scissors: The unexploitable strategy is playing every move randomly, resulting in a frequency of 1/3 for every move. If you notice the opponent deviates, you could exploit him by increasing the frequency of the countermeasure accordingly (i.e. if he plays more rocks, you play more papers), but this would make you vulnerable, should the opponent adapt to play more scissors.
From what I understood so far, paranoid is worst possible case and maxn disregards other opponents entirely. So all strategies are mixes of these two. If the eval tuple is say (a,b,c)
Paranoid player a's matrix
+1 0 0
-1 0 0
-1 0 0
So player a's score = a - (b + c)
Maxn player's matrix
1 0 0
0 1 0
0 0 1
Player a's score = a.

Using mixed strategy allows only those prunings possible in the maxn strategy.
Thanks for your input.
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: multi-player games thoughts

Post by hgm »

Daniel Shawul wrote:I will go through your method in detail now.
Yes, that would be a good idea. Because I think I convincingly prove there that alpha-beta pruning works just the same, except that the upper window limit cannot be set as tightly when you have more players.

I don't really get the formula in your quote above. Is is (1+sqrt(4*b-3))/2 ? What is b, the branching factor? Why is there no dependence on the number of players?
Harald
Posts: 318
Joined: Thu Mar 09, 2006 1:07 am

Re: multi-player games thoughts

Post by Harald »

Would it help to play against the maximum of the opponents evals?
Evaluate each player (A, B, C) individually (ea, eb, ec).
Then the real score seen from player A is ea - max(eb, ec).
Perhaps this idea is somehow useful.

Harald
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: multi-player games thoughts

Post by Daniel Shawul »

I don't really get the formula in your quote above. Is is (1+sqrt(4*b-3))/2 ? What is b, the branching factor? Why is there no dependence on the number of players?
Why ? The BF is the same because only one player moves at each ply. For chess that will be ~36 with min max only. The formula is the effective branching factor for alfa-beta if we disregard deep cutoffs. Its detailed derivation can be found in Knuth and Moore's analysis of alfa-beta http://www.scribd.com/doc/28194932/An-A ... ta-Pruning.
The deep cutoffs have a secondary effect for the best case, but as noted in this paper the average case is a no improvement and that is what matters usually.
Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: multi-player games thoughts

Post by Daniel Shawul »

Actually the paranoid algorithms does something similar to coalesce the scores. ea - (eb + ec). I can understand why you chose max(eb,ec) (it gives score of 0 at start position) , which is more comfortable for chess programmers like us :)
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: multi-player games thoughts

Post by hgm »

Well, it seems that his algorithm is quite different from mine. I would not count it as a generalization of alpha-beta, though, because it seems to re-open the search window in every node. But it could of course be that his algorithm is more efficient than mine. OTOH, the fact that he says it can only achieve this EBF in favorable cases, and that in the typical case there is no gain at all, is not encouraging. The more modest EBF reduction of my algorithm might be more robust.