multi-player games thoughts

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 05, 2011 2:35 am

So I was pondering about multi-player games past few days ,when I decided to post the following rant
for the sake of those souls who need something fresh to think about

There is a lot of opportunity for research
when you can't even use alpha-beta (or even negamax). Something simple like adding one more player brings
a whole lot of social studies to the mix. It has major applications to economics and social science in general.

My first attempt was to implement the paranoid algorithm. In this algorithm, the root player assumes the rest
of the players "gang up" on him to make his life miserable. This effectively reduces the n-player game into a
two player game where the n-1 players form a coalition against the one root player. (<skip>pretty much west vs gadafi IYKWIM</skip>)
Anyway this allowed me to use existing algorithms of Nebiyu with minor changes. The algorithm is not the best
because paranoic assumptions are usually not realistic. But it can be used for instance to get a lower bound score.
It can't get worse than the paranoic value!
-> coalitions are static. Maybe not for the whole game but atleast until the current root search finishes and a player makes a move.
This implies that the hash table has to be cleared at each move because the scores are always relative to the root player.
So the hashtable scores become useless when the root player changes.
-> with larger search depth parnoid gives bad results. The coalition is too much to handle for the root player.
For example score starts with about -6000 for three player chess (i.e if I don't scale it down). So searching deep
is not in his benefit for the root player. It can only discover it will get annihlated.

The second method is the "maxn" algorithm which is the complete opposite of paranoid player.
This is basically minmax for three or more players. Each player tries to maximize his own score regardless of opponent scores.
Unfortunately alpha-beta can not be used. Shallow alpha-beta pruning is possible under certain circumstances but the
deep-pruning is proven not to be applicable! You don't even need to order moves. Most of the pruning methods I read in general
give minimal improvements. Null move and futility pruning can not be used in general since they require alpha and beta .
Surprisingly I couldn't find use of LMR in literature. I tried it and it seems to give some increase in depth.
That is the only pruning scheme I have right now for this method.

In actual games coalitions form and break depending on who is ahead. So if one player is ahead, the other two will gang up on him
to bring his score down, so it is in his best interest not to "show off". So using purely paranoid algorithm is not realistic in most
cases. That cancels out alpha-beta and all of its derivatives. There are algorithms which mix paranoid and maxn, and they do not use
alpha-beta. One such algorithm is the social oriented search (SOS) where a player's animosity towards others is represented by a matrix.
The evaluation for players form a tuple. Hence multiplying the eval with the matrix gives a modified evaluation according to social orientation.
This can be dynamically adjusted according to the current board situation. All fascinating stuff if you want to read.

Btw I am experimenting with triple chess http://www.chessvariants.com/historic.d ... nelli.html
Rant is over. Thoughts are welcome.

cheers,
Daniel

Kirill Kryukov · Post by **Kirill Kryukov** » Wed Oct 05, 2011 8:27 am

Hi Daniel,

Interesting stuff. Why don't you try to participate in this year's Google AI Challenge? This time the game is multi-player, unlike last year. The contest should officially start sometime this week (hopefully). Current beta-testing link: http://aichallengebeta.hypertriangle.com/index.php

Best,
Kirill

hgm · Post by **hgm** » Wed Oct 05, 2011 8:52 am

Just an observation: in multi-player games you have to deal with some concepts that also occur in non-zero-sum two-player games. (Because inmulti-player games your alliance of opponents is not really a single opponent, and thus has points on its agenda that do not concern you.) Perhaps it would be useful to first consider this simpler case, and then see if some of the algorithms that work there can be generalized.

You will be faced with interesting dilemmas, like if you havethe choice between move A (which with the answer that is best for the opponent would lead to score +5, but when he make a move that is worse for himself ends you at -100) and move B (which puts you at -1 on his best reply, and +1 when he blunders), what should you do? How certain can you be the opponent will be able to find his best move, that you are willing to gofor the +5 and play A?

Michel · Post by **Michel** » Wed Oct 05, 2011 12:06 pm

The problem with multi-player and non-zero sum games is the fact that the players can negotiate with each other. This negotiation process is more important than the underlying board game.

E.g. player 1 and 2 might decide to first kill player 3 and then battle it out with each other.

The only defense of player 3 is to try to make a coalition with either 1 or 2.

hgm · Post by **hgm** » Wed Oct 05, 2011 12:14 pm

I don't agree that something similar to alpha-beta pruning would not be possible with maxn. It might be less useful, but the principle of branches that cannot affect the root score stands as long as it is a zero-sum game:

Suppose we have a 3-player game, (players A, B and C), and are working on a maxn search. The currently found PV will lead to a position with evaluation (a, b, c) for the three players. Suppose we evaluate in terms of winning probability, and winning is all that matters (no silver medal...), so that a+b+c = 100% (and each >= 0%). Suppose we are currently searching a move Ma that deviates from the current PV at ply n, played by A. Now suppose that in this sub-tree, we have already searched a move Mb at ply n+1 (B's turn). If it would be true that we can never use something like alpha-beta, we would have an exact score for that sub-sub-tree too, say (a', b', c').

Now suppose we are searching at a deeper level, in the sub-sub-tree after moves Ma and Mb' (an alternative at ply n+1), in a node where C has the move. Suppose we find one move for C with a score-triplet where c" > min(100% - a, 100% - b'). This guarantees that either A or B would avoid the current node. We know they have the possibility to do so: A could play Ma at ply n, securing score a, while the best he could get when C scores > 100%-a is lower, and can only get lower still by searching more C moves, as C will only take those if they are better for him. Similarly, given that A played Ma, B can secure score b' by playing Mb, and the current branch also denies him that when c" is already above 100%-b', leaving less than b' to divide between A and B. So searching for better moves in this node has become pointless: we have a beta cutoff (well, actually a gamma cutoff, because it is C's turn).

Player C's so-far-best score c" for the node thus becomes a lower-bound, and A and B's scores wil get upper-bounds a" = 100% - c", b" = 100% - c" (as we don't know how A and B will divide the remaining score between them in there pruned moves). The threshold gamma for which we cut when C finds a move that scores >= gamma will be 100% minus the maximum of all scores (or lower-bounds) of the best moves found for A and B in all nodes leading up to the current one.

So the algorithm is basically the same as for two-player alpha-beta: when you find a better score (or lower-bound) for yourself, and it is above the window limits of your opponents, you increase those window limits (similar to increasing alpha). When your own score gets above 100% minus the best of these opponent limits, you take a cutoff. The 100%-opponentWindowBound is the equivalent of setting beta = -alpha the recursive call; the 100% enters here because we are evaluating in terms of win probabilities, that sum to 100%, in stead of scores that sum up to 0. (The analogy could be made even better by scoring the "excess win probability" = winProbability -100%/N_player.)

hgm · Post by **hgm** » Wed Oct 05, 2011 12:20 pm

Michel wrote:The problem with multi-player and non-zero sum games is the fact that the players can negotiate with each other. This negotiation process is more important than the underlying board game.

E.g. player 1 and 2 might decide to first kill player 3 and then battle it out with each other.

The only defense of player 3 is to try to make a coalition with either 1 or 2.

True. But you could imagine games where communication between the players is impossible other than through the game moves. So the optimal strategy might involve something like signalling in Bridge, where you are also forbidden to confer with your partner. Several systems of signalling might be equivalent in terms of performance, provided you use the same one as your competitors. This would lead to a degenerate optimal strategy, and spontaneous symmetry breaking.

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 05, 2011 2:41 pm

hgm wrote:
Michel wrote:The problem with multi-player and non-zero sum games is the fact that the players can negotiate with each other. This negotiation process is more important than the underlying board game.

E.g. player 1 and 2 might decide to first kill player 3 and then battle it out with each other.

The only defense of player 3 is to try to make a coalition with either 1 or 2.
True. But you could imagine games where communication between the players is impossible other than through the game moves. So the optimal strategy might involve something like signalling in Bridge, where you are also forbidden to confer with your partner. Several systems of signalling might be equivalent in terms of performance, provided you use the same one as your competitors. This would lead to a degenerate optimal strategy, and spontaneous symmetry breaking.

I think that it can only be a problem when computer plays against humans! If rybka is playing against two average humans, the humans' best chance is to collude from start to finish, attacking and defending together. With computer players it is difficult if all communication passe through the arbiter. Even with that one can not be sure of collusion since there is a chance for betrayal anytime. When attacking the lone opponent it may be safe to assume collusion, but when defending yourself it is risky to assume your partner will help you defend. I think that games using monte-carlo search can be good multi-players.

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 05, 2011 3:09 pm

I don't agree that something similar to alpha-beta pruning would not be possible with maxn. It might be less useful, but the principle of branches that cannot affect the root score stands as long as it is a zero-sum game:

Suppose we have a 3-player game, (players A, B and C), and are working on a maxn search. The currently found PV will lead to a position with evaluation (a, b, c) for the three players. Suppose we evaluate in terms of winning probability, and winning is all that matters (no silver medal...), so that a+b+c = 100% (and each >= 0%). Suppose we are currently searching a move Ma that deviates from the current PV at ply n, played by A. Now suppose that in this sub-tree, we have already searched a move Mb at ply n+1 (B's turn). If it would be true that we can never use something like alpha-beta, we would have an exact score for that sub-sub-tree too, say (a', b', c').

Oh yes is possible. The best case scenario for shallow maxn pruning puts the branching factor at something close to b^(d/2) (i.e regular alpha-beta's with optimal move order). But the average case is much worse and is almost a non improvement in practice. Also this is only possible if the game has a lower bound on each player's score and an upper bound on the sum. Your example is also constant sum game. It is advised to convert the game to a constant sum game to gain maximal profits from shallow pruning. It can be achieved by taking fractions of scores i.e a / (a + b + c) for player a. I haven't gone through your description in detail (I will later) , but we agree there are possibilities for shallow pruning. The results reported in literature with different shallow pruning are not that fascinating.

Also there are multiple pvs possible in maxn that can give completely different outcomes, so the tie-breaker is very important as well. In the 2 player case maxn guarantees a best strategy against an optimal opponent, but only a very weak guarantee is possible in multi-player games (nash's theorem). I think switching on paranoid is beneficial towards endgames where we want to guarantee a win despite opponent move.

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 05, 2011 3:12 pm

Hi Kirilli

Thanks for the link. My goal is to write a general game playing engine in the long future and compete in one of these tournaments http://games.stanford.edu/
But right now I am far from achieving that ..

OliverUwira · Post by **OliverUwira** » Wed Oct 05, 2011 3:47 pm

I believe what you call alpha-beta pruning in this context would generally be the elimination of dominated strategies. A strategy is dominated if there is another strategy that is at most equal for all possible outcomes of the game.

If you can table the game your considering, this is easy to do, but it obviously harder for more complex games. In case every player has a dominating strategy, you'll find a nash equilibrium, where nobody can possibly deviate without being worse off, given the other players keep their strategy.

For Poker, the first thing one would do is to design a player model, which is updated after every observation. Those models are meant to approximate a players strategy and taken as the base for determining the expected value of a move to be made.

You would now build a search tree and weigh the branches according to the approximated likelihood of their occurence, as backgammon engines do during their shallow 2-3 ply search. Another possibility is to monte carlo on grounds of the player model. Note that you never determine a single move but rather a mixed strategy which gives the frequency of the various moves available.

Also note that in any game where information is concealed you would have to distinguish between unexploitable and exploitative strategies, the latter offering more reward at the risk of becoming exploitable oneself.

An unexploitable strategy would be what you called a paranoid strategy. The more exploitative a strategy becomes, the more the strategy closes in towards what you called maxn. In poker the decision about how exploitatively a strategy could be is very tough, because the risk depends on the ability of the opponents to adapt their strategies.

Consider playing Rock, Paper, Scissors: The unexploitable strategy is playing every move randomly, resulting in a frequency of 1/3 for every move. If you notice the opponent deviates, you could exploit him by increasing the frequency of the countermeasure accordingly (i.e. if he plays more rocks, you play more papers), but this would make you vulnerable, should the opponent adapt to play more scissors.

multi-player games thoughts

multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts