multi-player games thoughts

OliverUwira · Post by **OliverUwira** » Wed Oct 05, 2011 8:55 pm

Daniel Shawul wrote: From what I understood so far, paranoid is worst possible case and maxn disregards other opponents entirely. So all strategies are mixes of these two. If the eval tuple is say (a,b,c)
Paranoid player a's matrix
+1 0 0
-1 0 0
-1 0 0
So player a's score = a - (b + c)
Maxn player's matrix
1 0 0
0 1 0
0 0 1
Player a's score = a.

Using mixed strategy allows only those prunings possible in the maxn strategy.
Thanks for your input.

First I'd like to apologize as I had been under the impression that H.G. opened this thread. This was why I quoted him.

After having taken a closer look on your defintion of the algorithms, I believe we might not have been far off of each other in case of the paranoid algorithm. Given a zero sum game, the unexploitable strategy should equal the paranoid algorithm's choice. It minimizes a - (b+c) which would be zero and you can't possibly lose more.

If a game isn't zero sum, I would say one could generalize "unexploitable" to "they can't possibly win more off of me whatever they do" and we'd again be d'accord.

I misunderstood your definition of maxn, though. It simply assumes independent play and hasn't much to do with exploitation. However, if coalitions are possible at all, an engine cannot play as if they weren't.

The challenge is to find some strategy that improves on the minimum loss. Now there is risk and we search a strategy that is optimal in its relation of gain and risk. This would be the job of the evaluation, with minimax on top. Likelihood of coalitions could still enter the equation.

I have no idea about the pruning techniques, though - I will read the sources you posted.

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 05, 2011 9:11 pm

I am not sure what you mean by 'shallow' pruning. I don't see anything shallow in the algorithm; the deep cutoffs work as always. Unless you consider all alpha-beta pruning shallow...

I agree that in the typical multi-player case it will be less effective than in the two-player case. This is because the cutoff condition is score >= 100%-max(oppoScore), rather than score >= 100%-SUM(oppoScore). When translated to excess win probabilities this becomes

xScore + 100%/N >= 100% - max(xOppoScore + 100%/N)

or

xScore >= 100% * (1 - 2/N) - max(xOppoScore)

So rather than sign flipping alpha to get beta, there is a raise of (1-2/N)*100%. For 3 players this would already be 33%, and it gets worse with more.

Isn't it true that you can save at most BF^(d/3), because only one of the 3 players can have cut nodes?

I think there is some mis-communication between us. First maxn player picks his best value dis-regarding opponent scores. If you use something like my_score - sum {max}(opponent_score) ,it becomes paranoid. Also paranoid chooses one player until the search is done, and all scores are relative to that root player. For example if you have eval of a - (b + c) when a is about to move,then you still have to use the same eval (and not b - (a + c) when b is to move). As the paper I gave the abstract to outlines, same evaluation and 2-player games are major requirements of deep cutoff alpha-beta. With this in mind pruning's can not be deep unless you collude some players until the search is finished. Please demonstrate your algorithm with a simple tree because I am having a hard time how it does deep pruning.

I am not sure about the EBF = b^1/3 either. The only difference will be you would have max-min-min-max instead of max-min-max-min for the paranoid case for example.. For the paranoid case it is still EBF = b ^ 1/2.
The one using shallow cuts will have something a bit larger to that for large b, however it does not give improvement on average. Alpha-beta has b^3/4 on average.

Well, it seems that his algorithm is quite different from mine. I would not count it as a generalization of alpha-beta, though, because it seems to re-open the search window in every node. But it could of course be that his algorithm is more efficient than mine. OTOH, the fact that he says it can only achieve this EBF in favorable cases, and that in the typical case there is no gain at all, is not encouraging. The more modest EBF reduction of my algorithm might be more robust.

The paranoid algorithm colludes n-1 players and becomes a 2-player game with all the benefits of alpha-beta and its derivative. Korf et al did not propose new algorithms but just prove that alfa-beta with deep pruning can not be applied for maxn. I think what you perceived as "maxn" algorithm is different from what is being discussed there.

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 05, 2011 9:26 pm

First I'd like to apologize as I had been under the impression that H.G. opened this thread. This was why I quoted him.

Don't worry. We will be mis-understanding each other until the mud clears

I misunderstood your definition of maxn, though. It simply assumes independent play and hasn't much to do with exploitation. However, if coalitions are possible at all, an engine cannot play as if they weren't.

The challenge is to find some strategy that improves on the minimum loss. Now there is risk and we search a strategy that is optimal in its relation of gain and risk. This would be the job of the evaluation, with minimax on top. Likelihood of coalitions could still enter the equation.

I have no idea about the pruning techniques, though - I will read the sources you posted.

Yes the maxn's scores fore each player should include scores of other players. For chinese checkers maxn may be the best algorithm, but may be not for chess. Also we would have a lot of tie-breaking to do with maxn that will force us to include oppoent scores in the individual score of each player. I think that using (a - (b + c) , b - (a + c), c - (a + b)) i.e
1 -1 -1
-1 1 -1
-1 -1 1
at the leaves for evaluation, and then backing up by maxn strategy is still maxn i.e for backing up using
1 0 0
0 1 0
0 0 1
I think it doesn't matter how the evaluation tuples are formed at the leaves, only how they are backed up.

hgm · Post by **hgm** » Wed Oct 05, 2011 9:52 pm

Daniel Shawul wrote:I think there is some mis-communication between us. First maxn player picks his best value dis-regarding opponent scores. If you use something like my_score - sum {max}(opponent_score) ,it becomes paranoid.

Using sum would be paranoid, but not using max. But there is some miscommunication: the given formula was not intended to be used for scoring, but for calculating the cutoff threshold. The move choice is purely basedon its own score, ignoring opponent scores. Perhaps it was misleading to write xOppoScore. What I really should have written was

beta = 100%*(1-2/N) - max(alpha[opponents])

Let me try to formulate it in pseudocode:

Code: Select all

Search&#40;player1, player2, player3, alpha1, alpha2, alpha3, cutPlayer, depth&#41;
&#123;
  int beta = &#40;1-2/3&#41; - max&#40;alpha2, alpha3&#41;;
  int bestScore&#91;&#93; = &#123;-1/3, -1/3, -1/3&#125;; // lowest possible excess win probability
  if&#40;depth == 0&#41; return Eval&#40;); // No QS...
  for&#40;ALL_MOVES&#40;player1&#41;) &#123;
    score = Search&#40;player2, player3, player1, alpha2, alpha3, alpha1, cutPlayer, depth-1&#41;;
    if&#40;cutPlayer < 0&#41; cutPlayer = player2; // the next to play should refute us if we branch from PV
    if&#40;score&#91;player1&#93; > alpha1&#41; &#123;
      bestScore = score;
      alpha1 = score&#91;player1&#93;;
      if&#40;player1 == cutPlayer && alpha1 >= beta&#41; return score; // cutoff
    &#125;
  &#125;
  return bestScore;
&#125;

Search&#40;1, 2, 3, -1/3, -1/3, -1/3, -1, N&#41;;

[Edit:] I guess this pseudo-code is no good, as I don't properly track what are lower andwhatare upper-bounds. Upper-bounds should not be used to increase alpha.

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 05, 2011 10:24 pm

Daniel Shawul wrote:
First I'd like to apologize as I had been under the impression that H.G. opened this thread. This was why I quoted him.
Don't worry. We will be mis-understanding each other until the mud clears
I misunderstood your definition of maxn, though. It simply assumes independent play and hasn't much to do with exploitation. However, if coalitions are possible at all, an engine cannot play as if they weren't.

The challenge is to find some strategy that improves on the minimum loss. Now there is risk and we search a strategy that is optimal in its relation of gain and risk. This would be the job of the evaluation, with minimax on top. Likelihood of coalitions could still enter the equation.

I have no idea about the pruning techniques, though - I will read the sources you posted.
Yes the maxn's scores fore each player should include scores of other players. For chinese checkers maxn may be the best algorithm, but may be not for chess. Also we would have a lot of tie-breaking to do with maxn that will force us to include oppoent scores in the individual score of each player. I think that using (a - (b + c) , b - (a + c), c - (a + b)) i.e
1 -1 -1
-1 1 -1
-1 -1 1
at the leaves for evaluation, and then backing up by maxn strategy is still maxn i.e for backing up using
1 0 0
0 1 0
0 0 1
I think it doesn't matter how the evaluation tuples are formed at the leaves, only how they are backed up.

This is more confusing than I thought. If we have eval tuple (a,b,c), then
------------------------
paranoid: (a,-a,-a) if a is the root player. B and C do not care about their scores but to minimize a's.
Regular two player alpha beta has also that property with (a, -a).
-------------------------
maxn: (a,b,c) all players care about their scores and only that
-------------------------
The one I proposed: (a - (b + c), b - (a + c), c - (a + b)). This is something else not maxn and not paranoid. Probably a case where all players are paranoid. But this can not take advantage of alpha-beta since that is possible only when one of the players is paranoid.

Once the evaluation is modified using this method no modificuation is done to the scores again. Backup goes by picking up the tuple with highest score for the current player.. Much like negamax modification of min-max. Instead of maximizing/minizing we do maximizing only... Sorry for adding more confusion.

hgm · Post by **hgm** » Wed Oct 05, 2011 10:27 pm

OK, I see the problem. To prove a cutoff the players in the all nodes have no score. So you cannot exclude any combination of their moves, as there is no way to guess what they will choose to play. Which effectively mean you will also have to be able to refute that combination of moves where they team up. And in most games it is not likely you will be able to do that, if the score is not extreme to start with.

It depends on the game, though. There could be games where teaming up gives little to no advantage.

Daniel Shawul · Post by **Daniel Shawul** » Wed Oct 05, 2011 10:42 pm

I may have underestimated the effect of shallow pruning though. Obviously for the best case much of the advantage comes from shallow prunings. Knuth discards them as secondary in that case. But this requires a good move ordering close to the root. With sub-optimal move ordering, the deep cutoffs contribute a lot. Since we have good heuristics for move ordering in chess (unless it is drastically different for multi-players), then it can give good improvements for triple chess atleast. Other strategies , except paranoid, are also bound to the same limitation.

Daniel Shawul · Post by **Daniel Shawul** » Thu Oct 06, 2011 5:26 am

Ok let me clarify more on the point I was trying to make. Now that I implmented all of them , the mud is starting to clear.
Lets say we evaluate each side's material as (a,b,c). That is a's score contains the sum of the piece values of a's pieces only. That sounds really bad and it probably is. That is what we thought maxn does but that is not the case. Actually maxn can be used for evaluation formed in any way i.e (a - (b + c), b - (a + c) , c - (a + b)) is still maxn with a different social oriented evaluation (SOE). Even paranoid is maxn but in that case we effectively have only one evaluation for all players, unlike the general case of maxn which has 3 distinct evaluations for 3-player games f.i. Paranoid has (x,-x,-x) as tuple for three player chess, which effectively reduces it to one evaluation for all and is the major reason why alpha-beta is possible. I summarized in the following table the results with absolute and relative material scores.

Code: Select all

	      Material values 	Relative material score	General
Maxn			
1	           a	                a - b - c	         x
2	           b	                b - a - c	         y
3	           c	                c - a - b	         z
			
sum	      a + b + c	           a + b + c	     x + y + z
			
paranoid_sum			
1	      +&#40;a-b-c&#41;	        +&#40;3a-b-c&#41;	              +x
2	      -&#40;a-b-c&#41;	        -&#40;3a-b-c&#41;	              -x
3	      -&#40;a-b-c&#41;	        -&#40;3a-b-c&#41;	              -x
			
sum	    -&#40;a-b-c&#41;	        -&#40;3a-b-c&#41;	              -&#40;n-1&#41;x

I also tried paranoid_max where the score is obtained by a - max(b,c) instead of a - (b+c) as paranoid_sum, and it also seems to work fine.

hgm · Post by **hgm** » Thu Oct 06, 2011 10:57 am

a - max(b,c) is significantly less paranoid than a - (b+c). It assumes the strongest opponent will turn against you, while the weakest will remain neutral. Nevertheless, I think scores like a/(a+b+c) would be more natural.

We have to distinguish two issues here: search and eval strategies. Search just figures out how you should force your way to the best achievable position in a turn-based game. But the task to tell you what is best rests purely on eval.

In multi-player games the eval is highly effected by the higher-order strategy of alliance formation. In a multi-player Chess game each player has two assets: its material, and the right to move. Even if player A has more material than B and C together, he would still lose if B and C teamed up against him, because B and C can make two moves per turn-cycle, which is a devastating advantage. So alliances are a domininant factor.

Now there are all kinds of 'meta-strategies' w.r.t.alliance formation that would affect the conversion of raw material (a,b,c) to winning probabilities. E.g. when all players would always team up against the strongest, the game would be very stable, and material advantages would mean nothing unless they are extremely large. Team up against the weakest as long as the others are balanced might be a better strategy (a cheap way to up your chances from 33% to 50%). And because engagement in Chess is likely to weaken both sides, even unbalanced strongest might cooperate such that the losses of their skirmish with the weakest benefits the middle one, because the strongest takes the unavoidable losses. The weakest player can deter a concerted attack on it by a 'black-mail strategy', preferably damage the middle player, so that is not in the interest of the latter to cooperate with the strongest. Or even use a 'help-me-or-else' strategy, taking him down with you when he refuses to team up.

Daniel Shawul · Post by **Daniel Shawul** » Thu Oct 06, 2011 2:22 pm

We have to distinguish two issues here: search and eval strategies. Search just figures out how you should force your way to the best achievable position in a turn-based game. But the task to tell you what is best rests purely on eval.

Indeed. That is what has been confusing me a bit. I think we can divide maxn/parnoid as follows.The search strategy is always maxn (or min-max) in which each player maximizes its own score and only that.What differentiates the strategies is how that eval score is formed at the leaves. There are many maxn like mixed strategies with different social oriented evaluation SOE. A clear distinction between maxn and paranoid can be made by noting paranoid as a two player game where the tuple is (x,-x,-x,...). With individual material score's of (a,b,c) paranoid can be for example (x=a,-x,-x) OR (x=a-(b+c),-x,-x) OR (x=3a-(b+c),-x,-x) OR (x=a-max(b,c),-x,-x) etc...Maxn oth contains distinct eval tuples.

a - max(b,c) is significantly less paranoid than a - (b+c). It assumes the strongest opponent will turn against you, while the weakest will remain neutral. Nevertheless, I think scores like a/(a+b+c) would be more natural.

Maybe, but I am not sure as to the real difference. if we take out the same amount from evals the "relaive degree of parnoia" remains the same i.e (x,-x,-x) is still same as ((x+a),-(x+a),-(x+a)). Those are functionally equivalent since the relative order of nodes remain the same according to Korf paper. Paranoid_sum=a-(b+c) is one magnitude less than paranoid_max=a-max(b,c) if we assume a,b,c are roughly of the same magnitude. So to have a comparison it may be appropriate to define paranoid_sum = a-(b+c)/2 instead. I think that a - max(b,c) is more suited to games where there is no prize for second place, and a - (b+c) when 2nd place matters. Say a is far ahead and guranteed first place, the paranoic scores when b is the root player will be paranoid_sum = b - (MAX_SCORE + c) and paranoid_max= b - max(MAX_SCORE + c) = b - MAX_SCORE, so clearly paranoid_max to not compare its score against c sorta of not caring for second place ??

In multi-player games the eval is highly effected by the higher-order strategy of alliance formation. In a multi-player Chess game each player has two assets: its material, and the right to move. Even if player A has more material than B and C together, he would still lose if B and C teamed up against him, because B and C can make two moves per turn-cycle, which is a devastating advantage. So alliances are a domininant factor.

Yes, I think that if two week humans make an alliance against a strong computer always, then there is pretty much nothing we can do about it! That takes away some of the fun from multi-player games I think. But if the humans are "rational" and make and break alliances with whomever is weaker/stronger then we have a chance to be smart by changing the social orientation (SO) as the game goes on. For linear SOs, a dynamic matrix representing the SO of each player has been successfully used. Other methods may be used for paranoid_max and others that can not be represented by matrix.

Team up against the weakest as long as the others are balanced might be a better strategy (a cheap way to up your chances from 33% to 50%). And because engagement in Chess is likely to weaken both sides, even unbalanced strongest might cooperate such that the losses of their skirmish with the weakest benefits the middle one, because the strongest takes the unavoidable losses.

It makes sense. I haven't yet implemented a way to handle when one player gets taken but that is definitely the incentive i.e upping your chance to 50%.

multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts

Re: multi-player games thoughts