Interesting (Chinese) Chess variant

hgm · Post by **hgm** » Thu Jan 23, 2014 8:34 pm

It seems there is a new variant of Chinese Chess that is rapidly gaining popularity around Hong Kong. It is called 'reveal Chess'. In principle any form of Chess could be played as a 'reveal' variant. The idea is that the pieces are shuffled in the initial position (including Pawns, but except King), but that the result of this shuffling is hidden to both players. The traditional equipment used for Chinese Chess is ideally suited for that: it is played with checkers that have the piece name written on their top face, and you only have to flip them over and shuffle before setting up the board.

Unreveiled pieces move as the piece that normally starts on the square they are on, but at the end they are revealed for what they really are (by flipping them right-side-up), and for the rest of the game they move like the normal piece they are. Unrevealed pieces can also be captured, and it is revealed what they were. The rules (unique to Chinese Chess) that some pieces cannot cross certain zone boundaries is dropped in this game, except for the Kings.

In the end, when all peieces are revealed, this game turns into regular Xiangqi. But in the opening phase it is completely different. Then it has an aspect that is alien to most Chess variants, namely that the outcome of a move is not unique, but can result in a number of positions (with known probabilities).

As the players cannot exert any influence over these probabilities, this doesn't create a situation as complex as in "Chess 2". I think a normal alpha-beta search can be used for this game, with the modification that when a move is played with uncertain outcome, one would simply have to average over all possible outcomes (where the score of each of these outcomes must be determined by search). For the averaging to be allowed, the scores must be expressed as winning probabilities.

E.g. if one can capture an unrevealed piece with an Elephant, and the opponent still has 2 unreveiled Rooks, and single unrevealed Cannon, Adviser, and Pawn there are 4 possible outcomes, and the score of that capture would be 40%*score(R) + 20%*score(C) + 20%*score(A) + 20%*score(P). Obviously it would be much better to find you captures a Rook than a Pawn, and when you search the outcomes in MVV order it could very well be that 40%*score(R) + 20%score(C) would already be enough to score above beta even when score(A) and score(P) would be 0 (win probability, i.e. a certain loss), in which case you could do a beta cutoff even before caluclating the complete average.

It seems no computer programs exist yet that could play this game. I am thinking about converting my Xiangqi engine MaxQi to play it, using WinBoard as an interface. This touches upon the protocol, however: ideally (to prevent cheating) the GUI would be the only party deciding on the reveal outcomes. But that means that the engine might not know the outcome of the move it eventually plays (in the source it has to average over all possible outcomes anyway, so there is no problem there), but would have to be informed by the GUI in response to its move. One way to do it would be to send it a drop move, like R@e6, when it moved an unrevealed piece to e6, and the GUI wants to tell it that piece now is a Rook. These would be out-of-turn moves, which should not alter whose turn it is. But they are easy to distingusish from real moves by the engine because they are notated as drop moves, in a game that does not have any drops. So the engine can then interpret all drop moves as board updates in the variant.

Daniel Shawul · Post by **Daniel Shawul** » Thu Jan 23, 2014 9:36 pm

In the end, when all peieces are revealed, this game turns into regular Xiangqi. But in the opening phase it is completely different. Then it has an aspect that is alien to most Chess variants, namely that the outcome of a move is not unique, but can result in a number of positions (with known probabilities).

As the players cannot exert any influence over these probabilities, this doesn't create a situation as complex as in "Chess 2". I think a normal alpha-beta search can be used for this game, with the modification that when a move is played with uncertain outcome, one would simply have to average over all possible outcomes (where the score of each of these outcomes must be determined by search). For the averaging to be allowed, the scores must be expressed as winning probabilities.

My first thought is a Markov Decision Process, in which players have partial controls over their move choices (which move to make) but not about the final state (only probabilities of the type of piece). Well if it is MDP, I am inclined to scream Monte Carlo

but I will resist myself from that and consider alpha-beta and other options. It seems that we only need to make a decision at the leaves. Before that unrevealed piece is just a normal piece that moves like the piece on whose square it was placed. Its value then should be the same as that piece, and revealing it can be considered as a promotion/de-promoiton depending on value gained/lost.

E.g. if one can capture an unrevealed piece with an Elephant, and the opponent still has 2 unreveiled Rooks, and single unrevealed Cannon, Adviser, and Pawn there are 4 possible outcomes, and the score of that capture would be 40%*score(R) + 20%*score(C) + 20%*score(A) + 20%*score(P). Obviously it would be much better to find you captures a Rook than a Pawn, and when you search the outcomes in MVV order it could very well be that 40%*score(R) + 20%score(C) would already be enough to score above beta even when score(A) and score(P) would be 0 (win probability, i.e. a certain loss), in which case you could do a beta cutoff even before caluclating the complete average.

If we do evaluation at terminal nodes, we don't have to worry about the value of an intermediate capture except for move ordering. For leaf evaluation, we just have to keep the pieces of the opponent that are unrevealed yet. Infact their total values is known but their placements are not! Maybe I misunderstood the rules. Pieces are revealed when we capture opponent pieces and presumably voluntary revealing in the hope that we get a better piece (promotion).

hgm · Post by **hgm** » Thu Jan 23, 2014 10:59 pm

Indeed, it seems that revealing your own pieces never involves any large evaluation change. It is a cigar from your own supply, as it were. There could of course be a mobility change when you move a hidden piece from a Pawn position, and it reveals as a Rook, or vice versa. This would be an argument to quickly reveal your pieces that mascarade as weak pieces, because chances are good that they turn into better pieces, so that you have a temporary advantage. OTOH, hanging on to pseudo-Rooks and pseudo-Horses, because they would likely reveal as a weaker piece when you move them, makes little sense, as in their original location, strong as they are, they are undeveloped, and thus next to useless.

As a first approximation I would not give any eval points for revealing own pieces. But of course capturing a revealed piece would earn you the value of the piece it was hiding, not that of the piece as which it would appear to be before its first move.

I think it is quite easy to adapt a normal Chess engine to playing this. Between MakeMove and UnMake you would need a doubly-nested loop that runs over all piece types, one for the mover, the other for the victim, and then probe if the 'unrevealed count' for that piece type is nonzero, and if so, decrement it, change the moved piece to the corresponding type (or add the value of the victim type to the material eval) and recurse. And then add the returned scores (and count) to calculate the average after the loop. Like

Code: Select all

// assumes all piece types have encoding with and without UNREVEALED bit set
MakeMove&#40;)
pieceType = ColoredType&#40;mover&#41;;
if&#40;mover & UNREVEALED&#41; pieceType = maxType&#91;stm&#93;;
nr = score = 0;
do &#123;
  if&#40;unreveiled&#91;pieceType&#93;--) &#123;
    board&#91;to&#93; = pieceType & ~UNREVEALED; // reveal
    victimType = ColoredType&#40;victim&#41;;
    if&#40;victim & UNREVEALED&#41; victimType = maxType&#91;xstm&#93;;
    do &#123;
      if&#40;unrevieled&#91;victimType&#93;--) &#123;
        int w = &#40;unreveiled&#91;victimType&#93;+1&#41;*&#40;unrevealed&#91;pieceType&#93;+1&#41;;
        score -= w*Search&#40;-beta, -alpha, -&#40;material + Value&#91;victimType&#93;, d-1&#41;;
        nr += w; // weighted average
      &#125;
      unrevealed&#91;victimType&#93;++;
    &#125; while&#40;--victimType & UNREVEALED&#41;;
  &#125;
  unreveiled&#91;pieceType&#93;++;
&#125; while&#40;--pieceType & UNREVEALED&#41;;
score = score/nr;
UnMake&#40;);

The kludge here is that the list with unrevealed counts for each piece types contains separate sections of entries for revealed and unrevealed piece types, so that initializing pieceType or victimType as a revealed type would execute the while loops just once for that type, (i.e. the piece stays itself), as this type would not have the UNREVEALED bit set in its encoding, and neither would its predecessor in the list. While for an unrevealed type it would be initialized to scan through the entire section of unrevealed types, and do something for every piece still in store. The reveiled types should be assigned high-enough unreveiled counts so that they would always be considered present, even if the count would be decreased for every ply in the search. (But 255 still seems enough.)

I am a bit worried about the need for a division to calculate the average. This could mean a significant slowdown.

Another concern is that averaging of scores is only meaningful when they represent win probabilities. So either the scores returned by Eval() should already be subjected to some saturating correction, or the correction should be made just before the averaging (and its inverse correction after it). Correcting the normal additive score by taking some arctan-like function of it would probably be good enough.

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 24, 2014 4:36 am

I am probably missing something here, but can't we just do normal alpha-beta to the leaves and then calculate a heuristic eval() that takes into consideration which pieces are not revealed yet and thus guess where they are. In the path from the root to the tips, it seems to me everything is the same. A capture reveals a hidden piece of the opponent, which we keep count of, and also the opponent could have deliberately revealed pieces, which again we keep count of. Then at the leaves the material score is the same whether ours or opponet pieces are hidden. The positional score however requires some guesses since we don't know which piece is where. That part of the eval requires we investigate every combination of hidden piece placments and we average. This is where all the stuff goes for me not after make/unmake since I am not conerned with incremental update of scores. Then we do backward induction to get the root score similar to standard alpha-beta. I have probably missed something though so I will read again..

hgm · Post by **hgm** » Fri Jan 24, 2014 9:47 am

The problem is that you can no longer pick the move with the best outcome, as the outcome cannot be controlled when the move involves a reveal. Otherwise you could just say: "I capture this unrevealed piece, and want it to be a Rook". But in reality, it might be revealed as a Pawn with a much larger probability, so that it is not attractive to capture it with a Horse if it was protected. HxR would then gain you a Rook for a Horse, but if the opponent's only unrevealed pieces were 1 Rook and 4 Pawns, 4 out of 5 times you would have played HxP, and lose a Horse for a Pawn. That would be a bad gamble, so you would not want to play the Hx? capture. But normal minimaxing would pick the move because the HxR outcome would top any other move score in the node.

So what you need in stead of

PositionScore = MAX{all moves} MoveScore(move)

(i.e. minimax), you now need

PositionScore = MAX{all moves} E(MoveScore(move, outcome))

where E() is the expectation value of the (stochastic) outcome of the move:

SUM{all outcomes} P(move, outcome)*MoveScore(move, outcome)

with P() the probability that the given 'move' has the given 'outcome'.

hgm · Post by **hgm** » Fri Jan 24, 2014 10:46 am

The pseudo-code I gave above is no good, though. The problem is again the search window. You can only average the score for the possible outcomes of a move when they are all the same bound type. The calculated expectation value will then also have that bound type.

In a PV node you would thus need an exact score for all the outcomes. But when capturing an unrevealed piece, the outcomes are likely to vary in a way very similar to the piece you capture. (There is no guarantee for this, however. Normally, finding that you captured a Cannon would be better than when the unrevealed captured piece turned out to be a Elephant. But suppose that these where the only two unrevealed pieces. Then when what I captured was a Cannon, the other must be an Elephant, and it could be that this Elephant is just in a position to fork my two Rooks, while a Cannon in that position would have been harmless. So when I am 'lucky' and hit the Cannon, I lose Rook vs Elephant on the next move, which is a bad deal, as R >> C+E. While if I was 'unlucky' and hit an Elephant, I would have won a clean Elephant.)

If R=10000 and P=100, with R+4P yet unrevealed, capturing a Rook normally would give you a score that is 900 better than capturing a Pawn. SO it would make sense to set the search window 900 higher when searching the outcome HxR than you would for searching HxP. Otherwise the in-windows score (which you need in a PV node) for the search of one outcome would almost certainly lead to an enormous fail-high or low for the other outcome. So for 'lucky' outcomes you want to shift the window up, while for 'unlucky' ones you should shift it down.

You will have to deal regularly with cases where one of the outcomes has an 'anomalous' score compared to the others, and will therefore get a wrong bound type compared to the guessed window.This will necessitate a re-search. I guess it must be learned by iterative deepening how to best set the windows.

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 24, 2014 1:06 pm

Oops I forgot that the reveal has a chance element to it and thus is an MDP. I guess both the eval() and internal nodes where reveals are made require to loop over hidden pieces. I think we still do min-maxing for the most part but not for reveal nodes at which we just average over the possible outcomes with some weighing. So the backward induction will be, if node is not reveal do min-max, otherwise take weighted average of results of the sub-trees.

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 24, 2014 1:26 pm

As a start I would search all the possible outcomes of a reveal with an open window. For capture reveals, it may be possible to do adjustments of search window based on the piece type that is captured as you suggested. But for voluntary reveals of our hidden pieces, windows should be the same. The gain from revealing will only be positional thus we cant confidently change the window size by a sufficient margin. If I reveal my piece to be a rook instead of a pawn, I will have some advantage now, but eventually I will reveal a pawn instead of a rook anyway.

hgm · Post by **hgm** » Fri Jan 24, 2014 2:48 pm

Indeed, that is correct. But opening the window for all outcomes of a reveal could be pretty expensive. I guess some 'aspiration' window could save a ton of work, here. The scores of the various outcomes in the previous iteration could be a guidance, (and if you don't have that, IID would provide it). E.g. if in the previous iteration this move was a cut-move, you could determine how much all the lower bounds from that iteration average above the current beta. And then subrtact that difference from all the bounds, to equally distribute the 'safety margin', and then hope for the best. (I.e. again all fail highs.) If one of the outcomes fails low, however, you have to widen the window (perhaps immediately to -INF), and re-search. If you then get an exact result only marginally below the previous score, it is conceivable that you could still get a cutoff with the average if you can get a sharper lower bound for one of the other outcomes, so you would have to re-search those with increased beta.

Daniel Shawul · Post by **Daniel Shawul** » Fri Jan 24, 2014 6:17 pm

I have question about the weights. Why does the mover's unrevealed count go to the weight formula? If I understand correctly , that would mean that we prefer to capture with a our revealed pieces that have counts of 255. If we take a simple example where the opponent have a hidden Rook and Pawn, and then we made a capture of one of them the probabilities are 0.5 for each, and scores +5 and +1, for an average of 3. With 2 hidden rooks and 1 pawn, the probabilties will be 2/3 and 1/3, which again does not depend on the mover's type.

I thought unreveiled[] was a type for unrevealed[] but I see it is not.

Interesting (Chinese) Chess variant

Interesting (Chinese) Chess variant

Re: Interesting (Chinese) Chess variant

Re: Interesting (Chinese) Chess variant

Re: Interesting (Chinese) Chess variant

Re: Interesting (Chinese) Chess variant

Re: Interesting (Chinese) Chess variant

Re: Interesting (Chinese) Chess variant

Re: Interesting (Chinese) Chess variant

Re: Interesting (Chinese) Chess variant

Re: Interesting (Chinese) Chess variant