Rollout engine

hgm · Post by **hgm** » Thu Jul 14, 2022 9:38 pm

I am thinking about how to make a fast 'engine' for generating rollouts in MCTS, or generating the moves in the tree phase with a policy weight that can be used to bias the move choice towards 'less crazy' moves. A natural idea is to use an algorithm for this similar to N.E.G., which bases its move choice on the SEE approximation of every move (picking the one with the highest SEE), but also taking account of solving a SEE threat by moving the piece away. Because N.E.G. only has to do this in a single node, speed is o f no concern there, and the algorithm is a bit cumbersome. For MCTS you would of course want something as fast as possible.

Since it would probably spend most time on calculating SEE for the target square of all moves and all on its own pieces, this is where the most can be gained. I was therefore thinking to do it something like this:

Generate moves for both sides, and for every board square add the number of attackers it has with weights K=1, Q=2, R=4, minor=12, P=48. That way it gives a unique signature for the composition of the set of attackers, which can run from 0 to 143 (for each color). The next step would be to use this as index in a lookup table that containes a code uniquely representing the three lowest-valued attackers of the square. The idea is that exchanges deeper than 6 ply are too rare to matter, and even when they would occur the SEE would be meaningless, as they would have too many side effects (like soft-pins, overloaded pieces).

If I counted right there are 5 possibilities for 1 attacker (note we never distinguish minors), 13 with 2 attackers and 23 with 3 attackers. And of course 1 possibility for no attackers at all. Together that makes 42. But as far as the protectors are conserned, the identity of the 3rd one does not really matter when we don't consider a 4th attacker. So the 23 possibilities for 3 protectors can be collapsed into 9 possibilities that do not distinguish the third attacker (but just record whether there is one). That makes 28 possible combinations for the defenders.

The codes for these combinations can be used in a 42x28 table (size 1176 bytes), which contains how much the attacker loses when he tries to access the square with the lowest attacker. This would have to be subtracted from the piece that was on the square to get the SEE for capturing that piece. This would be the method for detecting threats against the pieces of the side to move, and moves that solve such threats should be awarded. Moving the threatened piece away would be one way to solve it.

Capturing or blocking an attacker might be another way. This is more tricky, though, because it is not always the case. When our (multiply protected) Bishop is attacked by Rook + Pawn, taking the Rook out of the picture will not save the Bishop. If there is only a single attacker neutralizing that would certainly help, though. And this might be the most common case. We could handle that by giving a bonus equal to the value we would lose by the threat to the move that captures the attacker, or to any square between attacker and threatened piece.

For multiple attackers we would have to calculate how the SEE changes by neutralizing each of the attackers. To this end we could have 3 more 42x28 tables, each listing the SEE with one of the attacking pieces missing. And test which of those would reduce the threat. Capture or blocking of such attackers could then be similarly awarded.

We tacitly assumed here that it would be possible to know which piece the attacker was. This is easily achieved, though: in addition to the 'attacker summary' that runs from 0-143, we could create an attack map in which each bit corresponds to an attacking piece (rather than piece type!). That takes 16 bits, and thus can be fit together with the summary in a single 32-bit int. By assigning the bits in order of increasing piece value, we can use bit-extraction techniques to get the piece numbers of the lowest few attackers. And then through a piece list their location on the board.

For the offensive SEE we have another problem, namely that we also want to know what would happen if we move another piece than the lowest-value attacker to the target square in the first move. Again this could be pre-calculated and listed in a few extra 42x28 tables. When examining squares for scoring the moves to those we could extract the attackers from the attack map in order of increasing value, and look up the SEE for when we move that piece first. At that point we could also add the bonus for occupying its target square, for the threats that solves.

dangi12012 · Post by **dangi12012** » Thu Jul 14, 2022 10:17 pm

Keep in mind that MCTS requires "noisy" rollouts. So you have to introduce some randomness around the evalution (noise around the eval value so the second/third best moves are played too sometimes)
It is a very good idea to have an insanely fast engine at the rollout phase of MCTS (IE the leaves when they are first visited).

Something that keeps a +3 winning position winning (most of the time) even against the strongest of engines with the least amount of cpu cycles even with noisy moves.

hgm · Post by **hgm** » Thu Jul 14, 2022 10:26 pm

Another thought;

Since MCTS does not really use takebacks, incremental update of the attack map is twice as attractive as in a normal search. Instead of generating all moves of all pieces to add their attacker code to the attack map, you would only generate the moves of the moved piece in its new location to add, and the move in the old location to subtract them. So only for 2 pieces, rather than 16. In addition you might have to block and unblock some sliders that were hitting the from- and to-square. But since the attack map elements for those squares already tell you which sliders hit them, you can easily add or subtract their attacker codes at the downstream squares. I once measured that the number of sliders that you unblock is on average less than one, (and on captures you won't block anything that wasn't already blocked), and you then only have to retrace part of one of its moves.

You could even keep track of a dummy Rook and Bishop at the King location (giving each a separate bit in the attack map) so that it also keeps track of the rays that connect unobstructed to the King. This makes it easy to include checking into the scoring of the move. (N.E.G. is slightly biased towards checking, but then biased at least as much against moving the same piece twice in a row, in order to prevent that all won end-games end in perpetual checks. When you can alternately check with two pieces, you usually quickly reach checkmate.)

For each player you could use a bitboard to keep track of the squares that are attacked by them. That would make it easy to loop through such squares, for generating and scoring the moves to those (reverse move generation). If the rollouts would select the move with the best score you would not even have to store the moves in a list; you would just keep track of what the highest score so far was, and the move that had it. Even if you want some randomization there you could add a properly distributed random number to all scores before determining the maximum.

smatovic · Post by **smatovic** » Thu Jul 14, 2022 10:36 pm

SEE as a predicator in MCTS makes sense to me. Seems several people tried to come up with a SEE/move ordering neural network, dunno how this compares in a knowledge vs. speed metric.

--
Srdja

hgm · Post by **hgm** » Thu Jul 14, 2022 11:02 pm

I believe that neural networks are in general very slow solutions compared to a dedicated calculation of the same objective. Note that what I described here as SEE of course could be made into an arbitrarily tunable function, by making the values in the 42x28 tables tunable quantities rather than pre-calculated SEE values. It is after all not obvious that you should always go for the move with the highest SEE. You might want to go for the non-negative-SEE move with the most-valuable victim, assuming that there is no hurry for reaping the higher award from a SEE on a lesser piece, as you will get to that later.

jhaglund2 · Post by **jhaglund2** » Fri Jul 15, 2022 11:38 am

I think there should be a debug/data mode, it would be slower, but it puts every game in table arrays. As it pumps out the games, so they can be writable as a PGN file after so many have completed.

hgm · Post by **hgm** » Fri Jul 15, 2022 3:13 pm

A shortcoming of the proposed SEE is that it doesn't take account of X-ray attacks. This is probably far more important than very deep SEE. This can be remedied by also recording X-ray attacks in the attack map. There are only 5 sliders to make such attacks, and we could dedicate 5 bits in the attack-map elements for recording the X-rays. Slider move generation would then not automatically stop at the first obstacle, but continue in the same direction if the blocking piece is a friend moving in the same direction. For Pawns only one step along a diagonal, for sliders until the next obstacle. Using the spare flag for the blocked piece.

Generation of moves to the square would then only use the 16 flags for 'direct' attackers. But SEE would be aware that some of the attackers are backed up by more material.

Whether a piece makes a direct or an X-ray attack only affects SEE when a lower-valued piece X-rays a higher-valued one. This can happpen only when B or R X-ray the Queen. The attack summary should distinguish these cases from a direct attack. This can be done by not just having two 'Queen states' (no Queens or one), but four. When K = 1, we can encode those as Q = 2, B-behind-Q = 4 and R-behind-Q = 6. And then R = 8, B = N = 24, P = 96.

Note that there is no separate code for a Queen backed by two Rooks. For encoding this situation we use the R-behind-Q code in combination with 2 Rooks. This combination cannot occur in its original meaning, as there are only 2 Rooks. So we take it to mean 2 Rooks behind a Queen. The contribution of Such a 'frustrated battery' to the summary is than 8 + 8 + 6 = 22. This encoding would make the summary potentially run up to 287.

It seems logical to count the batteries as the number of pieces in them. So when only considering the first 3 attackers the B-behind-Q and R-behind-Q would only appear on their own or in combination with a single other piece (which could not be Queen). And the 2R-behind-Q would already have 3 pieces all by itself. That makes 6 new combinations of 3 attackers, raising the number from 42 to 48. If the frustrated batteries would occur in combination with lower-valued direct attacker, so that the piece backing the Queen would not belong to the first 3, they could simply be treated like a normal Queen.

For the protectors the type of the 3rd piece in the exchange does not matter, and if it would be behind a Queen, this is equivalent to having an extra direct King protector (which would also be used only after the Queen). This only doesn't help if there already was a direct King attacker. So K + B-behind-Q, K + R-behind-Q are essentially new protector combinations, the latter being equivalent to 2R-behind-Q. The frustrated batteries just by themselves are also new. That drives the number of distinguishable 0-3 protector combinations up from 28 to 32. We would thus need 48 x 32 tables (1536 bytes) for looking up the SEE value. Still only a very modest table. We could even consider correct handling of a super-numerary Queen.

The lowest 9 bits in the attack-map element for a given square could contain the summary; 21 more bits are needed for indicating the attacks for the individual pieces, 16 for direct attacks and 5 for X-ray slider attacks. The remaining 2 bits in a 32-bit int could be used for indicating diagonal and orthogonal rays connecting to the King.

Rollout engine

Rollout engine

Re: Rollout engine

Re: Rollout engine

Re: Rollout engine

Re: Rollout engine

Re: Rollout engine

Re: Rollout engine