mcostalba wrote:One of the most obscure parts is the encoding, where you can find someghting like this:
Code: Select all
i = pos[1] > pos[0];
int j = (pos[2] > pos[0]) + (pos[2] > pos[1]);
if (OffdiagA1H8[pos[0]])
idx = Triangle[pos[0]] * 63*62 + (pos[1] - i) * 62 + (pos[2] - j);
else if (OffdiagA1H8[pos[1]])
idx = 6*63*62 + Diag[pos[0]] * 28*62 + Lower[pos[1]] * 62 + pos[2] - j;
else if (OffdiagA1H8[pos[2]])
idx = 6*63*62 + 4*28*62 + (Diag[pos[0]]) * 7*28 + (Diag[pos[1]] - i) * 28 + Lower[pos[2]];
else
idx = 6*63*62 + 4*28*62 + 4*7*28 + (Diag[pos[0]] * 7*6) + (Diag[pos[1]] - i) * 6 + (Diag[pos[2]] - j);
There is some documentation available that explains what the encoding does? It would be much easier to follow the code if you have an idea of what you are looking at...
The encoding function maps a position to its index into the table.
Suppose we have KRvK. Let's say the pieces are on square numbers wK, wR and bK (each 0...63).
The simplest way to map this position to an index is like this:
Code: Select all
index = wK * 64*64 + wR * 64 + bK;
But this way the TB is going to be an array of 64*64*64 = 262144 positions, with lots of positions being equivalent (because they are mirrors of each other) and lots of positions being invalid (two pieces on one square, adjacent kings, etc.).
Usually the first step is to take the wK and bK together. There are just 462 ways to place the wK and bK on the board if we discard mirror positions and illegal positions (adjacent kings). These are positions with wK in the a1-d1-d4 triangle and bK on a non-adjacent square. If wK is on the a1-d4 half-diagonal, then bK can be forced to be on or below the a1-h8 diagonal. See the KK_idx[10][64] array.
Once we have placed the wK and bK, there are 62 squares left for the wR. That gives 462 * 62 = 28644 in total. Mapping the value of wR from 0...63 to 0...61 can be done like this:
Code: Select all
wR -= (wR > wK) + (wR > bK);
In words: if wR "comes later" than wR, we deduct 1, and the same if wR "comes later" than bK.
Example: wK = 11 (d2), bK = 32 (a4), then wR = 63 is mapped to 61 (we skip 11 and 32), wR = 30 is mapped to 29 (we skip 11), and wR = 8 is mapped to 8 (nothing to skip).
We can still improve on 28644. If wK and bK are on the a1h8-diagonal, we can force the wR to be on or below the a1h8-diagonal to get a total of 28056 positions (I think).
So this is for 3 pieces. The extension to 4, 5 and 6 pieces is not all too difficult, but we get new complications once we have like pieces of the same colour. For example KRRvK. We don't want to have one index value for R1 on a1, R2 on b1 and another index value for R1 on b1, R2 on a1. What we do is place the two Rs "together". If we have 62 squares left, we can place two Rs "together" in 62*63/2 ways. If we have 3 Rs, it is 62*63*64/6 ways. If we have 4 Rs, it is 62*63*64*65/24 ways.
So encode_piece() works like this:
First, the leading piece is mirrored to the a1-d1-d4 triangle (and if the first k pieces are on the a1h8-diagonal, piece k+1 is mapped to below that diagonal)
Second, the positions of the 2 or 3 leading pieces are converted into a single number. There are three cases.
In the "K2" case, an index for the two kings is calcuted. Index calculation then continues from the 3rd piece ("i = 2").
In the "K3" case, an index for the two kings and one further piece (e.g. R) is calculated. If the two kings are on the diagonal (KK_idx >= 441), we take special care of the R (as discussed above). Index calculation continues from the 4th piece ("i = 3").
In the "111" case, an index is calculated for three "different" pieces that only occur once (i.e. different type and/or color) which may include kings.
In init_tb() the relevant case is determined for each TB (this mirrors how the generator picked the encoding type):
Code: Select all
for (i = 0, j = 0; i < 16; i++)
if (pcs[i] == 1) j++;
if (j >= 3) ptr->enc_type = 0;
else if (j == 2) ptr->enc_type = 2;
else { /* only for suicide */
j = 16;
for (i = 0; i < 16; i++) {
if (pcs[i] < j && pcs[i] > 1) j = pcs[i];
ptr->enc_type = ubyte(1 + j);
}
}
- the usual case is "111" (enc_type == 0). If we have three different piece types (j >= 3), each occuring only once, we are in this case. We always have at least two of these: the white king and the black king. So we are in this case unless we have something like KRRvK, KRRvKBB, KRRRvK, KRRBBvK, KNNNNvK.
- otherwise we are in case "K2" (enc_type == 2).
The "only for suicide" case cannot happen: white king and black king ensure j >= 2. So the code can be cleaned up a bit.
This also means that the "K3" case seems never to occur and could probably be removed. A bit surprising that it is still there.
(In principle, "111" does not reduce the index space as much as "K3". However, "111" gives the generator more freedom to reorder pieces and that freedom pays off. The order in which pieces are indexed has a big impact on compression efficiency, so the more possibilities for reordering, the better the compression. Also, what the increase in index space that "111" gives compared to "K3" consists of broken positions with adjecent kings, and such positions anyway compress very well (as don't cares).)
I guess I initially thought the "K3" case was useful to have, but then found out that compression is always better if we use "111".
After the switch(), we place groups of like pieces (RR, RRR, RRRR) together. The norm[] array tells us how many like pieces we have. Their positions are sorted and then mapped to an index using a formula involving binomials. The "j += (p > pos[l]);" is there to skip positions that are occupied by pieces we have treated earlier.
The factor[] values have to do with the "best" ordering found by the generator.
Similar story for encode_pawn()
.
You should see the special case routine for indexing KKKvNN for suicide chess for the previous iteration of my suicide TB generator! The two current generic encoding functions save thousands of lines over what I previously had. (Plus they lead to better compression, because of the flexibility in ordering pieces.)