NNUE Question - King Placements

syzygy · Post by **syzygy** » Fri Oct 23, 2020 11:51 pm

The accumulator has a "white king" half and a "black king" half, where each half is a 256-element vector of 16-bit ints, which is equal to the sum of the weights of the "active" (pt, sq, ksq) features.

The "transform" step of the NNUE evaluation forms a 512-element vector of 8-bit ints where the first half is formed from the 256-element vector of the side to move and the second half is formed from the 256-element vector of the other side. In this step the 16-bit elements are clipped/clamped to a value from 0 to 127. This is the output of the input layer.

This 512-element vector of 8-bit ints is then multiplied by a 32x512 matrix of 8-bit weights to get a 32-element vector of 32-bit ints, to which a vector of 32-bit biases is added. The sum vector is divided by 64 and clipped/clamped to a 32-element vector of 8-bit ints from 0 to 127. This is the output of the first hidden layer.

The resulting 32-element vector of 8-bit ints is multiplied by a 32x32 matrix of 8-bit weights to get a 32-element vector of 32-bit ints, to which another vector of 32-bit biases is added. These ints are again divided by 64 and clipped/clamped to 32 8-bit ints from 0 to 127. This is the output of the second hidden layer.

This 32-element vector of 8-bits ints is then multiplied by a 1x32 matrix of 8-bit weights (i.e. the inner product of two vectors is taken). This produces a 32-bit value to which a 32-bit bias is added. This gives the output of the output layer.

The output of the output layer is divided by FV_SCALE = 16 to produce the NNUE evaluation. SF's evaluation then take some further steps such as adding a Tempo bonus (even though the NNUE evaluation inherently already takes into account the side to move in the "transform" step) and scaling the evaluation towards zero as rule50_count() approaches 50 moves.

syzygy · Post by **syzygy** » Sat Oct 24, 2020 12:02 am

syzygy wrote: ↑Fri Oct 23, 2020 10:58 pm The weight are shared in that the "white-king weight" for (pt, sq, wksq) equals the "black-king weight" for (pt ^ 8, sq ^ 63, bksq) where pt ^ 8 flips the piece's color and sq ^ 63 rotates the board by 180 degrees.

So this should ("obviously"

) be (pt, sq, wksq) <-> (pt ^ 8, sq ^ 63, bksq ^ 63).

So the current network basically assumes chess to be horizontally symmetrical (sq -> sq ^ 0x07). This means that in principle the king could always be mapped to the a-d files, and only half the weights are needed. I have no idea to what extent the actual weights comply with this assumed symmetry (I don't think the trainer takes the symmetry into account but I have not really had a look a the trainer so far, and I could easily be wrong).

Gerd Isenberg · Post by **Gerd Isenberg** » Sat Oct 24, 2020 1:19 am

Becomes clearer now, did not study the transformer yet. Found FeatureTransformer::Transform and UpdateAccumulator.
In particular

Code: Select all

void Transform(const Position& pos, OutputType* output) const {
  UpdateAccumulator(pos, WHITE);
  UpdateAccumulator(pos, BLACK);
  ...

Found orient(perspective, s) and kpp_board_index aka xor 8 by table lookup in MakeIndex of half_kp.cpp.

syzygy · Post by **syzygy** » Sat Oct 24, 2020 2:35 am

Gerd Isenberg wrote: ↑Sat Oct 24, 2020 1:19 am Becomes clearer now, did not study the transformer yet. Found FeatureTransformer::Transform and UpdateAccumulator.
In particular
Code: Select all
void Transform(const Position& pos, OutputType* output) const {
  UpdateAccumulator(pos, WHITE);
  UpdateAccumulator(pos, BLACK);
  ...
Found orient(perspective, s) and kpp_board_index aka xor 8 by table lookup in MakeIndex of half_kp.cpp.

The updating code is a bit complicated due to the attempts to do it incrementally. To understand its effect, just look at the "Refresh the accumulator" part:

Code: Select all

        // Refresh the accumulator
        auto& accumulator = pos.state()->accumulator;
        accumulator.state[c] = COMPUTED;
        Features::IndexList active;
        Features::HalfKP<Features::Side::kFriend>::AppendActiveIndices(pos, c, &active);
        std::memcpy(accumulator.accumulation[c][0], biases_,
            kHalfDimensions * sizeof(BiasType));

        for (const auto index : active)
        {
          const IndexType offset = kHalfDimensions * index;

          for (IndexType j = 0; j < kHalfDimensions; ++j)
            accumulator.accumulation[c][0][j] += weights_[offset + j];
        }

AppendActiveIndices loops through the bitboard of all pieces(+pawns) minus the kings and adds an index for each piece to the "active" list.

syzygy · Post by **syzygy** » Sat Oct 24, 2020 2:46 am

syzygy wrote: ↑Fri Oct 23, 2020 11:51 pm The accumulator has a "white king" half and a "black king" half, where each half is a 256-element vector of 16-bit ints, which is equal to the sum of the weights of the "active" (pt, sq, ksq) features.

Equal to that sum plus a 256-element vector of 16-bit biases.

hgm · Post by **hgm** » Sat Oct 24, 2020 2:56 pm

So the images are still wrong: they state there are 41K inputs, while in fact there are 2 x 41K inputs.

And I am still confused about what the two halves are for. We are talking about white and black, but I thought originally it was stm and opponent. The latter would require quite a bit more complex handling, as basically you would have to flip the board each ply, making the inputs completely different. So that incremental updating only makes sense if you have two 'leap-frog' accumulators that remember the valuse from two ply ago, when the inputs were still somewhat similar. And there is no symmetry to be expected in that case, as I pointed out before.

It also seems a big mistake to have only 10 piece types; there is not a single table now that has both Kings in them. No wonder it is so poor at end-games, as some people report. The relative King location is often crucial there. It should be able to do so much better if there also were 64x64 Kk inputs. That seems much more important than having 256 rather than, say 240 cells in the second layer. That is just more of the same.

D Sceviour · Post by **D Sceviour** » Sat Oct 24, 2020 5:20 pm

hgm wrote: ↑Sat Oct 24, 2020 2:56 pm And I am still confused about what the two halves are for. We are talking about white and black, but I thought originally it was stm and opponent. The latter would require quite a bit more complex handling, as basically you would have to flip the board each ply, making the inputs completely different.

That is what I found. The flipped position produces a different score for NNUE evaluations. I think the probing code actually rotates the board 180 degrees (sq ^ 63) which would produce even more unusual results. It is a miracle NNUE can play at all. It looks like a lot of code re-write is still needed since the original port to SF.

syzygy · Post by **syzygy** » Sat Oct 24, 2020 5:20 pm

hgm wrote: ↑Sat Oct 24, 2020 2:56 pm And I am still confused about what the two halves are for. We are talking about white and black, but I thought originally it was stm and opponent. The latter would require quite a bit more complex handling, as basically you would have to flip the board each ply, making the inputs completely different. So that incremental updating only makes sense if you have two 'leap-frog' accumulators that remember the valuse from two ply ago, when the inputs were still somewhat similar. And there is no symmetry to be expected in that case, as I pointed out before.

The incremental part works on white and black halves. No need for leap-frogging.

The flip takes place when converting the incrementally updated accumulator to the input values for the first hidden layer. If white is to move, the white half of the accumulator gives coefficients 0-255. If black is to move, the black half of the accumulator gives coefficients 0-255.

It also seems a big mistake to have only 10 piece types; there is not a single table now that has both Kings in them.

How many "big mistakes" have you made that work so incredibly well as NNUE in its current implementation?

No wonder it is so poor at end-games, as some people report.

I've seen just one post that claims that, and that seemed to have been based (as usual) on a single position and probably on a single pair of play-outs from that position.

I've seen another post reporting that NNUE improves endplay considerably more than TBs do. (And then NNUE+TBs turned out to do even better still, which is not surprising as they are orthogonal concepts.)

I've also seen SF-NNUE play games and from what I have seen I can only conclude that it knows very well which endgames are won and which are drawn or lost.

It is clear that the input layer fails to capture a lot of obviously important chess knowledge, but apparently the two hidden layers make up for that very well.

The relative King location is often crucial there. It should be able to do so much better if there also were 64x64 Kk inputs. That seems much more important than having 256 rather than, say 240 cells in the second layer. That is just more of the same.

It would be surprising if the current implementation could not be improved.

It seems adding Kk inputs basically means making the 256-element biases vector dependent on the positions of the two kings. (But from a learning perspective this might not be a useful way to look at things.)

syzygy · Post by **syzygy** » Sat Oct 24, 2020 5:24 pm

D Sceviour wrote: ↑Sat Oct 24, 2020 5:20 pm
hgm wrote: ↑Sat Oct 24, 2020 2:56 pm And I am still confused about what the two halves are for. We are talking about white and black, but I thought originally it was stm and opponent. The latter would require quite a bit more complex handling, as basically you would have to flip the board each ply, making the inputs completely different.
That is what I found. The flipped position produces a different score for NNUE evaluations. I think the probing code actually rotates the board 180 degrees (sq ^ 63) which would produce even more unusual results. It is a miracle NNUE can play at all. It looks like a lot of code re-write is still needed since the original port to SF.

Just like everybody is an immunologist nowadays, everybody is a machine-learning expert, too.

The sq ^ 63 has long been known (and was mentioned above). I suppose it will be removed once a net has been trained on sq ^ 56 that outperforms the current sq ^ 63 nets.

D Sceviour · Post by **D Sceviour** » Sat Oct 24, 2020 5:37 pm

syzygy wrote: ↑Sat Oct 24, 2020 5:24 pm
D Sceviour wrote: ↑Sat Oct 24, 2020 5:20 pm
hgm wrote: ↑Sat Oct 24, 2020 2:56 pm And I am still confused about what the two halves are for. We are talking about white and black, but I thought originally it was stm and opponent. The latter would require quite a bit more complex handling, as basically you would have to flip the board each ply, making the inputs completely different.
That is what I found. The flipped position produces a different score for NNUE evaluations. I think the probing code actually rotates the board 180 degrees (sq ^ 63) which would produce even more unusual results. It is a miracle NNUE can play at all. It looks like a lot of code re-write is still needed since the original port to SF.
Just like everybody is an immunologist nowadays, everybody is a machine-learning expert, too.

The sq ^ 63 has long been known (and was mentioned above). I suppose it will be removed once a net has been trained on sq ^ 56 that outperforms the current sq ^ 63 nets.

What is the purpose of flipping (or rotating or transposing) the board for every evaluation? It would be a lot simpler to invert (sq ^ 56) each piece during the makemove() updates, as per a different post and code suggestion made by HGM.

NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements