hgm wrote: ↑Sat Oct 24, 2020 2:56 pm
And I am still confused about what the two halves are for. We are talking about white and black, but I thought originally it was stm and opponent. The latter would require quite a bit more complex handling, as basically you would have to flip the board each ply, making the inputs completely different. So that incremental updating only makes sense if you have two 'leap-frog' accumulators that remember the valuse from two ply ago, when the inputs were still somewhat similar. And there is no symmetry to be expected in that case, as I pointed out before.
The incremental part works on white and black halves. No need for leap-frogging.
The flip takes place when converting the incrementally updated accumulator to the input values for the first hidden layer. If white is to move, the white half of the accumulator gives coefficients 0-255. If black is to move, the black half of the accumulator gives coefficients 0-255.
It also seems a big mistake to have only 10 piece types; there is not a single table now that has both Kings in them.
How many "big mistakes" have you made that work so incredibly well as NNUE in its current implementation?
No wonder it is so poor at end-games, as some people report.
I've seen just one post that claims that, and that seemed to have been based (as usual) on a single position and probably on a single pair of play-outs from that position.
I've seen another post reporting that NNUE improves endplay considerably more than TBs do. (And then NNUE+TBs turned out to do even better still, which is not surprising as they are orthogonal concepts.)
I've also seen SF-NNUE play games and from what I have seen I can only conclude that it knows very well which endgames are won and which are drawn or lost.
It is clear that the input layer fails to capture a lot of obviously important chess knowledge, but apparently the two hidden layers make up for that very well.
The relative King location is often crucial there. It should be able to do so much better if there also were 64x64 Kk inputs. That seems much more important than having 256 rather than, say 240 cells in the second layer. That is just more of the same.
It would be surprising if the current implementation could not be improved.
It seems adding Kk inputs basically means making the 256-element biases vector dependent on the positions of the two kings. (But from a learning perspective this might not be a useful way to look at things.)