NNUE Question - King Placements

Daniel Shawul · Post by **Daniel Shawul** » Thu Jul 01, 2021 11:19 pm

Sopel wrote: ↑Thu Jul 01, 2021 10:55 pm
syzygy wrote: ↑Sun Nov 01, 2020 3:00 pm
If you would force the king to be in a-d, the difference between flip and rotate disappears.

This sentence is a hidden gem. It sparked me to try and force each perspective in HalfKP to put the king in e..h files, mirroring the board for this perspective if the king is on the a..d files instead. Not only does this reduce the size of the network by a factor of 2 but is also, so far, the best way to cut the size of the net I have found - there is no visible impact on the strength and it even learns faster near the start. One could think that it may have disastrous effect because sometimes white's perspective is flipped while black's isn't, but miraculously this is not an issue at all!

blue = standard, orange = this stuff above

I have been using this since like forever

Horizontal mirroring halves the number of king-squares to 32, and then I go one step further via bucketing
to bring down the number of king squares to 16. This results in 4x less weights to train than Stockfish's with barely any strenght loss ...

AndrewGrant · Post by **AndrewGrant** » Thu Jul 01, 2021 11:41 pm

Sopel wrote: ↑Thu Jul 01, 2021 10:55 pm
syzygy wrote: ↑Sun Nov 01, 2020 3:00 pm
If you would force the king to be in a-d, the difference between flip and rotate disappears.

This sentence is a hidden gem. It sparked me to try and force each perspective in HalfKP to put the king in e..h files, mirroring the board for this perspective if the king is on the a..d files instead. Not only does this reduce the size of the network by a factor of 2 but is also, so far, the best way to cut the size of the net I have found - there is no visible impact on the strength and it even learns faster near the start. One could think that it may have disastrous effect because sometimes white's perspective is flipped while black's isn't, but miraculously this is not an issue at all!

blue = standard, orange = this stuff above

Potentially brilliant. Question arrises, and I suppose its a non issue. Implementation, this would make incremental updates tricky, but since king moves already reset the entire set of inputs ..., no problem there. I may try this into my trainer in some way. Cutting the network down is free speed, better training due to a pseudo-factorization (?).

Sopel · Post by **Sopel** » Thu Jul 01, 2021 11:57 pm

AndrewGrant wrote: ↑Thu Jul 01, 2021 11:41 pm
Potentially brilliant. Question arrises, and I suppose its a non issue. Implementation, this would make incremental updates tricky, but since king moves already reset the entire set of inputs ..., no problem there. I may try this into my trainer in some way. Cutting the network down is free speed, better training due to a pseudo-factorization (?).

the only way in which the perspective can become mirrored is when the king moves, and we refresh on each king move, so there is indeed no issue.
One reason why it might be showing faster learning near the start is that each weight is hit twice as often, potentially effectively increasing the LR. btw. this is how I implemented it for the test
player: https://github.com/Sopel97/Stockfish/tr ... riment_107
trainer: https://github.com/Sopel97/nnue-pytorch ... riment_107

Daniel Shawul wrote: ↑Thu Jul 01, 2021 11:19 pm I have been using this since like forever Horizontal mirroring halves the number of king-squares to 32, and then I go one step further via bucketing
to bring down the number of king squares to 16. This results in 4x less weights to train than Stockfish's with barely any strenght loss ...

Look like we arrived at the same thing

. Always good to have some independent verification.

AndrewGrant · Post by **AndrewGrant** » Fri Jul 02, 2021 12:31 am

Sopel wrote: ↑Thu Jul 01, 2021 11:57 pm
AndrewGrant wrote: ↑Thu Jul 01, 2021 11:41 pm
Potentially brilliant. Question arrises, and I suppose its a non issue. Implementation, this would make incremental updates tricky, but since king moves already reset the entire set of inputs ..., no problem there. I may try this into my trainer in some way. Cutting the network down is free speed, better training due to a pseudo-factorization (?).
the only way in which the perspective can become mirrored is when the king moves, and we refresh on each king move, so there is indeed no issue.
One reason why it might be showing faster learning near the start is that each weight is hit twice as often, potentially effectively increasing the LR. btw. this is how I implemented it for the test
player: https://github.com/Sopel97/Stockfish/tr ... riment_107
trainer: https://github.com/Sopel97/nnue-pytorch ... riment_107

Daniel Shawul wrote: ↑Thu Jul 01, 2021 11:19 pm I have been using this since like forever Horizontal mirroring halves the number of king-squares to 32, and then I go one step further via bucketing
to bring down the number of king squares to 16. This results in 4x less weights to train than Stockfish's with barely any strenght loss ...
Look like we arrived at the same thing . Always good to have some independent verification.

Do you see any speed gains in terms of NPS in the engine with smaller networks? Slightly more operations to compute the index, but few weights to juggle in memory/cache?

Sopel · Post by **Sopel** » Fri Jul 02, 2021 12:41 am

AndrewGrant wrote: ↑Fri Jul 02, 2021 12:31 am Do you see any speed gains in terms of NPS in the engine with smaller networks? Slightly more operations to compute the index, but few weights to juggle in memory/cache?

I haven't done any performance measurements, as it should stay about the same. The only possible speed improvements can come from 1. lower cache contention, 2. less refreshes in some positions (with the lower number of "king squares" as Daniel Shawul mentioned); but the effects are miniscule.

AndrewGrant · Post by **AndrewGrant** » Fri Jul 02, 2021 1:09 am

Sopel wrote: ↑Fri Jul 02, 2021 12:41 am
AndrewGrant wrote: ↑Fri Jul 02, 2021 12:31 am Do you see any speed gains in terms of NPS in the engine with smaller networks? Slightly more operations to compute the index, but few weights to juggle in memory/cache?
I haven't done any performance measurements, as it should stay about the same. The only possible speed improvements can come from 1. lower cache contention, 2. less refreshes in some positions (with the lower number of "king squares" as Daniel Shawul mentioned); but the effects are miniscule.

Less refreshes? I don't follow how that is possible. Movement from sq A -> B forces a refresh always, I would imagine? Even if A ends up being the same as B after the mirroring, you still need to refresh since now all pieces on the board need to updated to reflect the mirroring change?

Sopel · Post by **Sopel** » Fri Jul 02, 2021 11:51 am

AndrewGrant wrote: ↑Fri Jul 02, 2021 1:09 am
Sopel wrote: ↑Fri Jul 02, 2021 12:41 am
AndrewGrant wrote: ↑Fri Jul 02, 2021 12:31 am Do you see any speed gains in terms of NPS in the engine with smaller networks? Slightly more operations to compute the index, but few weights to juggle in memory/cache?
I haven't done any performance measurements, as it should stay about the same. The only possible speed improvements can come from 1. lower cache contention, 2. less refreshes in some positions (with the lower number of "king squares" as Daniel Shawul mentioned); but the effects are miniscule.
Less refreshes? I don't follow how that is possible. Movement from sq A -> B forces a refresh always, I would imagine? Even if A ends up being the same as B after the mirroring, you still need to refresh since now all pieces on the board need to updated to reflect the mirroring change?

I mean when the Daniel Shawul's "less king squares" is applied, where some some square clusteres share a king bucket. For example I used this without much loss in strength:

Code: Select all

KingBuckets = [ // order is A1 ... H1 ... H8
  24, 25, 26, 27, 28, 29, 30, 31,
  16, 17, 18, 19, 20, 21, 22, 23,
  12, 12, 13, 13, 14, 14, 15, 15,
   8,  8,  9,  9, 10, 10, 11, 11,
   4,  4,  5,  5,  6,  6,  7,  7,
   4,  4,  5,  5,  6,  6,  7,  7,
   0,  0,  1,  1,  2,  2,  3,  3,
   0,  0,  1,  1,  2,  2,  3,  3
];

chrisw · Post by **chrisw** » Fri Jul 02, 2021 3:07 pm

Daniel Shawul wrote: ↑Thu Jul 01, 2021 11:19 pm
Sopel wrote: ↑Thu Jul 01, 2021 10:55 pm
syzygy wrote: ↑Sun Nov 01, 2020 3:00 pm
If you would force the king to be in a-d, the difference between flip and rotate disappears.

This sentence is a hidden gem. It sparked me to try and force each perspective in HalfKP to put the king in e..h files, mirroring the board for this perspective if the king is on the a..d files instead. Not only does this reduce the size of the network by a factor of 2 but is also, so far, the best way to cut the size of the net I have found - there is no visible impact on the strength and it even learns faster near the start. One could think that it may have disastrous effect because sometimes white's perspective is flipped while black's isn't, but miraculously this is not an issue at all!

blue = standard, orange = this stuff above

I have been using this since like forever Horizontal mirroring halves the number of king-squares to 32, and then I go one step further via bucketing
to bring down the number of king squares to 16. This results in 4x less weights to train than Stockfish's with barely any strenght loss ...

Me too. Like many things if you think about it, you'll come up with same ideas and similar solutions. Buckets is going to be a standard thought; most of those king squares will hardly ever be touched, so not only is one wasting much 'space', one is also creating game space regions where the 'learning' will be partial and results from that space error prone. Problem seems to be how to distribute 'sparsity' around the base weights space in a chess-effective manner. I'm not at all sure that the borrowing of the king relative idea for sparsity from shogi was going to be the most effective encode of king-attack-safety concepts, although maybe - it's relatively efficient (ie rarely enough requires a full accumulator recompute), if it does in fact capture desirable king-factors.

chrisw · Post by **chrisw** » Fri Jul 02, 2021 3:14 pm

Sopel wrote: ↑Fri Jul 02, 2021 11:51 am
AndrewGrant wrote: ↑Fri Jul 02, 2021 1:09 am
Sopel wrote: ↑Fri Jul 02, 2021 12:41 am
AndrewGrant wrote: ↑Fri Jul 02, 2021 12:31 am Do you see any speed gains in terms of NPS in the engine with smaller networks? Slightly more operations to compute the index, but few weights to juggle in memory/cache?
I haven't done any performance measurements, as it should stay about the same. The only possible speed improvements can come from 1. lower cache contention, 2. less refreshes in some positions (with the lower number of "king squares" as Daniel Shawul mentioned); but the effects are miniscule.
Less refreshes? I don't follow how that is possible. Movement from sq A -> B forces a refresh always, I would imagine? Even if A ends up being the same as B after the mirroring, you still need to refresh since now all pieces on the board need to updated to reflect the mirroring change?
I mean when the Daniel Shawul's "less king squares" is applied, where some some square clusteres share a king bucket. For example I used this without much loss in strength:
Code: Select all
KingBuckets = [ // order is A1 ... H1 ... H8
  24, 25, 26, 27, 28, 29, 30, 31,
  16, 17, 18, 19, 20, 21, 22, 23,
  12, 12, 13, 13, 14, 14, 15, 15,
   8,  8,  9,  9, 10, 10, 11, 11,
   4,  4,  5,  5,  6,  6,  7,  7,
   4,  4,  5,  5,  6,  6,  7,  7,
   0,  0,  1,  1,  2,  2,  3,  3,
   0,  0,  1,  1,  2,  2,  3,  3
];

I read you also initialise all weights to zero. Coupled with the bog-standard factoriser, this is also going to help (a lot) for the weight space of, say, wK=a8 which I doubt learns anything very much. ie the factoriser doesn't just help generalisation, but coupled with zero initialisation it prevents a lot of stupidity.

NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements

Re: NNUE Question - King Placements