NNUE Question - King Placements

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Daniel Shawul
Posts: 4185
Joined: Tue Mar 14, 2006 11:34 am
Location: Ethiopia

Re: NNUE Question - King Placements

Post by Daniel Shawul »

Sopel wrote: Thu Jul 01, 2021 10:55 pm
syzygy wrote: Sun Nov 01, 2020 3:00 pm
If you would force the king to be in a-d, the difference between flip and rotate disappears.
This sentence is a hidden gem. It sparked me to try and force each perspective in HalfKP to put the king in e..h files, mirroring the board for this perspective if the king is on the a..d files instead. Not only does this reduce the size of the network by a factor of 2 but is also, so far, the best way to cut the size of the net I have found - there is no visible impact on the strength and it even learns faster near the start. One could think that it may have disastrous effect because sometimes white's perspective is flipped while black's isn't, but miraculously this is not an issue at all!

blue = standard, orange = this stuff above
Image
I have been using this since like forever :) Horizontal mirroring halves the number of king-squares to 32, and then I go one step further via bucketing
to bring down the number of king squares to 16. This results in 4x less weights to train than Stockfish's with barely any strenght loss ...
AndrewGrant
Posts: 1766
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: NNUE Question - King Placements

Post by AndrewGrant »

Sopel wrote: Thu Jul 01, 2021 10:55 pm
syzygy wrote: Sun Nov 01, 2020 3:00 pm
If you would force the king to be in a-d, the difference between flip and rotate disappears.
This sentence is a hidden gem. It sparked me to try and force each perspective in HalfKP to put the king in e..h files, mirroring the board for this perspective if the king is on the a..d files instead. Not only does this reduce the size of the network by a factor of 2 but is also, so far, the best way to cut the size of the net I have found - there is no visible impact on the strength and it even learns faster near the start. One could think that it may have disastrous effect because sometimes white's perspective is flipped while black's isn't, but miraculously this is not an issue at all!

blue = standard, orange = this stuff above
Image
Potentially brilliant. Question arrises, and I suppose its a non issue. Implementation, this would make incremental updates tricky, but since king moves already reset the entire set of inputs ..., no problem there. I may try this into my trainer in some way. Cutting the network down is free speed, better training due to a pseudo-factorization (?).
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
Sopel
Posts: 389
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: NNUE Question - King Placements

Post by Sopel »

AndrewGrant wrote: Thu Jul 01, 2021 11:41 pm
Potentially brilliant. Question arrises, and I suppose its a non issue. Implementation, this would make incremental updates tricky, but since king moves already reset the entire set of inputs ..., no problem there. I may try this into my trainer in some way. Cutting the network down is free speed, better training due to a pseudo-factorization (?).
the only way in which the perspective can become mirrored is when the king moves, and we refresh on each king move, so there is indeed no issue.
One reason why it might be showing faster learning near the start is that each weight is hit twice as often, potentially effectively increasing the LR. btw. this is how I implemented it for the test
player: https://github.com/Sopel97/Stockfish/tr ... riment_107
trainer: https://github.com/Sopel97/nnue-pytorch ... riment_107
Daniel Shawul wrote: Thu Jul 01, 2021 11:19 pm I have been using this since like forever :) Horizontal mirroring halves the number of king-squares to 32, and then I go one step further via bucketing
to bring down the number of king squares to 16. This results in 4x less weights to train than Stockfish's with barely any strenght loss ...
Look like we arrived at the same thing :). Always good to have some independent verification.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
AndrewGrant
Posts: 1766
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: NNUE Question - King Placements

Post by AndrewGrant »

Sopel wrote: Thu Jul 01, 2021 11:57 pm
AndrewGrant wrote: Thu Jul 01, 2021 11:41 pm
Potentially brilliant. Question arrises, and I suppose its a non issue. Implementation, this would make incremental updates tricky, but since king moves already reset the entire set of inputs ..., no problem there. I may try this into my trainer in some way. Cutting the network down is free speed, better training due to a pseudo-factorization (?).
the only way in which the perspective can become mirrored is when the king moves, and we refresh on each king move, so there is indeed no issue.
One reason why it might be showing faster learning near the start is that each weight is hit twice as often, potentially effectively increasing the LR. btw. this is how I implemented it for the test
player: https://github.com/Sopel97/Stockfish/tr ... riment_107
trainer: https://github.com/Sopel97/nnue-pytorch ... riment_107
Daniel Shawul wrote: Thu Jul 01, 2021 11:19 pm I have been using this since like forever :) Horizontal mirroring halves the number of king-squares to 32, and then I go one step further via bucketing
to bring down the number of king squares to 16. This results in 4x less weights to train than Stockfish's with barely any strenght loss ...
Look like we arrived at the same thing :). Always good to have some independent verification.
Do you see any speed gains in terms of NPS in the engine with smaller networks? Slightly more operations to compute the index, but few weights to juggle in memory/cache?
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
Sopel
Posts: 389
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: NNUE Question - King Placements

Post by Sopel »

AndrewGrant wrote: Fri Jul 02, 2021 12:31 am Do you see any speed gains in terms of NPS in the engine with smaller networks? Slightly more operations to compute the index, but few weights to juggle in memory/cache?
I haven't done any performance measurements, as it should stay about the same. The only possible speed improvements can come from 1. lower cache contention, 2. less refreshes in some positions (with the lower number of "king squares" as Daniel Shawul mentioned); but the effects are miniscule.
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
AndrewGrant
Posts: 1766
Joined: Tue Apr 19, 2016 6:08 am
Location: U.S.A
Full name: Andrew Grant

Re: NNUE Question - King Placements

Post by AndrewGrant »

Sopel wrote: Fri Jul 02, 2021 12:41 am
AndrewGrant wrote: Fri Jul 02, 2021 12:31 am Do you see any speed gains in terms of NPS in the engine with smaller networks? Slightly more operations to compute the index, but few weights to juggle in memory/cache?
I haven't done any performance measurements, as it should stay about the same. The only possible speed improvements can come from 1. lower cache contention, 2. less refreshes in some positions (with the lower number of "king squares" as Daniel Shawul mentioned); but the effects are miniscule.
Less refreshes? I don't follow how that is possible. Movement from sq A -> B forces a refresh always, I would imagine? Even if A ends up being the same as B after the mirroring, you still need to refresh since now all pieces on the board need to updated to reflect the mirroring change?
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
Sopel
Posts: 389
Joined: Tue Oct 08, 2019 11:39 pm
Full name: Tomasz Sobczyk

Re: NNUE Question - King Placements

Post by Sopel »

AndrewGrant wrote: Fri Jul 02, 2021 1:09 am
Sopel wrote: Fri Jul 02, 2021 12:41 am
AndrewGrant wrote: Fri Jul 02, 2021 12:31 am Do you see any speed gains in terms of NPS in the engine with smaller networks? Slightly more operations to compute the index, but few weights to juggle in memory/cache?
I haven't done any performance measurements, as it should stay about the same. The only possible speed improvements can come from 1. lower cache contention, 2. less refreshes in some positions (with the lower number of "king squares" as Daniel Shawul mentioned); but the effects are miniscule.
Less refreshes? I don't follow how that is possible. Movement from sq A -> B forces a refresh always, I would imagine? Even if A ends up being the same as B after the mirroring, you still need to refresh since now all pieces on the board need to updated to reflect the mirroring change?
I mean when the Daniel Shawul's "less king squares" is applied, where some some square clusteres share a king bucket. For example I used this without much loss in strength:

Code: Select all

KingBuckets = [ // order is A1 ... H1 ... H8
  24, 25, 26, 27, 28, 29, 30, 31,
  16, 17, 18, 19, 20, 21, 22, 23,
  12, 12, 13, 13, 14, 14, 15, 15,
   8,  8,  9,  9, 10, 10, 11, 11,
   4,  4,  5,  5,  6,  6,  7,  7,
   4,  4,  5,  5,  6,  6,  7,  7,
   0,  0,  1,  1,  2,  2,  3,  3,
   0,  0,  1,  1,  2,  2,  3,  3
];
dangi12012 wrote:No one wants to touch anything you have posted. That proves you now have negative reputations since everyone knows already you are a forum troll.

Maybe you copied your stockfish commits from someone else too?
I will look into that.
chrisw
Posts: 4321
Joined: Tue Apr 03, 2012 4:28 pm

Re: NNUE Question - King Placements

Post by chrisw »

Daniel Shawul wrote: Thu Jul 01, 2021 11:19 pm
Sopel wrote: Thu Jul 01, 2021 10:55 pm
syzygy wrote: Sun Nov 01, 2020 3:00 pm
If you would force the king to be in a-d, the difference between flip and rotate disappears.
This sentence is a hidden gem. It sparked me to try and force each perspective in HalfKP to put the king in e..h files, mirroring the board for this perspective if the king is on the a..d files instead. Not only does this reduce the size of the network by a factor of 2 but is also, so far, the best way to cut the size of the net I have found - there is no visible impact on the strength and it even learns faster near the start. One could think that it may have disastrous effect because sometimes white's perspective is flipped while black's isn't, but miraculously this is not an issue at all!

blue = standard, orange = this stuff above
Image
I have been using this since like forever :) Horizontal mirroring halves the number of king-squares to 32, and then I go one step further via bucketing
to bring down the number of king squares to 16. This results in 4x less weights to train than Stockfish's with barely any strenght loss ...
Me too. Like many things if you think about it, you'll come up with same ideas and similar solutions. Buckets is going to be a standard thought; most of those king squares will hardly ever be touched, so not only is one wasting much 'space', one is also creating game space regions where the 'learning' will be partial and results from that space error prone. Problem seems to be how to distribute 'sparsity' around the base weights space in a chess-effective manner. I'm not at all sure that the borrowing of the king relative idea for sparsity from shogi was going to be the most effective encode of king-attack-safety concepts, although maybe - it's relatively efficient (ie rarely enough requires a full accumulator recompute), if it does in fact capture desirable king-factors.
chrisw
Posts: 4321
Joined: Tue Apr 03, 2012 4:28 pm

Re: NNUE Question - King Placements

Post by chrisw »

Sopel wrote: Fri Jul 02, 2021 11:51 am
AndrewGrant wrote: Fri Jul 02, 2021 1:09 am
Sopel wrote: Fri Jul 02, 2021 12:41 am
AndrewGrant wrote: Fri Jul 02, 2021 12:31 am Do you see any speed gains in terms of NPS in the engine with smaller networks? Slightly more operations to compute the index, but few weights to juggle in memory/cache?
I haven't done any performance measurements, as it should stay about the same. The only possible speed improvements can come from 1. lower cache contention, 2. less refreshes in some positions (with the lower number of "king squares" as Daniel Shawul mentioned); but the effects are miniscule.
Less refreshes? I don't follow how that is possible. Movement from sq A -> B forces a refresh always, I would imagine? Even if A ends up being the same as B after the mirroring, you still need to refresh since now all pieces on the board need to updated to reflect the mirroring change?
I mean when the Daniel Shawul's "less king squares" is applied, where some some square clusteres share a king bucket. For example I used this without much loss in strength:

Code: Select all

KingBuckets = [ // order is A1 ... H1 ... H8
  24, 25, 26, 27, 28, 29, 30, 31,
  16, 17, 18, 19, 20, 21, 22, 23,
  12, 12, 13, 13, 14, 14, 15, 15,
   8,  8,  9,  9, 10, 10, 11, 11,
   4,  4,  5,  5,  6,  6,  7,  7,
   4,  4,  5,  5,  6,  6,  7,  7,
   0,  0,  1,  1,  2,  2,  3,  3,
   0,  0,  1,  1,  2,  2,  3,  3
];
I read you also initialise all weights to zero. Coupled with the bog-standard factoriser, this is also going to help (a lot) for the weight space of, say, wK=a8 which I doubt learns anything very much. ie the factoriser doesn't just help generalisation, but coupled with zero initialisation it prevents a lot of stupidity.