OliverBr wrote: ↑Thu Nov 12, 2020 5:59 am
I wonder what the "true limit" would be, when everybody was only using his own code based on his own ideas.
Is there anybody who wrote his own NNUE code and uses his own nets?
The three engines so far with their own NNUE architecture code are Seer, Halogen (which its author Kieren Pearson has tested to have reached 3050 elo), and Dragon by Komodo Chess. Everybody else so far has been tinkering with a copy of the NNUE code from Hisayori Noda's Stockfish fork grafted onto their engine, including the official Stockfish team themselves.
Edit: I forgot Minic, which since Minic 3 is also uses its own NNUE code as well.
FYI, we don't know for sure what the Komodo team is doing, but, in all likelihood, it's using the exact same training code from SF and network architecture (if someone from the Komodo team can correct me here, please do). This means, to avoid GPL, they likely rewrote just the inference code much like many top FOSS engines have done (see Vajolet and RubiChess for examples).
Currently Minic is actually using Seer's NN implementation, but Vivien is taking it in his own direction from there Regardless, I'm actually about to make some pretty serious changes to my network architecture anyways (moving away from halfkp features altogether to what I'm dubbing adjacent-piece-piece features which I believe to be superior for chess).
Congrats to Kieren! It looks like I have a lot of catch up work to do if I want to surpass Halogen again
connor_mcmonigle wrote: ↑Thu Nov 12, 2020 9:22 am
FYI, we don't know for sure what the Komodo team is doing, but, in all likelihood, it's using the exact same training code from SF and network architecture (if someone from the Komodo team can correct me here, please do). This means, to avoid GPL, they likely rewrote just the inference code much like many top FOSS engines have done (see Vajolet and RubiChess for examples).
Huh, the Komodo team said
lkaufman wrote: ↑Mon Nov 02, 2020 6:08 pm
We are also announcing our new "Dragon" version of Komodo, which is now playing in the chess.com CCC
tournament as "Mystery" and which we expect to release soon. It uses the new NNUE technology that was
developed for the game of shogi, but not the NNUE code. The search is Komodo search (with some parameters
tuned), and the nets we use are all trained on Komodo games and Komodo evals. The net is embedded so the
user need not do anything special to use it (though it can be turned off).
which I interpreted to be referring to the search used in training Dragon (Stockfish's trainer uses Stockfish search), but it could just refer to the fact that Dragon uses the same search as Komodo.
And btw, Vajolet hasn't been updated in an year so perhaps you are thinking of a different engine?
connor_mcmonigle wrote: ↑Thu Nov 12, 2020 9:22 am
FYI, we don't know for sure what the Komodo team is doing, but, in all likelihood, it's using the exact same training code from SF and network architecture (if someone from the Komodo team can correct me here, please do). This means, to avoid GPL, they likely rewrote just the inference code much like many top FOSS engines have done (see Vajolet and RubiChess for examples).
...
And btw, Vajolet hasn't been updated in an year so perhaps you are thinking of a different engine?
Madeleine Birchfield wrote: ↑Thu Nov 12, 2020 10:04 am
...which I interpreted to be referring to the search used in training Dragon (Stockfish's trainer uses Stockfish search), but it could just refer to the fact that Dragon uses the same search as Komodo.
Yes. They claim to and are very likely using Komodo's games as training data, but this doesn't mean they implemented new training code + made improvements/changes to the network architecture. This is exceedingly improbable imho.
Likely, what they did for training is the same as DKappe has been doing for a while which involves converting separate data obtained from self play games of a different engine into the packed fen format used by the SF trainer. It seems rather likely they didn't even bother swapping out the SF qsearch code used by the trainer.
To then actually run the networks produced by this process in their engine, they presumably got someone to exactly rewrite just the inference code so they could circumvent the GPL restrictions. If this is the case, I would personally like to see the computer Shogi developers who invested a lot of effort into writing the incredibly optimized and clever training code added to the Dragon authors list. They are responsible for the large majority of the work involved in the increase in strength.
Both Halogen and Seer are comparatively all original. Both just happen to rely on the "efficiently updatable" idea. They probably shouldn't be lumped into the same category as Komodo+NNUE.
connor_mcmonigle wrote: ↑Thu Nov 12, 2020 5:09 pm
Both Halogen and Seer are comparatively all original. Both just happen to rely on the "efficiently updatable" idea. They probably shouldn't be lumped into the same category as Komodo+NNUE.
connor_mcmonigle wrote: ↑Thu Nov 12, 2020 5:09 pm
Both Halogen and Seer are comparatively all original. Both just happen to rely on the "efficiently updatable" idea. They probably shouldn't be lumped into the same category as Komodo+NNUE.
+1, Agreed.
But maybe we might be able to add Ethereal to the list soon.
Madeleine Birchfield wrote: ↑Thu Nov 12, 2020 10:04 am
...which I interpreted to be referring to the search used in training Dragon (Stockfish's trainer uses Stockfish search), but it could just refer to the fact that Dragon uses the same search as Komodo.
Yes. They claim to and are very likely using Komodo's games as training data, but this doesn't mean they implemented new training code + made improvements/changes to the network architecture. This is exceedingly improbable imho.
Likely, what they did for training is the same as DKappe has been doing for a while which involves converting separate data obtained from self play games of a different engine into the packed fen format used by the SF trainer. It seems rather likely they didn't even bother swapping out the SF qsearch code used by the trainer.
To then actually run the networks produced by this process in their engine, they presumably got someone to exactly rewrite just the inference code so they could circumvent the GPL restrictions. If this is the case, I would personally like to see the computer Shogi developers who invested a lot of effort into writing the incredibly optimized and clever training code added to the Dragon authors list. They are responsible for the large majority of the work involved in the increase in strength.
Both Halogen and Seer are comparatively all original. Both just happen to rely on the "efficiently updatable" idea. They probably shouldn't be lumped into the same category as Komodo+NNUE.
(Also see Vajolet's NNUE branch)
Your wild speculations are amusing, stating as fact or high likelihood things you wish to be true. It’s good that things in developer land are generally more friendly. I’ve been encouraging the SF devs to port their trainer to pytorch for a while and been giving them small suggestions in a few areas now that they are on the way. I was afraid they were going to run into a development roadblock without this port, but they are making good progress. I am happy about this.
Just as a note, I’ve been training distilled, endgame and specialist nets in tensorflow and pytorch (and have started to use julia/flux) for several years. These aren’t new concepts to me. It’s my hobby. Don’t assume because you are helpless and out of your depth with regard to training neural nets that others are too.
P.S. On a more useful note, I’ve started using Tord Romstad’s excellent Chess.jl library (https://github.com/romstad/Chess.jl), though it has one major castling bug that I’m working to fix. Pretty speedy for stuff like qsearch.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
Madeleine Birchfield wrote: ↑Thu Nov 12, 2020 10:04 am
...which I interpreted to be referring to the search used in training Dragon (Stockfish's trainer uses Stockfish search), but it could just refer to the fact that Dragon uses the same search as Komodo.
Yes. They claim to and are very likely using Komodo's games as training data, but this doesn't mean they implemented new training code + made improvements/changes to the network architecture. This is exceedingly improbable imho.
Likely, what they did for training is the same as DKappe has been doing for a while which involves converting separate data obtained from self play games of a different engine into the packed fen format used by the SF trainer. It seems rather likely they didn't even bother swapping out the SF qsearch code used by the trainer.
To then actually run the networks produced by this process in their engine, they presumably got someone to exactly rewrite just the inference code so they could circumvent the GPL restrictions. If this is the case, I would personally like to see the computer Shogi developers who invested a lot of effort into writing the incredibly optimized and clever training code added to the Dragon authors list. They are responsible for the large majority of the work involved in the increase in strength.
Both Halogen and Seer are comparatively all original. Both just happen to rely on the "efficiently updatable" idea. They probably shouldn't be lumped into the same category as Komodo+NNUE.
(Also see Vajolet's NNUE branch)
Your wild speculations are amusing, stating as fact or high likelihood things you wish to be true. It’s good that things in developer land are generally more friendly. I’ve been encouraging the SF devs to port their trainer to pytorch for a while and been giving them small suggestions in a few areas now that they are on the way. I was afraid they were going to run into a development roadblock without this port, but they are making good progress. I am happy about this.
Just as a note, I’ve been training distilled, endgame and specialist nets in tensorflow and pytorch (and have started to use julia/flux) for several years. These aren’t new concepts to me. It’s my hobby. Don’t assume because you are helpless and out of your depth with regard to training neural nets that others are too.
P.S. On a more useful note, I’ve started using Tord Romstad’s excellent Chess.jl library (https://github.com/romstad/Chess.jl), though it has one major castling bug that I’m working to fix. Pretty speedy for stuff like qsearch.
AndrewGrant wrote: ↑Sat Nov 21, 2020 7:49 am
Ill note that you failed to deny the claims.
You mean the baseless speculations? Note what you like Andrew, but your rage posts are somewhat tiring.
Fat Titz by Stockfish, the engine with the bodaciously big net. Remember: size matters. If you want to learn more about this engine just google for "Fat Titz".
Madeleine Birchfield wrote: ↑Thu Nov 12, 2020 10:04 am
...which I interpreted to be referring to the search used in training Dragon (Stockfish's trainer uses Stockfish search), but it could just refer to the fact that Dragon uses the same search as Komodo.
Yes. They claim to and are very likely using Komodo's games as training data, but this doesn't mean they implemented new training code + made improvements/changes to the network architecture. This is exceedingly improbable imho.
Likely, what they did for training is the same as DKappe has been doing for a while which involves converting separate data obtained from self play games of a different engine into the packed fen format used by the SF trainer. It seems rather likely they didn't even bother swapping out the SF qsearch code used by the trainer.
To then actually run the networks produced by this process in their engine, they presumably got someone to exactly rewrite just the inference code so they could circumvent the GPL restrictions. If this is the case, I would personally like to see the computer Shogi developers who invested a lot of effort into writing the incredibly optimized and clever training code added to the Dragon authors list. They are responsible for the large majority of the work involved in the increase in strength.
Both Halogen and Seer are comparatively all original. Both just happen to rely on the "efficiently updatable" idea. They probably shouldn't be lumped into the same category as Komodo+NNUE.
(Also see Vajolet's NNUE branch)
Your wild speculations are amusing, stating as fact or high likelihood things you wish to be true. It’s good that things in developer land are generally more friendly. I’ve been encouraging the SF devs to port their trainer to pytorch for a while and been giving them small pointer in a few areas now that they are on the way. I was afraid they were going to run into a development roadblock without this, but they are making good progress. I am happy about this.
Just as a note, I’ve been training distilled, endgame and specialist nets in tensorflow and pytorch (and have started to use julia/flux) for several years. These aren’t new concepts to me. It’s my hobby. Don’t assume because you are helpless and out of your depth with regard to training neural nets that others are too.
P.S. On a more useful note, I’ve started using Tord Romstad’s excellent Chess.jl library (https://github.com/romstad/Chess.jl), though it has one major castling bug that I’m working to fix. Pretty speedy for stuff like qsearch.
Whoa. I believe you totally misintrepret my words (perhaps a little too much speculation on my part in all fairness) I wish precisely the opposite! I'm very hopeful that the networks used in Komodo NNUE were trained using original training code and a unique network architecture. In fact, a separate forum post describing the training process/unique features of the implementations as well as lessons learned from implementing the training code and inference code from the ground up (is this what you're claiming?) would be much appreciated...
That said, what's with the personal attacks? Perhaps you interpreted what I had written as an attack? This was certainly not my intention. I'm "not hopelessly out of my depths" when it comes to training neural networks. In fact, this PyTorch NNUE training code you refer to largely originated from the PyTorch training code I wrote for training networks for my personal project (you'll find it referenced in Gary's repository's readme).