New Guest Engine: Maverick NNUE...

chrisw · Post by **chrisw** » Tue May 26, 2026 8:00 pm

Steve Maughan wrote: ↑Tue May 26, 2026 7:24 pm I'm shocked that Maverick-NNUE is stronger than S13-NNUE.

My first thought is that this implies Maverick has a better positional evaluation than Shredder 13 — which I highly doubt. Could the reason be that Maverick was trained on more position (Leela positions were added) — I guess it's possible? Or could it be that S13 is much more selective than Maverick, which in normal games results in a much deeper searches. However in shallow 7-ply fixed depths games the selectivity hurts Shredder's playing strength compared to Mavericks less selective search.

Thoughts?

— Steve

Your search should not be fixed depth, but fixed iteration (where iteration is a sort of nominal depth, but extensions/pruning). Presumably that's what you mean?

Steve Maughan · Post by **Steve Maughan** » Tue May 26, 2026 8:38 pm

chrisw wrote: ↑Tue May 26, 2026 8:00 pm<SNIP>
Your search should not be fixed depth, but fixed iteration (where iteration is a sort of nominal depth, but extensions/pruning). Presumably that's what you mean?

Yes — exactly!

Rebel · Post by **Rebel** » Tue May 26, 2026 10:30 pm

Steve Maughan wrote: ↑Tue May 26, 2026 7:24 pm I'm shocked that Maverick-NNUE is stronger than S13-NNUE.

My first thought is that this implies Maverick has a better positional evaluation than Shredder 13 — which I highly doubt. Could the reason be that Maverick was trained on more position (Leela positions were added) — I guess it's possible? Or could it be that S13 is much more selective than Maverick, which in normal games results in a much deeper searches. However in shallow 7-ply fixed depths games the selectivity hurts Shredder's playing strength compared to Mavericks less selective search.

Thoughts?

— Steve

S13 reached ~3400 elo by its own HCE evaluation without the better Leela data.

jorose · Post by **jorose** » Wed May 27, 2026 8:58 am

Rebel wrote: ↑Tue May 26, 2026 4:17 pm
jorose wrote: ↑Tue May 26, 2026 8:21 am
I think you will remember the energy and computer time you needed to create data for NNUE evaluation playing millions of self play games especially with limited hardware. For guest engines I demand an absolute minimum of one billion positions which practically means playing 14-15 million self play games, depending on your hardware weeks or months.

And 1B is peanuts, AI flourishes with more data, more data. I offer authors to increase the volume by playing another 14-15 million self play games and have 2B positions or (to ease the pain) add ready to use 1B Leela positions. After the 1B self play experience you may guess the choice.

While I don't have much experience training NNUEs, I do have experience training my own models. Indeed, the compute is a limiting factor. That being said, if the objective is to have an engine which behaves similar to the base engine on steroids, you could just train a smaller model. Of course the engine might be a bit weaker than if you had 2B positions and a larger net, but it should still be plenty strong. I think Winter training data was 500M or so positions, though I would have to double check the exact number.

Rebel wrote: ↑Tue May 26, 2026 4:17 pm And mixing incompatible data of 2 engines with colliding moves and scores has its own set of problems.

I would imagine it is advantageous to have engines that might make different moves in your dataset, I don't understand why that is an issue unless we are training policy nets nowadays? I could see there being an issue training on different evaluation outputs, as they might be in different scales. I am unsure how big an issue it is, iirc Stockfish has some Leela data in its training dataset.

Rebel wrote: ↑Tue May 26, 2026 4:17 pm Regarding strength, a quick robin match between the 4 guest engines.

Code: Select all

No. Name             Win Draw Loss Unf.  Score Games       %
------------------------------------------------------------
  1 SP3-NNUE        +222 =208  -39   *0  326.0   469   69.5%
  2 Maverick-NNUE   +198 =225  -47   *0  310.5   470   66.1%
  3 S13-NNUE        +128 =221 -121   *0  238.5   470   50.7%
  4 Strong-Malt-1.0  +14 =100 -355   *0   64.0   469   13.6%

Total Games:     939
White Wins:      337 (35.9%)
Black Wins:      225 (24.0%)
Draws:           377 (40.1%)

Interesting stuff, thank you!

Rebel wrote: ↑Tue May 26, 2026 4:17 pm Regarding similarity : https://rebel7775.wixsite.com/rebel/gue ... similarity

Thank you! Could you run simex with the base engines (e.g. add non-NNUE Maverick to the simex you already ran)? I understand it is probably not designed for comparing engines with very different strength, but I think the results might be interesting.

Rebel · Post by **Rebel** » Wed May 27, 2026 4:01 pm

jorose wrote: ↑Wed May 27, 2026 8:58 am
Rebel wrote: ↑Tue May 26, 2026 4:17 pm Regarding similarity : https://rebel7775.wixsite.com/rebel/gue ... similarity
Thank you! Could you run simex with the base engines (e.g. add non-NNUE Maverick to the simex you already ran)? I understand it is probably not designed for comparing engines with very different strength, but I think the results might be interesting.

Why not, interesting indeed.

Note, Single-Malt does not support the UCI "depth" command, instead it used 100ms per position.

Rebel · Post by **Rebel** » Wed May 27, 2026 4:21 pm

jorose wrote: ↑Wed May 27, 2026 8:58 am
Rebel wrote: ↑Tue May 26, 2026 4:17 pm
jorose wrote: ↑Tue May 26, 2026 8:21 am
I think you will remember the energy and computer time you needed to create data for NNUE evaluation playing millions of self play games especially with limited hardware. For guest engines I demand an absolute minimum of one billion positions which practically means playing 14-15 million self play games, depending on your hardware weeks or months.

And 1B is peanuts, AI flourishes with more data, more data. I offer authors to increase the volume by playing another 14-15 million self play games and have 2B positions or (to ease the pain) add ready to use 1B Leela positions. After the 1B self play experience you may guess the choice.

While I don't have much experience training NNUEs, I do have experience training my own models. Indeed, the compute is a limiting factor. That being said, if the objective is to have an engine which behaves similar to the base engine on steroids, you could just train a smaller model. Of course the engine might be a bit weaker than if you had 2B positions and a larger net, but it should still be plenty strong. I think Winter training data was 500M or so positions, though I would have to double check the exact number.

I can assure you a 2B network would give Winter 50+ elo.

jorose wrote: ↑Wed May 27, 2026 8:58 am
Rebel wrote: ↑Tue May 26, 2026 4:17 pm And mixing incompatible data of 2 engines with colliding moves and scores has its own set of problems.

I would imagine it is advantageous to have engines that might make different moves in your dataset, I don't understand why that is an issue unless we are training policy nets nowadays? I could see there being an issue training on different evaluation outputs, as they might be in different scales. I am unsure how big an issue it is, iirc Stockfish has some Leela data in its training dataset.

Last time I checked Stockfish was using 95% Leela data and 5% Stockfish data.

Our trainer / learner software can't handle input from 2 engines the normal way. I had to write a merge tool that shuffles data on a one to one base. And still get a horrible loss graph. But it works.

chrisw · Post by **chrisw** » Wed May 27, 2026 5:47 pm

Rebel wrote: ↑Wed May 27, 2026 4:21 pm
jorose wrote: ↑Wed May 27, 2026 8:58 am
Rebel wrote: ↑Tue May 26, 2026 4:17 pm
jorose wrote: ↑Tue May 26, 2026 8:21 am
I think you will remember the energy and computer time you needed to create data for NNUE evaluation playing millions of self play games especially with limited hardware. For guest engines I demand an absolute minimum of one billion positions which practically means playing 14-15 million self play games, depending on your hardware weeks or months.

And 1B is peanuts, AI flourishes with more data, more data. I offer authors to increase the volume by playing another 14-15 million self play games and have 2B positions or (to ease the pain) add ready to use 1B Leela positions. After the 1B self play experience you may guess the choice.

While I don't have much experience training NNUEs, I do have experience training my own models. Indeed, the compute is a limiting factor. That being said, if the objective is to have an engine which behaves similar to the base engine on steroids, you could just train a smaller model. Of course the engine might be a bit weaker than if you had 2B positions and a larger net, but it should still be plenty strong. I think Winter training data was 500M or so positions, though I would have to double check the exact number.

I can assure you a 2B network would give Winter 50+ elo.

jorose wrote: ↑Wed May 27, 2026 8:58 am
Rebel wrote: ↑Tue May 26, 2026 4:17 pm And mixing incompatible data of 2 engines with colliding moves and scores has its own set of problems.

I would imagine it is advantageous to have engines that might make different moves in your dataset, I don't understand why that is an issue unless we are training policy nets nowadays? I could see there being an issue training on different evaluation outputs, as they might be in different scales. I am unsure how big an issue it is, iirc Stockfish has some Leela data in its training dataset.

Last time I checked Stockfish was using 95% Leela data and 5% Stockfish data.

Our trainer / learner software can't handle input from 2 engines the normal way. I had to write a merge tool that shuffles data on a one to one base. And still get a horrible loss graph. But it works.

NB we do NOT use the Stockfish trainer, we feed in pre_shuffled EPDs which are processed sequentially. SF Trainer reads in its own format of PGNs (wot I call Linrock data) and processes them by jumping forward N positions each time, which effectively is a way better way of shuffling. Our method involves masses of pre-processing work building shuffled EPD files. If I do this again I'll use the SF method.

New Guest Engine: Maverick NNUE...

Re: New Guest Engine: Maverick NNUE...

Re: New Guest Engine: Maverick NNUE...

Re: New Guest Engine: Maverick NNUE...

Re: New Guest Engine: Maverick NNUE...

Re: New Guest Engine: Maverick NNUE...

Re: New Guest Engine: Maverick NNUE...

Re: New Guest Engine: Maverick NNUE...