Your search should not be fixed depth, but fixed iteration (where iteration is a sort of nominal depth, but extensions/pruning). Presumably that's what you mean?Steve Maughan wrote: ↑Tue May 26, 2026 7:24 pm I'm shocked that Maverick-NNUE is stronger than S13-NNUE.
My first thought is that this implies Maverick has a better positional evaluation than Shredder 13 — which I highly doubt. Could the reason be that Maverick was trained on more position (Leela positions were added) — I guess it's possible? Or could it be that S13 is much more selective than Maverick, which in normal games results in a much deeper searches. However in shallow 7-ply fixed depths games the selectivity hurts Shredder's playing strength compared to Mavericks less selective search.
Thoughts?
— Steve
New Guest Engine: Maverick NNUE...
Moderator: Ras
-
chrisw
- Posts: 4963
- Joined: Tue Apr 03, 2012 4:28 pm
- Location: Anywhere but the Western Empire
- Full name: Christopher Whittington
Re: New Guest Engine: Maverick NNUE...
-
Steve Maughan
- Posts: 1331
- Joined: Wed Mar 08, 2006 8:28 pm
- Location: Florida, USA
Re: New Guest Engine: Maverick NNUE...
Yes — exactly!
http://www.chessprogramming.net - Juggernaut & Maverick Chess Engine
-
Rebel
- Posts: 7559
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: New Guest Engine: Maverick NNUE...
S13 reached ~3400 elo by its own HCE evaluation without the better Leela data.Steve Maughan wrote: ↑Tue May 26, 2026 7:24 pm I'm shocked that Maverick-NNUE is stronger than S13-NNUE.
My first thought is that this implies Maverick has a better positional evaluation than Shredder 13 — which I highly doubt. Could the reason be that Maverick was trained on more position (Leela positions were added) — I guess it's possible? Or could it be that S13 is much more selective than Maverick, which in normal games results in a much deeper searches. However in shallow 7-ply fixed depths games the selectivity hurts Shredder's playing strength compared to Mavericks less selective search.
Thoughts?
— Steve
90% of coding is debugging, the other 10% is writing bugs.
-
jorose
- Posts: 393
- Joined: Thu Jan 22, 2015 3:21 pm
- Location: Zurich, Switzerland
- Full name: Jonathan Rosenthal
Re: New Guest Engine: Maverick NNUE...
Rebel wrote: ↑Tue May 26, 2026 4:17 pmI think you will remember the energy and computer time you needed to create data for NNUE evaluation playing millions of self play games especially with limited hardware. For guest engines I demand an absolute minimum of one billion positions which practically means playing 14-15 million self play games, depending on your hardware weeks or months.
And 1B is peanuts, AI flourishes with more data, more data. I offer authors to increase the volume by playing another 14-15 million self play games and have 2B positions or (to ease the pain) add ready to use 1B Leela positions. After the 1B self play experience you may guess the choice.
While I don't have much experience training NNUEs, I do have experience training my own models. Indeed, the compute is a limiting factor. That being said, if the objective is to have an engine which behaves similar to the base engine on steroids, you could just train a smaller model. Of course the engine might be a bit weaker than if you had 2B positions and a larger net, but it should still be plenty strong. I think Winter training data was 500M or so positions, though I would have to double check the exact number.
I would imagine it is advantageous to have engines that might make different moves in your dataset, I don't understand why that is an issue unless we are training policy nets nowadays? I could see there being an issue training on different evaluation outputs, as they might be in different scales. I am unsure how big an issue it is, iirc Stockfish has some Leela data in its training dataset.
Interesting stuff, thank you!Rebel wrote: ↑Tue May 26, 2026 4:17 pm Regarding strength, a quick robin match between the 4 guest engines.
Code: Select all
No. Name Win Draw Loss Unf. Score Games % ------------------------------------------------------------ 1 SP3-NNUE +222 =208 -39 *0 326.0 469 69.5% 2 Maverick-NNUE +198 =225 -47 *0 310.5 470 66.1% 3 S13-NNUE +128 =221 -121 *0 238.5 470 50.7% 4 Strong-Malt-1.0 +14 =100 -355 *0 64.0 469 13.6% Total Games: 939 White Wins: 337 (35.9%) Black Wins: 225 (24.0%) Draws: 377 (40.1%)
Thank you! Could you run simex with the base engines (e.g. add non-NNUE Maverick to the simex you already ran)? I understand it is probably not designed for comparing engines with very different strength, but I think the results might be interesting.Rebel wrote: ↑Tue May 26, 2026 4:17 pm Regarding similarity : https://rebel7775.wixsite.com/rebel/gue ... similarity
-Jonathan
-
Rebel
- Posts: 7559
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: New Guest Engine: Maverick NNUE...
Why not, interesting indeed.jorose wrote: ↑Wed May 27, 2026 8:58 amThank you! Could you run simex with the base engines (e.g. add non-NNUE Maverick to the simex you already ran)? I understand it is probably not designed for comparing engines with very different strength, but I think the results might be interesting.Rebel wrote: ↑Tue May 26, 2026 4:17 pm Regarding similarity : https://rebel7775.wixsite.com/rebel/gue ... similarity

Note, Single-Malt does not support the UCI "depth" command, instead it used 100ms per position.
90% of coding is debugging, the other 10% is writing bugs.
-
Rebel
- Posts: 7559
- Joined: Thu Aug 18, 2011 12:04 pm
- Full name: Ed Schröder
Re: New Guest Engine: Maverick NNUE...
jorose wrote: ↑Wed May 27, 2026 8:58 amRebel wrote: ↑Tue May 26, 2026 4:17 pmI think you will remember the energy and computer time you needed to create data for NNUE evaluation playing millions of self play games especially with limited hardware. For guest engines I demand an absolute minimum of one billion positions which practically means playing 14-15 million self play games, depending on your hardware weeks or months.
And 1B is peanuts, AI flourishes with more data, more data. I offer authors to increase the volume by playing another 14-15 million self play games and have 2B positions or (to ease the pain) add ready to use 1B Leela positions. After the 1B self play experience you may guess the choice.
While I don't have much experience training NNUEs, I do have experience training my own models. Indeed, the compute is a limiting factor. That being said, if the objective is to have an engine which behaves similar to the base engine on steroids, you could just train a smaller model. Of course the engine might be a bit weaker than if you had 2B positions and a larger net, but it should still be plenty strong. I think Winter training data was 500M or so positions, though I would have to double check the exact number.
I can assure you a 2B network would give Winter 50+ elo.
jorose wrote: ↑Wed May 27, 2026 8:58 am
I would imagine it is advantageous to have engines that might make different moves in your dataset, I don't understand why that is an issue unless we are training policy nets nowadays? I could see there being an issue training on different evaluation outputs, as they might be in different scales. I am unsure how big an issue it is, iirc Stockfish has some Leela data in its training dataset.
Last time I checked Stockfish was using 95% Leela data and 5% Stockfish data.
Our trainer / learner software can't handle input from 2 engines the normal way. I had to write a merge tool that shuffles data on a one to one base. And still get a horrible loss graph. But it works.
90% of coding is debugging, the other 10% is writing bugs.
-
chrisw
- Posts: 4963
- Joined: Tue Apr 03, 2012 4:28 pm
- Location: Anywhere but the Western Empire
- Full name: Christopher Whittington
Re: New Guest Engine: Maverick NNUE...
Rebel wrote: ↑Wed May 27, 2026 4:21 pmjorose wrote: ↑Wed May 27, 2026 8:58 amRebel wrote: ↑Tue May 26, 2026 4:17 pmI think you will remember the energy and computer time you needed to create data for NNUE evaluation playing millions of self play games especially with limited hardware. For guest engines I demand an absolute minimum of one billion positions which practically means playing 14-15 million self play games, depending on your hardware weeks or months.
And 1B is peanuts, AI flourishes with more data, more data. I offer authors to increase the volume by playing another 14-15 million self play games and have 2B positions or (to ease the pain) add ready to use 1B Leela positions. After the 1B self play experience you may guess the choice.
While I don't have much experience training NNUEs, I do have experience training my own models. Indeed, the compute is a limiting factor. That being said, if the objective is to have an engine which behaves similar to the base engine on steroids, you could just train a smaller model. Of course the engine might be a bit weaker than if you had 2B positions and a larger net, but it should still be plenty strong. I think Winter training data was 500M or so positions, though I would have to double check the exact number.
I can assure you a 2B network would give Winter 50+ elo.
jorose wrote: ↑Wed May 27, 2026 8:58 am
I would imagine it is advantageous to have engines that might make different moves in your dataset, I don't understand why that is an issue unless we are training policy nets nowadays? I could see there being an issue training on different evaluation outputs, as they might be in different scales. I am unsure how big an issue it is, iirc Stockfish has some Leela data in its training dataset.
Last time I checked Stockfish was using 95% Leela data and 5% Stockfish data.
Our trainer / learner software can't handle input from 2 engines the normal way. I had to write a merge tool that shuffles data on a one to one base. And still get a horrible loss graph. But it works.
NB we do NOT use the Stockfish trainer, we feed in pre_shuffled EPDs which are processed sequentially. SF Trainer reads in its own format of PGNs (wot I call Linrock data) and processes them by jumping forward N positions each time, which effectively is a way better way of shuffling. Our method involves masses of pre-processing work building shuffled EPD files. If I do this again I'll use the SF method.