TDLeaf Learning for NNUE

dchoman · Post by **dchoman** » Tue Apr 14, 2026 12:18 pm

As described in my earlier thread, I've been working on a reboot of EXchess as "Leaf", which has a NNUE evaluation with built-in TDleaf training. Initial results were promising, but it took some time to improve and stabilize the learning algorithm. Still some room for improvement for sure, but the gains from just depth=6 rapid games is promising. Here are the results from the first 1.6 Million games:

Code: Select all

Bayesian Elo ratings — 17 PGN files combined
8500 games loaded, 10 players rated

Rank  Name                   Elo     ±  Games   Score   Oppo  Draws
-------------------------------------------------------------------
   1  Leaf_v260410-1.6e6g   +345    18   1000     55%   +313    36%
   2  Leaf_vclassic_eval    +336    12   3500     72%   +116    18%
   3  Leaf_v260410-1.5e6g   +289    15   1500     48%   +308    25%
   4  Leaf_v260410-8e5g     +244    15   1500     48%   +261    28%
   5  Leaf_v260410-5e5g     +158    15   1500     42%   +222    25%
   6  Leaf_v260410-2e5g      +88    16   2000     57%    -42    14%
   7  Leaf_v260410-5e4g      -50    18   2000     54%   -112    10%
   8  Leaf_v260410-1e4g     -259    22   1500     38%   -109     9%
   9  Leaf_v260410-0g       -539    25    500     62%   -612    41%
  10  Leaf_vmaterial_eval   -612    19   2000     14%   -190    14%

The net is initialized at 0 games with material values, random weights, and 0 biases. On my 16 processor laptop, 1.6M games at depth=6 can be played in less than a day. The above set took a few days as I did the learning in batches and tested after each batch. The first 800,000g were entirely self-play with both sides learning. After that, it seemed better to use fixed opponents: either the most recent best learned version or the classic_eval. I moved between those two options with quick progress checks to determine if we should discard a set of games --> sets were anywhere from 2000 to 50,000 games... not super scientific, but enough to make progress. At 1.6M games, things have stalled. We are probably in a local minimum as the next 400,000g played last night were a regression.

A few observations:

1) It will probably take order of magnitude(s) more games to make significantly more progress, regardless of the other changes I make. That is not necessarily a problem, other than time, but this is not a learning technique that can be used for obvious gains in a few hundred or few thousand games (except very early in the learning process).

2) Experience based learning like this is time consuming to optimize as one cannot use a bank of previously played games to test learning parameters. I need to think a bit about this and come up with a strategy for moving forward. Possibilities include

(a) Deeper games (depth = 8, 10, more, timed games - although there is an added noise signal there) will take time before I can tell if they are the right step

(b) Changing up opponents that will play different lines, exposing the TDleaf learning to a wider variety of positions

(c) Adjusting the learning parameters -- currently using AdamW for this, so not a lot to adjust, but there are still several choices of learning rates for various parts of the net, batching before initiating a learning set, merging learned values from multiple threads, etc.

No real question here except to see if anyone has experience with TDLeaf learning to adjust NNUE weights and biases through experiential play rather than off-line learning with a large databases of games. Any wisdom on areas to explore next is welcome.

- Dan