Understanding neural networks in chess... from RL point.

osvitashev · Post by **osvitashev** » Wed Aug 13, 2025 1:04 am

Armchair chess engine programmer here...

Below is my broad understanding of how neural networks are usually used in chess engines. (Yes, we use a lot of tricks on top of minimax to make it run faster and there are some very smart optimizations in NNUEs, but for the purpose of this question, that is not relevant)
Please correct me if i am totally wrong...

1. Use a static collection of chess games to train a network to recognize game outcome based on a position.
2. Plug this network as an evaluation function into a minimax algorithm. A position that is more 'won' correlates to a higher eval score.
3. Optionally, generate more games/positions through self-play and repeat step #1

I have started reading up on reinforcement learning algorithms and got confused in terms of how it plugs into chess engines.

Step #1 is effectively an offline deep q-learning algorithm. Offline DQL is generally terrible at handling distribution shifts, as it tends to overestimate the value of actions underrepresented in the training data set.

So, my question is... how does it all work?
More specifically:
Does filtering a potentially bad (biased) evaluation function through several levels of minimax somehow average it out and make it usable?
Is there some deep connection between Bellman Equations that are all over RL and minimax algorithm?
Is DQL an oversimplification on my part, and the evaluation function network training needs to be done via something more sophisticated like CQL?
Or... is the solution mainly in the iterative exploration in step #3, in other words: self-play. So it is not mainly an offline RL problem?

ZirconiumX · Post by **ZirconiumX** » Wed Aug 13, 2025 8:25 pm

I'm going to draw a distinction here for clarity. "Value networks" take a position as input and output one score, and are used for the evaluation. "Policy networks" take a position as input and output scores for all possible moves, and are used for move scoring. (Policy networks are much rarer.)

Value network training is not reinforcement learning, but supervised learning (specifically regression). The value network itself makes no judgement on what the best move in a position is; that's entirely left to minimax.

As such, simple methods like gradient descent, or variants thereof like Adam, are sufficient for training.

jdart · Post by **jdart** » Wed Aug 13, 2025 11:25 pm

The first step is supervised learning. The 3rd step is a form of reinforcement learning, at least that term is commonly used for it, because the training starts with and builds on top of the existing network. It typically uses a different learning rate and schedule, because you are presumably already close to the optimum point.

Aleks Peshkov · Post by **Aleks Peshkov** » Thu Aug 14, 2025 12:48 pm

I have a question about practical usage of NN.

I do not want NN to measue WDL or even full centipawn evaluation. I want to use NN as small positional correction to basic 1-3-3-5-9 or 1-4-4-6-12 material count as static evaluation.

Can we use simple and fast NN as PST substitution? It should make NN even with almost random values playable and not random mover.

ZirconiumX · Post by **ZirconiumX** » Thu Aug 14, 2025 3:36 pm

Aleks Peshkov wrote: ↑Thu Aug 14, 2025 12:48 pm I have a question about practical usage of NN.

I do not want NN to measue WDL or even full centipawn evaluation. I want to use NN as small positional correction to basic 1-3-3-5-9 or 1-4-4-6-12 material count as static evaluation.

Can we use simple and fast NN as PST substitution? It should make NN even with almost random values playable and not random mover.

Sure you can, although a simple NNUE usually uses PST inputs anyway.

Understanding neural networks in chess... from RL point.

Understanding neural networks in chess... from RL point.

Re: Understanding neural networks in chess... from RL point.

Re: Understanding neural networks in chess... from RL point.

Re: Understanding neural networks in chess... from RL point.

Re: Understanding neural networks in chess... from RL point.