NNOM++ - Move Ordering Neural Networks?

smatovic · Post by **smatovic** » Sun Jul 24, 2022 11:30 pm

I read now and then that people tried to use NNUE for move ordering, with lil
success, may I ask what the blocker is in this?

Obviously we would like to replace all the selective search techniques and move
ordering/selection with a neural network driven approach.

My naive idea would be to use current NNUE trained on quiet positions for quiet
move selection and a dedicated network for ordering non-quiet moves.

Further both networks could be used to obtain a position score, a predicator,
and use this score with fine tuned parameters for pruning, extension and
reduction.

Maybe both used only in ~10% of nodes with Quiescence-Search still relying on
classic methods.

Considering an EBF of maybe ~2 in modern AB engines, the goal should be to come
up with a good predicator for AB engines and lower EBF?

Makes sense?

--
Srdja

smatovic · Post by **smatovic** » Mon Jul 25, 2022 12:12 am

Hmm, okay, let's assume 1000 clock cycles per NNUE inference, 36 moves per node, 36000 clock cycles per node, can not work out.

--
Srdja

dangi12012 · Post by **dangi12012** » Mon Jul 25, 2022 1:11 am

smatovic wrote: ↑Sun Jul 24, 2022 11:30 pm I read now and then that people tried to use NNUE for move ordering, with lil
success, may I ask what the blocker is in this?

Obviously we would like to replace all the selective search techniques and move
ordering/selection with a neural network driven approach.

My naive idea would be to use current NNUE trained on quiet positions for quiet
move selection and a dedicated network for ordering non-quiet moves.

Further both networks could be used to obtain a position score, a predicator,
and use this score with fine tuned parameters for pruning, extension and
reduction.

Maybe both used only in ~10% of nodes with Quiescence-Search still relying on
classic methods.

Considering an EBF of maybe ~2 in modern AB engines, the goal should be to come
up with a good predicator for AB engines and lower EBF?

Makes sense?

--
Srdja

Predictor networks in general do work - and are a good idea: https://lczero.org/dev/wiki/technical-e ... hess-zero/
Predicting what is a candidate - and what is a good position are two seperate problems and the topology of alphazero/lc0 proves as much.

What is currently in work for computerchess is writing an engine that fully runs on the gpu and is not keeps half of its algo running on the cpu - and the other half on the gpu. This cannot be a AB engine for technical reasons - and cannot use the cpu for latency reasons. You have to pick one of the many breadth-first algos that lend themselves to parallelism.
As a counter you get 1-2 orders of magnitude more flops for NN evaluation. (the performance goes into a descent predictor network to bring the nodes down as much as possible)

Also who is to say that the NNUE topology of halfKP -> 32x2x256 -> 32x32 -> 32x1 is the perfect topology?
Not to mention that BFloat16 support is on the roadmap for all future Intel/AMD/GPUs

smatovic wrote: ↑Mon Jul 25, 2022 12:12 am Hmm, okay, let's assume 1000 clock cycles per NNUE inference, 36 moves per node, 36000 clock cycles per node, can not work out.

This can be calculated exactly and I counter that you overestimate - and one (incremental) NNUE eval is much cheaper.

hgm · Post by **hgm** » Mon Jul 25, 2022 6:30 am

Techniques for move ordering in AB engines are extremely primitive. NN are another extreme, and might be overkill. I am convinced there will be techniques that are enormously superior to the current industry standard of MVV/LVA, SEE, killer, history, but only take a fraction of the time of even a simply NN.

Joost Buijs · Post by **Joost Buijs** » Mon Jul 25, 2022 8:41 am

smatovic wrote: ↑Mon Jul 25, 2022 12:12 am Hmm, okay, let's assume 1000 clock cycles per NNUE inference, 36 moves per node, 36000 clock cycles per node, can not work out.

Since most networks are trained on game data you can use a classification network with softmax activation to predict the possibility for each move. Of course this will slow, but perfectly usable for nodes closer to the root. Move ordering near the root is more important than move ordering near the leaves anyway.

smatovic · Post by **smatovic** » Mon Jul 25, 2022 10:53 am

dangi12012 wrote: ↑Mon Jul 25, 2022 1:11 am Predictor networks in general do work - and are a good idea: https://lczero.org/dev/wiki/technical-e ... hess-zero/
Predicting what is a candidate - and what is a good position are two seperate problems and the topology of alphazero/lc0 proves as much.

As far as I got it, Lc0 uses the same network with a different head? Policy and Value?

dangi12012 wrote: ↑Mon Jul 25, 2022 1:11 am What is currently in work for computerchess is writing an engine that fully runs on the gpu and is not keeps half of its algo running on the cpu - and the other half on the gpu. This cannot be a AB engine for technical reasons - and cannot use the cpu for latency reasons. You have to pick one of the many breadth-first algos that lend themselves to parallelism.
As a counter you get 1-2 orders of magnitude more flops for NN evaluation. (the performance goes into a descent predictor network to bring the nodes down as much as possible)

A predicator driven Breadth-First search?

dangi12012 wrote: ↑Mon Jul 25, 2022 1:11 am Also who is to say that the NNUE topology of halfKP -> 32x2x256 -> 32x32 -> 32x1 is the perfect topology?

Not me.

dangi12012 wrote: ↑Mon Jul 25, 2022 1:11 am Not to mention that BFloat16 support is on the roadmap for all future Intel/AMD/GPUs

I agree, BF16 might be interesting, on CPU and GPU.

dangi12012 wrote: ↑Mon Jul 25, 2022 1:11 am
smatovic wrote: ↑Mon Jul 25, 2022 12:12 am Hmm, okay, let's assume 1000 clock cycles per NNUE inference, 36 moves per node, 36000 clock cycles per node, can not work out.
This can be calculated exactly and I counter that you overestimate - and one (incremental) NNUE eval is much cheaper.

Hmm, if applied on only ~10% of the nodes, the search speed may halve, it will loose one ply search depth, such an approach could then catch up again if the EBF is lowered enough, or alike.

--
Srdja

j.t. · Post by **j.t.** » Mon Jul 25, 2022 3:40 pm

I recently had another semi-naive try at this with a non-efficienty-updatable neural network (768 + 8192) -> 256 -> 64 -> 1 (relu activated, last one sigmoid, first layer is sparse, 768 stands for the position, 8192 for a given move).

My idea was not to interfere with the move ordering of quiet moves, but simply decide whether I want to reduce the depth of a given move by 1 or not. That means that the network doesn't need to run on all quiet moves everytime, but only when there hasn't been a cut before. Together with using the network only when depth 6 or more is still left, this lead to a NPS decrease to roughly 50%.

However, in the end it didn't work (Elo loss of almost 100), presumably because either because the training was inadequate, or because it was too slow, or because history heuristic is already one of the best heuristics we can have. If you think about it, the fact that history heuristic is perfectly matched to exactly the type of position we're currently searching might mean that it could be very hard to design a move pruning (since it is usually coupled with LMR/P) strategy that is better than this.

The only training method I tried was reinforcement learning by letting my engine play against itself and record slight adjustments to the output of the neural network and then later use gradient descent to update the neural network accordingly whether a recorded (input, output) pair was part of a winning/losing player.

I can't remember exactly, but I think I had 200 epochs with 100,000 to 1,000,000 records from ~50 games for each epoch.

The big issue is probably that the records are way too noisy to get any useful information from a won/lost game. I thought about an alternative training scheme, where the search with the neural pruning should be optimized such that on many positions it gives (almost) the same answer as the search without neural pruning but with fewer nodes searched. But I haven't thought up a concrete implementation (e.g. how should "correct answer" and "fewer nodes" be balanced out?).

Before I start experimenting with this again, I will try to automatically optimize all my existing search parameters.

lithander · Post by **lithander** » Mon Jul 25, 2022 4:49 pm

I wonder how is the best move sequence defined?

Let's say I go back to the first version of Leorik, the one that didn't implement any "lossy" search optimizations. The version that basically produces the same result as alpha-beta search. Now I run a search on a position with a hypothetical perfect move ordering. Would that produce the smallest possible search tree that still guarantees the correct result?
So given a position and a depth could you use the size of a tree as a metric to grade the quality of your move ordering?

And if that is true then I wonder how transferable is what you learned in the above simple case about perfect move ordering to an engine that uses all the tricks like prunings, reductions, extensions, null-move and zero-window searches etc?

j.t. · Post by **j.t.** » Mon Jul 25, 2022 5:14 pm

lithander wrote: ↑Mon Jul 25, 2022 4:49 pm Would that produce the smallest possible search tree that still guarantees the correct result?

From a theoretical standpoint, I am not sure about this, because there might be some awesome heuristic that just by chance makes the tree smaller (e.g. by pruning) but never misses the best move this way. Maybe there was some research about the general optimality of alpha-beta, but a quick look at Wikipedia suggests that it has only been proven optimal for when leaf nodes are evaluated randomly, so some deep property of chess might be missed there, that would allow for such a perfect heuristic.

lithander wrote: ↑Mon Jul 25, 2022 4:49 pm So given a position and a depth could you use the size of a tree as a metric to grade the quality of your move ordering?

Yes, I think that makes sense. However, as soon as you introduce pruning based on that move ordering, you might find a new move ordering strategy that makes the tree smaller, because it puts hard positions (i.e. big branching factor) at the end of the move list, but it would play much worse, because it searches important moves less deep.

I made some small experiments that suggested that this is indeed an important fact to consider when optimizing move ordering:

j.t. wrote: ↑Fri Feb 18, 2022 4:17 am What I previously assumed was that making move ordering worse (in the sense that objectively good moves are searched late) would automatically increase node count, but I realized that that's not necessarily true. So you're right, that I should've targeted Elo directly.

You could try to train your move ordering on pure alpha-beta with no pruning though, that is definitely an interesting idea.
For learning how to prune correctly, however, I believe that you have to also look at the quality of the search result (e.g. by playing games or comparing it to a more thorough search).

dangi12012 · Post by **dangi12012** » Mon Jul 25, 2022 7:31 pm

lithander wrote: ↑Mon Jul 25, 2022 4:49 pm I wonder how is the best move sequence defined?

The theoretical best that can be achieved is reducing the effective branching factor to its square root.
Not the total nodes searched - but the branching factor per ply.

So yes - the smallest possible (Alphabeta) search tree that can exist. Either way its good to know that a positions "interestingness" and "eval" are seperate concepts.

A NN will bring out heuristics that humans dont know about or have missed / or are not expressed easily in code as a simple surjective function which happens to have more and more hardware acceleration as the years go by.

NNOM++ - Move Ordering Neural Networks?

NNOM++ - Move Ordering Neural Networks?

Re: NNOM++ - Move Ordering Neural Networks?

Re: NNOM++ - Move Ordering Neural Networks?

Re: NNOM++ - Move Ordering Neural Networks?

Re: NNOM++ - Move Ordering Neural Networks?

Re: NNOM++ - Move Ordering Neural Networks?

Re: NNOM++ - Move Ordering Neural Networks?

Re: NNOM++ - Move Ordering Neural Networks?

Re: NNOM++ - Move Ordering Neural Networks?

Re: NNOM++ - Move Ordering Neural Networks?