Hi... I'm working on engine research

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
hgm
Posts: 28435
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Hi... I'm working on engine research

Post by hgm »

syzygy wrote: Thu Jan 01, 2026 7:04 pm
hgm wrote: Thu Jan 01, 2026 8:59 am
syzygy wrote: Wed Dec 31, 2025 9:58 amWhat counts is the average performance, i.e. maximum Elo.
Says who? It seems to me that you are assuming this as an axiom, which of course is the easiest way to prove anything...
Says I, and I explained why. In chess you win some, you lose some. Every engine will make mistakes in some positions. If you focus on not making any mistakes, you end up with an engine that plays hundreds of Elo weaker than the strongest engine, which means it will just make MORE mistakes, at least in practical play.

I understand that you don't care about practical play but about doing well in artificial positions that trick most engines but(/because) they will never show up in a real game. That is fine!
The point of course is that having an engine play games against other engines is not a significant use case. Outside the small community of developers and testers there is no one who wants to do that. The majority of engine users want to analyze positions with it. Positions from games that the engine did not play.

To get high Elo, you only have to perform well in positions that can occur in your own games. Positions the engine helps selecting. It doesn't hurt your Elo when you perform dismally in positions of a type the engine has learned to avoid. But users could encounter such positions all the time in their own games.

This is not just hypothetical. LC0 trained for high Elo plays like crap in Knight-odds games. Just training for high Elo keeps it oblivious from even the slightest idea what you can do best when you are down a piece early in the game.
Uri Blass
Posts: 11139
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Hi... I'm working on engine research

Post by Uri Blass »

hgm wrote: Fri Jan 02, 2026 8:28 am
syzygy wrote: Thu Jan 01, 2026 7:04 pm
hgm wrote: Thu Jan 01, 2026 8:59 am
syzygy wrote: Wed Dec 31, 2025 9:58 amWhat counts is the average performance, i.e. maximum Elo.
Says who? It seems to me that you are assuming this as an axiom, which of course is the easiest way to prove anything...
Says I, and I explained why. In chess you win some, you lose some. Every engine will make mistakes in some positions. If you focus on not making any mistakes, you end up with an engine that plays hundreds of Elo weaker than the strongest engine, which means it will just make MORE mistakes, at least in practical play.

I understand that you don't care about practical play but about doing well in artificial positions that trick most engines but(/because) they will never show up in a real game. That is fine!
The point of course is that having an engine play games against other engines is not a significant use case. Outside the small community of developers and testers there is no one who wants to do that. The majority of engine users want to analyze positions with it. Positions from games that the engine did not play.

To get high Elo, you only have to perform well in positions that can occur in your own games. Positions the engine helps selecting. It doesn't hurt your Elo when you perform dismally in positions of a type the engine has learned to avoid. But users could encounter such positions all the time in their own games.

This is not just hypothetical. LC0 trained for high Elo plays like crap in Knight-odds games. Just training for high Elo keeps it oblivious from even the slightest idea what you can do best when you are down a piece early in the game.
Note only that stockfish usually knows to keep a winning position when it is up a piece and the advantage of lc0 is in getting points against weaker opponents when it is a piece down.

Finding the best move in a losing position(the move that is losing in more moves) is not always the best practical move to get points but I believe it gives better practical chances relative to stockfish's move.

For analysis I may be interested both in best move to delay mate and best move to give practical chances(when in both of them stockfish is not best).
FireDragon761138
Posts: 18
Joined: Sun Dec 28, 2025 7:25 am
Full name: Aaron Munn

Re: Hi... I'm working on engine research

Post by FireDragon761138 »

OK, now I am digging into the weeds of SPRT testing using Stockfish and Theoria. What I am learning through research is really illimunating about the weaknesses of Stockfish for human interpretability. Standard Fishtest testing involves only 200ms/move, which is far too short for lc0 trained networks to draw out meanigful strategic patterns through search. Basically, Stockfish is good at finding checks, threats, capture- forcing moves, but it's telos isn't towards human-like ideas in the slightetest.

Some oddities in testing. In Stockfish vs. Theoria play, using UHO_2022, black never wins, no matter who is playing. But using a regular opening book (Silver Suite) results is a huge amount of draws (nearly 90 percent).

Stockfish is probably slightly stronger than my engine at 3s/move time controls, but I'm having some trouble drawing out really good testing data. AI analysis of the results says there's games being dropped, maybe the engines crashing.

Tentative hypothesis based on what I'm seeing: Stockfish is not a great engine for human analysis. Almost any other engine would be just about as good. Rodent IV, Rybka, Crafty, whatever. It doesn't see chess in a fundamentally different way, there's no attempt at drawing out holistic patterns beyond "find me a forcing move quickly".
chrisw
Posts: 4757
Joined: Tue Apr 03, 2012 4:28 pm
Location: Midi-Pyrénées
Full name: Christopher Whittington

Re: Hi... I'm working on engine research

Post by chrisw »

FireDragon761138 wrote: Fri Jan 02, 2026 2:17 pm OK, now I am digging into the weeds of SPRT testing using Stockfish and Theoria. What I am learning through research is really illimunating about the weaknesses of Stockfish for human interpretability. Standard Fishtest testing involves only 200ms/move, which is far too short for lc0 trained networks to draw out meanigful strategic patterns through search. Basically, Stockfish is good at finding checks, threats, capture- forcing moves, but it's telos isn't towards human-like ideas in the slightetest.

Some oddities in testing. In Stockfish vs. Theoria play, using UHO_2022, black never wins, no matter who is playing. But using a regular opening book (Silver Suite) results is a huge amount of draws (nearly 90 percent).

Stockfish is probably slightly stronger than my engine at 3s/move time controls, but I'm having some trouble drawing out really good testing data. AI analysis of the results says there's games being dropped, maybe the engines crashing.

Tentative hypothesis based on what I'm seeing: Stockfish is not a great engine for human analysis. Almost any other engine would be just about as good. Rodent IV, Rybka, Crafty, whatever. It doesn't see chess in a fundamentally different way, there's no attempt at drawing out holistic patterns beyond "find me a forcing move quickly".
It’s established you’re an engine programmer newbie, but you talk about “my engine” of which stockfish is “only slightly stronger”. My Engine is oxymoronic here.

It’s every engine newbie that tells us SF is somehow fundamentally flawed and his new wonder approach (often padded out with BS, philosophical rambling and unprovable terminology, as here) is going to discover some deep patterns miraculously that SF (after aeons of development) is unable to see through either idiocy or just ineptitude on the part of its developers.

Well, I call BS.
syzygy
Posts: 5825
Joined: Tue Feb 28, 2012 11:56 pm

Re: Hi... I'm working on engine research

Post by syzygy »

hgm wrote: Fri Jan 02, 2026 8:28 amThis is not just hypothetical. LC0 trained for high Elo plays like crap in Knight-odds games. Just training for high Elo keeps it oblivious from even the slightest idea what you can do best when you are down a piece early in the game.
OK, I agree that finding the "best" move in a non-artificial position that is objectively clearly lost is an interesting and useful goal.

I could imagine having different networks for "undecided" and "decided" positions.
However, an alpha/beta-type engine might simply not be the best choice for finding moves that might trick a winning opponent into a draw. The better its search, the quicker it will realize that no move can prevent the position from going further downhill in the next 10-15 moves. So I doubt that loading a specially trained NNUE network into SF could do the trick.

So there is room for chess programmers to develop a new type of engine (in so far as Leela Odds hasn't already solved this problem). Perhaps one should try to train an LLM for this. Something that is inherently bad at a deep tactical searches.
syzygy
Posts: 5825
Joined: Tue Feb 28, 2012 11:56 pm

Re: Hi... I'm working on engine research

Post by syzygy »

Uri Blass wrote: Thu Jan 01, 2026 12:10 pmFor me different things are important and I would like to buy an engine that can force mate with a smaller number of moves in games with odds.
Stockfish is not the best and old Wasp can mate faster against stockfish in all my queen odds tests.
To make SF convert a clearly won game into mate more quickly one could probably train a dedicated NNUE network.
Joerg Oster
Posts: 990
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: Hi... I'm working on engine research

Post by Joerg Oster »

syzygy wrote: Fri Jan 02, 2026 3:41 pm
Uri Blass wrote: Thu Jan 01, 2026 12:10 pmFor me different things are important and I would like to buy an engine that can force mate with a smaller number of moves in games with odds.
Stockfish is not the best and old Wasp can mate faster against stockfish in all my queen odds tests.
To make SF convert a clearly won game into mate more quickly one could probably train a dedicated NNUE network.
Why do you think this is a matter of the NNUE net and not of the search?
Just curious.
Jörg Oster
syzygy
Posts: 5825
Joined: Tue Feb 28, 2012 11:56 pm

Re: Hi... I'm working on engine research

Post by syzygy »

Joerg Oster wrote: Fri Jan 02, 2026 3:58 pm
syzygy wrote: Fri Jan 02, 2026 3:41 pm
Uri Blass wrote: Thu Jan 01, 2026 12:10 pmFor me different things are important and I would like to buy an engine that can force mate with a smaller number of moves in games with odds.
Stockfish is not the best and old Wasp can mate faster against stockfish in all my queen odds tests.
To make SF convert a clearly won game into mate more quickly one could probably train a dedicated NNUE network.
Why do you think this is a matter of the NNUE net and not of the search?
Just curious.
I am assuming this is not about actually finding the (shortest) mate but about more efficiently converting an easily won game into mate. So this is not about, say, disabling certain pruning techniques that cause SF to do a bit worse on mate finding but about generally giving higher evaluations to positions that are closer to mate. (At least this is how I understand Uri's observations.)

Perhaps there are search parameters in SF that are badly tuned for already decided positions, but that seems less likely than its evaluation not having been tuned to distinguish between 100% won positions on the basis of the remaining number of moves to actual mate.

Perhaps Uri could explain what SF is doing "wrong" in chess terms. Should SF just start simplifying the position into a trivially won endgame (which is what most humans would do but does not necessarily result in the actual fastest mate) or should SF try to force a quick mate while still in the middle game? I suppose it is the former?
Joerg Oster
Posts: 990
Joined: Fri Mar 10, 2006 4:29 pm
Location: Germany
Full name: Jörg Oster

Re: Hi... I'm working on engine research

Post by Joerg Oster »

syzygy wrote: Fri Jan 02, 2026 8:16 pm
Joerg Oster wrote: Fri Jan 02, 2026 3:58 pm
syzygy wrote: Fri Jan 02, 2026 3:41 pm
Uri Blass wrote: Thu Jan 01, 2026 12:10 pmFor me different things are important and I would like to buy an engine that can force mate with a smaller number of moves in games with odds.
Stockfish is not the best and old Wasp can mate faster against stockfish in all my queen odds tests.
To make SF convert a clearly won game into mate more quickly one could probably train a dedicated NNUE network.
Why do you think this is a matter of the NNUE net and not of the search?
Just curious.
I am assuming this is not about actually finding the (shortest) mate but about more efficiently converting an easily won game into mate. So this is not about, say, disabling certain pruning techniques that cause SF to do a bit worse on mate finding but about generally giving higher evaluations to positions that are closer to mate. (At least this is how I understand Uri's observations.)

Perhaps there are search parameters in SF that are badly tuned for already decided positions, but that seems less likely than its evaluation not having been tuned to distinguish between 100% won positions on the basis of the remaining number of moves to actual mate.

Perhaps Uri could explain what SF is doing "wrong" in chess terms. Should SF just start simplifying the position into a trivially won endgame (which is what most humans would do but does not necessarily result in the actual fastest mate) or should SF try to force a quick mate while still in the middle game? I suppose it is the former?
No, SF doesn't make this distinction, neither during training of the net nor in the eval provided by the net. Afaik this is.
I'm afraid, this is in fact only a matter of the search.
Jörg Oster
syzygy
Posts: 5825
Joined: Tue Feb 28, 2012 11:56 pm

Re: Hi... I'm working on engine research

Post by syzygy »

Joerg Oster wrote: Fri Jan 02, 2026 9:40 pmNo, SF doesn't make this distinction, neither during training of the net nor in the eval provided by the net. Afaik this is.
I'm afraid, this is in fact only a matter of the search.
If the evaluation is not tuned to give higher evaluations to 100% winning positions that are closer to mate than other 100% winning positions, then how can the search know how to make progress? Of course SF's search is good enough that it will essentially always convert a 100% winning position eventually, but this could be the reason why current SF is perhaps worse at converting than the old HCE versions. (I'm not saying it is worse, I'm just trying to explain Uri's observations.) This is not a search issue, i.e. there is nothing in the search code that could improve this.

(If I misunderstand Uri's observation and the observed problem is simply that SF less often reports the shortest mate than other engines, then this would very likely be a search issue. But solving it would cost Elo. I believe there are already enough SF forks that address this to improve mate finding at the cost of general playing strength, and people can use those forks.)