Devlog of Leorik

lithander · Post by **lithander** » Sun Jan 29, 2023 2:13 am

It's time for another Devlog update!

For the upcoming version 2.4 I want to focus on the search again. So far I have two small changes that are uncontroversial:

1.) I had implemented SEE a while ago already but only now managed to make it fast enough that it gave me a strength increase when used in QSearch.
2.) I also added the condition that a position should at least have a static evaluation greater alpha to be eligible for null move pruning. (because the null move tries to find positions that are too good it doesn't make much sense to even start it on positions that look terrible)

And that brought my attention to the other use-case of the static eval in my search:

Code: Select all

const int MAX_GAIN_PER_PLY = 70; //upper bound on the amount of cp you can hope to make good in a ply
const int FUTILITY_RANGE = 4;
             
bool interesting = playState.Stage == Stage.PV|| inCheck || next.InCheck();
if (remaining <= FUTILITY_RANGE && !interesting)
{
	//if the static eval looks much worse than alpha also skip it
	float futilityMargin = alpha - remaining * MAX_GAIN_PER_PLY;
	if (next.RelativeScore(current.SideToMove) < futilityMargin)
		continue;
}

The idea is to ignore moves that result in a static evaluation way below alpha. I never really tuned the constants and so I created a version where I could change the values for 'MAX_GAIN_PER_PLY' and 'FUTILITY_RANGE' via UCI setoption command to experiment. The FUTILITY_RANGE proved interesting because no matter how much I increased it the engine did not get weaker. In disbelief I ran a test against the 300 WAC positions at depth 15 and the number of solved positions dropped by 11 with the extreme case of FUTILITY_RANGE > 15. Not surprisingly the engine had started to fail on positions where sacrifices are required. These counterintuitive moves would cause a bad static eval and would thus be skipped without further evaluation.

Here is an example of what Leorik could no longer solve:

[fen]1nbq1r1k/3rbp1p/p1p1pp1Q/1p6/P1pPN3/5NP1/1P2PPBP/R4RK1 w - -[/fen]
The best move is to sacrifice the Knight with 'Nfg5'.

But in actual play the version didn't do so bad so I thought I should give it a real chance. After a few iterations the code looked like this:

Code: Select all

if (!interesting && !Evaluation.IsCheckmate(Score))
{
	//if the static eval looks much worse than alpha also skip it
	int margin = playState.Stage == Stage.Quiets ? _options.LateFutilityMargin : _options.FutilityMargin;
	float futilityMargin = alpha - remaining * margin;
	if (next.RelativeScore(current.SideToMove) < futilityMargin)
		continue;
}

Now I had two different values to tune (one for late quiet moves and one for the rest) and it seemed like a perfect opportunity to try the https://github.com/kiudee/chess-tuning-tools which I had book-marked a while ago. After some problems with getting it to run it worked as advertised: I gave it the task to find the best values for LateFutilityMargin and FutilityMargin as used in the above code snipped and it produced correct results and pretty pictures.

...and the predicted Elo gain was actually reproducable both in self-play matches

and in a gauntlet against other engines.

I could be very happy about the +20 Elo gain if it didn't come with the above described weakness of making the engine much worse in reasoning about one of the coolest type of moves: sacrifices!

Since then I have tried to find a way of keeping the strength increase from skipping *all* bad looking moves regardless of depth while also retaining the ability to solve all the puzzles that Leorik could solve before the change. Sadly so far I haven't found the right approach and I start to fear these goals are mutually exclusive.

Suggestions are welcome!

Ras · Post by **Ras** » Sun Jan 29, 2023 3:18 am

lithander wrote: ↑Sun Jan 29, 2023 2:13 am2.) I also added the condition that a position should at least have a static evaluation greater alpha to be eligible for null move pruning. (because the null move tries to find positions that are too good it doesn't make much sense to even start it on positions that look terrible)

I think testing against beta would make even more sense because "too good" means that it would fall under beta pruning anyway, but only after investing more computing time.

j.t. · Post by **j.t.** » Sun Jan 29, 2023 4:22 am

lithander wrote: ↑Sun Jan 29, 2023 2:13 am Suggestions are welcome!

Not something about the sacrifice dilemma, but another idea:

One thing you could try is not only skipping moves for which the futility margin is big enough, but also reducing the depth for moves by the amount that would still be allowed. E.g. if a specific position would be skipped because of its bad static eval when at depth 4, but not when at depth 5, then instead of not skipping when at depth 5, you could reduce the search to depth 1.

Mike Sherwin · Post by **Mike Sherwin** » Sun Jan 29, 2023 9:00 am

lithander wrote: ↑Sun Jan 29, 2023 2:13 am Suggestions are welcome!

It has been a long time since I have thought about this problem. The solution I believe is understanding how strong human players deal with it. When a strong human considers making a sacrifice they switch from 'normal' thinking about the position. They do not value material as much and instead value activity more. It is my opinion that an engine can do the same. If the first move is a sacrifice then increase the positional values like mobility and king safety or whatever else in the eval that is appropriate.

Another approach might be to increase margins if the first (or if depth > n) move loses material.

And yet another idea would be to do a less deep search with all pruning turned down and compare that score with that of a normal depth search and pick the better scoring move.

Also extending a line that losses material but has a better positional score.

Or just not pruning a line that has positional compensation for the material.

And so and so on.

JoAnnP38 · Post by **JoAnnP38** » Sun Jan 29, 2023 11:07 pm

Mike Sherwin wrote: ↑Sun Jan 29, 2023 9:00 am
lithander wrote: ↑Sun Jan 29, 2023 2:13 am Suggestions are welcome!
It has been a long time since I have thought about this problem. The solution I believe is understanding how strong human players deal with it. When a strong human considers making a sacrifice they switch from 'normal' thinking about the position. They do not value material as much and instead value activity more. It is my opinion that an engine can do the same. If the first move is a sacrifice then increase the positional values like mobility and king safety or whatever else in the eval that is appropriate.

Isn't that what your evaluation function provides though? If you are going to sacrifice a piece or pawn you have to gain something in return (i.e. like positional value (mobility, center control, attack on enemy king. etc.) To me at least, it seems like a well weighted evaluation would support a sacrifice of material for position provided the gain was significant. And if the sacrifice led to a forced checkmate (or a draw if you are behind) then the search should theoretically be able to work that out. Unfortunately, what I am now learning is that selectivity often eliminates moves from the search that might lead to such a beneficial sacrifice so it isn't even considered. It would be very difficult for most engines to sacrifice a piece that leads to a checkmate 52 moves later (or in Pedantic's case sometimes a few as 4 moves later

). So instead of increasing positional values as you suggest, its almost like you need less selectivity in cases where a sacrifice might lead to a beneficial position or perhaps some adjustment to move ordering?

Mike Sherwin · Post by **Mike Sherwin** » Mon Jan 30, 2023 4:48 pm

JoAnnP38 wrote: ↑Sun Jan 29, 2023 11:07 pm
Mike Sherwin wrote: ↑Sun Jan 29, 2023 9:00 am
lithander wrote: ↑Sun Jan 29, 2023 2:13 am Suggestions are welcome!
It has been a long time since I have thought about this problem. The solution I believe is understanding how strong human players deal with it. When a strong human considers making a sacrifice they switch from 'normal' thinking about the position. They do not value material as much and instead value activity more. It is my opinion that an engine can do the same. If the first move is a sacrifice then increase the positional values like mobility and king safety or whatever else in the eval that is appropriate.
Isn't that what your evaluation function provides though? If you are going to sacrifice a piece or pawn you have to gain something in return (i.e. like positional value (mobility, center control, attack on enemy king. etc.) To me at least, it seems like a well weighted evaluation would support a sacrifice of material for position provided the gain was significant. And if the sacrifice led to a forced checkmate (or a draw if you are behind) then the search should theoretically be able to work that out. Unfortunately, what I am now learning is that selectivity often eliminates moves from the search that might lead to such a beneficial sacrifice so it isn't even considered. It would be very difficult for most engines to sacrifice a piece that leads to a checkmate 52 moves later (or in Pedantic's case sometimes a few as 4 moves later ). So instead of increasing positional values as you suggest, its almost like you need less selectivity in cases where a sacrifice might lead to a beneficial position or perhaps some adjustment to move ordering?

Why do you think that increasing positional values leads to more selectivity? It just changes what the selectivity is triggered by. The larger the positional values are the less likely a line that is material deficient will be pruned if there is positional compensation.

Pio · Post by **Pio** » Mon Jan 30, 2023 11:30 pm

lithander wrote: ↑Sun Jan 29, 2023 2:13 am It's time for another Devlog update!

For the upcoming version 2.4 I want to focus on the search again. So far I have two small changes that are uncontroversial:

1.) I had implemented SEE a while ago already but only now managed to make it fast enough that it gave me a strength increase when used in QSearch.
2.) I also added the condition that a position should at least have a static evaluation greater alpha to be eligible for null move pruning. (because the null move tries to find positions that are too good it doesn't make much sense to even start it on positions that look terrible)

And that brought my attention to the other use-case of the static eval in my search:
Code: Select all
const int MAX_GAIN_PER_PLY = 70; //upper bound on the amount of cp you can hope to make good in a ply
const int FUTILITY_RANGE = 4;
             
bool interesting = playState.Stage == Stage.PV|| inCheck || next.InCheck();
if (remaining <= FUTILITY_RANGE && !interesting)
{
	//if the static eval looks much worse than alpha also skip it
	float futilityMargin = alpha - remaining * MAX_GAIN_PER_PLY;
	if (next.RelativeScore(current.SideToMove) < futilityMargin)
		continue;
}
The idea is to ignore moves that result in a static evaluation way below alpha. I never really tuned the constants and so I created a version where I could change the values for 'MAX_GAIN_PER_PLY' and 'FUTILITY_RANGE' via UCI setoption command to experiment. The FUTILITY_RANGE proved interesting because no matter how much I increased it the engine did not get weaker. In disbelief I ran a test against the 300 WAC positions at depth 15 and the number of solved positions dropped by 11 with the extreme case of FUTILITY_RANGE > 15. Not surprisingly the engine had started to fail on positions where sacrifices are required. These counterintuitive moves would cause a bad static eval and would thus be skipped without further evaluation.

Here is an example of what Leorik could no longer solve:

[fen]1nbq1r1k/3rbp1p/p1p1pp1Q/1p6/P1pPN3/5NP1/1P2PPBP/R4RK1 w - -[/fen]
The best move is to sacrifice the Knight with 'Nfg5'.

But in actual play the version didn't do so bad so I thought I should give it a real chance. After a few iterations the code looked like this:
Code: Select all
if (!interesting && !Evaluation.IsCheckmate(Score))
{
	//if the static eval looks much worse than alpha also skip it
	int margin = playState.Stage == Stage.Quiets ? _options.LateFutilityMargin : _options.FutilityMargin;
	float futilityMargin = alpha - remaining * margin;
	if (next.RelativeScore(current.SideToMove) < futilityMargin)
		continue;
}
Now I had two different values to tune (one for late quiet moves and one for the rest) and it seemed like a perfect opportunity to try the https://github.com/kiudee/chess-tuning-tools which I had book-marked a while ago. After some problems with getting it to run it worked as advertised: I gave it the task to find the best values for LateFutilityMargin and FutilityMargin as used in the above code snipped and it produced correct results and pretty pictures.

...and the predicted Elo gain was actually reproducable both in self-play matches

and in a gauntlet against other engines.

I could be very happy about the +20 Elo gain if it didn't come with the above described weakness of making the engine much worse in reasoning about one of the coolest type of moves: sacrifices!

Since then I have tried to find a way of keeping the strength increase from skipping *all* bad looking moves regardless of depth while also retaining the ability to solve all the puzzles that Leorik could solve before the change. Sadly so far I haven't found the right approach and I start to fear these goals are mutually exclusive.

Suggestions are welcome!

I would try to reduce bad captures less at higher depths (closer to the root) for two reasons. Firstly it would reduce the branching factor having bad captures reduced less at higher depths than lower depths since you will simplify the position quicker and on average reducing the number of legal moves. Secondly it won’t matter so much reducing the bad captures more closer to the leafs (low depth) since the horizon is too close to probably make the move look good before returning.

Good luck!
/Pio

lithander · Post by **lithander** » Sat Feb 04, 2023 2:39 am

j.t. wrote: ↑Sun Jan 29, 2023 4:22 am One thing you could try is not only skipping moves for which the futility margin is big enough, but also reducing the depth for moves by the amount that would still be allowed. E.g. if a specific position would be skipped because of its bad static eval when at depth 4, but not when at depth 5, then instead of not skipping when at depth 5, you could reduce the search to depth 1.

Pio wrote: ↑Mon Jan 30, 2023 11:30 pm I would try to reduce bad captures less at higher depths (closer to the root) for two reasons. Firstly it would reduce the branching factor having bad captures reduced less at higher depths than lower depths since you will simplify the position quicker and on average reducing the number of legal moves. Secondly it won’t matter so much reducing the bad captures more closer to the leafs (low depth) since the horizon is too close to probably make the move look good before returning.

Currently I'm not reducing bad captures at all. In QSearch I skip them completely and otherwise all captures are MVV-LVA sorted but otherwise treated the same. But what both of you suggest makes sense: I should probably just reduce but not skip moves entirely.

By now I have spent a dozen hours on the problem described in the last post. I was trying different ways of computing the futility-margin to retain some of the performance gains of the aggressively pruning version without losing too much accuracy on the WAC test set. I've tried lookup tables for the margins based on the remaining depth (plies yet to search). I tried polynomials of the form margin(ply) = a * ply^2 + b * ply + c

To help me tune the margin formula I compiled a version that does not prune but instead logs to console whenever it would have mispredicted and pruned erroneously. The resulting log looks scary. A careful selection of margins and preconditions (no checks, no mate-score, not in endgame, only when <X plies remaining...) can only attempt to minimize the damage but won't make it entirely safe. And the nodes-to-depth quickly increase with every safety measure until the entire endeavor is no longer worth it.

What I realized only through these experiments: Any reasonable margin would still be capable of missing some really good moves: a mate in one just because the move before checkmate results in a position that doesn't beat alpha when statically evaluated! Or it misjudges the situation because it's not aware of 3-fold repetitions. Or it could have returned a better value for the move from the transposition table but it never looked it up. So many ways where this could be wrong... and only the fact that you search a thinner tree to a deeper depth makes it a potentially winning trade. But how many blunders will it cause? Is it still a net-win at the long timecontrols of tournaments?

I'm tempted to remove it entirely and live with the ensuing Elo loss.

JVMerlino · Post by **JVMerlino** » Sat Feb 04, 2023 8:15 am

lithander wrote: ↑Sun Jan 29, 2023 2:13 am Here is an example of what Leorik could no longer solve:

[fen]1nbq1r1k/3rbp1p/p1p1pp1Q/1p6/P1pPN3/5NP1/1P2PPBP/R4RK1 w - -[/fen]
The best move is to sacrifice the Knight with 'Nfg5'.

Ah yes, WAC 293, which is a similar thorn in Myrddin's side. Current Myrddin finds it at d15 in about 14 seconds with one CPU. Normally I test WAC at 5s per position. So this is one of the four that are not found within the time limit.

j.t. · Post by **j.t.** » Sat Feb 04, 2023 11:37 am

lithander wrote: ↑Sat Feb 04, 2023 2:39 am The resulting log looks scary. A careful selection of margins and preconditions (no checks, no mate-score, not in endgame, only when <X plies remaining...) can only attempt to minimize the damage but won't make it entirely safe. And the nodes-to-depth quickly increase with every safety measure until the entire endeavor is no longer worth it.

What I realized only through these experiments: Any reasonable margin would still be capable of missing some really good moves: a mate in one just because the move before checkmate results in a position that doesn't beat alpha when statically evaluated! Or it misjudges the situation because it's not aware of 3-fold repetitions. Or it could have returned a better value for the move from the transposition table but it never looked it up. So many ways where this could be wrong... and only the fact that you search a thinner tree to a deeper depth makes it a potentially winning trade. But how many blunders will it cause? Is it still a net-win at the long timecontrols of tournaments?

I'm tempted to remove it entirely and live with the ensuing Elo loss.

Don't you have a similar problem quite generally every time you stop the search before the game is over? The futility pruning just cuts the tree at specifically the nodes, which are statistically less likely to be in any PV, in exchange for more search depth for lines that probably matter.

For preconditions, I found that futility pruning only works well, if I'm in a null window search with no checks ( not in check and no checking move (I do my futility reductions in the move loop)).

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik