More accurate evaluation function leads to worse play?

xylist · Post by **xylist** » Thu Jul 17, 2025 4:44 am

Hi everyone. I've been trying to improve the evaluation function of my engine recently to make it more accurate. To measure the accuracy, I selected 50k quiet positions (side to move not in check, best move not capture or check), used both my engine and stockfish to analyze them with low depth, and compared them with the static evaluation. I computed R^2 scores (coefficient of determination) and there is indeed an improvement (from 0.38 to 0.52). However, when I test the engine, the playing strength actually dropped (-180 elo). This feels so counterintuitive. Shouldn't a more accurate evaluation function result in a gain in playing strength?

benvining · Post by **benvining** » Thu Jul 17, 2025 5:30 am

Maybe not if the upgraded eval function takes way more time than the simpler one?

xylist · Post by **xylist** » Thu Jul 17, 2025 6:42 am

benvining wrote: ↑Thu Jul 17, 2025 5:30 am Maybe not if the upgraded eval function takes way more time than the simpler one?

The upgraded one is around 1.3x slower, so that shouldn't be a huge problem. However I noticed that the branching factor has increased a little, and the engine is now looking at a lot more nodes. But I have no idea why this is happening.

Bo Persson · Post by **Bo Persson** » Thu Jul 17, 2025 11:43 am

When you have a more "accurate" eveluation, you might also get more different scores. If you have scores 10, 10, 10 you can get cut-offs from "no improvement", but scores 11, 10, 12 might requires more search to tell them apart.

It is common to have to balance speed and "accuracy" in the program, and realize that some evaluation terms might just be to expensive to compute. Getting the correct answer too late doesn't help.

Tibono · Post by **Tibono** » Thu Jul 17, 2025 6:35 pm

I think a better evaluation is one that makes your engine more "comfortable" (i.e. efficient) with.
Getting closer to Stockfish's eval drifted it away from positions it manages best.
Just my 2 cents...

xylist · Post by **xylist** » Fri Jul 18, 2025 7:29 am

Tibono wrote: ↑Thu Jul 17, 2025 6:35 pm I think a better evaluation is one that makes your engine more "comfortable" (i.e. efficient) with.
Getting closer to Stockfish's eval drifted it away from positions it manages best.
Just my 2 cents...

What do you mean by "positions it manages best"?

Tibono · Post by **Tibono** » Fri Jul 18, 2025 8:19 am

xylist wrote: ↑Fri Jul 18, 2025 7:29 am What do you mean by "positions it manages best"?

Like any engine, yours has strengthes & weaknesses. I mean positions where it can show, rely on, its strength rather than expose itself to danger because of some potential weakness.
Tuning the eval towards SF's may lead to unnatural moves with regards to your engine skills, TMHO.

Uri Blass · Post by **Uri Blass** » Fri Jul 18, 2025 11:21 am

changing the evaluation may change the search of the engine so even simply multiplying the evaluation by 2 can cause a reduction or improvement in playing strength because of different pruning even if the evaluation has the same accuracy.

Uri Blass · Post by **Uri Blass** » Fri Jul 18, 2025 11:26 am

I can add that evaluation that is more accurate can be less accurate in the question if position A is better than B.

if some engine is too optimistic in every position it may know better that position A is better than B relative to the case that you change the evaluation to be correct only in part of the cases.

The engine may prefer losing move relative to drawing move that it know that it is a draw when earlier it prefered the drawing move because it considered it as +2 and the losing alternative only as +1

More accurate evaluation function leads to worse play?

More accurate evaluation function leads to worse play?

Re: More accurate evaluation function leads to worse play?

Re: More accurate evaluation function leads to worse play?

Re: More accurate evaluation function leads to worse play?

Re: More accurate evaluation function leads to worse play?

Re: More accurate evaluation function leads to worse play?

Re: More accurate evaluation function leads to worse play?

Re: More accurate evaluation function leads to worse play?

Re: More accurate evaluation function leads to worse play?