More accurate evaluation function leads to worse play?
Moderator: Ras
-
- Posts: 3
- Joined: Fri Feb 07, 2025 6:06 am
- Full name: Zhongle C. Qu
More accurate evaluation function leads to worse play?
Hi everyone. I've been trying to improve the evaluation function of my engine recently to make it more accurate. To measure the accuracy, I selected 50k quiet positions (side to move not in check, best move not capture or check), used both my engine and stockfish to analyze them with low depth, and compared them with the static evaluation. I computed R^2 scores (coefficient of determination) and there is indeed an improvement (from 0.38 to 0.52). However, when I test the engine, the playing strength actually dropped (-180 elo). This feels so counterintuitive. Shouldn't a more accurate evaluation function result in a gain in playing strength?
-
- Posts: 31
- Joined: Fri May 30, 2025 10:18 pm
- Full name: Ben Vining
Re: More accurate evaluation function leads to worse play?
Maybe not if the upgraded eval function takes way more time than the simpler one?
-
- Posts: 3
- Joined: Fri Feb 07, 2025 6:06 am
- Full name: Zhongle C. Qu
Re: More accurate evaluation function leads to worse play?
The upgraded one is around 1.3x slower, so that shouldn't be a huge problem. However I noticed that the branching factor has increased a little, and the engine is now looking at a lot more nodes. But I have no idea why this is happening.
-
- Posts: 259
- Joined: Sat Mar 11, 2006 8:31 am
- Location: Malmö, Sweden
- Full name: Bo Persson
Re: More accurate evaluation function leads to worse play?
When you have a more "accurate" eveluation, you might also get more different scores. If you have scores 10, 10, 10 you can get cut-offs from "no improvement", but scores 11, 10, 12 might requires more search to tell them apart.
It is common to have to balance speed and "accuracy" in the program, and realize that some evaluation terms might just be to expensive to compute. Getting the correct answer too late doesn't help.
It is common to have to balance speed and "accuracy" in the program, and realize that some evaluation terms might just be to expensive to compute. Getting the correct answer too late doesn't help.
-
- Posts: 129
- Joined: Sat Aug 01, 2015 6:16 pm
- Location: France
- Full name: Eric Bonneau
Re: More accurate evaluation function leads to worse play?
I think a better evaluation is one that makes your engine more "comfortable" (i.e. efficient) with.
Getting closer to Stockfish's eval drifted it away from positions it manages best.
Just my 2 cents...
Getting closer to Stockfish's eval drifted it away from positions it manages best.
Just my 2 cents...
-
- Posts: 3
- Joined: Fri Feb 07, 2025 6:06 am
- Full name: Zhongle C. Qu
-
- Posts: 129
- Joined: Sat Aug 01, 2015 6:16 pm
- Location: France
- Full name: Eric Bonneau
Re: More accurate evaluation function leads to worse play?
Like any engine, yours has strengthes & weaknesses. I mean positions where it can show, rely on, its strength rather than expose itself to danger because of some potential weakness.
Tuning the eval towards SF's may lead to unnatural moves with regards to your engine skills, TMHO.
-
- Posts: 10803
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: More accurate evaluation function leads to worse play?
changing the evaluation may change the search of the engine so even simply multiplying the evaluation by 2 can cause a reduction or improvement in playing strength because of different pruning even if the evaluation has the same accuracy.
-
- Posts: 10803
- Joined: Thu Mar 09, 2006 12:37 am
- Location: Tel-Aviv Israel
Re: More accurate evaluation function leads to worse play?
I can add that evaluation that is more accurate can be less accurate in the question if position A is better than B.
if some engine is too optimistic in every position it may know better that position A is better than B relative to the case that you change the evaluation to be correct only in part of the cases.
The engine may prefer losing move relative to drawing move that it know that it is a draw when earlier it prefered the drawing move because it considered it as +2 and the losing alternative only as +1
if some engine is too optimistic in every position it may know better that position A is better than B relative to the case that you change the evaluation to be correct only in part of the cases.
The engine may prefer losing move relative to drawing move that it know that it is a draw when earlier it prefered the drawing move because it considered it as +2 and the losing alternative only as +1