Engine evaluation consistency/stability

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

gordonr
Posts: 194
Joined: Thu Aug 06, 2009 8:04 pm
Location: UK

Engine evaluation consistency/stability

Post by gordonr »

Hi,

Sometimes when I'm analysing a position with Stockfish, the evaluation can vary quite significantly while stepping into the PV. Of course, I don't expect the eval to stay the same since afterall Stockfish may then be searching the subposition to a different depth, etc. However, my question is, do some engines tend to have a more consistent/stable evaluation than others when doing one step into the PV?

cheers
Gordon
Ferdy
Posts: 4833
Joined: Sun Aug 10, 2008 3:15 pm
Location: Philippines

Re: Engine evaluation consistency/stability

Post by Ferdy »

gordonr wrote: Wed Oct 09, 2019 1:15 pm Hi,

Sometimes when I'm analysing a position with Stockfish, the evaluation can vary quite significantly while stepping into the PV. Of course, I don't expect the eval to stay the same since afterall Stockfish may then be searching the subposition to a different depth, etc. However, my question is, do some engines tend to have a more consistent/stable evaluation than others when doing one step into the PV?

cheers
Gordon
I did a test some months ago. Search for EARS or engine analysis reliabilty score. I think I did a 6 ply comparison. Source code is available so you can do some experiments with latest engines.
jdart
Posts: 4366
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: Engine evaluation consistency/stability

Post by jdart »

I prefer Houdini for most analysis because its eval doesn't bounce around so much.
Jouni
Posts: 3281
Joined: Wed Mar 08, 2006 8:15 pm

Re: Engine evaluation consistency/stability

Post by Jouni »

In TCEC SF has many games (white and black) with 0,00 evaluation entire game :!: Some games are over 100 moves. Chess is soon solved?
Jouni
Uri
Posts: 473
Joined: Thu Dec 27, 2007 9:34 pm

Re: Engine evaluation consistency/stability

Post by Uri »

Jouni wrote: Wed Oct 09, 2019 4:05 pm In TCEC SF has many games (white and black) with 0,00 evaluation entire game :!: Some games are over 100 moves. Chess is soon solved?
I believe that chess is still very far away from being solved. Even in the year 4020 (we are now in the year 2020) chess would still not be completely solved.

You see chess is so very complex that chess engines still have many weaknesses in their chess understanding and knowledge of the game, compared to humans.
gordonr
Posts: 194
Joined: Thu Aug 06, 2009 8:04 pm
Location: UK

Re: Engine evaluation consistency/stability

Post by gordonr »

Thanks everyone for their help. Ferdy, I found your excellent post. Very interesting and useful indeed.

viewtopic.php?f=2&t=70151&p=792684&hilit=ears#p792684
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Engine evaluation consistency/stability

Post by Dann Corbit »

gordonr wrote: Wed Oct 09, 2019 1:15 pm Hi,

Sometimes when I'm analysing a position with Stockfish, the evaluation can vary quite significantly while stepping into the PV. Of course, I don't expect the eval to stay the same since afterall Stockfish may then be searching the subposition to a different depth, etc. However, my question is, do some engines tend to have a more consistent/stable evaluation than others when doing one step into the PV?

cheers
Gordon
You can fix the stockfish sewing machine with this simple thing:

In ucioptions.cpp (set to false from the GUI):

Code: Select all

    o["Show Fail High and Fail Low"] << Option(true);
In search.cpp:

Code: Select all

    bool bSewingMachine = Options["Show Fail High and Fail Low"];
Then do this:

Code: Select all

               // When failing high/low give some update (without cluttering
                // the UI) before a re-search.
                if (   mainThread
                        && multiPV == 1
                        && (bestValue <= alpha || bestValue >= beta)
                        && (Time.elapsed() > 3000 ) && bSewingMachine)
                    sync_cout << UCI::pv(rootPos, rootDepth, alpha, beta) << sync_endl;

Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Engine evaluation consistency/stability

Post by Ovyron »

Jouni wrote: Wed Oct 09, 2019 4:05 pm In TCEC SF has many games (white and black) with 0,00 evaluation entire game :!: Some games are over 100 moves. Chess is soon solved?
Chess isn't close at all to being solved. People have claimed that they can produce perfect chess moves on the fly, but if this was true, they could make an opening book where their moves were played up to the point where an unassisted engine could draw the game from there at bullet chess. That this hasn't been done and bullet chess is fine draw-wise means those people still have to work hard to produce "perfect chess", and the only reason those people haven't lost yet is because they haven't played enough games for that.
Your beliefs create your reality, so be careful what you wish for.
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Engine evaluation consistency/stability

Post by Laskos »

Hmmm, no word about Leela?
It is stable in both time (search) and along the PV, if there is not a lot of tactics. Compared to AB engines I know, much more stable. And for tactics, AB engines are complementary to Leela, they find it and stick to it.
User avatar
Ovyron
Posts: 4556
Joined: Tue Jul 03, 2007 4:30 am

Re: Engine evaluation consistency/stability

Post by Ovyron »

Laskos wrote: Thu Oct 10, 2019 11:26 am Hmmm, no word about Leela?
It is stable in both time (search) and along the PV, if there is not a lot of tactics.
Or one big tactic. But you never know, the analysis will become inconsistent once she sees it, so it's not stable.
Your beliefs create your reality, so be careful what you wish for.