A bizarre evaluation.

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

User avatar
hgm
Posts: 28499
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: A bizarre evaluation.

Post by hgm »

Whether these terms would be real or not if the conditions on which they are based would persist is one thing, but when the opponent has the move and can wipe out all basis for it in a single move by castling, it is just asking for trouble.

There really isn't much difference between this kind of evaluation, and foregoing QS and counting a hanging Bishop to your material assets when the opponent has the move. 'Advantages' that the opponent will make disappear with his move should just not be counted. The larger they are, the bigger the damage this will do, of course. Have you ever tried to run the engine without QS, and looked what that does to the branching factor? Sure, when you search deep enough it will probably find the right move. But it won't be cheap!
lkaufman
Posts: 6297
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: A bizarre evaluation.

Post by lkaufman »

hgm wrote:Whether these terms would be real or not if the conditions on which they are based would persist is one thing, but when the opponent has the move and can wipe out all basis for it in a single move by castling, it is just asking for trouble.

There really isn't much difference between this kind of evaluation, and foregoing QS and counting a hanging Bishop to your material assets when the opponent has the move. 'Advantages' that the opponent will make disappear with his move should just not be counted. The larger they are, the bigger the damage this will do, of course. Have you ever tried to run the engine without QS, and looked what that does to the branching factor? Sure, when you search deep enough it will probably find the right move. But it won't be cheap!
I completely agree with your comments, this huge weighting of a temporary and easily remedied thing like "threatening" to make a spite check or a silly queen sac is illogical and does seem to be asking for trouble. But both Stockfish and Komodo have found that drastically lowering the weights on terms like this hurts elo. So what would you propose that we do?
Komodo rules!
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: A bizarre evaluation.

Post by cdani »

lkaufman wrote: I completely agree with your comments, this huge weighting of a temporary and easily remedied thing like "threatening" to make a spite check or a silly queen sac is illogical and does seem to be asking for trouble. But both Stockfish and Komodo have found that drastically lowering the weights on terms like this hurts elo. So what would you propose that we do?
Those high evaluated threats probably help mostly with move ordering, i.e. typical position where a check wins a piece for example. If you find a way out of the evaluation to achieve the same result, you will have a better more human readable evaluation, and maybe even a stronger engine.

In fact I have a more evaluated quiet move ordering function, exactly to avoid the need of some things in evaluation function.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: A bizarre evaluation.

Post by cdani »

cdani wrote:In fact I have a more evaluated quiet move ordering function, exactly to avoid the need of some things in evaluation function.
I mean "In fact I have a more elaborated quiet move ordering function"
Cardoso
Posts: 363
Joined: Thu Mar 16, 2006 7:39 pm
Location: Portugal
Full name: Alvaro Cardoso

Re: A bizarre evaluation.

Post by Cardoso »

cdani wrote:
lkaufman wrote:The point is that the position after 7.Qb3 is given an absurd evaluation by both engines. Never mind what search leads to this position, shouldn't the evaluation be a reasonable one? Any human who evaluated such a position as easiliy winning for White would be called a moron. Why should such poor evaluation work for engines? Yet it does.
One idea that comes to my mind is that an engine is a "function" of search + static evaluation, and going to depth 1 basically you are viewing only static evaluation.

But I'm with you that this evaluation is absurd. My idea is something that I told already with other words somewhere, related to intrinsically engine weaknesses of current engines, that they have a bad evaluation function because is very limited in parameters/algorithms. So I have the idea that the coming years what will rule the improvements will be mostly the evaluation function, not the search, because the later resolves already most tactical stuff for the best engines.
I've been thinking the same for quite some time, search enhancements are well known, and after so much effort going so deep in the search it would be a good thing to evaluate realistically at the tips, but unfortunately that is very very hard to do, harder than programming the search. Maybe that's why we call the eval "static" because of the lack (or having few) of dynamic elements, and a realistic evaluation should have dynamic features.
I remember the Deep Blue team saying they could increase Deep Blue's nps by a factor of 2 by tuning the software alone, so they might had reached 400 million nodes per second, but they felt they much better spend their time and effort on the evaluation function, and they were right.
I think there is much more to invent in the evaluation function than on the search (software wise).
Going slightly of topic they treated the 4 ply hardware search + hardware eval as an evaluation function, so we can say their eval did have dynamic features up to a point. It was so sad IBM decided to stop the project since Feng-hsiung Hsu wanted to implement an hashtable on hardware. Going further he also intended to form an independent start-up to design a chess chip for consumers that could beat the world champion using a desktop pc by the year 2000. So surely he intended to have a considerably deeper search on hardware and I believe he could have done it, Feng was really competent at his job. Just as a curiosity the 1997 chess chips had two evaluation functions: a fast one and a slow one, the fast one could be executed on 3 cycles and the slow one on 11 cycles. I wonder how far he could have gone if IBM had continued their support.

Alvaro
lkaufman
Posts: 6297
Joined: Sun Jan 10, 2010 6:15 am
Location: Maryland USA
Full name: Larry Kaufman

Re: A bizarre evaluation.

Post by lkaufman »

cdani wrote:
lkaufman wrote: I completely agree with your comments, this huge weighting of a temporary and easily remedied thing like "threatening" to make a spite check or a silly queen sac is illogical and does seem to be asking for trouble. But both Stockfish and Komodo have found that drastically lowering the weights on terms like this hurts elo. So what would you propose that we do?
Those high evaluated threats probably help mostly with move ordering, i.e. typical position where a check wins a piece for example. If you find a way out of the evaluation to achieve the same result, you will have a better more human readable evaluation, and maybe even a stronger engine.

In fact I have a more evaluated quiet move ordering function, exactly to avoid the need of some things in evaluation function.
Thanks, I think that is a very astute comment, and one that we should explore.
Komodo rules!