A bizarre evaluation.

hgm · Post by **hgm** » Sun Mar 20, 2016 6:54 pm

Whether these terms would be real or not if the conditions on which they are based would persist is one thing, but when the opponent has the move and can wipe out all basis for it in a single move by castling, it is just asking for trouble.

There really isn't much difference between this kind of evaluation, and foregoing QS and counting a hanging Bishop to your material assets when the opponent has the move. 'Advantages' that the opponent will make disappear with his move should just not be counted. The larger they are, the bigger the damage this will do, of course. Have you ever tried to run the engine without QS, and looked what that does to the branching factor? Sure, when you search deep enough it will probably find the right move. But it won't be cheap!

lkaufman · Post by **lkaufman** » Sun Mar 20, 2016 7:17 pm

hgm wrote:Whether these terms would be real or not if the conditions on which they are based would persist is one thing, but when the opponent has the move and can wipe out all basis for it in a single move by castling, it is just asking for trouble.

There really isn't much difference between this kind of evaluation, and foregoing QS and counting a hanging Bishop to your material assets when the opponent has the move. 'Advantages' that the opponent will make disappear with his move should just not be counted. The larger they are, the bigger the damage this will do, of course. Have you ever tried to run the engine without QS, and looked what that does to the branching factor? Sure, when you search deep enough it will probably find the right move. But it won't be cheap!

I completely agree with your comments, this huge weighting of a temporary and easily remedied thing like "threatening" to make a spite check or a silly queen sac is illogical and does seem to be asking for trouble. But both Stockfish and Komodo have found that drastically lowering the weights on terms like this hurts elo. So what would you propose that we do?

cdani · Post by **cdani** » Sun Mar 20, 2016 7:38 pm

lkaufman wrote: I completely agree with your comments, this huge weighting of a temporary and easily remedied thing like "threatening" to make a spite check or a silly queen sac is illogical and does seem to be asking for trouble. But both Stockfish and Komodo have found that drastically lowering the weights on terms like this hurts elo. So what would you propose that we do?

Those high evaluated threats probably help mostly with move ordering, i.e. typical position where a check wins a piece for example. If you find a way out of the evaluation to achieve the same result, you will have a better more human readable evaluation, and maybe even a stronger engine.

In fact I have a more evaluated quiet move ordering function, exactly to avoid the need of some things in evaluation function.

cdani · Post by **cdani** » Sun Mar 20, 2016 8:14 pm

cdani wrote:In fact I have a more evaluated quiet move ordering function, exactly to avoid the need of some things in evaluation function.

I mean "In fact I have a more elaborated quiet move ordering function"

Cardoso · Post by **Cardoso** » Sun Mar 20, 2016 9:22 pm

cdani wrote:
lkaufman wrote:The point is that the position after 7.Qb3 is given an absurd evaluation by both engines. Never mind what search leads to this position, shouldn't the evaluation be a reasonable one? Any human who evaluated such a position as easiliy winning for White would be called a moron. Why should such poor evaluation work for engines? Yet it does.
One idea that comes to my mind is that an engine is a "function" of search + static evaluation, and going to depth 1 basically you are viewing only static evaluation.

But I'm with you that this evaluation is absurd. My idea is something that I told already with other words somewhere, related to intrinsically engine weaknesses of current engines, that they have a bad evaluation function because is very limited in parameters/algorithms. So I have the idea that the coming years what will rule the improvements will be mostly the evaluation function, not the search, because the later resolves already most tactical stuff for the best engines.

I've been thinking the same for quite some time, search enhancements are well known, and after so much effort going so deep in the search it would be a good thing to evaluate realistically at the tips, but unfortunately that is very very hard to do, harder than programming the search. Maybe that's why we call the eval "static" because of the lack (or having few) of dynamic elements, and a realistic evaluation should have dynamic features.
I remember the Deep Blue team saying they could increase Deep Blue's nps by a factor of 2 by tuning the software alone, so they might had reached 400 million nodes per second, but they felt they much better spend their time and effort on the evaluation function, and they were right.
I think there is much more to invent in the evaluation function than on the search (software wise).
Going slightly of topic they treated the 4 ply hardware search + hardware eval as an evaluation function, so we can say their eval did have dynamic features up to a point. It was so sad IBM decided to stop the project since Feng-hsiung Hsu wanted to implement an hashtable on hardware. Going further he also intended to form an independent start-up to design a chess chip for consumers that could beat the world champion using a desktop pc by the year 2000. So surely he intended to have a considerably deeper search on hardware and I believe he could have done it, Feng was really competent at his job. Just as a curiosity the 1997 chess chips had two evaluation functions: a fast one and a slow one, the fast one could be executed on 3 cycles and the slow one on 11 cycles. I wonder how far he could have gone if IBM had continued their support.

Alvaro

lkaufman · Post by **lkaufman** » Sun Mar 20, 2016 9:37 pm

cdani wrote:
lkaufman wrote: I completely agree with your comments, this huge weighting of a temporary and easily remedied thing like "threatening" to make a spite check or a silly queen sac is illogical and does seem to be asking for trouble. But both Stockfish and Komodo have found that drastically lowering the weights on terms like this hurts elo. So what would you propose that we do?
Those high evaluated threats probably help mostly with move ordering, i.e. typical position where a check wins a piece for example. If you find a way out of the evaluation to achieve the same result, you will have a better more human readable evaluation, and maybe even a stronger engine.

In fact I have a more evaluated quiet move ordering function, exactly to avoid the need of some things in evaluation function.

Thanks, I think that is a very astute comment, and one that we should explore.

A bizarre evaluation.

Re: A bizarre evaluation.

Re: A bizarre evaluation.

Re: A bizarre evaluation.

Re: A bizarre evaluation.

Re: A bizarre evaluation.

Re: A bizarre evaluation.