What separates the top engines from the rest?

kgburcham · Post by **kgburcham** » Sun Nov 18, 2012 5:07 am

Edit:
just to illustrate my point: an engine has no concept of a common chess term "positional exchange sacrifice" - it just might see or might not see the consequences of such a move. If the evaluation function gives a higher score at the end of the given line than it gave on any other lines it will play the move. Almost zero "positional" magic is involved (although static rules might help greatly with pruning obviously bad lines).

Richard do you have one, prefer 2 or 3, positions so that we can better see what you mean by positional ex sac?
thanks
kgburcham

rvida · Post by **rvida** » Sun Nov 18, 2012 5:20 am

yanquis1972 wrote:richard, surely this wasn't the case with rybka 3 dynamic (loved exchange sacs) vs R3? recall that R3D was rated almost as high as R3. i assume the search was the same, so the difference must've been in how it evaluated positions/pieces..

With Rybka 4 you can adjust the relative piece values... It is possible to adjust them in such way that she will examine lines with seemingly unfavorable material trades not unlike to R3D. I don't want to imply that all R3 versions (Normal, Human, Dynamic) had exactly the same search function - I really don't know.

I released a few unofficial versions of Critter - which were like 20-30 elo points weaker than the public one - solely for solving chess-puzzles. They had some bulit-in heuristics for common mate patterns (epaulet mate, backrank mate, two rooks on 7th, suffocate mate, mate by two bishops, etc...), but detecting these patterns is very slow to be practical in actual gameplay. Perhaps the Houdini's "tactical mode" does something similar but polished enough to be publicly released?

Uri Blass · Post by **Uri Blass** » Sun Nov 18, 2012 6:47 am

rvida wrote:
Graham Banks wrote:In my opinion (based on watching an awful lot of engine v engine games), the top engines excel in three areas:

- they value active play and mobility highly. They are regularly able to put the squeeze on their opponents.
- they better understand the importance of passed pawns - both in how to create potentially dangerous passers and how to take advantage of them.
- they better understand material imbalances, particularly in giving up the exchange. This would seem to indicate that they value pawns differently and ties in a lot with the previous aspect mentioned.

If those authors of engines not too far away from the top could do some extensive work on those three aspects, they would be more competitive.
Having said all that, I do realise that the search is also a big differentiating factor.
Sorry to disappoint you, but these concepts - as we humans perceive - are quite out of reach of current (top) engines. Sure, everyone who is aiming to be in the top5 must have a _reasonable_ eval function, but in the last few years the real progress was mostly due to search improvements (rather than better evaluation).

Edit:
just to illustrate my point: an engine has no concept of a common chess term "positional exchange sacrifice" - it just might see or might not see the consequences of such a move. If the evaluation function gives a higher score at the end of the given line than it gave on any other lines it will play the move. Almost zero "positional" magic is involved (although static rules might help greatly with pruning obviously bad lines).

I do not see that Graham claimed that top engines has a concept of positional exhange sacrifice.

His words:
"they better understand material imbalances, particularly in giving up the exchange."

In other words the claim is that top engines evaluate the exchange as something that worth less relative to weaker engines.

Sven · Post by **Sven** » Sun Nov 18, 2012 12:43 pm

rvida wrote:but in the last few years the real progress was mostly due to search improvements (rather than better evaluation).

We should not underestimate the benefit of massive parameter tuning that has contributed to the strength increase of the last few years as well.

Sven

Don · Post by **Don** » Sun Nov 18, 2012 2:47 pm

rvida wrote:
Graham Banks wrote:In my opinion (based on watching an awful lot of engine v engine games), the top engines excel in three areas:

- they value active play and mobility highly. They are regularly able to put the squeeze on their opponents.
- they better understand the importance of passed pawns - both in how to create potentially dangerous passers and how to take advantage of them.
- they better understand material imbalances, particularly in giving up the exchange. This would seem to indicate that they value pawns differently and ties in a lot with the previous aspect mentioned.

If those authors of engines not too far away from the top could do some extensive work on those three aspects, they would be more competitive.
Having said all that, I do realise that the search is also a big differentiating factor.
Sorry to disappoint you, but these concepts - as we humans perceive - are quite out of reach of current (top) engines. Sure, everyone who is aiming to be in the top5 must have a _reasonable_ eval function, but in the last few years the real progress was mostly due to search improvements (rather than better evaluation).

Edit:
just to illustrate my point: an engine has no concept of a common chess term "positional exchange sacrifice" - it just might see or might not see the consequences of such a move. If the evaluation function gives a higher score at the end of the given line than it gave on any other lines it will play the move. Almost zero "positional" magic is involved (although static rules might help greatly with pruning obviously bad lines).

Search improvements no doubt have been a major factor in computer chess progress but I feel that evaluation is bigger than most people give credit for. Several of what I consider our biggest improvements are in evaluation. That has been a major surprise for me but I now believe that you cannot have a top program unless you have an exceptional evaluation function.

In fact I believe evaluation will be the next big "breakthrough" in computer chess and today's programs will be considered naive bean counters. I don't mean quantity, I mean quality.

There are numerous move decisions a chess program makes in a game and many top programs will choose different moves and most of those choices commit you to one path instead of another. We call them stylistic bias but of course they are determined pretty much solely by the evaluation function. Each decision affects the rest of the game and how it will be played. In most cases a strong program will choose a reasonable move among the choices, but I believe that even if the move is "game theoretically correct" they are not all equal. I'll call it a "micro blunder" if the move it plays makes it more difficult to win or draw when it needs to. The opening move 1. h3 for example probably leads to a draw with best play but it also probably makes it less likely you will win that game. Even the choice between 1. d4 and 1. e4 affects your chances of winning (statistically, 1. d4 gets slightly better results.)

I remember being hit over the head with this concept in a tournament many years ago. My program (*Socrates) had a choice to allow the opponent a very intrusive pawn center which allowed it to get connected a and b pawns but not very advanced. It was totally unclear to me which was best as the pawn center was dangerous and crippling and there was no way any program at the time could see significant progress. The alternative, which was Socrates choice, would have left Socrates with a good game but not winning. When I examined the position later I saw there was only a small scoring difference between the two choices. Whether it's true or not I do not know, but it occurred to me during the game that whether Socrates wins or loses might very well be decided by the "wisdom" of this one choice. That is not uncommon whatsoever in computer chess. And after decades of progress in computer chess and amazing progress in search, it STILL often comes down to whether a programs preference is the right one. In many ways a program with a sub-par evaluation is constantly shooting itself in the foot.

We are still in the early days of evaluation - we have not progressed much beyond the linear weight phase where the evaluation function is expressed as a table of weight applied as if it were a linear polynomial. It's true we have built on that somewhat but we still have a long way to go. What is clearly the case in computer chess is that having some fixed weight for some feature is naive and stupid. We code around that in ad-hoc fashion by making up rules (as we go) which restrict it's usage. A trivial example is the value of putting a rook on the 7th rank. The most naive approach is a fixed bonus for doing that - but any strong program appreciates the importance of putting at least some basic conditions on that rule such as having targets or restricting the enemy king to the 8th rank, etc.... So we have a rule about this, but the actual value used is fixed (or phase based) but a single weight is not equally relevant in every case.

I cannot prove to you or anyone that we are at the "baby just learning to walk" phase of evaluation, but I can at least present some anecdotal evidence. Many times it will happen that a program is torn between 2 moves (or perhaps 3 or more) and one of them clearly wins. A deep search reveals that one wins for sure, but the program may play the move anyway well before discovering it tactically without needing the deep search - or it might not. The choice is often an accident. The solution is not just to search a ply deeper because that is like using a hammer to swat a fly. There will always be a new layer of "misunderstandings" to deal with unless you can search the entire game tree. If the solution was to search another ply then we could use evaluation functions from 30 years ago and do just fine.

Of course I realize there is magic in searching another ply, a 7 ply search is pretty much a "cure" for most of the shortcomings of a 6 ply search and so on. A deeper search improves both tactics and positional play but it's foolish not to deal with them with evaluation if you can. Imagine a program that gives a queen the same weight as a pawn. Given enough depth it will avoid playing stupid queen sacrifices but only when it looks deep enough to see that the damage to it's position is overwhelming. In particular a bad position can be held for dozens of moves in the endings so if your program thinks a bad position is a good one it might require 30 or 40 additional ply of depth to "wake up."

To summarize I think we are still in the bean counter phase of computer chess and that search has worked so well for us that we are inclined to focus even more on that. Doing evaluation correctly is the most difficult problem in computer chess but that is the future in my opinion.

Modern Times · Post by **Modern Times** » Sun Nov 18, 2012 3:00 pm

Don wrote: In fact I believe evaluation will be the next big "breakthrough" in computer chess and today's programs will be considered naive bean counters. I don't mean quantity, I mean quality.

Yes I think you are right about this. Time will tell.

Aser Huerga · Post by **Aser Huerga** » Sun Nov 18, 2012 3:01 pm

I'm happy to read this, Don, because I like Komodo very much for its evaluation. Keep the good work on it, in analysis this feature is much appreciated and not so common as deep search.

Don · Post by **Don** » Sun Nov 18, 2012 3:28 pm

Aser Huerga wrote:I'm happy to read this, Don, because I like Komodo very much for its evaluation. Keep the good work on it, in analysis this feature is much appreciated and not so common as deep search.

Thanks for the kind words. Although I believe Komodo is exceptional in evaluation, I think we are the same place as all the other programs. The issues I mentioned are not ones that I claim Komodo has solved or is even starting to solve. The top programs pretty much do the same basic things and each have strengths and weaknesses - but in the grand scheme of things they all evaluate poorly just as Richard says, and that includes Komodo.

Larry and I are aware of several basic problem that all programs face, with examples. For example almost every program is prone to mis-evaluate positions with a lot of space for one side. There are openings that are theoretically equal or close to it and most programs return big scores.

The reason is that any term you make up is an approximation to the abstract concept of winning chances. Mobility of pieces is assumed to be a good thing but that is not always the case. Put the 2 kings on the board and then add a white bishop. The bishop will have exceptional mobility but does nothing to increase the winning chances.

In the past Larry and I have done studies using the logistics function which basically converts a score to a "winning percentage" which itself is a rather abstract concept. Our evaluation tries to avoid these "impedance mismatches" where 2 equal but extremely different positions return significantly different scores. That is one major source of error, especially as programs search deeper and deeper. If you don't have a rule to cover bishop and king vs king you get a major "impedance mismatch" for example and you pave the way for bad decisions. You might trade a winning advantage away for the BKvsK draw.

Programs are exceptionally good at comparing positions that are very similar, but exceptionally poor at comparing positions that are very different.

Tom Likens · Post by **Tom Likens** » Sun Nov 18, 2012 5:16 pm

Don,

Very nice post, which I happen to agree with to a large extent. As an engineer one of the things
I always like to do to understand a physical phenomenon is to take things to their boundaries.
In this case, if you could couple Kasparov's evaluation function with a modern chess program's
search, you'd have something practically unbeatable. I think the dilemma is creating such an
evaluation function without slowing the search to a crawl.

regards,
--tom

Don · Post by **Don** » Sun Nov 18, 2012 5:53 pm

Tom Likens wrote:Don,

Very nice post, which I happen to agree with to a large extent. As an engineer one of the things
I always like to do to understand a physical phenomenon is to take things to their boundaries.
In this case, if you could couple Kasparov's evaluation function with a modern chess program's
search, you'd have something practically unbeatable. I think the dilemma is creating such an
evaluation function without slowing the search to a crawl.

regards,
--tom

I think modern program have much slower evaluation functions than the top programs of 20 years ago in general. At least that is true for Komodo versus my earlier works. So the trick is always giving something up to get something of greater value.

I think it will turn out that as program search deeper and deeper the evaluation function speed will be less of a factor and will play a bigger role than it even does now in the overall strength of the program. Of course engineers will always want to do as much as they can to minimize the overhead of this and so it will be a question of balance.

There are other issues that evaluation functions do not generally address too, such as the concept of "never." You can have pieces that are permanently trapped, or a king that can never bust through locked pawns. You can be a queen up permanently and yet a win is impossible. Chess has trillions upon trillions of positions that cannot be evaluated correctly using any current evaluation function even given reasonable search depths. Fortunately they are rare enough not to have huge impact but these are the things that set humans apart from computers.

What separates the top engines from the rest?

Re: What separates the top engines from the rest?

Re: What separates the top engines from the rest?

Re: What separates the top engines from the rest?

Re: What separates the top engines from the rest?

Re: What separates the top engines from the rest?

Re: What separates the top engines from the rest?

Re: What separates the top engines from the rest?

Re: What separates the top engines from the rest?

Re: What separates the top engines from the rest?

Re: What separates the top engines from the rest?