Couple more ideas

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Gandalf cross

Post by Lyudmil Tsvetkov »

[d]6k1/8/8/3P4/2PBP3/3P4/8/6K1 w - - 0 1

This is a beauty, is not it?

Btw., I thought a bit why some obviously good eval terms fail in SF, and one of the reasonable explanations is that simply the engine can not make sufficient use of some such terms, because of specific idiosyncracies, the way the search works, the way all eval terms interact, etc.

I watched many games, where SF has some very nice eval advantages at its disposal, and simply can not make use of them. Sometimes, or even frequently, it tends to only destroy them as soon as possible, or at a later point in time, or simply fully neglects them.

Obviously, move ordering and search and the interaction of all eval terms tell it to do something else.

I wonder, if the engine can not see how to optimally proceed in specific cases with longer thinking time, how will it be able to do so in a game lasting 15 sec.?

But of course, good eval terms still help, especially if people find a smart way to introduce them.

Obviously, SF still has a long winding way to go until its search and eval are improved to a point that it is able to recognise sufficiently well most good chess knowledge patterns.

Also, I wonder, if SF can not quite make use of available knowledge in STC games, how do other, even weaker engines, manage to make use of it, when sometimes those testers test at lightning speed?

Any feedback on that?
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Gandalf cross

Post by zullil »

Lyudmil Tsvetkov wrote: Obviously, SF still has a long winding way to go until its search and eval are improved to a point that it is able to recognise sufficiently well most good chess knowledge patterns.
Most good chess "knowledge patterns" are crutches needed by humans. Why do you continue to believe that so many should be hammered into Stockfish? Stockfish isn't human.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Gandalf cross

Post by Lyudmil Tsvetkov »

zullil wrote:
Lyudmil Tsvetkov wrote: Obviously, SF still has a long winding way to go until its search and eval are improved to a point that it is able to recognise sufficiently well most good chess knowledge patterns.
Most good chess "knowledge patterns" are crutches needed by humans. Why do you continue to believe that so many should be hammered into Stockfish? Stockfish isn't human.
Because the more cores that will be available in the future need to compute something, and not just beancount.

Look what happens here, when some of the pawns of the cross advances, or is eliminated by an enemy:

[d]6k1/8/8/3P4/2P1P3/8/8/6K1 w - - 0 1
if d3 is eliminated, a very strong structure remains, doubly defended pawn

[d]6k1/8/8/3P4/4P3/3P4/8/6K1 w - - 0 1
if d5 is eliminated, and c4 recaptures, this one remains, a structure with 2 defended pawns into a single whole; same for e4 recapturing

[d]6k1/8/8/3PP3/2PP4/8/8/6K1 w - - 0 1
if d3 advances once, and e4 advances once, this one is left, a structure with 2 defended pawns side by side - very good; same, if d3 advances once and c4 advances once

[d]6k1/8/4P3/3P4/2P5/3P4/8/6K1 w - - 0 1
if e4 advances two squares, this one is left, a structure with 3 defended pawns!, consituting a single whole, plus a longer chain of 3 pawns, c4-d5-e6 - very nice; same if c4 advances two squares

There must be some magic behind it, do not you think? :D

Believe me Louis, half of the reasonable eval terms that do not succeed in SF is not because they are bad, but because the engine is incapable of recognising their importance. I watches many games what SF does, and by now I am fully convinced this is precisely so.

On the other hand, many good terms have bigger chance to succeed than simply bad terms, lacking any chess knowledge, as the interpretation of the engine there is simply chaotic.

So we need good terms, but also need to somehow find a way to entice the engine into recognising them.
vincenegri
Posts: 73
Joined: Wed Feb 11, 2015 9:19 am

Re: Gandalf cross

Post by vincenegri »

I think what you are saying in a rather roundabout way is something that always needs to be remembered:

From a playing strength point of view, the 'correct' eval for a position by an engine is the one that gives the best score to the position from which the engine is most likely to win.

There is no point telling any engine that such-and-such a position is wonderful if the engine doesn't know how to win from that position.

But, once the engine knows how to exploit that advantage, it will start to see those positions as good naturally, via search.

As Ed Schroeder found some time ago:
Some specific chess knowledge through the years become out-dated due to the speed of nowadays computers. An example: In the early days of computer chess, say the period 1985-1989 I as hardware had a 6502 running at 5 Mhz. Rebel at that time could only search 5-7 plies on tournament time control. Such a low depth guarantees you one thing: horizon effects all over, thus losing the game.

To escape from the horizon effect all kind of tricks were invented, chess knowledge about dangerous pins, knight forks, double attacks, overloading of pieces and reward those aspects in eval. Complicated and processor time consuming software it was (15-20% less performance) but it did the trick escaping from the horizon effect in a reasonable way.

Today we run chess program on 1500 Mhz machines and instead of the 5-7 plies Rebel now gets 13-15 plies in the middle game and the horizon effect which was a major problem at 5 Mhz slowly was fading away.

So I wondered, what if I throw that complicated "anti-horizon" code out of Rebel, is it still needed? So I tried and found out that Rebel played as good with the "anti-horizon" code as without the code. In other words, the net gain was a "free" speed gain of 15-20%, thus an improvement.
As computing power goes up, the most efficient way of representing chess knowledge in the engine moves away from an encoding of rules as we humans have encoded them and into an encoding of the more primitive 'atoms' behind those rules. Unfortunately this is considerably harder to do!


Btw, I am not surprised you like Gandalf, Lydumil, as it was an engine with a lot of human-style rules built into it and consequently was known to have quite a humanistic style.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Gandalf cross

Post by Lyudmil Tsvetkov »

vincenegri wrote: From a playing strength point of view, the 'correct' eval for a position by an engine is the one that gives the best score to the position from which the engine is most likely to win.

There is no point telling any engine that such-and-such a position is wonderful if the engine doesn't know how to win from that position.

But, once the engine knows how to exploit that advantage, it will start to see those positions as good naturally, via search.

Absolutely.
Fully subscribe to the above.

We must find ways into enticing the engine to recognise some knowledge terms are good for it.

Yeah, I liked Gandalf a lot, but recently started to beat it all too often. :)
It had that magical touch about it, it is not easy to describe it with words; great sacrificial attacks and very nice harmony between the pieces.

Btw., Vince, to use that I am able to find you here, as I do not post on fishcooking, maybe in a month's time, when the framework is freer, it will make sense to push one other test on hotspots, maybe with lower, maybe with bigger penalty, maybe with stricter conditions. You never know, a test that you do on your home computer at 9 sec and 10 000 games might succeed on your machine, but fail on the framework, while a test that fails under those conditions on your machine might succeed on the framework.

But of course, I am convinced one of the reasons the hotspots failed is precisely that SF is unable to use well such positions. I see many positions with such hotspots, and, as soon as such spots appear, SF wants to get rid of them or does not find a way to proceed - pain to watch. It sometimes recognises the benefit of longterm advantages, but I would say in more than half of the cases does not do so. So that tests with similar longterm positional advantages are very hard to pass.

SF search simply is not deep and accurate enough. I watch some games, and SF sees practically all moves that require shallow or relatively not very deep search, but almost never sees moves that require deep or very deep search. So this is very much search dependant - engines simply need deeper search to be able to cope with some positions. How you will be able to make search deeper, I do not have a clue.

Anyway, many thanks for pushing so many patches on the framework on my stupid ideas!
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Gandalf cross

Post by cdani »

Lyudmil Tsvetkov wrote: Btw., I thought a bit why some obviously good eval terms fail in SF, and one of the reasonable explanations is that simply the engine can not make sufficient use of some such terms, because of specific idiosyncracies, the way the search works, the way all eval terms interact, etc.
Sure. Is like if an engine is a type of expert system than ignores many things outside his trained ones. It's very far from be intelligent as all the people knows.

My idea is, as Stockfish and other engines only understands a subset of moves/ideas, it's like in most of the games it plays, you can find an engine that will win it. I explain a bit better. You put Stockfish in any position, and you have a great possibility to find an engine that will win it. Stockfish will win against most of the engines, but you have a lot of possibilities to find one that will win.

How this is possible? Because it's far from understanding the complete ideas of chess, of course.

May be many of you will thing that it's not possible, that is only imagination. Think this in other way. Instead of make it play against any other engine, make it play against 4000 elo engine, and then think why Stockfish will lose most of its games.

So the holes that Stockfish and other engines has, are tactical and also in knowledge. And a 4000 engine will know, or at least will seem to know, most of the positional ideas of a GM, and of course will play so different that many other ideas or concepts will permeate trough his moves.
zullil
Posts: 6442
Joined: Tue Jan 09, 2007 12:31 am
Location: PA USA
Full name: Louis Zulli

Re: Gandalf cross

Post by zullil »

Lyudmil Tsvetkov wrote: Believe me Louis, half of the reasonable eval terms that do not succeed in SF is not because they are bad, but because the engine is incapable of recognising their importance.
Here you make an interesting point. I suppose certain positional features, especially ones that are permanent or hard to remove, do call for certain long term strategies of play in order to yield their reward. If an engine is steered to such a position, but lacks the ability to convert its advantage, then the correctness of the original evaluation is lost.

One way to approach this I suppose it to create an evaluation that "self-modifies" with the nature of the position, changing the weights of various features to help steer the search to the correct long term goal.

In any case, the real purpose of my previous message was to stop you from talking to yourself. :wink: Now we have a thread with multiple posters, as a thread should be.
vincenegri
Posts: 73
Joined: Wed Feb 11, 2015 9:19 am

Re: Gandalf cross

Post by vincenegri »

zullil wrote:
One way to approach this I suppose it to create an evaluation that "self-modifies" with the nature of the position, changing the weights of various features to help steer the search to the correct long term goal.
Picking up on this point I'd like to share something that I realised when reviewing the "hotspot" local test run.

One reason I do local test runs is that you get the opportunity to datamine the resulting games afterward (something that doesn't happen with fishtest framework - I wonder if this could be changed one day… would need a good lump of online storage)

After "hotspot" did not pass in the framework, I went back to the local VSTC test and ran it through chessbase to see which games actually had the e5/g5 bind (or its various equivalents).

The pawn arrangement for which the code is looking actually happened in about 200 out of 5000 games. But in the large majority of those games, the overall position was very, very different from the paradigmatic one Lyudmil originally gave. In most of them the bind was not pertinent to the evaluation of the position or how play ought to be conducted - too many pieces had been exchanged, or there were overarching concrete factors, or the centre was too open, etc etc.

In only about 11 games was there a "proper" bind (and indeed in those games the side with such a bind did in general prevail, although one has to take such a small subsample with much salt)

The point being that one has to be very careful that (if you make properties of search or eval conditional on other properties of the position) you check your condition actually matches the positions and only the positions you expect it to.


Oftentimes we will say something "its good to put your knight on such-and-such a square" when what is really meant is "… in these particular circumstances, and other things being equal, and when there are no concrete considerations overriding this". These caveats are automatically inserted by our brains when we read the advice. But the computer has no such autocorrect and blindly follows the advice we give it. So we have to be doubly careful in our definitions.

The difficulty then is, if we very carefully and accurately specify the specific conditions in which an evaluation adjustment is justified, these conditions ail occur only infrequently in a random book-based test. So even if the change produces a large improvement in that particular species of position, overall the elo gain may be too low to be measured. Yet we cannot restrict our testing to a narrow book since it is important to verify that the patch does not have a negative effect in general. One really requires two distinct parts to the test: a defensible test, agreed by a third party, to validate the objective, and a broad regression test to confirm no undesired side-effects.
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Gandalf cross

Post by Lyudmil Tsvetkov »

vincenegri wrote: The point being that one has to be very careful that (if you make properties of search or eval conditional on other properties of the position) you check your condition actually matches the positions and only the positions you expect it to.


So we have to be doubly careful in our definitions.


That is a professional approach.

In case of hotspots, stricter conditions mihgt really help, for example:

- considering only f6 bind with an f6 backward-fated pawn - e5,g5,e6,f7,g6 pawn structure, f7 being backward for black
- apply only when npm bigger than 2/3 tnpm
- king location restricted to h8,g8,h7,g7 squares
Lyudmil Tsvetkov
Posts: 6052
Joined: Tue Jun 12, 2012 12:41 pm

Re: Gandalf cross

Post by Lyudmil Tsvetkov »

cdani wrote:
Lyudmil Tsvetkov wrote: Btw., I thought a bit why some obviously good eval terms fail in SF, and one of the reasonable explanations is that simply the engine can not make sufficient use of some such terms, because of specific idiosyncracies, the way the search works, the way all eval terms interact, etc.
Sure. Is like if an engine is a type of expert system than ignores many things outside his trained ones. It's very far from be intelligent as all the people knows.

My idea is, as Stockfish and other engines only understands a subset of moves/ideas, it's like in most of the games it plays, you can find an engine that will win it. I explain a bit better. You put Stockfish in any position, and you have a great possibility to find an engine that will win it. Stockfish will win against most of the engines, but you have a lot of possibilities to find one that will win.

How this is possible? Because it's far from understanding the complete ideas of chess, of course.

May be many of you will thing that it's not possible, that is only imagination. Think this in other way. Instead of make it play against any other engine, make it play against 4000 elo engine, and then think why Stockfish will lose most of its games.

So the holes that Stockfish and other engines has, are tactical and also in knowledge. And a 4000 engine will know, or at least will seem to know, most of the positional ideas of a GM, and of course will play so different that many other ideas or concepts will permeate trough his moves.
And the 4000 elo engine will be crushed by a 5000 elo engine. :D