A poor man's testing environment

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Tue Jan 08, 2013 12:03 pm

HGM wrote: In fact many times Larry has noticed that chess programs (and Komodo specifically) might score some opening much higher or lower than theory suggests and we will spend a significant amount of time trying to understand why and determining what corrections to make - which sometimes involve inventing more evaluation features.

I really think that the right approach should be not to correct the lines of the engine, but to correct the lines of theory.
Theory, although checked and rechecked by many people over a prolonged period of time, and having a sacrosaint status in many people's minds, is fallible, just as the people that invented the lines.
Chances are very big (especially when theory extends further into the middlegame, but not only then) that an engine on multiple cores will suggest the right correction of opening lines, even if it does so all too often. In most cases it would be wrong to trust theory instead of trusting a powerful engine. Humans will miss at least a variety of tactical underpinnings, that could change the whole assessment of the variation. If I had to choose between trusting theory and a powerful engine, I would choose trusting the engine in 80% of cases.

Btw, a good example of the opening superiority of computers is the fact that most reliable engines almost never play the Sicilian for black, as this opening is simply very unreliable. There might be some drawing lines, and even some lines that are better for black, but they represent just a tiny portion of possible variations and if you are not a wizard, it would be better not to play the Sicilian for black. I think performance statistics of human tournaments also hint at the unreliability of this opening.

Lyudmil

Michel · Post by **Michel** » Tue Jan 08, 2013 12:29 pm

I tend to agree with this, but I think a lot of people try to use CLOP to tune 4, 5, 6+ terms simultaneously, without running enough games to let things really converge. If you're tuning one parameter you might be able to get away with 10,000 or so games to get a decent result, but as the number of terms go up you need to run a whole lot more (in the 100,000+ range). I haven't run the math in a while, but the numbers become fairly daunting, fairly quickly. Most people simply don't have enough respect for the Law of Large Numbers.

I think the real issue is that most people have not taken the time to study how CLOP works.
So they have only a vague idea what it does.

Tom Likens · Post by **Tom Likens** » Tue Jan 08, 2013 2:08 pm

Michel wrote:I think the real issue is that most people have not taken the time to study how CLOP works.
So they have only a vague idea what it does.

Agreed. Remi's paper is a good read, but you do have to read it. The need to run thousands of
games is discouraging to people, but really there's no other way. CLOP isn't magic.

regards,
--tom

jdart · Post by **jdart** » Tue Jan 08, 2013 4:29 pm

I think you are saying two somewhat contradictory things: first, that computers can successfully evaluate openings better than humans, and second, that some openings are just too complex for computers.

I also disagree about the Sicilian. It's in most computer opening books. It is also still popular in correspondence play, and correspondence players really know their openings. If you think it is bad for Black you should look at GM Ftacnik's book "Grandmaster Repertoire 6: The Sicilian Defense" which covers the Black side in depth. However note that he recommends the . .. d6 lines (Najdorf etc) instead of .. e6 (Scheveningen), and my computer book also does this. I think the Keres Attack is quite unpleasant for Black.

--JOn

Don · Post by **Don** » Tue Jan 08, 2013 4:39 pm

jdart wrote:I think you are saying two somewhat contradictory things: first, that computers can successfully evaluate openings better than humans, and second, that some openings are just too complex for computers.

I also disagree about the Sicilian. It's in most computer opening books. It is also still popular in correspondence play, and correspondence players really know their openings. If you think it is bad for Black you should look at GM Ftacnik's book "Grandmaster Repertoire 6: The Sicilian Defense" which covers the Black side in depth. However note that he recommends the . .. d6 lines (Najdorf etc) instead of .. e6 (Scheveningen), and my computer book also does this. I think the Keres Attack is quite unpleasant for Black.

--JOn

I think computers CAN evaluate openings better than humans now. However, humans still do a much better job with the help of computers than computers could do alone. We pretty much ignore that factor these days and simply give humans all the credit don't we?

I looked at some utube video's on the analysis of some openings such as the morra gambit and others and there were several references to computers showing the way. I don't remember the exact wording but for example one GM said, "we used to think this obvious move bad due to this continuation (then he gives the continuation), but the computers have shown us that you can go ahead and play this due to this clever line (and he shows why it's ok.)

I doubt anyone has documented this but I think that modern theory has been overhauled due to computers in the past 20 or 30 years.

jdart · Post by **jdart** » Tue Jan 08, 2013 5:20 pm

I doubt anyone has documented this but I think that modern theory has been overhauled due to computers in the past 20 or 30 years.

.

I don't doubt that at all, but many lines are still evaluated by both engines and humans as "unclear.". So there is still room for human judgement.

--Jon

Richard Allbert · Post by **Richard Allbert** » Thu Jan 10, 2013 12:02 am

Totally agree.

Computers are strong, no doubt, especially with tactics, but I think far too much emphasis is put on their evalutations.

For example, say after move 14 you see +0.50 from one engine playing another -> after 10 more moves this score is usually different.

This means by definition the +0.50 was maybe not accurate afterall.

iirc on Houdini's website (or an interview with CB) Houdart said that Houdini wins 90% of games where it thinks it is > +0.8 in the middlegame.

That's a a big margin of Evaluation error, and thus I never understand the obsession with differentiating lines differing by 0.1

This was especially so during the London Chess Classic - many observers were constantly claiming "mistakes" by the players, due to a +0.5 swing in evaluation by their engine.

Interestingly, the GM's commentating were quite the opposite - they rarely believed the computer assessment, unless it was massively in favour of one player (over +2.0 or so), or tablebases were accessed. Otherwise they used them for tactics checking.

Richard

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Thu Jan 10, 2013 7:01 am

jdart wrote:I think you are saying two somewhat contradictory things: first, that computers can successfully evaluate openings better than humans, and second, that some openings are just too complex for computers.

I also disagree about the Sicilian. It's in most computer opening books. It is also still popular in correspondence play, and correspondence players really know their openings. If you think it is bad for Black you should look at GM Ftacnik's book "Grandmaster Repertoire 6: The Sicilian Defense" which covers the Black side in depth. However note that he recommends the . .. d6 lines (Najdorf etc) instead of .. e6 (Scheveningen), and my computer book also does this. I think the Keres Attack is quite unpleasant for Black.

--JOn

Hi Jon.
I do not manage to find a contradiction in my current post, but maybe you refer to some other text too.
Top engines on multiple cores definitely play openings better than humans
without an opening book, but still, I believe, that they do not play them perfectly. T.e., they have surpassed humans in opening understanding (primarily due to their tactical ability), but are still not perfect in their opening play; they play middlegame positions even better overall in relation to opening positions, simply because engines are tested starting with some middlegame variations.
Theory is a very wide concept and includes a wide number of lines that are more or less inferior: 1. c4, for example, should not be considered as optimal, but is still part of theory. Same goes true for other lines.

Computer opening books are prepared by humans, that are biassed, fallible, guided by preconceived ideas and still not fully aware of the revolution in chess that is currently going on with the appearance of very sophisticated software running on powerful hardware that is accessible for all. Correspondence players have the same shortcomings as the people preparing computer opening books. Moreover, if correspondence players do not use engine help, it is very likely that they are going to miss subtle computer improvements or refutations even when taking an awful amount of time on opening preparation. The way computers play (tactically) is just different, inaccessible for human mentality.

I do not know about the Najdorf and the Scheveningen, but in most lines black has to contend with insufficient control of center with hardly anything in exchange for that. That is why performance statistics for human tournaments show that the Sicilian is one of the most difficult openings for black. Players still play it, because it is interesting, even if a bit inferior. Fischer and Kasparov played it, because they had been able to find and thoroughly analyse lines (a tiny portion of possible variations) offering black equal or even better play, something others had been unable to do, and that is why most world champions have been loath playing it.

Best regards, Lyudmil

Lyudmil Tsvetkov · Post by **Lyudmil Tsvetkov** » Thu Jan 10, 2013 7:17 am

Richard Allbert wrote:Totally agree.

Computers are strong, no doubt, especially with tactics, but I think far too much emphasis is put on their evalutations.

For example, say after move 14 you see +0.50 from one engine playing another -> after 10 more moves this score is usually different.

This means by definition the +0.50 was maybe not accurate afterall.

iirc on Houdini's website (or an interview with CB) Houdart said that Houdini wins 90% of games where it thinks it is > +0.8 in the middlegame.

That's a a big margin of Evaluation error, and thus I never understand the obsession with differentiating lines differing by 0.1

This was especially so during the London Chess Classic - many observers were constantly claiming "mistakes" by the players, due to a +0.5 swing in evaluation by their engine.

Interestingly, the GM's commentating were quite the opposite - they rarely believed the computer assessment, unless it was massively in favour of one player (over +2.0 or so), or tablebases were accessed. Otherwise they used them for tactics checking.

Richard

Hi Richard.
The better an engine's evaluation is, the less it is going to jump (swing).
Older engines like Fritz, etc., would register even more astounding jumps in their evaluations. But, next to evaluations of humans (and humans still evaluate subliminally, as well compute subliminally millions of variations per second, just as computers do), engine evaluations are simply outstanding. If you were able to see jumps in human evaluations (you could judge for them by seeing the larger quantity of mistakes in comparison to computers), you would know, that human evaluations are jumping sometimes not by half a pawn, but by much bigger margins. The inability to foresee a tactical shot, for example, should register a jump measured by a couple of very whole pawns.
I think that not only 0.1 pawns make the difference, but even 0.01 pawns. In chess every single detail counts. An engine showing +30cps and facing an opponent of roughly equal strength has actually very good chances to gradually increase its score up to a winning point, simply because the number of available variations for the defending side still maintaining the score would decrease with every passing move. Chances are much bigger that an advantage in score will be increased, than they are that it would decrease or at least be maintained.
That is why it is important to evaluate every single detail.

Best regards, Lyudmil

jdart · Post by **jdart** » Fri Jan 11, 2013 9:46 pm

Most correspondence chess nowadays is computer-aided (it is allowed). Correspondence players are very aware of what computers can do, and what they can't.

--Jon

A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment

Re: A poor man's testing environment