diep wrote:bob wrote:chrisw wrote:diep wrote:lkaufman wrote:diep wrote:Rebel wrote:Don wrote: I personally believe that Komodo has the best evaluation function of any chess program in the world.
I see a new form of (fun!) competition arising at the horizon, who has the best eval?
Its basic framework:
1. Root search (1-ply) only with standard QS.
2. QS needs to be defined by mutual agreement.
3. No extensions allowed.
25,000 - 50,000 games (or so) to weed out most of the noise because the lack of search.
Details to be worked out of course.
Great idea Ed. We need an independant tester who also verifies no cheating occurs. Do you volunteer?
With some luck we'll see then how strong mobility and coordinated piece evaluation plays.
Oh i remember - diep also knows everything about pins, and has extensive kingsafety that will directly attack the opponent king with all pieces, probably with the usual computer bug not using many pawns to do so. Will be giving spectacular attacking games!
This is the problem. Knowledge about pins is generally considered tactical, not evaluation, even if you put it in the eval function. So probably Diep would look great on a one ply test due to this pin knowledge, but this has no bearing on which program has the better evaluation. There is no limit to how much tactical knowledge can be put into an eval function, but whether it justifies the slowdown in search is the question.
Regarding your request for a Komodo 5 version without PST, Richard Vida posted a patch to Komodo 5 making all eval terms configurable. Since we don't condone this I won't post the link here, but if you can find his patch all you need do is set the "xtm" terms ("pawn table multiplier" etc.), to zero and you'll have what you want.
You are trying to talk your way out of the 1 ply match?
kingsafety is also tactical, mobility is also tactical, evaluating attacks which diep is doing massively that's also tactical?
Yet evaluating the material suddenly is the most important 'positional term' of an evaluation?
Oh comeon we can call everything tactical.
I want a 1 ply match
Ed?
Make some noise!
Completely agree with Vincent. Only beancounter programmers would oppose Ed's idea, always using the same false dichotomy, search=tactics, eval=positional. Nonsense of course. I'ld take it further, ban the QS, which can contain all manner of check search tricks btw and force the beancounters to write a SEE. Then we'll see how really crap their evals are
One way you can also test btw, is put the zero search program onto ICC and test it against rated players. Then shoot any programmer who can't get 2000 ELO out of raw evaluation only.
There is a major flaw in your reasoning. You are going back to the 70's, when the mantra was "you must do chess on a computer like a human does it." Problem was, then, and still is, now, "We don't know HOW a human plays chess." So saying "no search" is a meaningless constraint.
Not to mention the obvious dichotomy where one can write a complex eval, or use search to fill in issues the eval doesn't handle well, and either should eventually reach exactly the same level of skill. But with computers, it is easier to rely on recursive high-speed stuff rather than on overly complex code that contains too many bugs to ever work well..
We know very well how a human plays chess.
In fact the most important clue we already know from a research from De Groot in 1946.
That clue is simply that it's knowledge based and not search based.
That also explains why so many players who are analytically real strong, why they lose games - they make search mistakes - sometimes even 2 ply ones; simply missing the opponents move entirely.
After 1946, the research there usually focuses upon the wrong persons.
Everyone always wants to research the world champion. The world champion from scientific viewpoint is NOT interesting to research.
In computerchess we also know very clearly that if you have somewhere abug you ARE gonna lose everything based upon it. So avoiding bugs is what you want.
So in such researches they always make the same beginnersmistake, a zillion times again - you want to avoid making the same mistakes like the common man who plays chess.
But researching a guy who is 1200 elo is not sexy huh?
All the ladies who work for government always want to research those in society that are weird/more intelligent right?
Interesting is researching the common guys who without knowing anything still can win a game.
Interesting is knowing why a correspondence player who is elo 1100 himself over the board, not even knowing which endgame is a clear win for him, why he's world top correspondence chess.
No one is researching those cases.
Simply because we all already know the truth since 1946. There is nothing secret there.
It's all knowledge based with human players.
Of course the real interesting question then is: in how far is accurate parameter tuning important to humankind?
To me it seems chess engines are far better tuned than any human has knowledge tuned in his brain.
Yet THAT is an open discussion and an interesting one.
My claim is that the beancounter chess engines have been tuned far beyond what is interesting from research viewpoint. Total useless to even spend money on developing further engines like Komodo, Stockfish, Rybka or any similar engine. It's total trivial that they play far above the elostrength that their knowledge supports. Not interesting in short.
So if someone claims his evaluation function is better, then we simply can do the 1 ply test.
They now back off and claim their eval is ok when they add that 30 ply search they're doing
Next attempt will be: "but we must both run on 1 core, as Diep nearly always wins from engines when it has a big hardware advantage and being SMP is a hardware advantage" (that even was the case when Diep was a lot lower rated than it is now).
Then after that the attempt will be: "but the only thing that matters is superbullet".
And after all, they're still not better in evaluation, not even a penny, than DeepSjeng 2011
(which definitely is much better evaluation than rybka)
In fact it's very similar to it as well.
Yet DeepSjeng 2011 is from spring 2011, and Komodo5 is far more recent...
Who copied who?