Laskos wrote: ↑Tue Aug 13, 2019 8:16 am
Uri Blass wrote: ↑Tue Aug 13, 2019 5:46 am
I think that if we claim that stockfish does not find the right move then we need some evidence that the move we claim to be better is really the better move(for example some tree that if we go forward and backward in the tree stockfish can learn to play the right move).
... I guess Stockfish of today has very little idea about the outcome (the finality) of a quiet balanced position in openings or midgames. ...
You are absolutely right! Stockfish has trouble making plans in the opening and early middlegame.
However,
"IF" you have good analysis technique, you can improve it's performance significantly. And above all else you do need strong verification that the best move is actually best.
I am against picking the solutions based on some engine analysis in these positional test suites. I checked to see when exactly I built Openings200, it was built almost 3 years ago. It could have been easily proven junk by Lc0 when it arrived, and that would have probably happened if I had built the positional suite by engine analysis (that happened to STS suite, in fact). To my surprise, my approach was somehow validated by Lc0, it comes in Openings200 (revised or not) far ahead of regular eval AB engines.
I'm not against using computer analysis provided it meets certain criterion. The primary one being that the positions evaluation must eventually lead to a decided advantage for one side or the other. If you can't “prove” this with an AB type engine (NN engines are too weak tactically to do this I think) then the position should be excluded or put into a separate category.
There are enough instances where this is possible even when the positions appears to be drawn with “normal” analysis techniques to make this a workable solution. It does, however, require extensive analysis of the positions involved.
Laskos wrote: ↑Tue Aug 13, 2019 11:02 am
While about my positional test-suites I had great doubts myself, about positional superiority of Lc0 on a reasonable GPU with a strong net, I have few doubts. Maybe I don't understand what "positional" means. What is obvious, from games, test-suites, is that Lc0 is clearly weaker tactically compared to not only Stockfish, but even to much weaker modern AB engines with a regular eval. It is again obvious to me that to be the strongest engine on my PC from regular openings, Lc0 compensates by its very strong "positional" play, maybe in my wrong understanding of the notion of "positional", as some sort of conjugate of "tactical". I am curious, do you have some confidence that a strong Lc0 is superior "positionally" to a strong regular eval AB engine?
That's why I was pretty happy that Lc0 comes far atop in my dubious positional suites. And after all, is it common sense to say: databases of human games are wrong, Lc0 is not that strong positionally so a validation by it means nothing, and the only way to check for correctness is to go back and forth from each position with Stockfish for hours on end? My common sense would tell me that it is a wrong methodology for what I understand to be a positional test suite.
Depending solely on Lc0 for validation is risky. It's tactical weakness allows for errors to creep into the analysis. I would think that ideally you would want SF and Lc0 to agree on a move. Lc0 will provide the “strategic” insight while SF will ensure that the plan/move is “tactically” sound. I don't believe that “normal” analysis using an AB engine is sufficient by itself to guarantee tactical soundness. If you want to use these positions as a “test” suite then you need to go to “extraordinary” measures to see that no unforeseen tactical motifs rear their ugly heads. I leave the definition of “extraordinary” up to you, but note that it should be better than just turning the engine on and letting it run for a while on a given position. At the very least, after a “normal” analysis is done a multi-PV=2 (or more) should be done to the same depth to verify that there is truly only one “best” move in the position. Having multiple “best” moves dilutes the value of the position even if you know all of them. (e.g. a position that has 36 moves and 12 of them are “best” is all most worthless)
zullil wrote: ↑Tue Aug 13, 2019 12:39 pm
What does "best move" or "right move" even mean here? Surely there are many moves in most of these positions that are equal with correct play---likely all leading to draws.
A
VERY good fundamental question. In most “truly” even middle-game positions there can be many “equal” moves. So, there IS NO “BEST” move. I would think that a position that “looks” even under normal analysis but is “in fact” won or at least highly favorable to one side is what is needed.
And forgive me, but in my opinion relying on statistics from completed games, especially human games, is like relying on noise created by cosmic radiation. My guess is that 95% of games that contain one of these positions are decided by significant blunders made subsequent to the positions.
True enough!
Laskos wrote: ↑Tue Aug 13, 2019 2:32 pm
… Are there many bottlenecks with one single best move? What is the usual dominant in regular games, many equivalent best moves? Maybe the oddities occurring in databases do mean something close to bottlenecks? We do have 6 and even 7 men tablebases, and although I haven't sit to analyze them, there are both of these cases, and interesting enough, an engine like Stockfish by itself (no TBs), with much of hardcoded endgame knowledge, often finds solutions (bm) even in these bottlenecks, often showing itself as a good estimator.
I spent a week or so looking at SF analysis of very long EGTB mates. The “raw” number of moves for the losing side seems to be epicyclic with both amplitude (# of moves) and length of cycle length (# of moves between the min and max # of moves) varying in a bounded semi-random fashion.
Uri Blass wrote: ↑Wed Aug 14, 2019 4:51 am
I think that testing the level of engines even in year 2000 should not be done at short time control because people could give the computer hours for analysis at home in order to find novelties.
I wonder if there is a top engine that suggested 2.Nc3 after 1.Nf3 d5 when you gave it one hour to search and note that using engines to find novelties does not always mean choosing the top moves of chess engines.
Note that I used chess engines in correspondence games from the opening stage and did not trust opening theory when I played correspondence chess(I do not play correspondence chess today).
Yes, opening theory is terrible compared to deep analysis with AB engines. GM games are all most worthless in correspondence chess. I used them when I first started but found that there are so many errors they are more of a hindrance than anything else. Even using ICCF games as guides is dangerous. Many contain errors and can't be trusted unless you examine every move in a given line of play in depth.
Regards,
Zenmastur