Zenmastur wrote:The people that develop Stockfish and Komodo use “tournaments” to test their patches. They play the old version against the new version in a contest to see which is better. When you look at SF testing you see lines that look like this:
Code: Select all
24-05-16 Vo sp-killer diff LLR: -2.95 (-2.94,2.94) [0.00,5.00] sprt @ 10+0.1 th 1 Don't SEE prune main killer. (Sometimes killer moves act like a bait)
Total: 12889 W: 2322 L: 2392 D: 8175
This is in fact the results of a tournament held between two different versions of SF to determine which is better. So, I have to strongly disagree!
Well, it is pretty stupid to use that for testing modifications in the hashing scheme, as you know that the hash-table size only weakly affects speed, and speed only weekly affects Elo. A doubling of the Hash size typically produces only 6% speedup (the 12th root of 2), which corresponds to 6 Elo. A 10% more efficient use of the hash table would only cause 0.8 Elo. It would take >100,000 games to see that, while the speedup could already be measured accurately from a few hundred test positions (the equivalent of 10 games or so).
When a program like Komodo participate in tournaments like TCEC there is a lot of risk since a bad result could have bad economic consequences for the authors. A lot of buyers look at the programs tournament performance when considering purchasing the program. They want a good value for their money and a poor or very poor performance may make them think twice about a purchase. So the developers may not want to risk their program tournaments prowess on a patch that will only be “good” during long analysis sessions.
Volkswagen also took a lot of risk when they had their cars tested for emission of pollutants, because when they would score badly on that count they would lose a lot of business on people that wanted to buy environmentally friendly cars and governments that offered subsidies for those. So they had the engine run in a different mode during tests as it would run in for the purpose of doing what people actually bought it for: driving on roads from one place to another.
So now they are widely known as frauds and cheaters, who swindled people out of their money with false promises, because that would have better 'economic consequences' for them.
What you propose is that the Komodo developers should likewise defraud their customers, by using methods to beef up their test performance by methods they know would fail badly in actual use. I would not recommend that method of doing business.
This is true, but the more complex the change the more time, code and testing is required. People with limited resources (which is every chess engine developer on the planet) would be reluctant to make massive changes if there is but little to be gained compared to a simpler implementation. So, unless you want to demonstrate that your method is superior and that the difference is worth the extra effort to implement it, I think most people will choose a simpler option.
It is fortunate then that equidistributed-draft replacement is pretty simple to implement.
You get Larry Kaufman to post in this thread that he could care less how Komodo does in TCEC then I might take this point seriously. The fact is winning tournaments sells Komodo better than any advertisement ever could. So, I don't think your going to convince me or many other people that “no cares” about tournament reults.
I was talking about
users. Of course a supplier often has interests opposite to customers. He would earn much more by selling them cheap crap for a high price, while the customers want high quality for a low price. So a dishonest supplier could indeed consider it very important to create false expectations with their customers. That doesn't mean, however, that the customers think it equally important that they will have false expectations...
What I propose is a simple, quick, and easy method to solve an analysis problem without screwing with the programs tournament performance.
And what I propose is a simple, safe, fundamentally correct, and robust method to solve the analysis problem and improve tournament performance at the same time.
Well, if its so damn easy, why don't you lay some numbers down that supports your claim.
As I said, there is little to gain from improving hash performance, so for me this doesn't have a very high priority even for my own engines. I am currently working on a tablebase generator, which must be finished before the ICGA event.
Until it's shown to be “inferior” I have no reason to believe that it is.
What you believe or not is of course entirely your business. You could believe that the Earth is flat, for all I care. Technological progress is usually brought about by people with a vision, that do recognize the reasons why some things would work better before it was shown to them that they did. Imagine that Edison would have said "I won't spend any time on building a light bulb, because I have no reason to believe that would work, unless you first show me one that does"...
In fact there are of course many theoretical reasons why some things are inferior to others, and ignoring those as a matter of principle doesn't get you very far. You suggested yourself it would be more important to protect exact entries from overwriting than bounds. And it is easily calculated that a minimal alpha-beta tree has many more nodes in the PV sub-trees than in those hanging from cut- or all-nodes.
In case of doubt, I would always try the simplest method first. I.E. the “kiss” strategy is a well know strategy for a reason. It works!
Well, "simple" doesn't necessarily always translate to "dumb". When in doubt, the first thing I would do is remove the doubt...