Page 6 of 13

Re: TCEC stage 3 , New Houdini starts with a bang

Posted: Sun Jul 24, 2016 8:47 pm
by Laskos
Branko Radovanovic wrote:
Laskos wrote:The 5 ELO points difference in Komodo settings with probability of 98% didn't have any impact on the outcome of the first game, and these 5 ELO points in single game are negligible. It's replaying of the game which impacted seriously the stage. It flipped the fortunes of a single game approximately by the value of the drawelo of 250 ELO points. Replaying the game had a pretty devastating and an expected a priori effect on the outcome of a single previously won by Houdini game. Come on, humans have pretty good intuition on these matters, everybody knew that the decision favors Komodo for that game massively.
"Someone stole a lottery ticket from Komodo and gave it to Houdini. Then, the ticket won the jackpot. Taking the jackpot from Houdini and giving it to Komodo would have a devastating effect on Houdini because it would cost him millions, while the value of the lottery ticket in the moment it was stolen was pretty much negligible, being a million-to-one longshot. So, this decision favors Komodo massively."

That's really just a rephrasing of my earlier reply to Ronald de Man: one either compares a priori with a priori or a posteriori with a posteriori - one cannot mix the two, because it leads to faulty conclusions like the one above.
You simply don't realize a simple fact obesrved intuitively by almost everybody (but you) that flipping a coin until it gives a desired result is not science about the coin and I am afraid is a bit flawed stage of TCEC.

Re: TCEC stage 3 , New Houdini starts with a bang

Posted: Sun Jul 24, 2016 9:13 pm
by Uri Blass
syzygy wrote:
Uri Blass wrote:
Laskos wrote:
IanO wrote:
carldaman wrote:What was wrong with Komodo's settings? People should at least be entitled to a proper explanation by the organizers.
The differences can be found in the initial game comment.

Before:

Code: Select all

BlackEngineOptions: Use Syzygy=true; Syzygy Probe Depth=2; Syzygy Probe Limit=6; Dynamism=110;
After:

Code: Select all

BlackEngineOptions: Contempt=7;
To summarize, removing custom Syzygy probe options and "Dynamism" (whatever that is), and adding Contempt 7.

It would be nice to hear from Mark what the effects of these changes are likely to be.
The values "Before" were close to to optimal. The ELO loss was no larger than 5-10 ELO points and hardly the reason for the lost game. Seems unfair to Houdini to replay the game.
I do not think that we know the elo loss at long time control and there is no basis to claim that the wrong setting is not the reason for the lost game.

I see nothing unfair for Houdini.
It is only fair if the game had been replayed no matter what.

That this game would have been replayed no matter what is far from clear. The rules are silent on this situation and what if it had been discovered only much later?

What if Komodo had drawn or won "despite its disadvantage"? I think most people would have considered it fair not to replay the game in that case.
Replying to:
"What if Komodo had drawn or won "despite its disadvantage"?"

I do not know the opinion of most people but my point of view that it is not fair not to replay the game also in this case.

I suggest that in the future everybody can see the default setting of the programs before the tournament(so everybody and not only the programmers can complain and ask to replay games if they see something is wrong).

Re: TCEC stage 3 , New Houdini starts with a bang

Posted: Sun Jul 24, 2016 9:50 pm
by Ralf Müller
It's not about replay a game, it's about restart the whole tournament. All engines start at 0 points and have the same possibilities.

Also it would be fair to restart a whole coin flip tournament (if there are enough rounds to go).

Re: TCEC stage 3 , New Houdini starts with a bang

Posted: Sun Jul 24, 2016 10:02 pm
by Branko Radovanovic
Laskos wrote:
Branko Radovanovic wrote:
Laskos wrote:The 5 ELO points difference in Komodo settings with probability of 98% didn't have any impact on the outcome of the first game, and these 5 ELO points in single game are negligible. It's replaying of the game which impacted seriously the stage. It flipped the fortunes of a single game approximately by the value of the drawelo of 250 ELO points. Replaying the game had a pretty devastating and an expected a priori effect on the outcome of a single previously won by Houdini game. Come on, humans have pretty good intuition on these matters, everybody knew that the decision favors Komodo for that game massively.
"Someone stole a lottery ticket from Komodo and gave it to Houdini. Then, the ticket won the jackpot. Taking the jackpot from Houdini and giving it to Komodo would have a devastating effect on Houdini because it would cost him millions, while the value of the lottery ticket in the moment it was stolen was pretty much negligible, being a million-to-one longshot. So, this decision favors Komodo massively."

That's really just a rephrasing of my earlier reply to Ronald de Man: one either compares a priori with a priori or a posteriori with a posteriori - one cannot mix the two, because it leads to faulty conclusions like the one above.
You simply don't realize a simple fact obesrved intuitively by almost everybody (but you) that flipping a coin until it gives a desired result is not science about the coin and I am afraid is a bit flawed stage of TCEC.
Flipping a coin until it gives a desired result is indeed not science, but that's not at all what happened here. The game which was played in violation of the rules was null and void in the moment when it started, null and void while it was played, and null and void after it ended - there is no difference. That's a "legalist" point of view, so to say, and I don't think it can be reasonably disputed.

Here is a real question: what if Komodo won the game with wrong settings? Would it be fair then to: a) accept this result as valid, since the engine won while handicapped, or b) replay the game regardless? Mathematically, it's both intuitive and easy to show that the correct answer is b), even if this could lead to a seemingly grossly unfair outcome, such as Komodo losing with the correct settings afterwards. In fact, only if one decides to replay or not based on the outcome of the tainted game, the thing becomes equivalent to what you've described as "flipping a coin until it gives a desired result".

Unconditional replay is both "legally" right and mathematically fair, as it results in the same expectancy as if everything was correct from the start.

Re: TCEC stage 3 , New Houdini starts with a bang

Posted: Sun Jul 24, 2016 11:25 pm
by syzygy
Ralf Müller wrote:It's not about replay a game, it's about restart the whole tournament. All engines start at 0 points and have the same possibilities.

Also it would be fair to restart a whole coin flip tournament (if there are enough rounds to go).
You are making a basic error.

If someone thinks he might have a patch that improves SF, he can run it on fishtest. If the patch passes all tests, it will normally be added to the official SF.

Now suppose you try this and the patch fails some of the tests.
You try again... you "restart the whole tournament". Now it passes.

What do you expect to happen? Will your patch be added to the official Stockfish?
No, your patch will not be added and for very good reasons.

Restarting the tournament for whatever reason (provided it is linked, however remotely, to the outcome of the earlier games) is inherently unfair. (And in any event, it is absolutely irrelevant that the game that was restarted was the first game of the series.)


Completely separate question: what if Komodo had won the game despite its disadvantage and the stage would have continued without replay. Would you have objected?

Re: TCEC stage 3 , New Houdini starts with a bang

Posted: Sun Jul 24, 2016 11:33 pm
by syzygy
Branko Radovanovic wrote:The game which was played in violation of the rules was null and void in the moment when it started, null and void while it was played, and null and void after it ended - there is no difference. That's a "legalist" point of view, so to say, and I don't think it can be reasonably disputed.
If the rules had been so clear, I would have agreed. (But I wonder how the rules would have had to be formulated. What do you do if the error is detected only very late in the stage and affects many games? Replay everything?)
Here is a real question: what if Komodo won the game with wrong settings? Would it be fair then to: a) accept this result as valid, since the engine won while handicapped, or b) replay the game regardless? Mathematically, it's both intuitive and easy to show that the correct answer is b), even if this could lead to a seemingly grossly unfair outcome, such as Komodo losing with the correct settings afterwards. In fact, only if one decides to replay or not based on the outcome of the tainted game, the thing becomes equivalent to what you've described as "flipping a coin until it gives a desired result".
I fully agree. Unfortunately I could not avoid seeing phrases such as "Komodo has the right to a replay". (Please note: not coming from Mark.)

I know this is speculative, but if Komodo had won (or drawn or even lost) and the game would not have been replayed, I don't think any discussion would have been taking place now. Of course everybody would have agreed that Komodo should play its remaining games with the correct settings.

Did you check the written rules?

Re: TCEC stage 3 , New Houdini starts with a bang

Posted: Sun Jul 24, 2016 11:58 pm
by MikeB
syzygy wrote:
Ralf Müller wrote:It's not about replay a game, it's about restart the whole tournament. All engines start at 0 points and have the same possibilities.

Also it would be fair to restart a whole coin flip tournament (if there are enough rounds to go).
You are making a basic error.

If someone thinks he might have a patch that improves SF, he can run it on fishtest. If the patch passes all tests, it will normally be added to the official SF.

Now suppose you try this and the patch fails some of the tests.
You try again... you "restart the whole tournament". Now it passes.

What do you expect to happen? Will your patch be added to the official Stockfish?
No, your patch will not be added and for very good reasons.

Restarting the tournament for whatever reason (provided it is linked, however remotely, to the outcome of the earlier games) is inherently unfair. (And in any event, it is absolutely irrelevant that the game that was restarted was the first game of the series.)


Completely separate question: what if Komodo had won the game despite its disadvantage and the stage would have continued without replay. Would you have objected?
+1 Absolutely correct. It's so unfortunate that such a poor decision was made and this illustrates the down fall of a single organization (or person) making all the decisions. Going forward , they should have an independent body of seasoned chess programmers who do not have an engine involved to make such decisions. They might have come to the same conclusion, but unfortunately even the slightest hint of perceived favoritism becomes reality in cases like this.

Re: TCEC stage 3 , New Houdini starts with a bang

Posted: Mon Jul 25, 2016 12:24 am
by Peter Berger
You have to realize that these situations happen ALL THE TIME in engine tournaments. They just go unnoticed often.

I attended 3 WCCCs as an operator. Each year I observed unexpected situations that demanded a judgement by an official. Some I agreed to, some I didn't.

E.g. Crafty-Shredder Ramat-Gan 1st round when Crafty crashed and wasn't set back up in time, It should have been a loss for Crafty in a perfect world IMHO. The game mainly ended as a draw because SMK was very friendly and resisted the visible temptation to insist on a win by default. The scorssheet eventually just showed a pretty boring draw.

It is the same with ALL basement tourneys. I have run quite a few myself ,and OFTEN sth unexpected happened at some point, despite utmost care with the setup.

How do you judge the case when at some point in the game some scheduled process running monthly you have completely forgotten about fires off and steals processor time from an engine?

How do you judge crashes when it can't be worked out who is to blame ( engine, setup, machine)? As every judgement changes the probablities of the final result.

You can try to do your best though.

In case the one to blame is the operator as he set up the engine incorrectly, the only reasonable decision is to ALWAYS replay the game/s though IMHO. No matter the result of the game.

So I agree with the TCEC decision.

Peter

Re: TCEC stage 3 , New Houdini starts with a bang

Posted: Mon Jul 25, 2016 1:10 am
by Ralf Müller
Your example of the patch misses one important thing: If you have completed a test series with a new patch and the patch passes all tests and before you've done one another SINGLE game under slightly OTHER circumstances, where the patch lost this single game - than you'll of course add the patch to the official Stockfish for good reasons.

And yes, I would also insist for a replay if Komodo had won this. If the result shall be worth something, all games must be played under similar circumstances.

Re: TCEC stage 3 , New Houdini starts with a bang

Posted: Mon Jul 25, 2016 1:21 am
by Rochester
Sensible only play with the default setting. Then the programer must code it correctly always. He can make the program use the setting he want. Make new program when change setting.

And when programmer go to the movie he can't complain when he return. Movie is more important? NO!