Martin on the SF loss on time

bob · Post by **bob** » Mon Oct 12, 2015 12:03 am

michiguel wrote:
hgm wrote:So why is crashing a play-limiting bug, and a too-generous time allocation not, when the observable results are exactly the same?

They are not the same. SF lost on time, but the games were properly "resolved" (SF just played after the time limit) and the whole communication with the GUI was ok (exiting properly etc). That is what I understood. By the definition and the examples given in the rules (crashing or communication bugs with the interface) this does not fall under the category. Just by looking at the behaviour, you do not even know if this is a bug and an extremely (crazy) aggressive time management to use the last possible millisecond.

But this is not really up to debate since the organizers decided it was NOT a play-limiting bug. The reason SF was allowed to continue was because the opponents allowed it by a unanimous vote, if I understood correctly.

Miguel

It is not that I want to argue that the Stockfish team doesn't get what they deserve; I am just puzzled by the fact that they consider these things to be different. It seems that 'play-limiting' is not so much defined by the symptoms, as by the underlying technical causes for these symptoms. To be fair to the Stockfish team it should be said that in TCEC they seem to be running on quite special hardware so that they really didn't have much of an opportunity to discover that on this particular hardware things might be a few msec slower.

There is a risk of skewing results, however, which ought to be avoided. IE suppose YOU play against a program that loses both games to you on time, then you are asked "should they be allowed to fix this?" Of course you would say "yes. Give 'em all the time they need. That will give others more losses and might let me advance where I would not if they also picked up time wins...

It sounds like a good choice to make, but there are hidden undercurrents that are often overlooked. If it were a Swiss, for example, this would even affect future round pairings.

And don't forget the obvious pressure this puts on each opponent to say "yes, lets fix this." VERY difficult to say "no" no matter how you feel. Peer pressure is quite real.

michiguel · Post by **michiguel** » Mon Oct 12, 2015 12:13 am

bob wrote:
michiguel wrote:
hgm wrote:So why is crashing a play-limiting bug, and a too-generous time allocation not, when the observable results are exactly the same?

They are not the same. SF lost on time, but the games were properly "resolved" (SF just played after the time limit) and the whole communication with the GUI was ok (exiting properly etc). That is what I understood. By the definition and the examples given in the rules (crashing or communication bugs with the interface) this does not fall under the category. Just by looking at the behaviour, you do not even know if this is a bug and an extremely (crazy) aggressive time management to use the last possible millisecond.

But this is not really up to debate since the organizers decided it was NOT a play-limiting bug. The reason SF was allowed to continue was because the opponents allowed it by a unanimous vote, if I understood correctly.

Miguel

It is not that I want to argue that the Stockfish team doesn't get what they deserve; I am just puzzled by the fact that they consider these things to be different. It seems that 'play-limiting' is not so much defined by the symptoms, as by the underlying technical causes for these symptoms. To be fair to the Stockfish team it should be said that in TCEC they seem to be running on quite special hardware so that they really didn't have much of an opportunity to discover that on this particular hardware things might be a few msec slower.
There is a risk of skewing results, however, which ought to be avoided. IE suppose YOU play against a program that loses both games to you on time, then you are asked "should they be allowed to fix this?" Of course you would say "yes. Give 'em all the time they need. That will give others more losses and might let me advance where I would not if they also picked up time wins...

It sounds like a good choice to make, but there are hidden undercurrents that are often overlooked. If it were a Swiss, for example, this would even affect future round pairings.

And don't forget the obvious pressure this puts on each opponent to say "yes, lets fix this." VERY difficult to say "no" no matter how you feel. Peer pressure is quite real.

I should have said "The reason why SF was allowed to continue with a fix".
I am not giving an opinion, I am just describing what happened.

Miguel

bob · Post by **bob** » Mon Oct 12, 2015 12:27 am

michiguel wrote:
bob wrote:
michiguel wrote:
hgm wrote:So why is crashing a play-limiting bug, and a too-generous time allocation not, when the observable results are exactly the same?

They are not the same. SF lost on time, but the games were properly "resolved" (SF just played after the time limit) and the whole communication with the GUI was ok (exiting properly etc). That is what I understood. By the definition and the examples given in the rules (crashing or communication bugs with the interface) this does not fall under the category. Just by looking at the behaviour, you do not even know if this is a bug and an extremely (crazy) aggressive time management to use the last possible millisecond.

But this is not really up to debate since the organizers decided it was NOT a play-limiting bug. The reason SF was allowed to continue was because the opponents allowed it by a unanimous vote, if I understood correctly.

Miguel

It is not that I want to argue that the Stockfish team doesn't get what they deserve; I am just puzzled by the fact that they consider these things to be different. It seems that 'play-limiting' is not so much defined by the symptoms, as by the underlying technical causes for these symptoms. To be fair to the Stockfish team it should be said that in TCEC they seem to be running on quite special hardware so that they really didn't have much of an opportunity to discover that on this particular hardware things might be a few msec slower.
There is a risk of skewing results, however, which ought to be avoided. IE suppose YOU play against a program that loses both games to you on time, then you are asked "should they be allowed to fix this?" Of course you would say "yes. Give 'em all the time they need. That will give others more losses and might let me advance where I would not if they also picked up time wins...

It sounds like a good choice to make, but there are hidden undercurrents that are often overlooked. If it were a Swiss, for example, this would even affect future round pairings.

And don't forget the obvious pressure this puts on each opponent to say "yes, lets fix this." VERY difficult to say "no" no matter how you feel. Peer pressure is quite real.
I should have said "The reason why SF was allowed to continue with a fix".
I am not giving an opinion, I am just describing what happened.

Miguel

That was my understanding of your post. My comments were not directed toward you, but toward the decision. The best solution is to always have rules that are clear, concise, all inclusive and unambiguous. "all inclusive" is the challenge. But here, "clear" would eliminate the discussion. NASCAR rules come to mind... If the rules say your front air dam must be at least 1 7/8" above the ground when the vehicle is fully fueled and with driver on board, they mean 1 7/8" at least, NOT 1 13/16" or 1 3/4".

S.Taylor · Post by **S.Taylor** » Mon Oct 12, 2015 2:11 am

bob wrote:
michiguel wrote:
bob wrote:
michiguel wrote:
hgm wrote:So why is crashing a play-limiting bug, and a too-generous time allocation not, when the observable results are exactly the same?

They are not the same. SF lost on time, but the games were properly "resolved" (SF just played after the time limit) and the whole communication with the GUI was ok (exiting properly etc). That is what I understood. By the definition and the examples given in the rules (crashing or communication bugs with the interface) this does not fall under the category. Just by looking at the behaviour, you do not even know if this is a bug and an extremely (crazy) aggressive time management to use the last possible millisecond.

But this is not really up to debate since the organizers decided it was NOT a play-limiting bug. The reason SF was allowed to continue was because the opponents allowed it by a unanimous vote, if I understood correctly.

Miguel

It is not that I want to argue that the Stockfish team doesn't get what they deserve; I am just puzzled by the fact that they consider these things to be different. It seems that 'play-limiting' is not so much defined by the symptoms, as by the underlying technical causes for these symptoms. To be fair to the Stockfish team it should be said that in TCEC they seem to be running on quite special hardware so that they really didn't have much of an opportunity to discover that on this particular hardware things might be a few msec slower.
There is a risk of skewing results, however, which ought to be avoided. IE suppose YOU play against a program that loses both games to you on time, then you are asked "should they be allowed to fix this?" Of course you would say "yes. Give 'em all the time they need. That will give others more losses and might let me advance where I would not if they also picked up time wins...

It sounds like a good choice to make, but there are hidden undercurrents that are often overlooked. If it were a Swiss, for example, this would even affect future round pairings.

And don't forget the obvious pressure this puts on each opponent to say "yes, lets fix this." VERY difficult to say "no" no matter how you feel. Peer pressure is quite real.
I should have said "The reason why SF was allowed to continue with a fix".
I am not giving an opinion, I am just describing what happened.

Miguel
That was my understanding of your post. My comments were not directed toward you, but toward the decision. The best solution is to always have rules that are clear, concise, all inclusive and unambiguous. "all inclusive" is the challenge. But here, "clear" would eliminate the discussion. NASCAR rules come to mind... If the rules say your front air dam must be at least 1 7/8" above the ground when the vehicle is fully fueled and with driver on board, they mean 1 7/8" at least, NOT 1 13/16" or 1 3/4".

Also, the rules need to be correct (e.g. all inclusive) so they don't NEED to be changed unexpectedly. But if they are faulty, then there's no choice.

bob · Post by **bob** » Mon Oct 12, 2015 2:41 am

S.Taylor wrote:
bob wrote:
michiguel wrote:
bob wrote:
michiguel wrote:
hgm wrote:So why is crashing a play-limiting bug, and a too-generous time allocation not, when the observable results are exactly the same?

They are not the same. SF lost on time, but the games were properly "resolved" (SF just played after the time limit) and the whole communication with the GUI was ok (exiting properly etc). That is what I understood. By the definition and the examples given in the rules (crashing or communication bugs with the interface) this does not fall under the category. Just by looking at the behaviour, you do not even know if this is a bug and an extremely (crazy) aggressive time management to use the last possible millisecond.

But this is not really up to debate since the organizers decided it was NOT a play-limiting bug. The reason SF was allowed to continue was because the opponents allowed it by a unanimous vote, if I understood correctly.

Miguel

It is not that I want to argue that the Stockfish team doesn't get what they deserve; I am just puzzled by the fact that they consider these things to be different. It seems that 'play-limiting' is not so much defined by the symptoms, as by the underlying technical causes for these symptoms. To be fair to the Stockfish team it should be said that in TCEC they seem to be running on quite special hardware so that they really didn't have much of an opportunity to discover that on this particular hardware things might be a few msec slower.
There is a risk of skewing results, however, which ought to be avoided. IE suppose YOU play against a program that loses both games to you on time, then you are asked "should they be allowed to fix this?" Of course you would say "yes. Give 'em all the time they need. That will give others more losses and might let me advance where I would not if they also picked up time wins...

It sounds like a good choice to make, but there are hidden undercurrents that are often overlooked. If it were a Swiss, for example, this would even affect future round pairings.

And don't forget the obvious pressure this puts on each opponent to say "yes, lets fix this." VERY difficult to say "no" no matter how you feel. Peer pressure is quite real.
I should have said "The reason why SF was allowed to continue with a fix".
I am not giving an opinion, I am just describing what happened.

Miguel
That was my understanding of your post. My comments were not directed toward you, but toward the decision. The best solution is to always have rules that are clear, concise, all inclusive and unambiguous. "all inclusive" is the challenge. But here, "clear" would eliminate the discussion. NASCAR rules come to mind... If the rules say your front air dam must be at least 1 7/8" above the ground when the vehicle is fully fueled and with driver on board, they mean 1 7/8" at least, NOT 1 13/16" or 1 3/4".
Also, the rules need to be correct (e.g. all inclusive) so they don't NEED to be changed unexpectedly. But if they are faulty, then there's no choice.

The only problem here is the software engineering no-no of having an "ambiguous specification".

It is a very difficult problem. Sometimes precise wording is not possible (in the case of "originality" for example). But for this specific case, they could eliminate the "show stopper" problems with a quick validation blitz tourney or whatever up front. Then you "run what you brung."

S.Taylor · Post by **S.Taylor** » Mon Oct 12, 2015 7:06 am

RJN wrote:2nd video update from Martin:

https://vid.me/9XV0

So DID Martin make another video update as promissed?

RJN · Post by **RJN** » Mon Oct 12, 2015 7:26 am

S.Taylor wrote:
RJN wrote:2nd video update from Martin:

https://vid.me/9XV0
So DID Martin make another video update as promissed?

I do not believe so, and the other videos are deleted, so I guess they served their purpose as the situation unfolded. I don't recall that he "promised" to do a 3rd one.

S.Taylor · Post by **S.Taylor** » Mon Oct 12, 2015 8:44 am

RJN wrote:
S.Taylor wrote:
RJN wrote:2nd video update from Martin:

https://vid.me/9XV0
So DID Martin make another video update as promissed?
I do not believe so, and the other videos are deleted, so I guess they served their purpose as the situation unfolded. I don't recall that he "promised" to do a 3rd one.

He said he will decide by the next day what to do, and i think he said that even though he doesn't like making videos, that he will be back with the answer one more time.

mcostalba · Post by **mcostalba** » Mon Oct 12, 2015 1:02 pm

MikeB wrote: They got lucky - they submitted an untested beta version and were getting burnt. Hopefully next time, wiser decisions are made as to which version to submit.

No we didn't get lucky. I really don't see where the lucky is.

For the record the decision was taken unilaterally by tournament managers (we were not involved in this specific decision but we have been informed afterwards as everybody else). Of course the tournament is theirs and they can do what they want.

At the moment we are discussing internally if and what to reply as official answer.

I post here as my personal comment, not as SF speaker.

mjlef · Post by **mjlef** » Mon Oct 12, 2015 3:22 pm

mcostalba wrote:
MikeB wrote: They got lucky - they submitted an untested beta version and were getting burnt. Hopefully next time, wiser decisions are made as to which version to submit.
No we didn't get lucky. I really don't see where the lucky is.

For the record the decision was taken unilaterally by tournament managers (we were not involved in this specific decision but we have been informed afterwards as everybody else). Of course the tournament is theirs and they can do what they want.

At the moment we are discussing internally if and what to reply as official answer.

I post here as my personal comment, not as SF speaker.

Marco,

I think by "lucky" he meant that the other competitors voted to allow time exceeding version of Stockfish to be changed. This did not happen for other programs that lost on time. All the votes cast said yes, showing that the programmers wanted to have a more stable version and improve Stockfish's winning chances. Since this time control bug would like make it much more likely another program would win, I find this very noble that they voted this way.

Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time

Re: Martin on the SF loss on time