Page 1 of 4

Re: what happened to scorpio NN in TCEC?

Posted: Sat Nov 17, 2018 6:19 pm
by Daniel Shawul
From what I gather it is not the fault of Scorpio but cutechess-cli.

Apparently cutechess-cli only waits 10 seconds for an engine to load but loading Scorpio neural networks may take upto 30 seconds.
It is not a problem for winboard if an engine takes an hour to initialize because the winboard protocol says this
done (integer, no default)
If you set done=1 during the initial two-second timeout after xboard sends you the "xboard" command, the timeout will end and xboard will not look for any more feature commands before starting normal operation. If you set done=0, the initial timeout is increased to one hour; in this case, you must set done=1 before xboard will enter normal operation.
The xboard protocol provides a way to counter this by doing:
"feature done 0" ... then time-taking operation ... "feature done 1"
So the engine can take upto 1 hour initializing its stuff and there shouldn't be a problem.
I implemented that and expect it to work in every GUI but i guess cutechess-cli just resumes normal operation after waiting only 10 seconds....

I am fine with scorpio getting out of the tournament due to its hangs but it should not be alluded that the cause of the tournament being restarted
is scoprio especially when i did things the rightway and their GUI (cutechess-cli) happens not to implement winboard correctly.

Daniel

Re: what happened to scorpio NN in TCEC?

Posted: Sat Nov 17, 2018 6:27 pm
by Branko Radovanovic
kasinp wrote: Sat Nov 17, 2018 5:46 pm Imagine a Stockfish version with a particularly nasty bug that causes it to crash after capturing a knight on f4. This scenario will occur infrequently, but will cause SF to lose the game on the spot. A tournament in which this happens in, say 6 games out of 30, will lead to a distorted view of the strength of other engines.
Are you saying that engines that won points from SF's crashes will falsely appear stronger relative to engines that did not happen to win points in the same way? That is true but, as you noted, we don't know a priori which engines will fall into each group.
kasinp wrote: Sat Nov 17, 2018 5:46 pm Wins and losses are not random as with the growing number of games they reflect the relative strength of engines with increasing accuracy. OTOH the selection of opponents against which an engine happens to crash is random.
What I meant is that individual wins, losses and draws are random (not flip-of-the-perfect-coin-random, of course, but distributed in accordance with relative strength). If "fairness" of a certain ruleset means "maximizing the prior probability that a stronger engine gets promoted over the weaker one", then removing valid game outcomes will inevitably work against fairness. Using your example, removing 24 actual or potential "valid" outcomes as well as 6 "tainted" outcomes will have the net negative effect in measuring the relative strength of competitors. (That is my hypothesis anyway - I believe it could be proven with a Monte Carlo simulation, for example.)

Re: what happened to scorpio NN in TCEC?

Posted: Sat Nov 17, 2018 6:41 pm
by Daniel Shawul
Daniel Shawul wrote: Sat Nov 17, 2018 6:19 pm From what I gather it is not the fault of Scorpio but cutechess-cli.

Apparently cutechess-cli only waits 10 seconds for an engine to load but loading Scorpio neural networks may take upto 30 seconds.
It is not a problem for winboard if an engine takes an hour to initialize because the winboard protocol says this
done (integer, no default)
If you set done=1 during the initial two-second timeout after xboard sends you the "xboard" command, the timeout will end and xboard will not look for any more feature commands before starting normal operation. If you set done=0, the initial timeout is increased to one hour; in this case, you must set done=1 before xboard will enter normal operation.
The xboard protocol provides a way to counter this by doing:
"feature done 0" ... then time-taking operation ... "feature done 1"
So the engine can take upto 1 hour initializing its stuff and there shouldn't be a problem.
I implemented that and expect it to work in every GUI but i guess cutechess-cli just resumes normal operation after waiting only 10 seconds....

I am fine with scorpio getting out of the tournament due to its hangs but it should not be alluded that the cause of the tournament being restarted
is scoprio especially when i did things the rightway and their GUI (cutechess-cli) happens not to implement winboard correctly.

Daniel
I am not even sure this is the case at all. It played 80 blitz games without a problem so if NN loading taking too long was a problem, it would have
caused way too many hangs there...

Anyway one would assume cutechess-cli probably implemented the xboard protocol correctly.

Edit:
Indeed cutechess implements things correctly like I suspected. It waits for a "feature done 1" before initializing.

Code: Select all

	else if (name == "done")
	{
		write("accepted done", Unbuffered);
		m_initTimer->stop();
		
		if (val == "1")
			initialize();
		return;
	}
The only explanation for me is that it is not strong enough for Div4 so lets blame it on its hangs and then say it was causing cutechess-cli to hang or whatever...

Re: what happened to scorpio NN in TCEC?

Posted: Sat Nov 17, 2018 7:58 pm
by syzygy
Branko Radovanovic wrote: Sat Nov 17, 2018 5:09 pm Confusion aside, the very idea of removing the engine with three crashes from competition and discarding all of its games - supposedly in the interest of fairness - is absolutely misguided, because crashes are essentially random events, just as normal wins, losses and draws are, and discarding valid games actually hurts fairness instead of improving it.
Crashes are random events with a probability distribution that is entirely unrelated to that of normal wins, losses and draws. The inclusion of games of a randomly crashing engine clearly hurts fairness. Discarding all the games of such an engine seems quite reasonable. (Of course the decision to disqualify an engine should be made on the basis of predetermined criteria.)

Re: what happened to scorpio NN in TCEC?

Posted: Sat Nov 17, 2018 8:11 pm
by syzygy
Branko Radovanovic wrote: Sat Nov 17, 2018 6:27 pmIf "fairness" of a certain ruleset means "maximizing the prior probability that a stronger engine gets promoted over the weaker one", then removing valid game outcomes will inevitably work against fairness. Using your example, removing 24 actual or potential "valid" outcomes as well as 6 "tainted" outcomes will have the net negative effect in measuring the relative strength of competitors. (That is my hypothesis anyway - I believe it could be proven with a Monte Carlo simulation, for example.)
It seems clear to me that there is no net negative effect. Removing all the games of the randomly crashing engine simply results in a normal tournament with one fewer participant.

You could argue the more participants the better, but that does not apply if the extra participant introduces severe noise.

edit:
OK, thinking about it again I think I now see your point. Your point is probably that the random crashes are equivalent to game-losing blunders, so an engine that crashes is not different from an engine that randomly blunders away the game.

I suppose it is hard to argue against that.
Still, it is intuitively clear to me that such unpredictable engines are not desirable in a tournament. In the long run (as the number of games approaches infinity), the randomly crashing engine clearly will not affect the relative ranking of the remaining engines (or at least not more than the addition of any regular engine to a tournament could). But an unpredictably crashing engines makes the results more volatile so that more games are needed. Just like you need more games at STC than at LTC.

Suppose engines A and B are equally strong.
If A and B always draw, then any tournament will give accurate results.
If A and B never draw, then it is highly likely that a tournament will suggest that one is stronger than the other, even though they are equally strong.

So the higher draw ratio, the "better" (not always better for spectators, but still).
Random crashes artificially reduce that draw ratio (and they will not make spectators happy).

Re: what happened to scorpio NN in TCEC?

Posted: Sat Nov 17, 2018 8:19 pm
by Branko Radovanovic
syzygy wrote: Sat Nov 17, 2018 7:58 pm The inclusion of games of a randomly crashing engine clearly hurts fairness.
How can one hurt fairness by including non-crashing games of a crashing engine? On the contrary, it is their removal that hurts fairness. If it's unfair for an engine to receive a point just because its opponent crashed, it is then equally unfair to deduct a fully earned point against such engine from a game in which it didn't crash. If the choices are discard all or discard none, discarding all only seems fairer.

Re: what happened to scorpio NN in TCEC?

Posted: Sat Nov 17, 2018 8:32 pm
by syzygy
Branko Radovanovic wrote: Sat Nov 17, 2018 8:19 pm
syzygy wrote: Sat Nov 17, 2018 7:58 pm The inclusion of games of a randomly crashing engine clearly hurts fairness.
How can one hurt fairness by including non-crashing games of a crashing engine?
I now agree that strictly speaking it does not hurt fairness. But it does hurt the tournament because it distorts the results. See my previous post (which I have edited).
On the contrary, it is their removal that hurts fairness. If it's unfair for an engine to receive a point just because its opponent crashed, it is then equally unfair to deduct a fully earned point against such engine from a game in which it didn't crash. If the choices are discard all or discard none, discarding all only seems fairer.
It is not unfair to deduct the point if you consider that there is no difference with a tournament in which the disqualified engine had not started to begin with.

Re: what happened to scorpio NN in TCEC?

Posted: Sat Nov 17, 2018 8:39 pm
by syzygy
syzygy wrote: Sat Nov 17, 2018 8:32 pm
Branko Radovanovic wrote: Sat Nov 17, 2018 8:19 pm
syzygy wrote: Sat Nov 17, 2018 7:58 pm The inclusion of games of a randomly crashing engine clearly hurts fairness.
How can one hurt fairness by including non-crashing games of a crashing engine?
I now agree that strictly speaking it does not hurt fairness. But it does hurt the tournament because it distorts the results. See my previous post (which I have edited).
To add to my explanation why it distorts even though it does not hurt fairness:

Suppose TCEC had a rule that after the tournament is finished, each engine is awarded random bonus points. For all engines the bonus points are drawn from the same random distribution, so this is absolutely and completely fair. But clearly this would just add noise to the tournament results, distorting the tournament's outcome.

A randomly crashing engine is hardly different from such a rule. Compared to a hypothetical tournament in which the engine had not randomly crashed, the crashing engine randomly awards half and full points to some of the other engines.

Re: what happened to scorpio NN in TCEC?

Posted: Sat Nov 17, 2018 8:49 pm
by syzygy
I think another problem with a randomly crashing engine is that its performance relative to other engines contradicts the otherwise reasonable assumption that Elo differences are additive. So it's not just draw ratio that is the problem.

(So even if draws were impossible, as in tennis, it would be highly undesirable to have a randomly crashing player if we care about the reliability of the relative ranking of the non-crashing players at the end of the tournament.)

Re: what happened to scorpio NN in TCEC?

Posted: Sat Nov 17, 2018 9:01 pm
by Branko Radovanovic
syzygy wrote: Sat Nov 17, 2018 8:39 pm Suppose TCEC had a rule that after the tournament is finished, each engine is awarded random bonus points. For all engines the bonus points are drawn from the same random distribution, so this is absolutely and completely fair. But clearly this would just add noise to the tournament results, distorting the tournament's outcome.

A randomly crashing engine is hardly different from such a rule. Compared to a hypothetical tournament in which the engine had not randomly crashed, the crashing engine randomly awards half and full points to some of the other engines.
Absolutely true: points from crashes are randomly won points (as you've noted, and I've discussed it in a recent thread about Leela, this is similar to random major blunders), and that by itself hurts fairness. However, on the other hand, points won fair-and-square contribute to fairness. This is particularly important if the probability of crashing for a given engine is fairly low (i.e. not close to say 0.5), which is usually the case.

There is actually a difference between a tournament with all engines and the same tournament without the crashing engine: the latter has fewer valid games, which essentially means more random chance and therefore less fairness - that is, if one adopts the definition I've given earlier (note I'm not saying it is the best or the only definition of "fairness" - the term is a bit hard to define).