UCI protocol issue

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

jdart
Posts: 4367
Joined: Fri Mar 10, 2006 5:23 am
Location: http://www.arasanchess.org

Re: UCI protocol issue

Post by jdart »

But if the move sequence in the "position command" ends with a ponder move and the next command is "go ponder" then the game is not finished, it would only be finished if the opponent would actually play the move that enforces the draw. That is not the case in your scenario! And the GUI sends exactly the ponder move that your engine provided with its previous "bestmove xxxx ponder yyyy" command. I would understand your trouble if the GUI would send "ponderhit", meaning that it does not terminate the game despite the rep draw, but what it does in your scenario is to abort your ponder search (whatever it did) with a "stop" before telling you the different move of the opponent. And there your engine should be able to have *some* move to reply with as "bestmove", knowing that "stop" only aborts the ponder search.
I don't dispute this. Btw. I have implemented the "bestmove 0000" hack (I think hack is the right word here): https://github.com/jdart1/arasan-chess/ ... 65b29bdcb9.

--Jon
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: UCI protocol issue

Post by Don »

Sven Schüle wrote:
Don wrote:So simply as a matter of preference I would like to see that Komodo is NEVER sent a position to search that has no legal moves
That is certainly o.k. but in the given case the GUI does not do so, it sends the ponder move that your engine provided with its last "bestmove xxxx ponder yyyy" line, and even if "yyyy" leads to a rep draw this is not a situation with no legal moves. When instructed to do a ponder search with "go ponder" you are free to decide on which move to ponder, and if your own proposal "yyyy" that you had previously sent enforces a draw you might prepare for the case that the opponent goes for a win with a different move, and use your resources that way.

Sven
The draw by repetition for UCI as someone says is not something that can be claimed by an engine. Therefore, Komodo relies on the GUI to do it correctly. So we have no choice but to play it out if the GUI doesn't handle it correctly.

By what I stated was a preference - not something I can do anything about because my preference is not supported by the UCI protocol.

Ponder of course as you say is the trickiest case and a minor improvement to Komodo would be to never send ponder moves that were repetition draws.

What it comes down to is that the ponder move really is the engines decision, regardless of how it's implemented or what protocol is used.

The UCI method is a great convenience of the protocol for the engine author, taking some of the burden of dealing with the complexities of managing pondering away from the engine author. But this discussion has made me realize that perhaps the engine should take a little more responsibility for this.

To be brutally honest, both winboard and uci protocols make a huge number of assumptions about how a chess program works. From a purely idealistic point of view that is pretty dicey. Three examples are the assumptions in the protocol (and GUI's use this to do what they do) that ALL programs iterate, all programs produce a "score" as a by-product of how they select a move and all programs work their way down a move list.

But it's easy to imagine someone building a monte carlo searcher that doesn't have some of this "classical" behavior. It could be made to work of course as a hack, but it would really look strange in a modern GUI that is built around all these assumptions.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
hgm
Posts: 27811
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: UCI protocol issue

Post by hgm »

Sven Schüle wrote:Do the UCI specs say anywhere that an engine should not be able to swim through the pacific ocean? :-)
No, but neither do FIDE rules say that a player has to swim through the pacific ocean. FIDE rules do specify you have the option to resign (at any time) or claim a draw (subject to conditions), though. So the default assumption (i.e. unless explicitly contradicted in the specs) for a Chess-engine communication protocol is that there is no 'swim' command, but that there is a 'claim draw' command.
The UCI specs do not provide any means for an engine to claim a draw. That is a fact. If you disagree then show me the rule that specifies the handling of a draw claim.
Well, that is easy, of course:
UCI specs wrote:A nullmove from the Engine to the GUI should be send as 0000.
A position that is a rep draw is not a terminal position where no legal moves exist, that is another fact. So at least "bestmove 0000" must not be sent if its intention would be to state that the engine has no legal moves.
An intention that springs purely from your (and other people's) imagination, and is not even hinted at anywhere in the UCI specs...
The UCI specs, under "move format", specify indeed that an engine can send a "null move" as "0000", but the specs do not mention what that means. So the GUI would be free to take your "bestmove 0000" as a statement that you refuse to move.
A null move is by defenition explicit expression of the refusal to move anything. So I don't think the GUI is 'free' to take it that way anymore than it would be 'free' to take e2e4 meaning you want to move from e2 to e4. It is obliged to take it that way. (How it reacts to that can of course depend on the situation, e.g. whether e2e4 was legal or not.)

If not for claiming a draw or resigning, what other need would there ever be for an engine to send a null move to the GUI? Why is this sentence in the protocol specs at all? Is it because the GUI is expected to send positions where there are no legal moves (i.e. checkmate or stalemate) to the engine, and is it by inference a non-compliancy of the GUI if it does not do that?

Some people here insist even that it would be a GUI bug to send positions where the engine is check/stalemated. What do those people think 0000 should be used for, then?
Since in the given case of Jon we are talking about a ponder search that is stopped after the opponent played a different move, and the engine did not receive "ponderhit" but "stop", we might assume that our "bestmove" reply will be ignored anyway but we cannot be sure since the semantics of "bestmove 0000" is not fully specified,
You cannot even be sure that 'bestmove e2e4' in the startpos would not lead the GUI to forfeit you... Such things are called 'GUI bugs', and IMO engine authors had better not cater to them. If a GUI would not ignore whatever bestmove is given in an aborted ponder search, that GUI is 100% certified broken. By definition the result of a ponder search is what you would do in the hypothetical case that the opponent played that move. If that doesn't materialyze, what you intended to do, even if you make it known to the world, should never have any consequences.
... therefore I would refrain from using it. The logical implication is that the engine should do *some* ponder search at least after "go ponder" so that it has *something* to reply in case of a "stop". Pondering on a different move than the one enforcing the rep draw would be a better use of resources as you already confirmed.
Although this would not violate any specs, I think it is undesirable that the engine would have to ly against the GUI about what it is doing. One obvious disadvantage is that the GUI will not be able to correctly interpret ant pv infos from the ponder search.
No. So let's call it not a "protocol violation" but a "dangerous behaviour" of the engine, as I described above :-)
Never yield to GUI bugs!
I did not mean that "bestmove 0000" is a form of a draw claim. There is no draw claim in UCI.
So you imagine. But according to the specs 'bestmove 0000' means the engine refuses to play a move. (0000 is defined as a null move, and a null move is commonly understood to leave the board unchanged.) Which happens to be within the player's rights according to FIDE rules, in any position.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: UCI protocol issue

Post by Don »

H.G.,

I think you are being unreasonable here. You are saying that we are imagining things and "making things up" on the one hand and then claiming that 0000 IS a draw claim even though there is no hint of that in the UCI specification. Who is making things up?

You are so enamored with your chain of reasoning on this that you declare it to be how it really is - wishful thinking on your part.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
hgm
Posts: 27811
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: UCI protocol issue

Post by hgm »

Don wrote:Komodo doesn't count the number of repetitions but I could easily add that, but I like that I don't have to do that.
In that case this situation causes no ploblems, because the detected repeat could be a second occurrence, in which case the GUI (if it is FIDE compliant) would have to let you play on anyway. So if it inadvertantly sends you a third occurrence, you would automatically do the same (not being aware of it). Whether the GUI actually decides game end is a moot point. Komodo apparently sees the repetition apparently as the best option, and so does the opponent. So they will keep repeating the same loop of moves forever. This is within the (FIDE specified) rights of the engine, and if the user is not happy with it, he can only blame the GUI.

Note, however, that ending games at 3-fold-rep or 50-move is an adjudication, and not enforcing FIDE rules. But adjudications are a highly praised feature of GUIs, and in general it is very useful to make all kind of adjudications not based on FIDE rules. (E.g. KRKR, KBKN...) That is a GUI-design issue, but it should not be dictated by the protocol.

So we have a situation where the GUI should be doing this anyway (a GUI that doesn't know if a draw claim is valid is broken) so why should Komodo have to do it?
I admit there is room for debate here, but my vision of an ideal protocol is one that requires as little as possible from the engine author - he should be free to focus on the things that really matter and not the intricacies of GUI design. The engine should not have to "help" the GUI. So anything that a GUI has to do anyway, the engine should not have to do.
By that same logic it would also be better if the GUI would decide what the engine should move... Most engines would work vastly better if the GUI incorporated the public-domain Ippolit code, and substituted the engine's move for the moves obtained from that. (Only in case they differed, of course...) But it has become kind of a tradition to grant the engine (or the book) monopoly on game-playing decisions. And, like it or not, FIDE rules make the decision to claim part of the game.

I really think your complaints about this should be grouped with those that complain that implementing e.p. capture is such a hassle, and that the GUI should be made responsible for plaing an e.p. capture whenever this is legally possible. Tough luck! If you don't like e.p. capture, program a Shatranj engine...
In a way we each have totally different prospective's that probably cannot be considered fully objective. You are lazy, I am lazy. I want the GUI to do as much for me as possible and you are exactly the opposite. You want the engine to do all the work because you are primarily a GUI maker.

Part of the issue with you is that you are supporting so many games you are almost forced to lean on the engine. I don't think your GUI even understands what moves are legal in some of those games - is that correct? I'm not being critical, I fully understand what a pain it would be to build a fully legal move checking system for so many games. It's not easy even to build a bug-free legal move generator for chess that understand all the dark corners of the rules.
The latter is a more accurate analysis than 'that I am primarily a GUI maker' (which I consider nonsense). It would indeed be an immense waste of effort to have to implement the rules for these games both in the GUI and in the engine, if through a hand-full of very simple protocol commands the GUI can be kept completely general, and serve engines for games it has never even heard of. Have you ever tried to run Daniel Shawul's Nebiyu engine with the WinBoard Alien Edition, to play Ultima, Checkers, Go or Amazons, for example? If you really want to experience first hand how powerful this type of design can be, I can strongly recommend it.

Indeed the GUI is totally unaware of the rules for most games. Nevertheless, when you pick up a piece, all the squares where you can move to will be highlighted by colored dots (the color indicating capture / non-capture / promotion or other specials, like first 'leg' of a multi-move such as castling), because the engine will send the GUI a 'color FEN' in response to the 'lift SQUARE' command, and the GUI will then refuse any move where you do not release the piece on any of the marked squares, and automatically trigger selection of a promotion piece when you drop it on square colored for promotion, or prompt for a second leg if the square was color-coded as first-leg destination, etc. An extremely powerful protocol addition. The GUI really doesn't have to know anything at all about the game, yet has all functionality people expect in dedicated GUIs. (And of course the engine provides the start position and sets the board size at the beginning of every game.)
User avatar
hgm
Posts: 27811
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: UCI protocol issue

Post by hgm »

Don wrote:I think you are being unreasonable here. You are saying that we are imagining things and "making things up" on the one hand
I don't think that in any of the cases I said that it was not an undisputable fact. I constantly have to face all kind of claims in this discussion that come from absolutely nowhere. To name a few:
*) 'bestmove 0000' means there are no legal moves
*) Sending 'bestmove 0000' violates the UCI specs
*) The ponder move given in bestmove-ponder must be the move the engine expects the opponent to play
*) It violates the UCI specs to 'go' the engine in a position where there are no legal moves
*) It is not possible for a UCI engine to refuse playing a move
*) UCI does not provide any tools to claim a draw
and then claiming that 0000 IS a draw claim even though there is no hint of that in the UCI specification. Who is making things up?

You are so enamored with your chain of reasoning on this that you declare it to be how it really is - wishful thinking on your part.
Well, let me make more precise what I am trying to say, so you can point out what exactly I am imagining:

*) Null move means not moving anything
*) UCI specs say the engine can send a null move to the GUI
*) Not moving anything is allowed by the FIDE rules (e.g. resigning)
*) According to FIDE rules, you are not obliged to move in a 3-fold-repeated position, or after 50 reversible moves, and the alternative is not necessarily resign.

So FIDE rules say you have the option not to move, and UCI has a command not to move. Why is it so overwhelmingly important to everyone to insist that these could not possibly be the same thing, but that when the UCI engine refuses to move, it must mean something entirely different from choosing the option granted to it by FIDE rules not to move? Although they cannot come with a consistent explanation for what it then means in stead...

As to inconsistencies:
*) If it is a bug when a GUI presents a mated position to the engine, what is the intended use of the null move mentioned in the UCI specs?
*) If 'bestmove 0000' would exclusively describe a position with no legal moves, why isn't it a problem that this does not uniquely describe if it is checkmate (a loss) or stalemate (a draw)
*) If the above is not a problem, because the GUI can determine that, why is it suddenly a problem when 'bestmove 0000' is used to encode the other cases where FIDE rules allow you to not move, that in some cases it is a loss (resign), and in other cases a draw (rep-claim)? Is this now suddenly too difficult for the GUI to determine whether a draw claim can be made?

I don't think it is unreasonable to demand a satisfactory answer to any of these questions.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: UCI protocol issue

Post by Don »

hgm wrote: Indeed the GUI is totally unaware of the rules for most games. Nevertheless, when you pick up a piece, all the squares where you can move to will be highlighted by colored dots (the color indicating capture / non-capture / promotion or other specials, like first 'leg' of a multi-move such as castling), because the engine will send the GUI a 'color FEN' in response to the 'lift SQUARE' command, and the GUI will then refuse any move where you do not release the piece on any of the marked squares, and automatically trigger selection of a promotion piece when you drop it on square colored for promotion, or prompt for a second leg if the square was color-coded as first-leg destination, etc. An extremely powerful protocol addition. The GUI really doesn't have to know anything at all about the game, yet has all functionality people expect in dedicated GUIs. (And of course the engine provides the start position and sets the board size at the beginning of every game.)
I'm not being critical of this kind of design - I'm just pointing out that it's not appropriate for an auto-testing interface - or least not nearly as simple. Imagine testing 2 Foo playing programs against each other and one of them claims a win even though there is no win. Such a protocol would work of course if you insist that all decisions are passed through both programs. So if white claims a draw, black has to agree, if black claims a win, white must agree. If one side plays a move that is illegal the other side must object and of course in any case where there is disagreement one side is going to get screwed unless it's resolved by a third party. In chess interfaces the GUI is the 3rd party authority in all matters. It will know if there is a problem and can forfeit the non-conforming side.

So I think such a protocol would work fine as long as this agreement protocol was also built into the testing procedure assuming the GUI supports automated testing between programs. When there is a disagreement however you cannot just let it pass, you have to report the conflict and then determine which program is buggy offline. An alternative is to have a 3rd trusted program resolve disputes.

For a single playing program just using the GUI as a canvas, it's not a problem as the two programs together can be considered a single system and it's no different than in the old days where you didn't build just a chess program but had to provide a GUI. This is actually a partial regression to those days when the chess program was everything. I don't want my GUI to just be low level library I might as well just link into the chess program (and not need an external GUI.)

My take on all of this is that GUI programming is really difficult and I don't want to spend my time as an engine developer writing half your GUI for you. It's appropriate when it's an interface designed to support many different games and perhaps only 1 version of the game will every be made, but it's not optimal for chess where there are 4000+ engines out there.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
Don
Posts: 5106
Joined: Tue Apr 29, 2008 4:27 pm

Re: UCI protocol issue

Post by Don »

hgm wrote:
Don wrote:I think you are being unreasonable here. You are saying that we are imagining things and "making things up" on the one hand
I don't think that in any of the cases I said that it was not an undisputable fact. I constantly have to face all kind of claims in this discussion that come from absolutely nowhere. To name a few:
*) 'bestmove 0000' means there are no legal moves
*) Sending 'bestmove 0000' violates the UCI specs
*) The ponder move given in bestmove-ponder must be the move the engine expects the opponent to play
*) It violates the UCI specs to 'go' the engine in a position where there are no legal moves
*) It is not possible for a UCI engine to refuse playing a move
*) UCI does not provide any tools to claim a draw
and then claiming that 0000 IS a draw claim even though there is no hint of that in the UCI specification. Who is making things up?

You are so enamored with your chain of reasoning on this that you declare it to be how it really is - wishful thinking on your part.
Well, let me make more precise what I am trying to say, so you can point out what exactly I am imagining:

*) Null move means not moving anything
*) UCI specs say the engine can send a null move to the GUI
*) Not moving anything is allowed by the FIDE rules (e.g. resigning)
*) According to FIDE rules, you are not obliged to move in a 3-fold-repeated position, or after 50 reversible moves, and the alternative is not necessarily resign.
What point is there giving a list of things that we agree on? Is it supposed to make it seem that you are not imaging other things? If you want to debate this at least be intellectually honest and don't use misdirection tactics and sounding like a politician.

The point you are imagining is not in your above list. It's your assertion that "0000" means resign or "draw claim" depending on context. It's not in the spec but it must be that way because you said it was? Or is because that makes sense to you?

The FIDE business just illustrates what I already said, that UCI is broken. And by the way YOU are the one that claims it is not broken, not me. We entered into this discussion as a result of YOU trying to justify the 0000 in terms that make some sort of sense if you assign your own meaning to them - a meaning that is not in the spec.

So FIDE rules say you have the option not to move, and UCI has a command not to move. Why is it so overwhelmingly important to everyone to insist that these could not possibly be the same thing, but that when the UCI engine refuses to move, it must mean something entirely different from choosing the option granted to it by FIDE rules not to move? Although they cannot come with a consistent explanation for what it then means in stead...
FIDE has no pass rule. You cannot say, "I do not wish to move, it's your turn."

I don't think you realize that what you are doing is imposing your own interpretation of what it must mean. It's not even consistent either, because you say it means "claim draw" in one context and "resign" in another. In other words YOU are making up a new FIDE rule that you are not allowed to resign if the position is a repetition? So you either have to claim the draw, or play on, but only in that one specific case you are not allowed to resign? And you consider this the official meaning of the 0000 stuff but we are making things up?

As to inconsistencies:
*) If it is a bug when a GUI presents a mated position to the engine, what is the intended use of the null move mentioned in the UCI specs?
*) If 'bestmove 0000' would exclusively describe a position with no legal moves, why isn't it a problem that this does not uniquely describe if it is checkmate (a loss) or stalemate (a draw)
*) If the above is not a problem, because the GUI can determine that, why is it suddenly a problem when 'bestmove 0000' is used to encode the other cases where FIDE rules allow you to not move, that in some cases it is a loss (resign), and in other cases a draw (rep-claim)? Is this now suddenly too difficult for the GUI to determine whether a draw claim can be made?

I don't think it is unreasonable to demand a satisfactory answer to any of these questions.
The 0000 is completely illogical. First of all you MUST play a move by the FIDE rules to claim a draw. You don't just say, "my move is to claim a draw" but you say, "with this move I claim a draw by repetition."

Secondly, you don't ask a player what his move is in a position where there is no legal move. I argue that it's a bug in the GUI to do that and that there should not be a provision to respond to a bug in such an ill-defined way in the protocol either. I'm ok if the spec says you respond to a position with no legal moves with 0000 - but the UCI doesn't say that. Yes, I agree that is the most reasonable response and that is what Komodo does.
Capital punishment would be more effective as a preventive measure if it were administered prior to the crime.
User avatar
hgm
Posts: 27811
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: UCI protocol issue

Post by hgm »

Don wrote:What point is there giving a list of things that we agree on?
Well, for the record: this was a list I pertinently do not agree on, where each item can be disproven by hard fact of the UCI specs...
FIDE has no pass rule. You cannot say, "I do not wish to move, it's your turn."


No, but it has a rule that you can say "I do not wish to move, so we are done with this game". This is also known as "resigning" (in most contexts).
I don't think you realize that what you are doing is imposing your own interpretation of what it must mean. It's not even consistent either, because you say it means "claim draw" in one context and "resign" in another.
So why is this suddenly a problem? Others are claiming 'bestmove 0000' means there are no legal moves, i.e. checkmate in one context, stalemate in another. (I am not sure what is your position in this.) And doesn't 'bestmove e2e4' mean move e2 to e4 in one context, and forfeit by illegal move in another? (Or have you abandoned the idea that a GUI should do legality checking, and always accept the move?) Doesn't the meaning of a move always depend on the context?

But I still think you don't address the real point. Which is this:

The UCI specs very explicitly state that the engine can send a null move to the GUI. So there must be a situation where the engine should do this. I am merely making a best effort to deduce what this situation could be, in all likelihood. Which is a tad different from just dreaming up things out of nothing, as others so frequently do here.

You are very vocal in criticizing me, but what is your 'best effort' to assign a use to the 0000 the specs mention? That the author of the specs had no idea what he was talking about, and just felt like adding some 'glitches' to the specs to confuse people? That doesn't sound very credible to me...
In other words YOU are making up a new FIDE rule that you are not allowed to resign if the position is a repetition? So you either have to claim the draw, or play on, but only in that one specific case you are not allowed to resign?
This is actually the first argument you give that 'carves any wood'. But only very little. Who would want to resign if you can claim a draw?

I am not making up any new FIDE rules, by the way. The worst you can accuse me of is that in the interpretation of the UCI specs I consider most likely, the protocol is still not perfect, unable to handle the contrived situation that you would want to resign in a position where you can claim a draw. OK, so resign one move earlier, then! This is not nearly as defective implementation of FIDE rules as your version, where the draw is automatic. Because then you can also not resign in that same drawn position. And you cannot play on there. And you cannot resign one move earlier... That UCI sucks is of course no news, but why would you want it to suck that much?

And you consider this the official meaning of the 0000 stuff but we are making things up?
The 0000 is completely illogical. First of all you MUST play a move by the FIDE rules to claim a draw. You don't just say, "my move is to claim a draw" but you say, "with this move I claim a draw by repetition."
No, no, no! You only have to say that if the repetition occurs AFTER the move. If the opponent plays a move that brings on a repetition, and does not claim by himself, then you can indeed claim without moving.

I admit that there still is a difficulty in claiming repetitions after the move. Even WB protocol sucks in that respect, and must make use of a kludge. (Making it impossible to OFFER a draw in a position where you can claim one. Believe it or not, I actually received a complained about an engine that intended to claim a draw under WinBoard, but in stead was given a draw by agreement, because the opponent had a draw offer pending. He did not want to take that offer, but in stead play a move to claim a draw...)
Secondly, you don't ask a player what his move is in a position where there is no legal move.
You ask him to concede that he has lost... When I play Chess I always make the mating move on the board and press the clock (although I am aware that the latter is not strictly needed, but just so that there is no discussion possible afterwards if I made the move within time or not). I do not stand up from the board, go to the TD, and whisper in his ear "I can mate my opponent, please write 1-0 on the match form". I don't know how you do it...
I argue that it's a bug in the GUI to do that and that there should not be a provision to respond to a bug in such an ill-defined way in the protocol either. I'm ok if the spec says you respond to a position with no legal moves with 0000 - but the UCI doesn't say that.
OK, great. Because we actually do agree on that. There is no point in sending the engine such a position. (But beware, the engine could bring it upon itself to request that move as a ponder move! I would also consider it extremely bad form when the GUI would overrule the ponder move specified by the engine, and substitute its own.)

But given your stance on this, that UCI forbids sending of such positions to the GUI for the purpose of actually moving (as opposed to pondering)... What is the intended function of the 0000 the engine could send to the GUI, according to the specs? In what situation that is not forbidden by the protocol must the engine send this?

You don't consider it strange that the protocol specs make an explicit provision for what the engine must do in a situation that according to the same specs would be (implicitly) forbidden to create? That is what I would call 'inconsistent'. Consistent is to either assume that the GUI should send such positions to the engine (to inform it of the way the game ended, so the engine can use it as a trigger for its learn code, say), and the engine should respond with 'bestmove 0000' (which I agree the specs do not say), or to assume that it is indeed a GUI bug, and the 0000 must be intended for something else (but what?).
User avatar
hgm
Posts: 27811
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: UCI protocol issue

Post by hgm »

Don wrote:I'm not being critical of this kind of design - I'm just pointing out that it's not appropriate for an auto-testing interface - or least not nearly as simple. Imagine testing 2 Foo playing programs against each other and one of them claims a win even though there is no win. Such a protocol would work of course if you insist that all decisions are passed through both programs. So if white claims a draw, black has to agree, if black claims a win, white must agree. If one side plays a move that is illegal the other side must object and of course in any case where there is disagreement one side is going to get screwed unless it's resolved by a third party. In chess interfaces the GUI is the 3rd party authority in all matters. It will know if there is a problem and can forfeit the non-conforming side.
This is completely true, and we (Daniel and I) are aware of it. The design we would really prefer is one that uses a trusted engine as referee (it would be in force mode, never searching but only doing legality checking on the moves fed to it), while non-trusted engines play against each other. But then you are talking already about a later stage of the evolution of software for that game. It always starts with having only a single engine. And you will always have a problem to make the first implementation of the move generator flawless. Testing an engine against itself doesn't make you catch move-generator bugs. Perft could do that. Hard to make GUIs do a perft. Much easier to make an engine do a perft... (But still not much use if no perft numbers are known from another source!) If the one writing the first engine would also have to adapt the legality checking of the GUI he uses, he is likely to use the same code, and make the same errors in both. And even if they are independent, there is no guarantee that in case of a disagreeent the GUI is right. Life is always hard if you start with nothing. (That is why I like it!)
So I think such a protocol would work fine as long as this agreement protocol was also built into the testing procedure assuming the GUI supports automated testing between programs. When there is a disagreement however you cannot just let it pass, you have to report the conflict and then determine which program is buggy offline. An alternative is to have a 3rd trusted program resolve disputes.
In practice the problem is not that bad. When playing two untrusted engines with an ignorant GUI, you will get some disagreements. They are usually well indicated in the PGN file (e.g. "forfeit by illegal move"). You just select the games that have such a result comment, and look at a few by hand to see who was right, and what misconceptions the side that is wrong seems to have.
For a single playing program just using the GUI as a canvas, it's not a problem as the two programs together can be considered a single system and it's no different than in the old days where you didn't build just a chess program but had to provide a GUI. This is actually a partial regression to those days when the chess program was everything. I don't want my GUI to just be low level library I might as well just link into the chess program (and not need an external GUI.)

My take on all of this is that GUI programming is really difficult and I don't want to spend my time as an engine developer writing half your GUI for you. It's appropriate when it's an interface designed to support many different games and perhaps only 1 version of the game will every be made, but it's not optimal for chess where there are 4000+ engines out there.
The approach with a referee engine would also work very well, when so many engines are available. I think it is more a matter of the number of independent implementations than whether these implementations are GUIs or engines. I don't expect that it would be any problem in practice to test an untrusted angine against a trusted one, in a two-party ignorant GUI. You get a number of disagreements, and they are all the untrusted engine's fault. Unless your engines are extremely crappy, the number of disagreements should be rather small, and it should be very easy to be the 3rd-party referee yourself. Eventually you would have to look at the disagreements anyway, because an automated 3d-party referee won't fix the engine bugs that cause them for you!

There is no reason why a GUI would be intrinsically more trustworthy than an engine. Some engines have many more users than some GUIs... The only requirement is that the GUI would flag all disagreements. Which is a completely game-rule-independent task.