UCI2WB 4.0

Discussion of chess software programming and technical issues.

Moderators: bob, hgm, Harvey Williamson

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
User avatar
Posts: 24020
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller

Re: UCI2WB 4.0

Post by hgm » Wed Jun 19, 2019 10:51 am

The last two lines should give a very strong clue. To prevent recursive calling of GameEnd I have it set a variable 'endingGame' when we enter it, to be cleared again when it returns. But before doing that it checks the variable, and immediately returns without doing anything if it is set. If ExitEvent() finds the variable set, it prints this message. So apparently GameEnds() somehow managed to return control to the event loop (keeping WinBoard responsive) without completing its task, and without clearing the 'endingGame' variable.

Can you also find what the debug file contains around the time that the engine submitted the last move? This should help in establishing which part of GameEnds() was still executed, as there are several other places in it where it writes something to the debug file. E.g.

Code: Select all

GameEnds(28, XBoard adjudication: repetition draw, 4)
72338 >first : result 1/2-1/2 {XBoard adjudication: repetition draw}
72339 >second: result 1/2-1/2 {XBoard adjudication: repetition draw}
72339 >first : quit
72339 >second: quit
results = '*'
Stockfish V\012Stockfish V\012Stockfish V\012Stockfish V\012
pick next game from '=': 1
Reserved, next=1, nr=0
cycle=0, round=0, pairing=0 curGame=2

Posts: 4189
Joined: Sun Aug 10, 2008 1:15 pm
Location: Philippines

Re: UCI2WB 4.0

Post by Ferdy » Wed Jun 19, 2019 11:54 am

Game 12, the game that ends in mate, where the last move was not executed on the board.
404343 <second: # engine said: info depth 12 seldepth 22 multipv 1 score mate 1 time 471 nodes 409355 nps 869118 pv h4d8
404343 <second: 12 100001 47 409355 h4d8
404343 <second: # engine said: info time 472 nodes 409355
404343 <second: # engine said: bestmove h4d8
404343 <second: move h4d8
404343 >first : usermove 404343 >first : h4d8
GameEnds(26, Xboard adjudication: Checkmate, 4)
GE(26, Xboard adjudication: Checkmate, 4) bare king k=8 color=0
404345 >first : result 1-0 {Xboard adjudication: Checkmate}
404345 >second: result 1-0 {Xboard adjudication: Checkmate}
404345 >first : quit
404345 >second: quit
results = '+--=+-=-+*******'
Deuterium_v2019.2.37.52\012Deuterium_v2019.1.36.50_x64_pop\012RubiChess-1.4\012amoeba_3.0_win64\012Bobcat 8.0 x64\012Wasp360-x64\012Vajolet2_2.7\012TogaII401_intelPGO_x64\012texel_1.07_x64_pop\012SmarThink_v198_x64_SSE3_standalone\012senpai_20-x64\012arasanx-64-popcnt\012baron344-x64\012Rodent III 0.277\012rofChade\012
pick next game from '+--=+-=-+**+****': 16

User avatar
Posts: 24020
Joined: Fri Mar 10, 2006 9:06 am
Location: Amsterdam
Full name: H G Muller

Re: UCI2WB 4.0

Post by hgm » Wed Jun 19, 2019 1:44 pm

That surely localizes the trouble, but I still cannot say I understand it. ReserveGame(), called from GameEnd() ends as follows:

Code: Select all

    if(appData.debugMode) fprintf(debugFP, "pick next game from '%s': %d\n", q, nextGame);
    if(nextGame <= appData.matchGames && resChar != ' ' && !abortMatch) { // reserve next game if tourney not yet done
	if(q[nextGame] == NULLCHAR) q[nextGame+1] = NULLCHAR; // append one char
	q[nextGame] = '*';
    fseek(tf, -(strlen(p)+4), SEEK_END);
    c = fgetc(tf);
    if(c != '"') // depending on DOS or Unix line endings we can be one off
	 fseek(tf, -(strlen(p)+2), SEEK_END);
    else fseek(tf, -(strlen(p)+3), SEEK_END);
    fprintf(tf, "%s\"\n", q); fclose(tf); // update, and flush by closing
    DisplayMessage(buf, "");
    free(p); appData.results = q;
    if(nextGame <= appData.matchGames && resChar != ' ' && !abortMatch &&
       (gameNr < 0 || nextGame / appData.defaultMatchGames != gameNr / appData.defaultMatchGames)) {
      int round = appData.defaultMatchGames * appData.tourneyType;
      if(gameNr < 0 || appData.tourneyType < 1 ||  // gauntlet engine can always stay loaded as first engine
	 appData.tourneyType > 1 && nextGame/round != gameNr/round) // in multi-gauntlet change only after round
	UnloadEngine(&first);  // next game belongs to other pairing;
	UnloadEngine(&second); // already unload the engines, so TwoMachinesEvent will load new ones.
    if(appData.debugMode) fprintf(debugFP, "Reserved, next=%d, nr=%d\n", nextGame, gameNr);

We see the printf at the start of that section, with correct values (it changed the result of game 12 from * to +, and picked game 16 to do next, which is just beyond the end of the -result string. Then it does some seeking and writing on the tourneyfile 'tf' (which I know to be succesfully opened), restores the message field (DisplayMessage) with the message it contained previously (which was saved in 'buf'), overwriting the "pick new game" which it displayed earlier (and indeed is not in the message field, which still contains Thinking Output). So it seems up to that point everything worked.

The final printf, "Reserved ..." never appears, though. (Or the DisplayMessage("pick next game") was even never executed; this is always tricky, because much stuff happens asynchronously, just putting a draw or expose event in the event queue, and then only executing that after the current code (still handling the consequences of the engine move) has returned. So what is in the display is not as reliable an indicator as what is in the debug file.) So it must somehow have gone astray in the "if(nextGame..." clause. Which I suppose should indeed be executed, as game 16 and game 12 will not have the same engines in a multi-gauntlet with 2 games per pairing.

So UnloadEngine should have been called. And this also contains a debug print (which never appeared!):

Code: Select all

UnloadEngine (ChessProgramState *cps)
	/* Kill off first chess program */
	if (cps->isr != NULL)
	cps->isr = NULL;

	if (cps->pr != NoProc) {
            DoSleep( appData.delayBeforeQuit );
	    SendToProgram("quit\n", cps);
	    DestroyChildProcess(cps->pr, 4*!cps->isUCI + cps->useSigterm);
	cps->pr = NoProc;
	if(appData.debugMode) fprintf(debugFP, "Unload %s\n", cps->which);
Now this code can potentially stall at the DoSleep (if -delayBeforeQuit has a strange value, but it should be configured to 0, in which case it is a no-op), which is just a platform-independent wrapper for the Win API Sleep() call. The DestroyChildProcess() should just close the pipes to the engine here (as these are UCI engines, and WB will not try to kill the adapter after some timeout). In fact it should do nothing here, as UCI2WB specifies feature reuse=0, which would already have caused GameEnd() to terminate the engines (and set their cps->pr field to NoProc and cps->isr to NULL) before it called ReserveGame(). So this routine should be a complete no-op, except for the debug printing, which does not appear!

The point is that even when it would stall in this code, or in its caller ReserveGame(), I don't see how it could ever remain responsive and handle the ExitEvent() later. WinBoard is not really a multi-threaded program; there is just one thread monitoring the event queue (blocking while that is empty), and executing the various event handlers (which might seem indpendent processes) one after the other. So if one event handler gets stuck, it will never return to monitoring the event queue for handling later events. So it is like it mysteriously managed to skip several printfs, and return to the caller of GameEnd() without clearing endingGame. This is of course in theory possible through corruption of the stack, which stores return addresses of the function, and if a number of those addresses gets popped of the stack, or an earlier return address copied to the top, a return from a function could skip several levels of callers, and return to the main program directly. This is at best extremely unlikely, and I just don't see how that could happen here at all.

Post Reply