Winboard/Xboard patch request

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

jhaglund
Posts: 173
Joined: Sun May 11, 2008 7:43 am

Re: Winboard/Xboard patch request

Post by jhaglund »

I have now uploaded a version that prints the raw error number together with the message (same link). I could not test it because I could not think of an easy way togenerate errors. Could you try that for me? It should still give the same problem, but I would want to know what exactly that problem is.

The solution I am thinking of now is something like you propsed, except in a loop, so that it tries as often as needed. I don't want it to miss saving a file ever. But I don'twant it to get stuck in an infinite loop either when there is a genuine reason why the save does not work (like an invalid path name). So I want to loop back only when the error number for the failure that is given is equal to this mysterious case.
I uploaded a version now that does upto 2 retries on a fail with errno == 13. This should alleviate the problem, although there is no guarantee it will always work. I guess it would be better to "stat" the file or diretory, to check what the permissions are, and if they are such that the operation inprinciple should suceed, keep trying as long as needed.
Testing shows two errors:

---------------------------------
Can't open "C:\chess.pgn":
The data is invalid.

[13]
---------------------------------

The other is not a pop_up, but rather a fail-soft exit, and it's just like it exits the /mg loop and closes the GUI as if it were done. I cannot tell if it saves the game before these exits occur or pop_ups happen currently.

Does the GUI exit after 2 failed attempts? Does the attempt counter reset after each game played or each game successfully saved?
---------------------------------

What about dumping the pgn to an errno.pgn file so the game isn't lost?
What about storing all the games in memory, and then dumping them all when /mg is done to a file rather individually?
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Winboard/Xboard patch request

Post by hgm »

Well, this was how I made the codein that last version

Code: Select all

/* Save the current game to the given file */
int
SaveGameToFile(filename, append)
     char *filename;
     int append;
{
    FILE *f;
    char buf[MSG_SIZ];
    int result;

    if (strcmp(filename, "-") == 0) {
	return SaveGame(stdout, 0, NULL);
    } else {
	f = fopen(filename, append ? "a" : "w");
	if(!f && errno == 13) { DoSleep(1); f = fopen(filename, append ? "a" : "w"); }
	if(!f && errno == 13) { DoSleep(1); f = fopen(filename, append ? "a" : "w"); }
	if (f == NULL) {
	    snprintf(buf, sizeof(buf), _("Can't open \"%s\""), filename);
	    DisplayError(buf, errno);
	    return FALSE;
	} else {
	    safeStrCpy(buf, lastMsg, MSG_SIZ);
	    DisplayMessage(_("Waiting for access to save file"), "");
	    flock(fileno(f), LOCK_EX); // [HGM] lock: lock file while we are writing
	    DisplayMessage(_("Saving game"), "");
	    if(lseek(fileno(f), 0, SEEK_END) == -1) DisplayError("Bad Seek", errno);     // better safe than sorry...
	    result = SaveGame(f, 0, NULL);
	    DisplayMessage(buf, "");
	    return result;
	}
    }
}
Apparently two retries is not enough to makethe problem go away, or it was a bad idea to wait for a fixed time. I am not completely sure what DoSleep(1) means anyway whether that would make it sleep 1 sec or 1 msec. If it is 1 msec, it might just be too short, and retries this fast could be completely futile. I am pretty sure that in XBoard it would mean 1 sec, though.

I don't understand what could trigger an exit here. The code I added certainly should not do that. Was this the standard Vista system popup "winboard.exe is no longer working" you always get when a program is crashing,or was it WinBoard-generated?

Ialsodon't gethow there could be square brackets around the errno ("[13]")in the error popup; the code change I made there (in winboard.c) was:

Code: Select all

VOID
DisplayError(char *str, int error)
{
  char buf[MSG_SIZ*2], buf2[MSG_SIZ];
  int len;

  if (error == 0) {
    safeStrCpy(buf, str, sizeof(buf)/sizeof(buf[0]) );
  } else {
    len = FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM,
			NULL, error, LANG_NEUTRAL,
			(LPSTR) buf2, MSG_SIZ, NULL);
    if (len > 0) {
      snprintf(buf, 2*MSG_SIZ, "%s:\n%s(%d)", str, buf2, error);
    } else {
      ErrorMap *em = errmap;
      while (em->err != 0 && em->err != error) em++;
      if (em->err != 0) {
	snprintf(buf, 2*MSG_SIZ, "%s:\n%s(%d)", str, em->msg, error);
      } else {
	snprintf(buf, 2*MSG_SIZ, "%s:\nError code %d", str, error);
      }
    }
  }
  
  ErrorPopUp(_("Error"), buf);
}
i.e. parentheses around it (in the snprintf after if(len > 0)).
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Winboard/Xboard patch request

Post by hgm »

I checked it out, and in WinBoard DoSleep(n) is #defined to Sleep(n), which definitely interprets n as msec. That makes it extra tricky to write something here that works both in WinBoard and XBoard, because for XBoard it is #defined as sleep(n), which uses n as seconds.

This causes a difference in the interpretation of the command-line options -delayBeforeQuit and -delayAfterQuit between WinBoard and XBoard. Which is very undesirable, but unfortunately also what people currently are used to.

Best solution would be to take out these #defines completely, and declare a wrapper routine DoSleep in the respective front-ends. In winboard.c this would then simply be Sleep, which is a msec sleep system call offered by Windows. In xboard.c there is a routine FrameDelay which implements a msec sleep, through either the usleep system call (if the configure says it is available) or through timer interrupts and signals (when usleep is not available). This would then redefine the meaning of the -delayBefore/AfterQuit options as msec in XBoard, but that seems acceptable.

We could then always count on DoSleep to be msec, and use it in a random-wait before retry.
jhaglund
Posts: 173
Joined: Sun May 11, 2008 7:43 am

Re: Winboard/Xboard patch request

Post by jhaglund »

I checked it out, and in WinBoard DoSleep(n) is #defined to Sleep(n), which definitely interprets n as msec.
Yes, this is correct. Sorry for a late reply.
We could then always count on DoSleep to be msec, and use it in a random-wait before retry.
I am trying your code with my random waits of about 1000 milliseconds.
Ialsodon't gethow there could be square brackets around the errno ("[13]")in the error popup;
That was just typed from my memory, I just remember something on each side of 13, I was just acknowledging there was a 13 included...
I don't understand what could trigger an exit here. The code I added certainly should not do that. Was this the standard Vista system popup "winboard.exe is no longer working" you always get when a program is crashing,or was it WinBoard-generated?
It doesn't display anything it just closes as if were done... but wasn't suppose to.

Thanks again though. It is moving in the right direction because I have completed some small 1k game tests (5x200) 600 games played fine, but I lost 45 games, 955/1000.

Testing code... more later...
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Winboard/Xboard patch request

Post by hgm »

Hmm, the closing is very weird.

Note that the code you originally posted here (I don't know if that is the code you actually used) is seriously buggy. The closing brace of the for loop is probably not where you intended it to be, so there never are any retries of fopen, just a lot of assignments to min and max. And what is worse, max is used in the stop-test of the for loop while it is still uninitialized. As a result the for loop could be not executedat all, with as a result that also range remains uninitialized, so that it might sleep for a million seconds effectively hanging WinBoard....

I agree,though, that retrying after a random wait still seems the appropriate solution in view of our current knowledge.

In order to know what a good choice for the range of waiting times is, I still would lile to have a better idea what itis waiting for. Is the file ust locked for a short transient period while the system is opening it, or does itreally wait until WinBoard has written the complete PGN on it, and closed it again? To shed some more light on it, could you run the program I gave in the 5th post of this thread (you probably haveto change the filename), and see whether, while you run it (before typing something that closes it), and it has the mentioned file open for append, WinBoard is capable to save a game on that same file, on your system?

If this is the case (so that we are apparently dealing with a short locked period during the act of opening itself), I would like to have an idea how longthis blackout period approximately lasts. Could you keep statistics on that? I.e. when you succeed opening the file only after retries, write the total time that was waited before the fopen succeeded to the filewith the game (e.g. as a phoney PGN tag, [Delay "3 tries, 23msec"])? Then we could later check the PGN to see what would be the best waiting time.
ernest
Posts: 2041
Joined: Wed Mar 08, 2006 8:30 pm

Re: Winboard/Xboard patch request

Post by ernest »

hgm wrote:Please let me know if this makes the problem go away.
http://hgm.nubati.net/WinBoard-TM.zip
Hi HGM,
I am a bit lost in your versions.
This is 4.5.2011.906
Previously I had 4.5.2011.506
Is there a link for an overview of all these versions?
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Winboard/Xboard patch request

Post by hgm »

The numbers are the dates I compiled them: 906 = September 6. That does not always uniquely identify them, because sometimes I fix a bug the same day, and compile & upload the fixed version. But if two versions with the same number are really the same usually follows from the file size. Sometimes I forget to update the number as well, although I almost always do it for versions with a new feature that I announce here.

I can't say it is an exact science... :oops:
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Winboard/Xboard patch request

Post by hgm »

My current version has this:

Code: Select all

int
SaveGameToFile(filename, append)
     char *filename;
     int append;
{
    FILE *f;
    char buf[MSG_SIZ];
    int result, i, t,tot=0;

    if (strcmp(filename, "-") == 0) {
	return SaveGame(stdout, 0, NULL);
    } else {
	for&#40;i=0; i<10; i++) &#123; // upto 10 tries
	     f = fopen&#40;filename, append ? "a" &#58; "w");
	     if&#40;f && i&#41; fprintf&#40;f, "&#91;Delay \"%d retries, %d msec\"&#93;\n",i,tot&#41;;
	     if&#40;f || errno != 13&#41; break;
	     DoSleep&#40;t = 5 + random&#40;)%11&#41;; // wait 5-15 msec
	     tot += t;
	&#125;
	if &#40;f == NULL&#41; &#123;
	    snprintf&#40;buf, sizeof&#40;buf&#41;, _("Can't open \"%s\""), filename&#41;;
	    DisplayError&#40;buf, errno&#41;;
	    return FALSE;
	&#125; else &#123;
	    safeStrCpy&#40;buf, lastMsg, MSG_SIZ&#41;;
	    DisplayMessage&#40;_("Waiting for access to save file"), "");
	    flock&#40;fileno&#40;f&#41;, LOCK_EX&#41;; // &#91;HGM&#93; lock&#58; lock file while we are writing
	    DisplayMessage&#40;_("Saving game"), "");
	    if&#40;lseek&#40;fileno&#40;f&#41;, 0, SEEK_END&#41; == -1&#41; DisplayError&#40;"Bad Seek", errno&#41;;     // better safe than sorry...
	    result = SaveGame&#40;f, 0, NULL&#41;;
	    DisplayMessage&#40;buf, "");
	    return result;
	&#125;
    &#125;
&#125;
jhaglund
Posts: 173
Joined: Sun May 11, 2008 7:43 am

Re: Winboard/Xboard patch request

Post by jhaglund »

Getting to this took too long. Your latest code looks good. Tests good.

The average appearance was somewhere in 3-8 msec, so I would think it would be "lag" = hdd_seek_time + multiple attempts of 10 should work given that the odds of 10 attempts would equal atleast one full hdd_seek_time ~10 msec, which it seems to do. Good.

Testing of 125 games per minute (1250 games in 10 minutes) worked without errors, since I had the chance to use a "newer" computer this weekend.

The only concern is on "slower" I/O systems and performance I get this error:

This is what the weird error ended up being... (pop_up exit was turned off)

Code: Select all

 snprintf&#40;buf, MSG_SIZ, _("Error&#58; %s chess program (%s&#41; exited unexpectedly"), _&#40;cps->which&#41;, cps->program&#41;; 
Could this be from the same kind of problem as opening a file to save a game as calling a fcp or scp after each game, causing an exit? Calling the same program 50+ different times for example.

Would adding the same type of solution to calling fcp & scp resolve this for computers less than quad core...

Thanks,

Joshua D. Haglund
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Winboard/Xboard patch request

Post by hgm »

This error does not occur when you start up the engine, but when reading from the program through the pipe returns EOF (i.e., zero characters read). It really means what it says: that the engine must have died (or otherwise closed its stdout, but engines never do that). The engine was successfully started,or you would have seen "Starup failure on %s" instead.

So I don't think this can be solved from WinBoard. It is a problem in the engine. It could be that an engine trying to open its ini or log file runs into the same problem that WinBoard ran into when it tried to open the PGN, and croaks because of that.

If this satisfactory solves it for the PGN file, I guess I will have to implement a similar solution for the tourney file, and perhaps the -loadGameFile. Perhaps I should just replace calls to fopen anywhere by calls to Robust_fopen, and let that do the retries.