Any WinBoard bugs I missed?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Any WinBoard bugs I missed?

Post by hgm »

I don't do an explicit fork(), but I suppose that system() does that for me. I don't know if it does it in a smart way, to avoid the zombie problem. We cannot afford to wait(), because that would be blocking on a command that might take a long time to complete. In principle you could fork() & wait(), let the child fork() a grand-child and exit() immediately, to wake up the waiting parent, and have the child die gracefully. The grand-child is then orphaned, and inherited by process 1, which is performing a wait() all the time, and thus liberates the grand-child out of the zombie state.

Killing it would be no solution: zombie processes are already dead.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Thread madness

Post by sje »

Use a separate thread which does two things:

1. It polls for a command from its calling thread to execute a system() call.
2. It polls its child processes with wait() so that there are no zombies.

This will help keep the process table clean.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: Any WinBoard bugs I missed?

Post by mvk »

Perhaps this is helpful:
Wikipedia wrote:On modern UNIX-like systems (that comply with SUSv3 specification in this respect), the following special case applies: if the parent explicitly ignores SIGCHLD by setting its handler to SIG_IGN (rather than simply ignoring the signal by default) or has the SA_NOCLDWAIT flag set, all child exit status information will be discarded and no zombie processes will be left.
[Account deleted]
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Any WinBoard bugs I missed?

Post by bob »

hgm wrote:I don't do an explicit fork(), but I suppose that system() does that for me. I don't know if it does it in a smart way, to avoid the zombie problem. We cannot afford to wait(), because that would be blocking on a command that might take a long time to complete. In principle you could fork() & wait(), let the child fork() a grand-child and exit() immediately, to wake up the waiting parent, and have the child die gracefully. The grand-child is then orphaned, and inherited by process 1, which is performing a wait() all the time, and thus liberates the grand-child out of the zombie state.

Killing it would be no solution: zombie processes are already dead.
System() should not do anything other than go off and execute the command. Last time I looked (and I use system fairly frequently) it also blocks SIGCHLD and then waits for the process to terminate. That dumps the exit status and no zombie is left (if you can't catch SIGCHLD, obviously you don't care about the exit status). I just ran a trivial test on my macbook and it works just as I had remembered. This should not be causing you any problems, at least for unix/linux/osx/etc. No clue about windows. Your scheme would work to eliminate the latency, but I don't think it is needed.

How are you avoiding blocking on system("aplay ...") since it certainly blocks for me. Perhaps a "&" on the end? that might produce a zombie perhaps..
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Any WinBoard bugs I missed?

Post by hgm »

Ah yes, you are right. XBoard does put a & at the end of the command.

But that means it follows exactly the strategy I outlined above: system() forks of a child, which does an execv on the shell, the shell forks off a grand-child doing an execv on aplay, and does not wait() for it because of the &. So it exits immediately (because it was requested through arguments to do only one command), and orphans the aplay process.

There can be no zombies this way. If aplay processes remain, it must be because aply is hanging. My guess it that at some point moves come in so fast that several aplays are overlapping (especially 'gong' is a long sound!) and try to access the sound harware at the same time, and that it somehow cannot handle that and gets stuck.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Any WinBoard bugs I missed?

Post by bob »

hgm wrote:I am preparing a new WinBoard/XBoard release (4.8.0), but I have been busy with other stuff for the past 6 months, and I am not sure if I have solved all issues that have come up since then.

If you noticed a bug these past months, could you remind me of it (even if you already reported it)?

I have noticed a quirk that is perhaps intentional, but it is confusing.

When I get output like this from Crafty:

Code: Select all

         25->  18.72/3:36     3.62   31. Qxd5+ Bd6 32. Re5 Qb6 33. b4 Qc6 34.
                                     Nc5+ Kc7 35. Qf7+ Kb8 36. Nd7+ Qxd7 37.
                                     Qxd7 Bxe5 38. Qxh7 Rd8 39. g3 f4 40. gxf4
                                     Rh8 41. Qf7 Rxh6 42. fxe5 Rxh4 43. e6 Rg4+
                                     44. Kf1 Rxb4 45. e7 Rd1+ 46. Kg2
         26    39.42/3:36       ++   31. Qxd5+! (>+3.78)                  
         26    56.24/3:36       ++   31. Qxd5+! (>+3.94)                  
         26     1:43/3:36     3.93   31. Qxd5+ Bd6 32. Re5 g5 33. Bxg5 Rg6 34.
                                     Bd2 Rexe6 35. Bxb4 Rxe5 36. Qf7+ Kc6 37.
                                     Qxh7 Rge6 38. Bxd6 Rxd6 39. Qh8 Kd5 40.
                                     Qa8+ Ke6 41. Qg8+ Kf6 42. Qg5+ Ke6 43.
                                     h5 Kf7 44. h6 Rg6 45. Qh5 Re1+ 46. Kh2
                                     Kf6
xboard reorders that in the engine output window to use descending scores. Which means the fail high on > 3.94 appears above the actual 3.93 PV I get back. The above is the order the stuff appears in my log, but in the engine output it looks like this:

Code: Select all

         26    56.24/3:36       ++   31. Qxd5+! (>+3.94)                  
         26     1:43/3:36     3.93   31. Qxd5+ Bd6 32. Re5 g5 33. Bxg5 Rg6 34.
                                     Bd2 Rexe6 35. Bxb4 Rxe5 36. Qf7+ Kc6 37.
                                     Qxh7 Rge6 38. Bxd6 Rxd6 39. Qh8 Kd5 40.
                                     Qa8+ Ke6 41. Qg8+ Kf6 42. Qg5+ Ke6 43.
                                     h5 Kf7 44. h6 Rg6 45. Qh5 Re1+ 46. Kh2
                                     Kf6
         26    39.42/3:36       ++   31. Qxd5+! (>+3.78)             
         25->  18.72/3:36     3.62   31. Qxd5+ Bd6 32. Re5 Qb6 33. b4 Qc6 34.
                                     Nc5+ Kc7 35. Qf7+ Kb8 36. Nd7+ Qxd7 37.
                                     Qxd7 Bxe5 38. Qxh7 Rd8 39. g3 f4 40. gxf4
                                     Rh8 41. Qf7 Rxh6 42. fxe5 Rxh4 43. e6 Rg4+
                                     44. Kf1 Rxb4 45. e7 Rd1+ 46. Kg2
Which is to me a bit confusing...
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Any WinBoard bugs I missed?

Post by hgm »

This sorting is indeed intentional, because it gives better display in case of multi-PV output. My original reasoning was that it would not hurt in single-PV mode, because you only get new PVs when the old one is overturned by one with a higher score.

I had not counted on engines printing fail-low or fail-high lines, though. Fail low is obviously a problem, because after the upper bound in the failed search it usually finds an exact score which is lower. Fail highs are usually not a problem, because after the lower limit it normally finds an exact score that is higher. In your example there must be a case of search instability, also producing a lower score.

I did address this problem in the upcoming 4.8 release, though, which should greatly improve the situation: it would now never swap the order of lines that start with the same move. Instead it would use the score of the latest such line to correct the score of the previous lines (of the same depth) with that move, assuming that these must have been failed searches.

Furthermore the engine can explicitly indicate a search failed by printing a ! or ? as last character of the PV.
xmas79
Posts: 286
Joined: Mon Jun 03, 2013 7:05 pm
Location: Italy

Re: Any WinBoard bugs I missed?

Post by xmas79 »

I did address this problem in the upcoming 4.8 release, though, which should greatly improve the situation: it would now never swap the order of lines that start with the same move. Instead it would use the score of the latest such line to correct the score of the previous lines (of the same depth) with that move, assuming that these must have been failed searches.
Ah very good, I will definitely put ! or ? at the end since this problem is annoying me for about one year! I assume that the output (by explictly printing ! or ?) will now exactly reflect the output of the engine over time. Right?
JoshPettus
Posts: 730
Joined: Fri Oct 19, 2012 2:23 am

Re: Any WinBoard bugs I missed?

Post by JoshPettus »

I'm not sure if it's just me on OSX, but if I open the Tournament window, click OK or Cancel and then open it again, Xboard quits with a segfault 11. It's the only window that does this as far as I can see.

Also, we discussed this before, but I don't remember if you intended to fix it for this release.

If you put xboard in ICS mode without the Terminal (E.G. via XOP file) You get an "end of file from keyboard" clicking OK quits xboard. Closing the error box leaves xboard somewhat unstable and liable to quit when clicking on the board at random times. (e.g. sometimes while you were observing)
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Any WinBoard bugs I missed?

Post by hgm »

xmas79 wrote:Ah very good, I will definitely put ! or ? at the end since this problem is annoying me for about one year! I assume that the output (by explictly printing ! or ?) will now exactly reflect the output of the engine over time. Right?
New lines bubble up from the bottom (of the lines of that depth) to the point where they encounter a higher score. But when that higher score is due to a fail (or has the same move) they would pass it anyway. When it passes it, it would adjust the score (not in the display, but in the sort key), so that lines starting with other moves can now pass it too, if there was no explicit fail indicator.

I think that should guarantee temporal ordering in single-PV mode, but that is not an obvious truth. It depends on the assumption that you cannot get fail-low lines after the engine found an exact score, and that, say, after a fail high on a later move which it prints, if search instability would cause the search on that move in the re-search to get a lower score than the PV move (or a fail low), this line will not be printed. E.g. the sequence

13 +0.12 d4 Nf6 ...
13 +0.20 e4!
13 +0.10 e4 e5

would sort the last line below the first one, because neither are fails, and the first one has the highest score. If you want the engine to give explicit output to recall the fail high, it should repeat the best line in this situation. This would bubble to above that same line printed earlier (because they have the same move, even through they have the same score 0.12), and then also to above the fail-high line that was on top.