Two more suggestions:
1) If an engine is unresponsive, c-chess-cli exits without killing the engine processes which keep on lingering as zombies. Child process kill could be added upon exit.
2) For tournaments, it would be useful to have some final output of the tournament table. Right now, this requires loading the PGN into some other software, or not using the tournament mode and running the encounters separately. Optionally giving a result file name and outputting that as CSV table (just a text file) would be nice. Using a semicolon as separator could work around locales that use the comma as decimal point.
c-chess-cli
Moderators: hgm, Rebel, chrisw
-
- Posts: 2488
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: c-chess-cli
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: c-chess-cli
1) I was hoping to handle this portably and simply by relying on EOF, which is the idiomatic way of Unix pipes. when the engine gets an EOF reading from stdin, it should exit (or crash if the programmer was not careful which amounts to the same). And vice-versa (child dies, parent cant read from broken pipe, exits).Ras wrote: ↑Mon Nov 02, 2020 4:22 pm Two more suggestions:
1) If an engine is unresponsive, c-chess-cli exits without killing the engine processes which keep on lingering as zombies. Child process kill could be added upon exit.
2) For tournaments, it would be useful to have some final output of the tournament table. Right now, this requires loading the PGN into some other software, or not using the tournament mode and running the encounters separately. Optionally giving a result file name and outputting that as CSV table (just a text file) would be nice. Using a semicolon as separator could work around locales that use the comma as decimal point.
But you're right. A hanging engine could be stuck forever without ever doing a read from stdin. The problem is I need a solution that catches every case, not just this one. For example a Ctrl+C, and who know what else the operating system could throw at us.
2) Tournament table… Not sure about this. I don't want to use a broken ELO model, and I don't want to reimplement BayesElo either. The only thing I could print are pair stats, just WLD triplet per pair. But there are N(N-1)/2 pairs in a RR!
I like the CSV solution. Is there a format I can use that other tools use ? eg. doesn't Ordo have such an input format ?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 3232
- Joined: Mon May 31, 2010 1:29 pm
- Full name: lucasart
Re: c-chess-cli
Actually, I think 1) is purely academic. When parent dies, both stdin and stdout in the child process are broken, and will result in termination. If an engine doesn't terminate in such conditions, the engine is at fault.lucasart wrote: ↑Tue Nov 03, 2020 2:01 am1) I was hoping to handle this portably and simply by relying on EOF, which is the idiomatic way of Unix pipes. when the engine gets an EOF reading from stdin, it should exit (or crash if the programmer was not careful which amounts to the same). And vice-versa (child dies, parent cant read from broken pipe, exits).Ras wrote: ↑Mon Nov 02, 2020 4:22 pm Two more suggestions:
1) If an engine is unresponsive, c-chess-cli exits without killing the engine processes which keep on lingering as zombies. Child process kill could be added upon exit.
2) For tournaments, it would be useful to have some final output of the tournament table. Right now, this requires loading the PGN into some other software, or not using the tournament mode and running the encounters separately. Optionally giving a result file name and outputting that as CSV table (just a text file) would be nice. Using a semicolon as separator could work around locales that use the comma as decimal point.
But you're right. A hanging engine could be stuck forever without ever doing a read from stdin. The problem is I need a solution that catches every case, not just this one. For example a Ctrl+C, and who know what else the operating system could throw at us.
2) Tournament table… Not sure about this. I don't want to use a broken ELO model, and I don't want to reimplement BayesElo either. The only thing I could print are pair stats, just WLD triplet per pair. But there are N(N-1)/2 pairs in a RR!
I like the CSV solution. Is there a format I can use that other tools use ? eg. doesn't Ordo have such an input format ?
Do you have a real example when this zombie child scenario happens ?
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
-
- Posts: 1766
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: c-chess-cli
Agreed fully. One of the things I test for each engine I add to OpenBench is whether or not they respect to closure of stdin.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
-
- Posts: 2488
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: c-chess-cli
Ordo has only PGN as input and uses CSV as output. That would be the other solution if printing tournament stats (just the table without Elo) didn't make sense for c-chess-cli.
Yes, that's how I spotted it. Let's take Raven 1.1, modified in chess.c line 62 to check the return code of fgets() and exit if it's 0, and match that against Zevra 2.1.2 with 8 threads (I have a 4C/8T CPU). I can't reproduce that with fewer workers, then Zevra won't be unresponsive.
Zevra is unresponsive after about 10-20 games, and while the Zevra processes are killed (most of the time, but not always), the Raven one's linger. The lingering engines are sleeping in waiting channel "pipe_wait" as per my system monitor, with FD 0 and 1 indicated as open files of pipe sort.
Using my engine instead of Raven has a similar effect, but I'm using read() directly on stdin (with error checking). The hanging processes of my engine are sleeping in waiting channel futex_wait_queue_me though.
It looks like the pipes aren't closed. I'm on kernel 5.4.0, but have also tried 5.8.0 - same results.
However, matching Raven and my engine works. Matching Demolito against Zevra works without Zevra becoming unresponsive. Pretty strange.
Killing c-chess-cli with CTRL-C before Zevra is unresponsive makes all processes exit as expected.
What's your test case for that (Linux)?AndrewGrant wrote: ↑Tue Nov 03, 2020 7:33 amOne of the things I test for each engine I add to OpenBench is whether or not they respect to closure of stdin.
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 2488
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: c-chess-cli
Could the issue be a race condition between pipe() and fork()?
https://stackoverflow.com/questions/380 ... rom-thread
https://stackoverflow.com/questions/380 ... rom-thread
Even when matching Raven against my engine where things work with 8 game threads in parallel, there are like e.g. 31 file descriptors open in each engine process, 28 of them pipes. There shouldn't be that many pipes open. FD 0 and 1 as pipes (stdin/stdout), 3 FDs as file /dev/pts/0, the EPD and the PGN file (yes, open in the engine processes), and then 26 FDs 5 to 30 as pipes. Some engine processes have fewer pipes open, only up to FD 8, others more, up to FD 36.If a thread switch occurs and fork is called between the pipe and fork system calls, the pipe file descriptors are duplicated, causing the write/read ends to be open multiple times.
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 1766
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: c-chess-cli
Well, in practice itsRas wrote: ↑Tue Nov 03, 2020 10:20 amWhat's your test case for that (Linux)?AndrewGrant wrote: ↑Tue Nov 03, 2020 7:33 amOne of the things I test for each engine I add to OpenBench is whether or not they respect to closure of stdin.
1) pkilling cutechess from the command line while its running OpenBench; and
2) Stopping a test on the OpenBench webpage, and seeing how a worker running said test reacts.
Nothing fancy.
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
-
- Posts: 2488
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: c-chess-cli
If I do that with c-chess-cli, my engine exits correctly if there is only one concurrent game going on. Otherwise, the duplicate pipes are still open in the spawned engine processes so that reading stdin just waits forever.AndrewGrant wrote: ↑Tue Nov 03, 2020 6:24 pm1) pkilling cutechess from the command line while its running OpenBench
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net
-
- Posts: 1766
- Joined: Tue Apr 19, 2016 6:08 am
- Location: U.S.A
- Full name: Andrew Grant
Re: c-chess-cli
I'm not sure why having multiple concurrency changes things?Ras wrote: ↑Tue Nov 03, 2020 8:18 pmIf I do that with c-chess-cli, my engine exits correctly if there is only one concurrent game going on. Otherwise, the duplicate pipes are still open in the spawned engine processes so that reading stdin just waits forever.AndrewGrant wrote: ↑Tue Nov 03, 2020 6:24 pm1) pkilling cutechess from the command line while its running OpenBench
#WeAreAllDraude #JusticeForDraude #RememberDraude #LeptirBigUltra
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
"Those who can't do, clone instead" - Eduard ( A real life friend, not this forum's Eduard )
-
- Posts: 2488
- Joined: Tue Aug 30, 2016 8:19 pm
- Full name: Rasmus Althoff
Re: c-chess-cli
Because reading stdin only gives an error if the pipe is closed. That however requires it to be closed on all other ends because it's reference counted. Since the pipes are duplicated also in the spawned engine processes, killing c-chess-cli doesn't close all other pipe ends so that reading stdin just gives a blocking call with no input. Basically, the zombies keep each other alive. With only one game, i.e. two engines, the second engine dupes the first one's stdin/out, but has no one duping its stdin/out. So the second engine exits, and then also the first one. A least, that's what I think that happens.AndrewGrant wrote: ↑Tue Nov 03, 2020 8:36 pmI'm not sure why having multiple concurrency changes things?
Rasmus Althoff
https://www.ct800.net
https://www.ct800.net