Page 1 of 6

Help request for debugging ICS

Posted: Sat Jan 07, 2017 5:46 pm
by hgm
After many months of flawless running, the ICS on which I organize the monthly blitz tournaments suddenly became crash prone in October, and now typically crashes within the hour. Nothing was changed, neither in the ICS nor in the OS of the VPS it is running on. My experience with Linux is quite limited, and I am not sure how I can figure out what causes these crashes.

The ICS itself is so friendly that it catches segfaults, through the following handler:

Code: Select all

/*
  give a decent backtrace on segv
*/
static void segv_handler(int sig)
{
	char cmd[100];
	snprintf(cmd, sizeof(cmd), "/home/mics/bin/backtrace %d > /home/mics/chessd/segv_%d 2>&1", 
		 (int)getpid(), (int)getpid());
	system(cmd);
	_exit(1);
}
Indeed many segv_* files were created in the home directory, corresponding to times where we suffered crashing:

Code: Select all

mics@www:~/chessd$ ls -lt
total 140
-rw-r--r--  1 mics mics    40 2016-12-24 14:13 segv_9841
-rw-------  1 mics mics  8192 2016-12-23 21:30 config.tdb
-rw-------  1 mics mics   696 2016-12-23 21:30 news.tdb
-rw-r--r--  1 mics mics    40 2016-12-03 12:32 segv_15471
-rw-r--r--  1 mics mics    40 2016-11-19 23:09 segv_1092
drwxr-xr-x  2 mics mics 20480 2016-11-19 23:05 spool
-rw-------  1 mics mics 27360 2016-11-19 21:10 admin.log
-rw-r--r--  1 mics mics    40 2016-11-19 20:38 segv_628
-rw-r--r--  1 mics mics    40 2016-11-19 19:24 segv_15291
-rw-r--r--  1 mics mics    40 2016-11-19 18:27 segv_11847
-rw-r--r--  1 mics mics    40 2016-11-19 15:43 segv_5887
-rw-r--r--  1 mics mics    40 2016-11-16 23:07 segv_28079
-rw-r--r--  1 mics mics    40 2016-10-23 08:35 segv_19347
-rw-r--r--  1 mics mics    40 2015-07-16 03:33 segv_28315
-rw-r--r--  1 mics mics    40 2015-05-08 12:56 segv_27428
-rw-r--r--  1 mics mics    40 2015-05-08 10:55 segv_27290
-rw-r--r--  1 mics mics    40 2015-05-08 10:43 segv_26709
-rw-r--r--  1 mics mics    40 2015-05-06 17:33 segv_13933
-rw-r--r--  1 mics mics    40 2015-02-24 02:38 segv_28503
drwxr-xr-x  2 mics mics  4096 2014-04-29 12:24 bin
drwxr-xr-x 10 mics mics  4096 2014-04-29 12:18 data
drwxr-xr-x  5 mics mics  4096 2014-04-29 12:18 games
drwxr-xr-x  2 mics mics  4096 2014-04-29 12:18 lib
drwxr-xr-x 28 mics mics  4096 2014-04-29 12:18 players
(Note that in the crash-free periods since October the ICS was simply not running. According to the change log on May 8, 2015 I fixed a buffer overrun.) Unfortunately the segv* files do not contain any useful information; the all contain the single line

Code: Select all

sh: /home/mics/bin/backtrace: not found
Apparently I am missing a tool called 'backtrace' that ancient operators of the ICS still had, but now is no longer part of the project.

How could I figure out where the ICS is crashing? I suppose I could run it under gdb. Unfortuately, gdb is not installed on the VPS where the ICS is running. "Apt-get install" does not work on that machine, because repositories for the OS (Ubuntu 10.04) no longer exist.

Are there other ways to get a working gdb on that system? Or other ways to get a stack trace? Would it be possible to force a core dump from the segfault handler, which could be taken to another machine for post-mortem debugging? If so, how?

Re: Help request for debugging ICS

Posted: Sat Jan 07, 2017 6:14 pm
by tttony
You can install gdb adding this PPA: https://code.launchpad.net/~nitrof22/+a ... ubuntu/ppa (I got it from here: http://askubuntu.com/questions/100296/i ... or-gdb-7-4)

Add it to: /etc/apt/sources.list

Code: Select all

deb http://ppa.launchpad.net/nitrof22/ppa/ubuntu lucid main 
deb-src http://ppa.launchpad.net/nitrof22/ppa/ubuntu lucid main 
The install gdb

Code: Select all

sudo apt-get install gdb

Re: Help request for debugging ICS

Posted: Sat Jan 07, 2017 6:44 pm
by jdart
1. Compile code with -g. Consider not optimizing or if you do have optimization flags on, add -fno-omit-frame-pointer.

2. You can start gdb and attach to the process once it is running. Just type "attach <pid>" on the gdb command line, where "<pid>" is the process id obtained from "ps". attach automatically halts the program at it s current location.

3. Breakpoint on the segv handler. "break handler_file.cpp:line_no".

4. Type "continue" to resume program execution

5. When the segv happens you will hit the breakpoint. Type "where" to see the stack trace. You can also examine variables, etc.

For buffer overflow issues, also consider using a tool like valgrind or the gcc6 flags -fsantize=bounds and -fsanitize=address (must pass these to both the compler and linker). These will detect issues at runtime. You can also submit the code to Coverity Scan (https://scan.coverity.com/, which will do a static analysis for you.

--Jon

Re: Help request for debugging ICS

Posted: Sat Jan 07, 2017 7:55 pm
by hgm
tttony wrote:You can install gdb adding this PPA: https://code.launchpad.net/~nitrof22/+a ... ubuntu/ppa (I got it from here: http://askubuntu.com/questions/100296/i ... or-gdb-7-4)

Add it to: /etc/apt/sources.list

Code: Select all

deb http&#58;//ppa.launchpad.net/nitrof22/ppa/ubuntu lucid main 
deb-src http&#58;//ppa.launchpad.net/nitrof22/ppa/ubuntu lucid main 
The install gdb

Code: Select all

sudo apt-get install gdb
OK, after an extra "apt-get update" this worked. So I managed to obtain a gdb on that machine now. Thanks!

Fortunately -g is amongst the standard flags for compiling, in the Makefile. I also removed the catching of SIGSEGV, and set unlimited core dumps.

Now it is just a matter of waiting for it to segfault again. You are all invited to log in and start games, in order to stress it!

Re: Help request for debugging ICS

Posted: Sat Jan 07, 2017 9:11 pm
by Sven
I get "connection closed", is ICS already online?

Re: Help request for debugging ICS

Posted: Sat Jan 07, 2017 9:16 pm
by Sven
Sven Schüle wrote:I get "connection closed", is ICS already online?
I could also rephrase: did it crash already?

Re: Help request for debugging ICS

Posted: Sat Jan 07, 2017 9:17 pm
by flok
If you need more detailed debug info, try -ggdb3 instead of -g

Re: Help request for debugging ICS

Posted: Sat Jan 07, 2017 10:42 pm
by hgm
Yes, it crashed already. Unfortunately I cannot find any core dump, even though I commented out the catching of SIGSEGV. I had hoped to do a post-mortem debug.

I have restarted it now interactively under gdb. The problem will be to keep the ssh connection to the VPS through which I operate gdb open until the crash occurs, as it tends to time out when I just wait.

Re: Help request for debugging ICS

Posted: Sat Jan 07, 2017 10:46 pm
by Volker Annuss
Try to compile the ICS with -g and run it under valgrind with

Code: Select all

valgrind --log-file=ics.log <your ICS start command>
and you'll get your stacktrace and more. See the valgrind manual for more options. Expect the ICS to run about 30 times slower than normal.

You can download valgrind from http://valgrind.org and install it with the triple jump

Code: Select all

./configure
make
make install
No need to get it from the old Ubuntu repositories.

Re: Help request for debugging ICS

Posted: Sat Jan 07, 2017 10:56 pm
by Evert
hgm wrote:Yes, it crashed already. Unfortunately I cannot find any core dump, even though I commented out the catching of SIGSEGV. I had hoped to do a post-mortem debug.
Make sure core dumps are enabled (type "ulimit -c unlimited").