Help request for debugging ICS

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Help request for debugging ICS

Post by hgm »

When I type 'ulimit' it replies 'unlimited'. So I guess the default setting should do. Still, I could not find any file 'core'. Perhaps it does not makeit in the directory I would expect it in (the directory where I started the ICS).

Very annoying. Now that I am waiting for a crash, it doesn't crash. I guess I have to abandon the test now, as it is already midnight. That means that when it crashes, I won't be there to give the 'where' command to gdb, and by the time I am there the ssh connection will be closed.

Is there a software method to create a core dump, in a place you can specify explicitly?
Last edited by hgm on Sun Jan 08, 2017 12:15 am, edited 1 time in total.
flok

Re: Help request for debugging ICS

Post by flok »

Run it in "screen"

sudo apt-get install screen

then:

screen -S ics

then start gdb with ics etc

then when you want to go to sleep, press ctrl+a and then d
then next mornign, login to your server and enter:
screen -r
and presto: gdb with all that!
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: Help request for debugging ICS

Post by Evert »

hgm wrote:When I type 'ulimit' it replies 'unlimited'. So I guess the default setting should do. Still, I could not find any file 'core'. Perhaps it does not makeit in the directory I would expect it in (the directory where I started the ICS).
Maybe. Under OS X, cire dumps end up in /cores, but I never used a linux system that was configured to use anyhing other than the default location.
Very annoying. Now that I am waiting for a crash, it doesn't crash. I guess I have to abandon the test now, as it is already midnight. That means that when it crashes, I won't be there to give the 'where' command to gdb, and by the time I am there the ssh connection will be closed.
Did you try disabling the ssh timeout on the server?
Is there a software method to create a core dump, in a place you can specify explicitly?
Sure. Try this: http://stackoverflow.com/questions/1604 ... -core-dump[/quote]
Volker Annuss
Posts: 180
Joined: Mon Sep 03, 2007 9:15 am

Re: Help request for debugging ICS

Post by Volker Annuss »

I should add that valgrind gives stack traces for many situations that can cause crashs such as using memory that has not been allocaced or already been freed or branching dependent on using uninitialized memory.
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Help request for debugging ICS

Post by hgm »

If you are waiting forit tocrash, it never does. The ICS is still running since yesterday night. I have started a new gdb this morning, and attached it to the running ICS process.

I have now discovered the command 'gcore', which seems to be able to fulfil the function of the missig 'backtrace' program. Currently the ICS launches the command 'backtrace' from its SIGSEGV handler (which in the current test I disabled by commenting out the catching of SIGSEGV):

Code: Select all

	snprintf(cmd, sizeof(cmd), "/home/mics/bin/backtrace %d > /home/mics/chessd/segv_%d 2>&1",  (int)getpid(), (int)getpid());
	system(cmd);
        _exit(1);
I guess I can replace this by

Code: Select all

	snprintf(cmd, sizeof(cmd), "gcore %d -o /home/mics/chessd/segv_%d", (int)getpid(), (int)getpid());
	system(cmd);
to restore this functionality.
flok

Re: Help request for debugging ICS

Post by flok »

Try starting a tournament. I think it is related to that.
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Help request for debugging ICS

Post by hgm »

Good point. Mamer was not logged in this time. Perhaps some mamer commands cause the crashing. In the previous run, where we did crash, I did launch mamer. (Although there was no tournament.)
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Help request for debugging ICS

Post by hgm »

Umm, that did not take long. So perhaps the crashes are mamer-related.

Unfortunately gdb let us down: it claims there is no stack when I use the "where" command.

Code: Select all

mics@www:~$ gdb
GNU gdb (Ubuntu/Linaro 7.4-2012.02-0~71~lucid1) 7.4-2012.02
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+&#58; GNU GPL version 3 or later <http&#58;//gnu.org/licenses/gpl.html>
This is free software&#58; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see&#58;
<http&#58;//bugs.launchpad.net/gdb-linaro/>.
&#40;gdb&#41; attach 21473
Attaching to process 21473
Reading symbols from /home/mics/chessd/bin/chessd...done.
warning&#58; Could not load shared library symbols for ./lib/chessd.so.
Do you need "set solib-search-path" or "set sysroot"?
Reading symbols from /lib32/libm.so.6...&#40;no debugging symbols found&#41;...done.
Loaded symbols for /lib32/libm.so.6
Reading symbols from /lib32/libdl.so.2...&#40;no debugging symbols found&#41;...done.
Loaded symbols for /lib32/libdl.so.2
Reading symbols from /lib32/libc.so.6...&#40;no debugging symbols found&#41;...done.
Loaded symbols for /lib32/libc.so.6
Reading symbols from /lib/ld-linux.so.2...warning&#58; the debug information found in "/lib/ld-2.11.1.so" does not match "/lib/ld-linux.so.2" &#40;CRC mismatch&#41;.

&#40;no debugging symbols found&#41;...done.
Loaded symbols for /lib/ld-linux.so.2
0xf7fdf430 in __kernel_vsyscall ()
&#40;gdb&#41; continue
Continuing.






Program received signal SIGSEGV, Segmentation fault.
0xf7decbf3 in ?? ()
&#40;gdb&#41; 
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
&#40;gdb&#41; 
The program is not being run.
&#40;gdb&#41; 
The program is not being run.
&#40;gdb&#41; 
The program is not being run.
&#40;gdb&#41; 
The program is not being run.
&#40;gdb&#41; where
No stack.
&#40;gdb&#41; 
The messages that the program is not being run are a delayed response to the empty lines I typed during the runing to keep the ssh connection alive.

It is also worrisome that it says it could not find the symbols for chessd.so. Chessd.so is basically the entire ICS; the executable chessd is merely a front-end for it, which can upgrade the ICS without needing to stop it, by changing to another chessd.so library.

I will try to modify the ICS to catch the SIGSEGV again, and invoke gcore to create a core dump before exiting.
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Help request for debugging ICS

Post by hgm »

OK, progress. The ICS crashed while running tourney, and in the terminal it was used to start it the message appeared

Code: Select all

warning&#58; the debug information found in "/lib/ld-2.11.1.so" does not match "/lib/ld-linux.so.2" &#40;CRC mismatch&#41;.

0xf7767430 in __kernel_vsyscall ()
Saved corefile /home/mics/chessd/core_5436.5436
due to gcore being invoked from the SIGSEGV handler.

When I run gdb on this core dump:

Code: Select all

mics@www&#58;~/chessd$ gdb bin/chessd core_5436.5436
GNU gdb &#40;Ubuntu/Linaro 7.4-2012.02-0~71~lucid1&#41; 7.4-2012.02
Copyright &#40;C&#41; 2012 Free Software Foundation, Inc.
License GPLv3+&#58; GNU GPL version 3 or later <http&#58;//gnu.org/licenses/gpl.html>
This is free software&#58; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see&#58;
<http&#58;//bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/mics/chessd/bin/chessd...done.
&#91;New LWP 5436&#93;

warning&#58; the debug information found in "/lib/ld-2.11.1.so" does not match "/lib/ld-linux.so.2" &#40;CRC mismatch&#41;.

Core was generated by `chessd'.
#0  0xf7767430 in __kernel_vsyscall ()
&#40;gdb&#41; where
#0  0xf7767430 in __kernel_vsyscall ()
#1  0xf7673e33 in waitpid () from /lib32/libc.so.6
#2  0xf7610b83 in ?? () from /lib32/libc.so.6
#3  0x0804936e in segv_handler &#40;sig=11&#41; at ficsmain.c&#58;177
#4  <signal handler called>
#5  0xf7574bf3 in game_ended &#40;g=4, winner=128, why=0&#41; at gameproc.c&#58;349
#6  0xf7576ef0 in process_move &#40;p=7, command=0xffa9c6d7 "b2d2")
    at gameproc.c&#58;783
#7  0xf756092a in process_prompt &#40;command=0xffa9c6d7 "b2d2", p=<optimized out>)
    at command.c&#58;733
#8  process_input &#40;fd=15, com_string=0xffa9c6d7 "b2d2") at command.c&#58;818
#9  0xf75845f8 in select_loop () at network.c&#58;605
#10 0x080491d2 in main_event_loop () at ficsmain.c&#58;90
#11 main &#40;argc=3, argv=0xffa9cc44&#41; at ficsmain.c&#58;232
&#40;gdb&#41; 
So it seems the ICS is crashing in game_ended(4, BLACK, 0).

Now what? How can I find out what statement in this routine caused the crash? I have no experience in using gdb.
User avatar
hgm
Posts: 27808
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Help request for debugging ICS

Post by hgm »

Code: Select all

&#40;gdb&#41; list *0xf7574bf3
0xf7574bf3 is in game_ended &#40;gameproc.c&#58;349&#41;.
344	        if ((&#40;player_globals.parray&#91;gg->black&#93;.b_stats.rating <= pp->availmax&#41; && &#40;player_globals.parray&#91;gg->black&#93;.b_stats.rating >= pp->availmin&#41;) || (!pp->availmax&#41;) &#123;
345	          pprintf &#40;p,"\n%s",avail_black&#41;;
346	          avail_printed = 1;
347	        &#125;
348	        if &#40;gl == -1&#41; /* bughouse ? */ &#123;
349	          if ((&#40;player_globals.parray&#91;game_globals.garray&#91;gl&#93;.white&#93;.b_stats.rating <= pp->availmax&#41; && &#40;player_globals.parray&#91;game_globals.garray&#91;gl&#93;.white&#93;.b_stats.rating >= pp->availmin&#41;) || (!pp->availmax&#41;) &#123;
350	            pprintf &#40;p,"\n%s",avail_bugwhite&#41;;
351	            avail_printed = 1;
352	          &#125;
353	          if ((&#40;player_globals.parray&#91;game_globals.garray&#91;gl&#93;.black&#93;.b_stats.rating <= pp->availmax&#41; && &#40;player_globals.parray&#91;game_globals.garray&#91;gl&#93;.black&#93;.b_stats.rating >= pp->availmin&#41;) || (!pp->availmax&#41;) &#123;
It seems the address fromwhich the SIGSEGV exception handler was called is in line 349. This line is in a code section for which the comment suggests it should only be executed in bughouse games, though. :shock:

It is quite suspect in the first place that when gl is found to be -1 it should be used as an index in game_globals.garray anyway...