Your favorite crash

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

What makes your engine crash most?

Due to a division by zero
1
4%
Infinite loop
3
12%
Illegal addressing, such as a pointer out of memory
21
84%
 
Total votes: 25

Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Your favorite crash

Post by Ras »

My engine simply does not crash. But only after debugging stability issues, and in that one, my favourite was illegal addressing.

NG-Play still has a hash table bug where an out-of-bounds write may occur. I have converted the dynamic alloc to static alloc, and in this scenario, the out of bounds write can overwrite the variable after the table - which was another pointer. That one was really hard to debug, though the fix was easy.

The second one (fixed as of NG-Play 9.87, btw) was an algorithmic one. I noticed that the engine was throwing away a queen for nothing. Debugging showed that this position was so bad that all of the moves were cut off in futility pruning - and since zero moves remained, that was mis-interpreted as stalemate. The fix was to also count legal but pruned moves for the stalemate detection.
lauriet
Posts: 199
Joined: Sun Nov 03, 2013 9:32 am

Re: Your favorite crash

Post by lauriet »

Rebel wrote:During the years I noticed that when my engine crashes during development it's because of a division by zero, apparently my favorite sloppiness.

Yours is?

I program in Pascal, therefore it never crashes :P
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Your favorite crash

Post by AlvaroBegue »

After 15 years writing and maintaining software for a trading system, I don't seem to make software that crashes much. :)

Sometimes I hack something together, I get sloppy, and then the flavor of crash I get is generally something like a buffer overrun.
User avatar
hgm
Posts: 27789
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Your favorite crash

Post by hgm »

Ras wrote:Debugging showed that this position was so bad that all of the moves were cut off in futility pruning - and since zero moves remained, that was mis-interpreted as stalemate. The fix was to also count legal but pruned moves for the stalemate detection.
Indeed, I had a similar problem in accounting the validity depth of a node; when all moves were futile that depth remained at infinity. Logically one should treat a futility-pruned move like it has score = currentEval + value[victim] + MARGIN, and depth = 1 (because at larger depth you would not do futility pruning). I had forgotten to do that.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Your favorite crash

Post by Gerd Isenberg »

Not computer chess related - parsing a null-terminated C-string via char* ptr, before checking (*ptr) I referred (*(ptr+1)) for some wild speculative optimizations, which crashed once in a while if ptr was last address of a 4K page ...
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: Your favorite crash

Post by lucasart »

Crashes are the tip of the iceberg. Most of the iceberg is under water.

It's amazing the amount of hidden bugs that one finds, when carefully testing. Some elo draining bugs, that don't crash, and produce no visible effect. Some not impacting elo, but still worth fixing to strengthen the codebase.

The worst kind of bugs are, by far, SMP bugs, such as races and dead locks, and compiler optimization (things allowed by the C standard that break SMP code not carefully written).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Your favorite crash

Post by cdani »

The strangest crash I had. Happened today. Seems like caused by a punctual RAM incoherence or similar, in a Ryzen 7 1800X. I have such detail due to generating crash dumps.

Code: Select all

000000013F2A6F36  call        marcar_captures_bones_q (013F2BAFA0h)  
			estat_actual->actual = &estat_actual->moviments[0];
000000013F2A6F3B  mov         qword ptr [rdi+0B08h],rdi  
000000013F2A6F42  mov         eax,0FFFFh  
000000013F2A6F47  mov         ebx,0C000h  
			while ((m = seguent_moviment(estat_actual))) {
000000013F2A6F4C  mov         rdx,qword ptr [rdi+0B08h]  
000000013F2A6F53  cmp         dword ptr [rdx],eax  
000000013F2A6F55  je          $no_fer_hash+108Fh (013F2A7381h)  
000000013F2A6F5B  cmp         dword ptr [rdx],0  
000000013F2A6F5E  je          $no_fer_hash+452h (013F2A6744h)  
000000013F2A6F64  mov         r8d,dword ptr [rdx+8]  
000000013F2A6F68  ?? ?? 
000000013F2A6F69  ?? ?? 
000000013F2A6F6A  ?? ?? 
000000013F2A6F6B  ?? ?? 
000000013F2A6F6C  ?? ?? 
000000013F2A6F6D  ?? ?? 
000000013F2A6F6E  ?? ?? 
000000013F2A6F6F  ?? ?? 
000000013F2A6F70  ?? ?? 
000000013F2A6F71  ?? ?? 
000000013F2A6F72  ?? ?? 
000000013F2A6F73  ?? ?? 
000000013F2A6F74  ?? ?? 
000000013F2A6F75  ?? ?? 
000000013F2A6F76  ?? ?? 
000000013F2A6F77  ?? ?? 
000000013F2A6F78  ?? ?? 
000000013F2A6F79  ?? ?? 
000000013F2A6F7A  ?? ?? 
000000013F2A6F7B  ?? ?? 
000000013F2A6F7C  ?? ?? 
000000013F2A6F7D  ?? ?? 
000000013F2A6F7E  ?? ?? 
000000013F2A6F7F  ?? ?? 
000000013F2A6F80  ?? ?? 
000000013F2A6F81  ?? ?? 
000000013F2A6F82  ?? ?? 
000000013F2A6F83  ?? ?? 
000000013F2A6F84  ?? ?? 
000000013F2A6F85  ?? ?? 
000000013F2A6F86  ?? ?? 
000000013F2A6F87  ?? ?? 
000000013F2A6F88  ?? ?? 
000000013F2A6F89  ?? ?? 
000000013F2A6F8A  ?? ?? 
000000013F2A6F8B  ?? ?? 
000000013F2A6F8C  ?? ?? 
000000013F2A6F8D  ?? ?? 
000000013F2A6F8E  ?? ?? 
000000013F2A6F8F  ?? ?? 
000000013F2A6F90  ?? ?? 
000000013F2A6F91  ?? ?? 
000000013F2A6F92  ?? ?? 
000000013F2A6F93  ?? ?? 
000000013F2A6F94  ?? ?? 
000000013F2A6F95  ?? ?? 
000000013F2A6F96  ?? ?? 
000000013F2A6F97  ?? ?? 
000000013F2A6F98  ?? ?? 
000000013F2A6F99  ?? ?? 
000000013F2A6F9A  ?? ?? 
000000013F2A6F9B  ?? ?? 
000000013F2A6F9C  ?? ?? 
000000013F2A6F9D  ?? ?? 
000000013F2A6F9E  ?? ?? 
000000013F2A6F9F  ?? ?? 
000000013F2A6FA0  ?? ?? 
000000013F2A6FA1  ?? ?? 
000000013F2A6FA2  ?? ?? 
000000013F2A6FA3  ?? ?? 
000000013F2A6FA4  ?? ?? 
000000013F2A6FA5  ?? ?? 
000000013F2A6FA6  ?? ?? 
000000013F2A6FA7  ?? ?? 
000000013F2A6FA8  ?? ?? 
000000013F2A6FA9  ?? ?? 
000000013F2A6FAA  ?? ?? 
000000013F2A6FAB  ?? ?? 
000000013F2A6FAC  ?? ?? 
000000013F2A6FAD  ?? ?? 
000000013F2A6FAE  ?? ?? 
000000013F2A6FAF  ?? ?? 
000000013F2A6FB0  ?? ?? 
000000013F2A6FB1  ?? ?? 
000000013F2A6FB2  ?? ?? 
000000013F2A6FB3  ?? ?? 
000000013F2A6FB4  ?? ?? 
000000013F2A6FB5  ?? ?? 
000000013F2A6FB6  ?? ?? 
000000013F2A6FB7  ?? ?? 
000000013F2A6FB8  ?? ?? 
000000013F2A6FB9  ?? ?? 
000000013F2A6FBA  ?? ?? 
000000013F2A6FBB  ?? ?? 
000000013F2A6FBC  ?? ?? 
000000013F2A6FBD  ?? ?? 
000000013F2A6FBE  ?? ?? 
000000013F2A6FBF  ?? ?? 
000000013F2A6FC0  ?? ?? 
000000013F2A6FC1  ?? ?? 
000000013F2A6FC2  ?? ?? 
000000013F2A6FC3  ?? ?? 
000000013F2A6FC4  ?? ?? 
000000013F2A6FC5  ?? ?? 
000000013F2A6FC6  ?? ?? 
000000013F2A6FC7  ?? ?? 
000000013F2A6FC8  ?? ?? 
000000013F2A6FC9  ?? ?? 
000000013F2A6FCA  ?? ?? 
000000013F2A6FCB  ?? ?? 
000000013F2A6FCC  ?? ?? 
000000013F2A6FCD  ?? ?? 
000000013F2A6FCE  ?? ?? 
000000013F2A6FCF  ?? ?? 
					continue;
				nummovimentlegal++;
				if (provades > 0
					&& !EsPromocio(m)
					&& abs&#40;beta&#41; < MATE
					&& ss->tau.c&#91;Desti&#40;m&#41;&#93; == 0
					&& see&#40;ss, m, -50, &#40;e_colors&#41;estat_actual->mou&#41; < 0
					)
000000013F2A6FD0  xor         byte ptr &#91;rbp+198F0FC9h&#93;,al  
000000013F2A6FD6  add         dword ptr &#91;rax&#93;,eax  
000000013F2A6FD8  add         bh,bh  
					continue;
MahmoudUthman
Posts: 234
Joined: Sat Jan 17, 2015 11:54 pm

Re: Your favorite crash

Post by MahmoudUthman »

What is a "punctual RAM incoherence" ?
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Your favorite crash

Post by cdani »

MahmoudUthman wrote:What is a "punctual RAM incoherence" ?
I mean a RAM bug that happens one time, but of course I cannot be sure. Maybe is cache (CPU) related. I don't know if is common.
User avatar
cdani
Posts: 2204
Joined: Sat Jan 18, 2014 10:24 am
Location: Andorra

Re: Your favorite crash

Post by cdani »

This is the disassembly of the place where the crash happened, at 000000013F2A6FD0, just after the ?? ?? invalid instructions. Where there are ?? ??, should be part of the assembly code of the engine, and instead of it there are a lot of invalid instructions. So the RAM went corrupted, and is not some bug of the engine, as I understand the processes does not have permission to overwrite the execution memory.