Help with Debugging My Chess Engine

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Help with Debugging My Chess Engine

Post by matthewlai »

universecoder wrote: I did what you suggested(declared them as static and then initialized them), but I still have a problem, the generated keys are still different.
They will be different between runs still unless you switch to fixed seed. Even then it will depend on other parts of your program if you use rand() elsewhere. Using C++11's random number generation is much safer.
I did that, but I am getting a host of weird messages that I am unable to understand.

------------------------------------------------

0x0000000000407186 in std::__detail::_Insert_base<int, int, std::allocator<int>, std::__detail::_Identity, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::insert(int const&) ()

--------------------------------------------------

0x000000000040ae35 in std::_Hashtable<int, int, std::allocator<int>, std::__detail::_Identity, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_M_remove_bucket_begin(unsigned long, std::__detail::_Hash_node<int, false>*, unsigned long) ()
--------------------------------------------------

Program received signal SIGINT, Interrupt.
0x0000000000409ff2 in std::allocator<int>::~allocator() ()
--------------------------------------------------
That's where the program is when you interrupted it. If you type "bt" (backtrace) it will give you the whole call stack, so you can see where in your program you are calling it.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
universecoder
Posts: 53
Joined: Mon Sep 19, 2016 6:51 am

Re: Help with Debugging My Chess Engine

Post by universecoder »

They will be different between runs still unless you switch to fixed seed. Even then it will depend on other parts of your program if you use rand() elsewhere. Using C++11's random number generation is much safer.
They are not different between runs, they are different in the SAME run, which they shouldn't be since the hash lists are now static.
That's where the program is when you interrupted it. If you type "bt" (backtrace) it will give you the whole call stack, so you can see where in your program you are calling it.
I am getting the following output:
----------------------------------------------------

#0 0x0000000000409acd in std::_Hashtable<int, int, std::allocator<int>, std::__detail::_Identity, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_M_find_before_node(unsigned long, int const&, unsigned long) const ()
#1 0x000000000040850c in std::_Hashtable<int, int, std::allocator<int>, std::__detail::_Identity, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_M_erase(std::integral_constant<bool, true>, int const&) ()
#2 0x000000000040720f in std::_Hashtable<int, int, std::allocator<int>, std::__detail::_Identity, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::erase(int const&) ()
#3 0x00000000004066f5 in std::unordered_set<int, std::hash<int>, std::equal_to<int>, std::allocator<int> >::erase(int const&) ()
#4 0x000000000040313f in chessboard::playMove(Move&) ()
#5 0x000000000040516e in chessboard::isMoveValid(Move&) ()
#6 0x000000000040522b in chessboard::addMove(Move&) ()
#7 0x000000000040c7f1 in chessboard::generateAllMoves() ()
#8 0x000000000040ea30 in main ()

----------------------------------------------------------

But this is not helping me, I knew that something is causing that infinite loop in isMoveValid(), I am unable to figure out what it is.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Help with Debugging My Chess Engine

Post by AlvaroBegue »

You can inspect the values of your variables using gdb. But you should probably find a tutorial online to learn how to do it.
universecoder
Posts: 53
Joined: Mon Sep 19, 2016 6:51 am

Re: Help with Debugging My Chess Engine

Post by universecoder »

Ok, thanks. I'll learn the GNU debugger tool. But did you find any fault with the code?
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Help with Debugging My Chess Engine

Post by matthewlai »

universecoder wrote:
They will be different between runs still unless you switch to fixed seed. Even then it will depend on other parts of your program if you use rand() elsewhere. Using C++11's random number generation is much safer.
They are not different between runs, they are different in the SAME run, which they shouldn't be since the hash lists are now static.
That's where the program is when you interrupted it. If you type "bt" (backtrace) it will give you the whole call stack, so you can see where in your program you are calling it.
I am getting the following output:
----------------------------------------------------

#0 0x0000000000409acd in std::_Hashtable<int, int, std::allocator<int>, std::__detail::_Identity, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_M_find_before_node(unsigned long, int const&, unsigned long) const ()
#1 0x000000000040850c in std::_Hashtable<int, int, std::allocator<int>, std::__detail::_Identity, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_M_erase(std::integral_constant<bool, true>, int const&) ()
#2 0x000000000040720f in std::_Hashtable<int, int, std::allocator<int>, std::__detail::_Identity, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::erase(int const&) ()
#3 0x00000000004066f5 in std::unordered_set<int, std::hash<int>, std::equal_to<int>, std::allocator<int> >::erase(int const&) ()
#4 0x000000000040313f in chessboard::playMove(Move&) ()
#5 0x000000000040516e in chessboard::isMoveValid(Move&) ()
#6 0x000000000040522b in chessboard::addMove(Move&) ()
#7 0x000000000040c7f1 in chessboard::generateAllMoves() ()
#8 0x000000000040ea30 in main ()

----------------------------------------------------------

But this is not helping me, I knew that something is causing that infinite loop in isMoveValid(), I am unable to figure out what it is.
Like Álvaro mentioned, you can look at the values of variables, at any frame in the call stack, as well as the exact line number. But for that you need to learn a bit more about GDB. There are many tutorials online, and it only takes about 15 minutes to get a good handle of the basics. It will be one of your most important tools in your toolbox.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: Help with Debugging My Chess Engine

Post by Ras »

universecoder wrote:C) I am unable to figure to reset the castle permissions in undoMove().
NG-Play is using a separate status flags stack. MakeMove() pushes the current status on the stack while UndoMove() pops it back. The castling rights are done with a flag each for the two kings and the four rooks. There is also one flag each whether white or black has castled, useful for the static eval.

However, there is a little trap - if a king moves, it obviously looses its castling right. Now the rooks ALSO have to be flagged as "no castling" because otherwise, the repetition logic may not work correctly. Same after a castling, the remaining rook has to be flagged as "no castling". Took me quite some time to hut that bug down once I had encountered it.

This flag variable has also to be taken into account for the hashing.

But it's not only castling that is preserved there, also the flag which square is an en-passant-square, or 0 is the last move didn't enable en passant. However, this is not included in the hashing for the opening book, or else a position based book becomes difficult.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Help with Debugging My Chess Engine

Post by Sven »

universecoder wrote:
They will be different between runs still unless you switch to fixed seed. Even then it will depend on other parts of your program if you use rand() elsewhere. Using C++11's random number generation is much safer.
They are not different between runs, they are different in the SAME run, which they shouldn't be since the hash lists are now static.
So you are not talking about the zobrist keys (which you create and maintain once for the whole program) but about the hash key of the board which differs between "updated incrementally in playMove()" and "calculated from scratch", right?

If that is true then the problem is usually in the incremental update code (and your method chessboard::initUniqueKey() is trivial enough to trust it ...), so you could try to isolate the part of your incremental update code that does not work properly (assuming the root cause is actually there). Updating the hash key can be divided into four parts:

1) piece placement related part,
2) castling rights related part,
3) en passant square related part,
4) side to move related part.

What I do in such a case (which I already had a couple of times, at least once with each new engine) is to start with part 1 and strip down (by commenting out) the incremental hash key update code as well as the corresponding "calculate hash key from scratch" function so that it only contains code for part 1. If this works, i.e. now both versions of the hash key are identical, then the problem is in one of the other parts, so now enable part 2 in both functions and continue until you found out which part is responsible for the problem. In rare cases you find one problem but there is still a second one.

What you can do additionally is that you try to isolate the move type that causes the failure, e.g. normal move, en passant, castling, promotion, double step, capture, ... To increase the probability that the occurrence of the problem for, say, an ep capture does not happen by chance you could take different starting positions resp. move sequences and try to reach the same problem scenario. If it is always the same move type that triggers the bug then you have most probably found the "offending" piece of code.

If the problem already occurs with your current test code in main() where 1.e4 is the only move that is made from the initial opening position then isolating the move type of course does not help.
universecoder wrote:I am getting the following output:
----------------------------------------------------

#0 0x0000000000409acd in std::_Hashtable<int, int, std::allocator<int>, std::__detail::_Identity, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_M_find_before_node(unsigned long, int const&, unsigned long) const ()
#1 0x000000000040850c in std::_Hashtable<int, int, std::allocator<int>, std::__detail::_Identity, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::_M_erase(std::integral_constant<bool, true>, int const&) ()
#2 0x000000000040720f in std::_Hashtable<int, int, std::allocator<int>, std::__detail::_Identity, std::equal_to<int>, std::hash<int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, true, true> >::erase(int const&) ()
#3 0x00000000004066f5 in std::unordered_set<int, std::hash<int>, std::equal_to<int>, std::allocator<int> >::erase(int const&) ()
#4 0x000000000040313f in chessboard::playMove(Move&) ()
#5 0x000000000040516e in chessboard::isMoveValid(Move&) ()
#6 0x000000000040522b in chessboard::addMove(Move&) ()
#7 0x000000000040c7f1 in chessboard::generateAllMoves() ()
#8 0x000000000040ea30 in main ()

----------------------------------------------------------

But this is not helping me, I knew that something is causing that infinite loop in isMoveValid(), I am unable to figure out what it is.
This stacktrace shows that chessboard::playMove() calls the erase() method of std::unordered_set<int, ...>, where the real template arguments instead of my "..." actually look quite awkward. Looking into your code reveals that you use std::unordered_set for your piece list. I have taken a brief look into playMove() and could not find an obvious bug in the erase() calls so maybe the problem is already triggered somewhere else (undoMove?).

Many people who are frequently reading this forum know very well what my next proposal will be ... You should consider adding some "assert(...)" statements to check certain conditions at runtime. For instance in playMove() and undoMove() you do a bunch of "atomic" operations which mainly depend on the move type. Now within each move-type-specific branch you could assert() conditions like these:

- when moving a piece from X to Y without capturing, square X must be occupied and Y empty;
- for an ep capture there must be a friendly pawn and an enemy pawn on well-defined squares, and also certain empty squares are required;
- similar for a castling move, a promotion, a pawn double step, ...

Most important for the problem above would also be to assert() that the piece (more exactly: the square number) that you want to erase from the piece list is actually contained in it, and the same vice versa for inserting.

I could imagine that undoMove() misses some special case and leaves the board in an inconsistent state which leads to a crash in playMove() in some other case. But it could as well be something else.
universecoder
Posts: 53
Joined: Mon Sep 19, 2016 6:51 am

Re: Help with Debugging My Chess Engine

Post by universecoder »

Using gdb, I got to the source of the error:
It's the line 316 in chessboard.cpp.
pieceList[move.currPiece].insert(move.to);
Unable to figure why this causes havoc.
universecoder
Posts: 53
Joined: Mon Sep 19, 2016 6:51 am

Re: Help with Debugging My Chess Engine

Post by universecoder »

I found an error on line 316 in chessboard.cpp using gdb
pieceList[move.currPiece].insert(move.to);

Why, I am trying to find out.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Help with Debugging My Chess Engine

Post by matthewlai »

universecoder wrote:Using gdb, I got to the source of the error:
It's the line 316 in chessboard.cpp.
pieceList[move.currPiece].insert(move.to);
Unable to figure why this causes havoc.
Note that it doesn't actually means that's where the error is. That just means the CPU happens to be executing that line when you interrupted the program.

What that means is up to you to figure out.

When you have a function that doesn't return, it's usually either an infinite loop or infinite recursion. It doesn't look like this code is recursive, so it's probably an infinite loop. If you go up on the callstack (using the "frame" command) and examine the lines of code on each frame, that may help you figure out which loop is not ending.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.