Progress on Belofte

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

ydebilloez
Posts: 163
Joined: Tue Jun 27, 2017 11:01 pm
Location: Lubumbashi
Full name: Yves De Billoëz

Progress on Belofte

Post by ydebilloez »

My chess engine has had a bit of ups and downs, and herewith I want to give some explanation and have a topic to announce further evolution on the program.

Version 0.9.12 was a stable and good version but was riddled with memory leaks, partial code in C++ but most in C, and any future version (0.9.14 till 0.9.20) had issues in compiling on Mac and Windows. I decided thus to restart from scratch in plain C++ and hoped that I progressed faster. Unfortunately, my C++ knowledge was rusty and it took up to version 2.0.8.2 to have something more or less stable and with more than a pure random mover (2.0.0).

Since that, 2.1.0 performed very well but it was time to start working on the position evaluator. Doing this, the NPS got down to only 40% of the speed in 2.1.2... All performance loss located in 2 functions: calcHash and calcPieces.... On top of this, Gabor Szots announced that version 2.1.2 was faster in 32 bit mode than in 64 bit mode. Strangely enough, 2.1.1 performed better on my tests while on CCRL lists, it performed much worse than 2.1.0. Time for an evaluation.

Please find below the NPS reading for different versions:

Code: Select all

Performance results

after xboard + new + perft 5 commands
Nodes: 4865609

Linux 64 bit - Mint - LLVM 10.0 compiles

Version - bits - time -  NPS
2.0.8.2 - 32 -  4.581 - 1062128 (c) 
2.0.8.2 - 64 -  3.819 - 1274053 (c)
2.0.9   - 32 -  4.587 - 1060738 (c)
2.0.9   - 64 -  3.778 - 1287879 (c)
2.1.1.2 - 32 -  5.778 -  841999
2.1.1.2 - 64 -  4.704 - 1034237
2.1.2   - 32 - 12.787 -  380502
2.1.2   - 64 - 10.599 -  459053
2.1.2.2 - 32 - 13.231 -  367726
2.1.2.2 - 64 - 11.356 -  428430

Linux 32 bit - Mint - LLVM 10.0 compiles
Linux 64 bit - Mint - GCC 9.3 optimized compiles

2.1.2.2 - 64 -  7.547 -  644623

VM tests (VBOX 2GB 2 cores)

GCC 4.9.2 compiles

Win7-32  - 2.1.2   - 32 - 1:06.859 -  72773
Win7-32  - 2.1.2.2 - 32 - 1:07.125 -  72485

Win7-64  - 2.0.9   - 32 -   27.546 - 176635 (c)
Win7-64  - 2.0.9   - 64 -   32.500 - 149711 (c)
Win7-64  - 2.1.2   - 32 - 1:08.205 -  71337
Win7-64  - 2.1.2   - 64 - 1:08.828 -  70692
Win10-64 - 2.1.2.2 - 64 - 1:18:437 -  62031

Window 32 bit - GCC 4.9.2 compiles
Window 64 bit - GCC 8.1 compiles

Win7-64  - 2.1.2.1 - 64 -   20.562 - 236625
Win7-64  - 2.1.2.2 - 64 -   21.671 - 224512

Win10-64 - 2.1.2.2 - 32 - 1:13.112 -  66550
Win10-64 - 2.1.2.2 - 64 -   18.890 - 257569

VM tests (VBOX 4GB 4 cores)

GCC 4.9.2 compiles

Win7-32  - 2.1.2   - 32 - 1:04.859 -  75362

Wine tests (6.8 staging)

Wine-6.8 - 2.1.2.2 - 32 - 1:09.206 -  70305
Early conclusions: GCC 8.1 is about 400% faster than GCC 4.9.1. Time to move default windows compile to GCC 8.1

I have reposted release 2.1.2 for windows 64 as 2.1.2.2 which is compiled by GCC 8.1.

Upcoming speedups:
I have to upgrade my windows toolchain to GCC 9/10, also for 32 bit. (400% performance gain)
I have to do tests with LLVM compiles as well, and play with compile options beyond -O3.
I have to change my board representation from uchar[8][8] into int8_t[64] (11% performance gain)
I have to change my piece representation from 'K', 'Q' into int8_t values (15% performance gain)
I have to change my hash calculation and piece calculation into incremental (>25% performance gain)
I have to change my move generation (? gain)
I have to remove move history and other burden from board copy and only copy it when needed - lazy copy (20% performance gain)
Maybe it is time to do makeMove/unMakeMove instead of board copy and move apply in the search.

Other changes:
Even with C++, I still have plenty of memory leaks. ... to be continued

In short, belofte is not finished yet...
Yves De Billoëz @ macchess belofte chess
Once owner of a Mephisto I, II, challenger, ... chess computer.
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Progress on Belofte

Post by mvanthoor »

If your C++ knowledge is rusty, maybe you should move to Rust, and you'll be right at home 8-)

Sorry, couldn't resist. But without kidding: you seem to write memory leaks as a part-time hobby, because you're always point those out as the main problem in your code. Rust hates memory leaks, and doesn't even allow you to write one. If you do, the compiler kicks you in the head and refuses to compile the code. If you'd move to Rust, you'd have a hell of a time getting started (because you will run into the compiler A LOT in the beginning, if you don't have a good handle on memory management), but in the end, you _will_ get rid of the memory leaks.

(And problems with compiling on Windows or Mac; or Linux for that matter, will also disappear.)
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
ydebilloez
Posts: 163
Joined: Tue Jun 27, 2017 11:01 pm
Location: Lubumbashi
Full name: Yves De Billoëz

Re: Progress on Belofte

Post by ydebilloez »

mvanthoor wrote: Wed May 12, 2021 10:38 am ...you seem to write memory leaks as a part-time hobby...
full time hobby :D

For 0.9.x series, this was normal. Everything was allocated using malloc and threads were aborted from another TimeObserver thread when time elapsed.
For 2.x series in C++, the only special things I do are: raising exceptions to unwind recursive search and pluggable everything: search is run-time pluggable, evaluation is run-time pluggable, ui interactions are run-time pluggable, and even log-writing should become run-time pluggable.

In fact, I was thinking there were no leaks, until I discovered AddressSanitizer... Just launch the program and enter quit gives following log. (truncated because ...)

Code: Select all

...
Direct leak of 80 byte(s) in 1 object(s) allocated from:
    #0 0x5448b7  (/home/yves/chess/belofte/project/build/Debug/Belofte+0x5448b7)
    #1 0x557688  (/home/yves/chess/belofte/project/build/Debug/Belofte+0x557688)
    #2 0x559e00  (/home/yves/chess/belofte/project/build/Debug/Belofte+0x559e00)
    #3 0x55dc67  (/home/yves/chess/belofte/project/build/Debug/Belofte+0x55dc67)
    #4 0x5515ed  (/home/yves/chess/belofte/project/build/Debug/Belofte+0x5515ed)
    #5 0x5501c4  (/home/yves/chess/belofte/project/build/Debug/Belofte+0x5501c4)
    #6 0x7ff0aea59b74  (/lib64/libc.so.6+0x27b74)

Indirect leak of 4224 byte(s) in 142 object(s) allocated from:
    #0 0x5448b7  (/home/yves/chess/belofte/project/build/Debug/Belofte+0x5448b7)
    #1 0x7ff0aeedcd9e  (/lib64/libstdc++.so.6+0x147d9e)

SUMMARY: AddressSanitizer: 18464 byte(s) leaked in 320 allocation(s).
Yves De Billoëz @ macchess belofte chess
Once owner of a Mephisto I, II, challenger, ... chess computer.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Progress on Belofte

Post by Sven »

Which kind of data do you allocate via malloc or the new operator? Is this really necessary? I would suggest to get rid of that as much as possible, at least during search ...
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
ydebilloez
Posts: 163
Joined: Tue Jun 27, 2017 11:01 pm
Location: Lubumbashi
Full name: Yves De Billoëz

Re: Progress on Belofte

Post by ydebilloez »

Hi Sven,

No mallocs in 2.x. When generating moves, in pseudo-code:

// see at end of piece.cpp
piece = new bPiece... class - according to piece on board
movelist += piece->GenerateMoves(board)
delete piece

Will be optimized away in next release and I will only instantiate one piece-class per piece and reuse this. But still will keep one class per piece active to make move generation abstract per piece.
Yves De Billoëz @ macchess belofte chess
Once owner of a Mephisto I, II, challenger, ... chess computer.
ydebilloez
Posts: 163
Joined: Tue Jun 27, 2017 11:01 pm
Location: Lubumbashi
Full name: Yves De Billoëz

Re: Progress on Belofte

Post by ydebilloez »

Playing around with compiler flags gives following results:

Next release testing: NPS reading on fd34

32 bit gcc 11.1.1 -O3: 247816 - (--reference build--)
32 bit gcc 11.1.1 --opt: 319045 - (+28%)
64 bit gcc 11.1.1 --opt: 359559 - (+45%)
64 bit clang 12.0 --opt: 375747 - (+51%)
64 bit gcc 11.1.1 --opt --static: 395967 - (+59%)
64 bit clang 12.0 --opt --static: (to be completed)

(for information: 2.1.2 64 bit on this machine : 248677)
Yves De Billoëz @ macchess belofte chess
Once owner of a Mephisto I, II, challenger, ... chess computer.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Progress on Belofte

Post by Sven »

ydebilloez wrote: Wed May 12, 2021 5:21 pm Hi Sven,

No mallocs in 2.x. When generating moves, in pseudo-code:

// see at end of piece.cpp
piece = new bPiece... class - according to piece on board
movelist += piece->GenerateMoves(board)
delete piece

Will be optimized away in next release and I will only instantiate one piece-class per piece and reuse this. But still will keep one class per piece active to make move generation abstract per piece.
I took a quick look into your source code. It seems there are only very few places where dynamic allocation occurs ... I don't see much potential for any memory leaks there. So maybe you are using some C/C++ library functions that do allocation?

Some more thoughts:

1) Why do you copy the piece instances at all for move generation, why don't you just use the piece instances that are already there on your board?

2) I'd strongly suggest to decouple board and move list. This can greatly simplify parts of your code. Currently you maintain a move list as a data member of the board. You could maintain a move list locally wherever it is needed (that would mainly be one per node during search, kept on the stack of the current search method), and pass it by reference to the move generator and other functions dealing with it.

3) Also you seem to do a lot of board copying here and there, seems a bit to much for my taste ... For instance, to find out whether you are at a terminal node, it seems to be a bad idea to copy the board and do a full move generation for the copy. Static conditions like draw by insufficient material, fifty moves rule, or repetition can definitely be checked without that, and mate/stalemate detection is done by the search once you know that there are no legal moves, which is usually part of the search algorithm (which already provides the move generation), so there is no need for any additional move generation step.
Sven Schüle (engine author: Jumbo, KnockOut, Surprise)
ydebilloez
Posts: 163
Joined: Tue Jun 27, 2017 11:01 pm
Location: Lubumbashi
Full name: Yves De Billoëz

Re: Progress on Belofte

Post by ydebilloez »

Sven wrote: Thu May 13, 2021 12:15 pm Some more thoughts:
Thanks, included them in my todo list.
Yves De Billoëz @ macchess belofte chess
Once owner of a Mephisto I, II, challenger, ... chess computer.
ydebilloez
Posts: 163
Joined: Tue Jun 27, 2017 11:01 pm
Location: Lubumbashi
Full name: Yves De Billoëz

Re: Progress on Belofte

Post by ydebilloez »

I finally made some progress on Belofte. Small but paving the way for big :D

I released 2.1.0.1, 2.1.1.3 and 2.1.2.3 which are the same as the predecessors but add the --bench command. It is now possible to launch benchmarks against those versions. There is no need to grab them though as they are for the rest exactly the same as 2.1.0, 2.1.1.2 and 2.1.2.2. Hence I did not publish the binaries yet.

NPS table

Code: Select all

Version	Build	32	64	Rel 32	Rel 64	32 vs 64
2.1.0.1	normal	529515	-	109 %	-	-
2.1.0.1	opt	-	831044	-	135 %	171 %
2.1.1.3	normal	485507	615345	100 %	100 %	127 %
2.1.2.3	opt	251991	277948	52 %	45 %	57 %
2.1.3 a	normal	242770	278978	50 %	45 %	57 %
2.1.3 a	opt	312733	369375	64 %	60 %	76 %
2.1.3 a	static	-	401586	-	65 %	83 %
I made an error by posting the 32 bit version of 2.1.2 for CCRL compiled with GCC 4.9 with no optimization. Still it performs above 2.1.1 while running at less than half the speed.

As can be noticed from the above table comparing this with CCRL results:
2.1.0 64 (see 2.1.0.1) has 1050 elo and runs at 135% relative NPS.
2.1.1 64 (see 2.1.1.3) has 1012 elo and runs at 100% relative NPS.
2.1.2 32 (see 2.1.2.3) has 1031 elo and runs at 40% relative NPS maximum*.

* Linux reading, I did not test the actual Windows 32 bit build.


For those cloning the source code: I also had to rewrite the work in progress belofte-next branch into optMoveGen and restarted the belofte-next branch. Sorry for that. Version 2.1.3 (alpha) is contained in the next branch but still work in progress.

So expect the following in the next release:
- specific bmi2 and popcnt versions
- builds optimized for windows and linux in order to have maximum performance on all platforms
- fix the slowdowns introduced after 2.1.0 so that the NPS gets close at original speed while adding functionality
Yves De Billoëz @ macchess belofte chess
Once owner of a Mephisto I, II, challenger, ... chess computer.
ydebilloez
Posts: 163
Joined: Tue Jun 27, 2017 11:01 pm
Location: Lubumbashi
Full name: Yves De Billoëz

Re: Progress on Belofte

Post by ydebilloez »

While working on the next 2.1.3 release, I found some showstoppers in the existing release and decided that a new build of the existing 2.1.2.x branch would be nice. Thus I released 2.1.2.4 with following major fixes, for full details, see README.md:

- 1 or 2 moves not evaluated when castle is possible (2.1.2.4);
- Possible crash when looking for castle possibility when no piece on f1, g1 or h1 and initialising game with fen (2.1.2.4);
- Extra margin added when at last move of time control to avoid flag (2.1.2.4);
- Fix on building windows x64 version, use GCC 8.1.0 instead of 4.9, yielding in much better performance (2.1.2.2);

There are some other changes in to allow me to test progress against the next release but those will not affect default installation of 2.1.2 release compared to the previous version. Anyone can use 2.1.2.4 to replace 2.1.2.0 till 2.1.2.3.
Yves De Billoëz @ macchess belofte chess
Once owner of a Mephisto I, II, challenger, ... chess computer.