Progress on Loki

mvanthoor · Post by **mvanthoor** » Thu Apr 22, 2021 12:41 am

Good efforts. Good luck with the implementation. It's hard to test this accurately, and it's hard to know what each function would give you with regard to Elo gain.

Rustic Alpha 1 is in the CCRL Blitz list at 1675 (I tested it at 1695).
I tested Alpha 2 in my own gauntlet, with a result of 1840 (+165)

It ended up in the CCRL-list at 1781. (I did release a buggy version which was online for half an hour, so I inquired if the CCRL testers had used that; they hadn't.) Alpha 2 has had another test run, which put it at 1810, and now after it was included in another engine's gauntlet, it is at 1814. Still 26 points shy of my own test, but that was also the case with Alpha 1. The error bar is +/- 20, so the strength could indeed be 1834. The result is within expected parameters. (I sound like Data now.)

However, one interesting thing that happened are the tests against Bit Genie 1 and 2, which is at a similar development level as Rustic Alpha 2. Bit Genie version 1 is in the CCRL list around 1770. (It's stronger than Alpha 1 because BG 1 already has a TT.) Alpha 2 won its match with 19-13 against BG 1. After that, Bit Genie had a few small improvements, making the hash table more efficient, and optimizing some other stuff, but as far as I could see, no new functionality. It gained about 60 points in rating, and is now around 1860 in the CCRL list. BG 2 has some evaluation terms that Alpha 2 doesn't have, so it expected to be stronger. However, in their head to head match, BG 2 still lost to Alpha 2, again with 19-13.

So if Bit Genie's author would be testing against Alpha 2, it could end up looking as if he's making no progress, while he definitely did get better against other engines.

This situation is called an "Angst Gegner" in Germany: being unable to defeat an opponent that is objectively weaker than you are. In chess engines, it's common; it's because the weaker engine has some feature or trait that the stronger engine can't handle well. Many engines lose to TSCP because of it's pawn tricks, even if the other engines are objectively stronger in the CCRL list. You either have to include a hash table to outsmart TSCP's pawn shenanigans (it worked wonders for Alpha 2), or _specifically_ counteract them by including the evaluation terms yourself and giving even MORE weight to them than what TSCP is doing (which is something I don't like: I'm not going to program to defeat a sepecific engine).

So yeah... testing is hard. Your outcome can be completely different if you use different engines, different time controls, or even different opening books. If a tester would use "weird" opening books with things like b2-b3 on the first move, or g2-g3 / e2-e3 / Ng1-e2, that wouldn't jive with Rustic's current PST's, and it would thus move pieces around to get them where they are "supposed" to go. That will cause the engine to perform weaker, because it's losing tempi (time) by doing that.

niel5946 · Post by **niel5946** » Thu Apr 22, 2021 10:19 am

mvanthoor wrote: ↑Thu Apr 22, 2021 12:41 am Good efforts. Good luck with the implementation. It's hard to test this accurately, and it's hard to know what each function would give you with regard to Elo gain.

Thank you

. The reason, I am testing each feature/function this way is not to get an accurate number (since it's effect may vary greatly depending on the opponent engine), but rather to confirm whether it gains or loses elo, and what range this gain/loss is approximately in.

mvanthoor wrote: ↑Thu Apr 22, 2021 12:41 am However, one interesting thing that happened are the tests against Bit Genie 1 and 2, which is at a similar development level as Rustic Alpha 2. Bit Genie version 1 is in the CCRL list around 1770. (It's stronger than Alpha 1 because BG 1 already has a TT.) Alpha 2 won its match with 19-13 against BG 1. After that, Bit Genie had a few small improvements, making the hash table more efficient, and optimizing some other stuff, but as far as I could see, no new functionality. It gained about 60 points in rating, and is now around 1860 in the CCRL list. BG 2 has some evaluation terms that Alpha 2 doesn't have, so it expected to be stronger. However, in their head to head match, BG 2 still lost to Alpha 2, again with 19-13.

So if Bit Genie's author would be testing against Alpha 2, it could end up looking as if he's making no progress, while he definitely did get better against other engines.

I actually encountered something similar to this during my search and evaluation function rework. I tested a _very_ basic alphabeta function (with move ordering and NMP) against an engine rated ~1100-1200; I don't remember exactly. I was pretty sure that Loki would beat this opponent easily since it had won against a 1300 rated one (by won, I mean performed at a higher elo). But to my surprise, Loki was defeated, quite badly actually... This resulted in me struggling with the TT, but once I changed opponent engines, it showed really significant improvements as compared to the no-TT version.

mvanthoor wrote: ↑Thu Apr 22, 2021 12:41 am So yeah... testing is hard. Your outcome can be completely different if you use different engines, different time controls, or even different opening books. If a tester would use "weird" opening books with things like b2-b3 on the first move, or g2-g3 / e2-e3 / Ng1-e2, that wouldn't jive with Rustic's current PST's, and it would thus move pieces around to get them where they are "supposed" to go. That will cause the engine to perform weaker, because it's losing tempi (time) by doing that.

Testing is indeed hard, and very time consuming too. That is why my tests can't be as precise as I would like them to be: I only own one computer with the speed necessary to get meaningful results from a tournament (additionally, the other one isn't nearly as good at running concurrent games in a tournament). But since I am still in high-school, I can't have my computer at home running long tournaments, so I need to compromise with the accuracy of my results (because they need to be from very fast time controls)

Regarding the different engines and opening books, I usually pick new engines for my tests before releasing a new version. The reasons for this are that 1) The engines are closer to the expected elo obtained in self-play against the previous version of Loki, and 2) It is a good way to assure proper opponent diversity. Regarding opening suites, I always use the gaviota starters, that I found somewhere on this forum.

niel5946 · Post by **niel5946** » Thu Apr 22, 2021 12:01 pm

New, big to-do point: As I have stated before, I would like to see all heavy inspiration and copying of Vice being gone, and to make sure I get it done, I am planning on not releasing Loki 4.0 before.

The main point of inspiration from Vice is the UCI-implementation which will have to be completely re-written. This will both 1) Remove the largest amount of external code in Loki, and 2) Make the UCI-implementation more C++ style, like the rest of the code. My current idea is to:
1. Find a way to parse options, position and go-parameters.
2. While searching, have a thread responsible for listening for GUI input. This will maybe be done by the "main-thread" in search itself, which will then be responsible for stopping all other threads. I am not quite sure... Don't want to waste an entire thread just for listening when it could be searching instead...

Seems like a lot of work.

niel5946 · Post by **niel5946** » Sun Apr 25, 2021 12:01 am

After some consideration, I have just finished my list of changes/improvements that should be done before the release of Loki 4.0:

As stated in the latest post above, the UCI protocol implementation should be re-written to C++. Firstly because it is bad practice to have C in C++ and secondly to remove code copied from Vice.
After the 85th Amateur Series Division 8 tournament is done, some of Loki's games should be thoroughly analyzed in hopes of finding bugs in the search or evaluation function.
Late move reductions. I am currently working on this in a branch called lmr-lmp-branch, but I would like to see an improvement of > 120 elo before moving on to LMP.
Late move pruning. This is also a work in progress in the lmr-lmp-branch, but ATM it loses elo. It should ideally give Loki an improvement of > 30 elo before merging the branch into master.
Staged move generation. I have been working on this, but I am thinking of deleting the branch and starting over. I haven't worked with staged move generation before, so it is hard for me to give an expected elo increase.

On top of these points, I am thinking of porting my tuning framework to python while also implementing tuning of the search function. Depending on the amount of time the points above take, and the complexity of a search tuner, this may be postponed to version 4.5 or 5.0...

UCI rework update: I have finished re-writing the UCI implementation to C++, but the CheckUp function, called in search, is still from Vice. I don't really know how to implement that optimally. I will try using a thread to listen for input from the GUI, but I fear it will be slow..

All in all, I would say the development of Loki is going great

niel5946 · Post by **niel5946** » Tue Apr 27, 2021 11:12 am

Multithreaded hell

The biggest issue with my UCI-rework at the moment is how to listen for UCI inputs while searching. My initial idea was to have the following structs and objects in search.h:

Code: Select all

// This namespace handles communication with the GUI
namespace COMM {
	struct ListenData {
		ListenData(bool _stop, bool _quit) {
			stop = _stop;
			quit = _quit;
		}

		bool stop;
		bool quit;
	};

	// This object can be global since it'll only be written to by one thread.
	extern ListenData ld;

	// This function is handled by a separate thread which keeps track of input given to stdin
	void listen_for_input(bool stopset, long long stoptime);

	// check_stopped_search will be run by each thread every 2047 nodes. 
	void check_stopped_search(SearchThread_t* ss);
}

With the definitions:

Code: Select all

COMM::ListenData COMM::ld(true, true);

void COMM::listen_for_input(bool stopset, long long stoptime) {
	std::string input = "";

	while (true) {
		input = "";
		std::getline(std::cin, input);


		if (input.find("quit") != std::string::npos) {
			ld.quit = true;
			return;
		}
		
		if (input.find("stop") != std::string::npos) {
			ld.stop = true;
			return;
		}

		if ((stopset && getTimeMs() >= stoptime) || Search::isStop.load(std::memory_order_relaxed)) {
			return;
		}
	}
}


void COMM::check_stopped_search(SearchThread_t* ss) {

	// Step 1. Check for timed out search
	if (ss->info->timeset && getTimeMs() >= ss->info->stoptime) {
		ss->info->stopped = true;
	}

	// Step 2. Check for input from the GUI telling us to quit or stop. Only do this if we're the main thread.
	if (ss->thread_id == 0) {

		// Step 2A. Check for stop command
		if (ld.stop) {
			ss->info->stopped = true;
		}

		// Step 2B. Check for quit command
		if (ld.quit) {
			ss->info->quit = true;
			ss->info->stopped = true;
		}

		// Step 2C. If we've been told to stop or quit, tell the other threads with the std::atomic<bool> isStop flag
		if (ss->info->stopped) {
			Search::isStop = true;
		}
	}

	// Step 3. If we're not the main thread, check to see if we've been told to stop.
	else {
		if (Search::isStop.load(std::memory_order_relaxed)) {
			ss->info->stopped = true;
		}
	}
}

Then in the runSearch function, I start a thread that handles the listen_for_input() function:

Code: Select all

// Set the stop and quit flags in the listendata to false.
COMM::ld.stop = false; COMM::ld.quit = false;

// Run separate thread that listens on stdin
isStop = false;
std::thread listener(COMM::listen_for_input, info->timeset, info->stoptime);

/* Start searcher-threads and wait for them to finish */

listener.join();

/* Return */

Then while searching, I make all the searcher-threads run the check_stopped_search function to see if they should abort:

Code: Select all

// Check to see if we've been told to abort the search.
if ((ss->info->nodes & 2047) == 0) {
	COMM::check_stopped_search(ss);
}

if (ss->info->stopped) {
	return 0;
}

This implementation works very well for receiving and parsing input from the GUI; It stops when "stop" is given, and quits when "quit" is given. But there is one peculiar bug: When no input is given while searching, the listener thread won't join...
This results in one having to explicitly tell it "stop" even though it should've done that by itself after it timed out.

I will have to work on this a lot. I hate working on the UCI protocol. Not even related to the engine's strength...

Sven · Post by **Sven** » Tue Apr 27, 2021 10:23 pm

Another way could be to do the reverse: the main thread processes all (UCI) input and starts one ore more search threads when a search has been requested, but continues to process input while a search is running and signals the search thread(s) to stop if that has been requested via (UCI) protocol. Checking for timeout, though, should be checked by the search thread(s) (for instance every N nodes). So a search thread either terminates on its own due to timeout or finishing its work (in case of fixed-depth search or similar), or on request by the main thread.

Looks more natural to me ... I wrote (UCI) in brackets since it might as well be the WB protocol. And my WB engine Jumbo uses the thread model described above. And I think many other engines do it in a similar way.

niel5946 · Post by **niel5946** » Wed Apr 28, 2021 8:35 am

Sven wrote: ↑Tue Apr 27, 2021 10:23 pm Another way could be to do the reverse: the main thread processes all (UCI) input and starts one ore more search threads when a search has been requested, but continues to process input while a search is running and signals the search thread(s) to stop if that has been requested via (UCI) protocol. Checking for timeout, though, should be checked by the search thread(s) (for instance every N nodes). So a search thread either terminates on its own due to timeout or finishing its work (in case of fixed-depth search or similar), or on request by the main thread.

Looks more natural to me ... I wrote (UCI) in brackets since it might as well be the WB protocol. And my WB engine Jumbo uses the thread model described above. And I think many other engines do it in a similar way.

I hadn't actually given it any thought to do it the other way around... I see your point about it being more intuitive/natural to have the main thread handle all UCI-related issues. I just think it will take a lot of work to implement it this way in Loki. The reason being that I have already used a "program-structure" where the search is responsible for checking for input.
Thankfully, my UCI rework is in another branch from main, so I can experiment with it all I want without affecting the real Loki...

I am thinking of doing it this way:

When getting a "go" command, set the isStop = false, and start up a new thread that runs the search function.
While that thread is searching, look for input. Only handle this input if we recieve "stop" or "quit".
If the search times out by it self, it will set this isStop = true which will then signal to the main thread, that it can stop listening for search-related input.
Otherwise, if the search should stop due to request from the GUI, we'll set the isStop = true, end the listening loop, and wait for the search thread to join.

Am I understanding this correctly, or have I missed something?

Thanks for the help btw

Sven · Post by **Sven** » Wed Apr 28, 2021 8:53 am

I suggest to do it as simple as possible, so I would not care about "broken" cases where a GUI sends another "go" while a search is running.

And regarding branches and the amount of work: why not starting another branch from main, possibly cherry-picking some useful changes from your current "rework" branch?

niel5946 · Post by **niel5946** » Wed Apr 28, 2021 9:21 am

That makes sense. That is why I am thinking that I'll just look for a "stop" or a "quit", and if there are none, just reset the input and wait for a new one.

It is a good idea to cherry-pick the good parts of my current UCI-rework, considering that all UCI stuff except for the search-listening part is done.

niel5946 · Post by **niel5946** » Wed Apr 28, 2021 1:45 pm

UCI rework loses elo???
Yes, you heard it right. Something as simple - or at least i thought - as a communication protocol is making Loki lose elo:

Code: Select all

Score of Loki_dev vs Loki 3.0.0 64-bit: 112 - 161 - 127 [0.439]
...      Loki_dev playing White: 54 - 81 - 65  [0.432] 200
...      Loki_dev playing Black: 58 - 80 - 62  [0.445] 200
...      White vs Black: 134 - 139 - 127  [0.494] 400
Elo difference: -42.8 +/- 28.3, LOS: 0.2 %, DrawRatio: 31.8 %
400 of 400 games finished.

I know the number of games doesn't allow for exact numbers, but it is certainly worse than Loki 3.0.

I made the UCI implementation work, and changed the way Loki listens for input from the GUI. Thanks to Sven Schüle, I managed to get it working by making the main-thread listen for input while a separate thread handles the searching. This is the current implementation which is run after parsing the "go" command:

Code: Select all

	// Step 4. Finally, run the search.
	
	Search::isStop = false;
	std::thread searcher(Search::runSearch, std::ref(pos), std::ref(info), num_threads);

	std::string interrupt = "";

	while (!Search::isStop.load()) {

		// Check if there's input waiting
		if (InputWaiting()) {
			std::getline(std::cin, interrupt);

			if (interrupt.find("stop") != std::string::npos) {
				Search::isStop = true;
				break;
			}

			else if (interrupt.find("quit") != std::string::npos) {
				Search::isStop = true;
				info->quit = true;

				break;
			}
		}
	}

	searcher.join();

This simply runs the search in a new thread, listens to see if there's any GUI input. If there is, it sets a stop flag if the command is either "stop" or "quit", and if there isn't it doesn't do anything until the search times out by itself. Now the only function used by Vice is the InputWaiting, which I am satisfied with. Thus, if I can fix the elo loss, I can get on to working on more fun stuff and be done with my UCI-rework.
In the search function, I call the following method:

Code: Select all

void check_stopped_search(SearchThread_t* ss) {
	// Step 1. Check for timed out search and set isStop = true if we're the main thread
	if (ss->info->timeset && getTimeMs() >= ss->info->stoptime) {
		ss->info->stopped = true;

		if (ss->thread_id == 0) {
			Search::isStop = true;
		}
	}

	// Step 2. Check if we've been told to stop
	if (Search::isStop.load()) {
		ss->info->stopped = true;
	}
}

Which checks for timeout and interruption by the GUI, as set by the std::atomic<bool> isStop flag.

I can only seem to think of three things that might explain this - rather big - elo loss:
1. Separate threads are slower than the main thread and thus, NPS is lowered which in turn causes higher time-to-depth numbers.
2. It takes too long to start up a thread for searching and join it again, which makes this version of Loki lose due to time trouble. By time trouble I do not neccessarily mean lose on time, but that it has less time to search and thus doesn't see as deep.
3. It takes too long to access the std::atomic<bool> variable. Before, the main thread (in search) would not constantly access it.

I'll firstly try to balance out the third issue by only checking every 4096 nodes instead of every 2047.

I/O is probably the worst thing about C++. It is way more cumbersome than in higher-level languages...

Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki

Re: Progress on Loki