Devlog of Leorik

KhepriChess · Post by **KhepriChess** » Sun Sep 24, 2023 3:20 am

Funny timing. Just a few days ago I had been looking at how adding parallel searching to a C# engine would work. For no real reason, I didn't expect the TPL to actually work as well as you have indicated. Sure makes it easier than trying to manage the threads yourself though! 150+ Elo jump is a nice gain.

connor_mcmonigle · Post by **connor_mcmonigle** » Sun Sep 24, 2023 4:03 am

lithander wrote: ↑Sat Sep 23, 2023 1:55 am ...
I don't know what other authors do to prevent the search instances to be too much in synchrony but I just implemented a Shuffle() method that randomizes the order of the root moves completely randomly at the beginning of the search.
...

This strategy will result in a lot of wasted work with threads spending too much time on garbage root moves. I'd recommend testing against a baseline in which all threads search using the normal root move ordering. In practice, different threads diverge pretty quickly and end up searching complementary components of the game tree.

When Seer was in the ~3100 Elo range, roughly 80 Elo per thread doubling could be expected in selfplay (with the 4moves_noob book). This tentatively suggests you're leaving some Elo on the table.

JoAnnP38 · Post by **JoAnnP38** » Sun Sep 24, 2023 2:25 pm

connor_mcmonigle wrote: ↑Sun Sep 24, 2023 4:03 am This strategy will result in a lot of wasted work with threads spending too much time on garbage root moves. I'd recommend testing against a baseline in which all threads search using the normal root move ordering. In practice, different threads diverge pretty quickly and end up searching complementary components of the game tree.

When Seer was in the ~3100 Elo range, roughly 80 Elo per thread doubling could be expected in selfplay (with the 4moves_noob book). This tentatively suggests you're leaving some Elo on the table.

I have seen others try to encourage different paths for each thread by deferring examining root moves already being examined by other threads, varying depth or varying aspiration windows. Do you think this is a waste of effort or do you think there is any benefit at all?

JoAnnP38 · Post by **JoAnnP38** » Sun Sep 24, 2023 2:31 pm

lithander wrote: ↑Sat Sep 23, 2023 1:55 am Luckily there's an ingenious idea described in the chess programming wiki https://www.chessprogramming.org/Shared ... #Lock-less The idea is that before you write the key into the TT you XOR it with the data you're also going to write. When you want to know the key you can just do the same operation again provided that you have still the same data available. But if the table get's corrupted and key and data don't match anymore this won't produce the key that you expect. It's like a checksum but without dedicating any space to it.
Code: Select all
[StructLayout(LayoutKind.Explicit)]
public struct HashEntry
{
	[FieldOffset(0)]
	public ulong Key;        //8 Bytes
	[FieldOffset(8)]
	public short Score;      //2 Bytes
	[FieldOffset(10)]
	public byte Depth;       //1 Byte
	[FieldOffset(11)]
	public byte Age;         //1 Byte
	[FieldOffset(12)]
	public int MoveAndType;  //4 Byte
	//=================================
	//                        16 Bytes
	[FieldOffset(8)]
	public ulong Data;

	public ScoreType GetScoreType() => (ScoreType)(MoveAndType >> 29);
	public Move GetMove() => (Move)(MoveAndType & 0x1FFFFFFF);
	public ulong GetHash() => Key ^ Data;
	public void SetHash(ulong hash) => Key = hash ^ Data;
}
In C++ you can use 'union' to map different members of a struct to the same memory. In C# that's rarely done but possible with using explicit struct layouts as seen above. And I was happy to find that the performance overhead is so negligible that I can just use the new lockless TT in single-threaded search as well!

I implemented the lockless TT from the beginning thinking that lazy SMP was going to be implemented "any day now." Unfortunately, it seems I never get there. However, I had been manually setting bits in Data and retrieving that information using the intrinsic Bmi1.X64.BitFieldExtract (why isn't there a parallel function for BitFieldSet?). However, your implementation of using FieldOffset may be better. I like it! I think maybe I'm packing bits in that don't align on a C# data type boundary:

Code: Select all

    data = Move.ClearScore(bestMove) |
              (((ulong)score & 0x0fffful) << 27) |
              (((ulong)ttFlag & 0x03ul) << 43) |
              (((ulong)depth & 0x0fful) << 45) |
              (((ulong)generation & 0x07fful) << 53); 

    this.hash = hash ^ data;

lithander · Post by **lithander** » Sun Sep 24, 2023 10:21 pm

connor_mcmonigle wrote: ↑Sun Sep 24, 2023 4:03 am
lithander wrote: ↑Sat Sep 23, 2023 1:55 am ...
I don't know what other authors do to prevent the search instances to be too much in synchrony but I just implemented a Shuffle() method that randomizes the order of the root moves completely randomly at the beginning of the search.
...
This strategy will result in a lot of wasted work with threads spending too much time on garbage root moves. I'd recommend testing against a baseline in which all threads search using the normal root move ordering. In practice, different threads diverge pretty quickly and end up searching complementary components of the game tree.

When Seer was in the ~3100 Elo range, roughly 80 Elo per thread doubling could be expected in selfplay (with the 4moves_noob book). This tentatively suggests you're leaving some Elo on the table.

I tested this now at 5s + 200ms increment, each engine using 3 threads. The only difference is that 2.4.8b does not shuffle the root moves randomly for each instance at the beginning of the search, and 2.4.8a does.

Code: Select all

Score of Leorik-2.4.8a vs Leorik-2.4.8b: 643 - 604 - 1448  [0.507] 2695
Elo difference: 5.0 +/- 8.9, LOS: 86.5 %, DrawRatio: 53.7 %

The advantage of shuffling, if any, is negligible but it isn't making it worse.

Maybe shuffling is only competitive because my root move sorting is so poor: In the root I just take the moves as they come from the generator (captures before quiets) and whenever a move raises alpha I put it to the front of the moves. So the pv move is always in the slot 0 followed by a bunch of former pv moves. But the rest remains completely unsorted.

Sounds like an area with room for improvement.

JVMerlino · Post by **JVMerlino** » Mon Sep 25, 2023 4:20 am

lithander wrote: ↑Sun Sep 24, 2023 10:21 pm Maybe shuffling is only competitive because my root move sorting is so poor: In the root I just take the moves as they come from the generator (captures before quiets) and whenever a move raises alpha I put it to the front of the moves. So the pv move is always in the slot 0 followed by a bunch of former pv moves. But the rest remains completely unsorted.

Sounds like an area with room for improvement.

I was about to suggest that implementing MVV/LVA should give you a decent little elo bump, but looking at your source it appears to already be there?
https://github.com/lithander/Leorik/blo ... 57-L147C57

connor_mcmonigle · Post by **connor_mcmonigle** » Mon Sep 25, 2023 5:07 am

JoAnnP38 wrote: ↑Sun Sep 24, 2023 2:25 pm ...
I have seen others try to encourage different paths for each thread by deferring examining root moves already being examined by other threads, varying depth or varying aspiration windows. Do you think this is a waste of effort or do you think there is any benefit at all?

There are some tricks (which I believe Laser pioneered) which involve searching the root at different starting depths and tweaking the move ordering iirc. These tricks pretty consistently gain at STC, but have been largely proven to be neutral at LTC. Given how sensitive the dynamics of a modern chess engine's search are to slight changes in initial conditions, I don't think there's a lot of value in these tricks

lithander · Post by **lithander** » Mon Sep 25, 2023 11:46 am

JVMerlino wrote: ↑Mon Sep 25, 2023 4:20 am I was about to suggest that implementing MVV/LVA should give you a decent little elo bump, but looking at your source it appears to already be there?
https://github.com/lithander/Leorik/blo ... 57-L147C57

I have standard move ordering based on MVVLVA, killers and history everywhere except at the root level. At the root you'll typically not find a capture where the attacker is less valuable than the target because both players play with some sophistication. Starting with the previous PV is clearly your best bet. That move is expected to raise alpha the most and thus reduce the search effort for the following moves so it needs to go first.
But regarding the remaining moves I haven't had much success sorting them so far. For example I tried something based on the idea to measure how hard it is to refute a move (I count the nodes visited in the move's subtree) and sorted all moves based on the subtree size but that didn't gain any Elo.

lithander · Post by **lithander** » Tue Sep 26, 2023 11:54 pm

The root move order is a weird one. I don't know if others do Late-Move-Reductions for root moves but I'm not. So every root move is going to get played in each iteration and at a depth that does not depend on whether it's early or late int he list. The PV move comes first and set's alpha and unless a replacement is found all the remaining moves will have to be tested with the same zero-window.

The only way one ordering could better than the other is when the PV-replacing moves are sorted statistically more at the beginning so you find them quicker and raise alpha quicker and save some nodes on the remaining moves. I didn't expect much but still played around with it: For example I sorted all moves via QSearch from best to worst but the strength was about the same as unsorted. Then, to see how much strength a "bad" order would cost sorted them by their QSearch score again but in reverse: The worst move first, and then gradually better moves.

And to my utter surprise this actually gained a bit of elo in a statistically relevant way against unsorted.

Code: Select all

Score of Leorik-B vs Leorik-A: 2485 - 2287 - 5228  [0.510] 10000
...      Leorik-B playing White: 1490 - 885 - 2625  [0.560] 5000
...      Leorik-B playing Black: 995 - 1402 - 2603  [0.459] 5000
...      White vs Black: 2892 - 1880 - 5228  [0.551] 10000
Elo difference: 6.9 +/- 4.7, LOS: 99.8 %, DrawRatio: 52.3 %

...I refuse to commit something like that because it's just unaesthetic but I'm still somewhat confused because I can't explain why that unintuitive order is actually a good one? The depth-1 search, which is basically dropping right into QSearch now starts with the worst move so pretty much every move is better than the one before, raising alpha and replacing the PV at the front of the list. Searching Depth 1 will effectively reverse the order of moves. Only if two QSearch values are exactly equal the 2nd one won't put to the front but would remained in it's slot. Afterwards it's gradually going towards the end of the list as later moves continue to raise alpha and become PV. Okay... but why is that good?

mathmoi · Post by **mathmoi** » Wed Sep 27, 2023 7:21 pm

lithander wrote: ↑Tue Sep 26, 2023 11:54 pm ...I refuse to commit something like that because it's just unaesthetic but I'm still somewhat confused because I can't explain why that unintuitive order is actually a good one?

Did you by any chance forget to reverse the returned value of your qsearch (assuming negamax) before ordering? That would explain why the reverse order performed better.

Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik

Re: Devlog of Leorik