Efficiently Generated Legal moves question

pedrojdm2021 · Post by **pedrojdm2021** » Thu Jul 08, 2021 6:52 pm

Hello, i have been doing some research into my engine's code, and i have found that one of the main causes of the slowness (that the speed actually is nearly 300.000 nps) caused by the move generation:

In my engine i generate the moves in the following way:
- Generated all the pseudo legal moves in my "Generate moves" function, i don't verify if the move is "legal" at all or not.

In the "Make move" function the legality is checked after almost the function reach to the end:
Something like:

Code: Select all

public bool MakeMove(int move)
{
     .........
     make move code here
     .........
     
     if (in_check(player))
     {
          UnmakeMove(move);
          return false;
     }
     return true;
}

you can see in the code that each non-legal move will require the computation force of:
- make move
- unmake move
- in check verification (based on bitwise operations like: https://www.chessprogramming.org/Square_Attacked_By)

and the number of illegal moves can be really big when playing higher level depth's (depth 9 and up based of pruning techniques)
so that can be very bad when one look for speed...

I found this interesting article on internet:
https://peterellisjones.com/posts/gener ... ficiently/

my question is, any of you have implemented the "pin rays" ? if yes how did you do that? i have some ideas but i don't have a very clear solution in mind, thank you all!

mvanthoor · Post by **mvanthoor** » Thu Jul 08, 2021 8:22 pm

pedrojdm2021 wrote: ↑Thu Jul 08, 2021 6:52 pm Hello, i have been doing some research into my engine's code, and i have found that one of the main causes of the slowness (that the speed actually is nearly 300.000 nps) caused by the move generation...

This should not be the cause of a slow engine. Rustic generates and executes its moves in exactly the same way, and on my 5 year old i7-6700K, it manages 5.000.000 nodes per second. (With just a tapered evaluation). Perft speed is 32 million leaves/second (without hash table and bulk counting). Rustic keeps both sets of PST's and the Zobrist key incrementally. This slows down playing the moves because the PST values are changed at that point. This negatively impacts perft. However, because only two values are changed when moving a piece (subtract when picking it up, add when putting it down, and maybe a third when capturing a piece), this speeds up the evaluation by a lot, because you don't have to re-add up to 32 values there.

my question is, any of you have implemented the "pin rays" ? if yes how did you do that? i have some ideas but i don't have a very clear solution in mind, thank you all!

It's on my todo list, but because Rustic already runs fast enough for now, it's way down the list. The move generator is only a small part of the engine's speed. The evaluation function has much more impact. (If you generate 35 moves, you'll only run the move generator once, but when evaluating, you'll run the evaluation up to 35 times.)

algerbrex · Post by **algerbrex** » Thu Jul 08, 2021 10:43 pm

pedrojdm2021 wrote: ↑Thu Jul 08, 2021 6:52 pm Hello, i have been doing some research into my engine's code, and i have found that one of the main causes of the slowness (that the speed actually is nearly 300.000 nps) caused by the move generation:

I would agree with what Marcel said. I'm also in the same position as you and very much still a novice when it comes to chess programming. But one important lesson I've learned from talking with more experienced chess programmers on here and from my own research is that move generation speed isn't nearly as important as you might think it is when trying to create a decently strong engine.

Instead, focus on getting a working move generator, and then focus your efforts on creating a solid search and evaluation phase using alpha-beta pruning with good move ordering (e.g. Most-valuable-victim, Least-valuable-attacker), material count, and piece-square tables. You might be surprised at how much quicker your engine seems during the search phase, since, if you have good pruning, your engine has to search only a fraction of the full move tree to get the same quality results as searching the whole tree.

For example, in one of my previous iterations of Blunder (my engine), when I went from pure negamax, to adding alpha-beta pruning and MvvLva move ordering, my engine went from searching ~4M nodes in the Kikipete position to only searching ~4k, a fraction of the move tree. And it still found the best move.

So my advice to you as someone who's in a similar position is to focus right now on getting a working move generator. And really make sure it's working. If you google "perft test positions" or "perft test suite", you'll find different collections of positions to test your engine with. Once you finish your move generator, write a little testing script to go through each position and make sure you get the correct numbers at each depth.

Or if you'd like, I'd also be happy to post the testing suite I've been using. It's an amalgamation of tests I've found myself online and it seemed to work well and rooting out any nasty bugs I had left in the move generator.

pedrojdm2021 wrote: ↑Thu Jul 08, 2021 6:52 pm my question is, any of you have implemented the "pin rays" ? if yes how did you do that? i have some ideas but i don't have a very clear solution in mind, thank you all!

I'm currently in the process of re-writing Blunder since I wasn't happy with the code base, and because I realized I needed a better plan for developing Blunder. But anyway, the way I'm detecting and finding pin rays in this version is by having a 64x64 array. The array has a pre-calculated bitboard containing the line between any two squares that are aligned. Here's the relevant code:

Code: Select all

func sqOnBoard(sq int) bool {
	return sq >= 0 && sq <= 63
}

func distance(sq1, sq2 int) int {
	return max(abs(rankOf(sq2)-rankOf(sq1)), abs(fileOf(sq2)-fileOf(sq1)))
}

func pseduoSlidingAttacks(deltas []int, square int) (attacks Bitboard) {
	for _, delta := range deltas {
		for to := square + delta; sqOnBoard(to) &&
			distance(to, to-delta) == 1; to += delta {
			setBit(&attacks, to)
		}
	}
	return attacks
}

func lineBetween(deltas []int, sq1, sq2 int) (line Bitboard) {
	for _, delta := range deltas {
		for to := sq1 + delta; sqOnBoard(to) &&
			distance(to, to-delta) == 1; to += delta {
			setBit(&line, to)
			if to == sq2 {
				break
			}
		}
		if hasBitSet(line, sq2) {
			break
		} else {
			line = 0
		}
	}
	return line
}

var LinesBewteen [64][64]Bitboard
var Directions []int = []int{North, South, East, West, NorthEast, SouthEast, NorthWest, SouthWest}

func init() {
	for from := 0; from < 64; from++ {
		attacks := pseduoSlidingAttacks(Directions, from)
		for attacks != 0 {
			to, _ := popLSB(&attacks)
			LinesBewteen[from][to] = lineBetween(Directions, from, to)
		}
	}
}

And here's how I use it to detect pinned pieces:

Code: Select all

func genPinnedPieces(board *Board) (pinnedBB uint64) {
	enemyBB := board.Sides[board.ColorToMove^1]
	enemyBishops := board.Pieces[board.ColorToMove^1][Bishop]
	enemyRooks := board.Pieces[board.ColorToMove^1][Rook]
	enemyQueens := board.Pieces[board.ColorToMove^1][Queen]

	usBB := board.Sides[board.ColorToMove]
	kingBB := board.Pieces[board.ColorToMove][King]

	pinnersBB := (genIntercardianlMovesBB(kingBB, enemyBB)&(enemyBishops|enemyQueens) |
		genCardianlMovesBB(kingBB, enemyBB)&(enemyRooks|enemyQueens))
	kingPos := getLSBPos(kingBB)
	for pinnersBB != 0 {
		pinnerPos, _ := popLSB(&pinnersBB)
		pinnedBB |= LinesBewteen[kingPos][pinnerPos] & usBB
	}
	return pinnedBB
}

The nice thing about this approach is that both of the arrays used can be pre-computed before the engine starts, so the pin ray doesn't need to be generated on the fly.

I can't say this approach is original to me though. I got the idea from reading through the source code of Stockfish and hearing some members on here talk about similar ideas.

Jakob Progsch · Post by **Jakob Progsch** » Thu Jul 08, 2021 10:46 pm

pedrojdm2021 wrote: ↑Thu Jul 08, 2021 6:52 pm my question is, any of you have implemented the "pin rays" ? if yes how did you do that? i have some ideas but i don't have a very clear solution in mind, thank you all!

Perform ray attacks from your kings position. If the ray hits one of your own pieces remove it from the blocker set and repeat the ray attack. If you then hit an opposing piece that can move along that ray the first piece is pinned. Conveniently the ray you just calculated happens to be the squares the pinned piece can move to.

hgm · Post by **hgm** » Thu Jul 08, 2021 11:20 pm

pedrojdm2021 wrote: ↑Thu Jul 08, 2021 6:52 pmmy question is, any of you have implemented the "pin rays" ? if yes how did you do that? i have some ideas but i don't have a very clear solution in mind, thank you all!

In Joker / Qperft Ithe move generator is aware of pinned pieces, so that it would never generate a move with a non-royal piece that exposes its own King. The only illegal moves it can generate is when a King steps into check.

This is implemented by running through the slider section of the opponent's piece list (typically 5 pieces, QRRBB), to test their alignment with the stm King. If there is alignment (which happens only rarely), it test whether there only is a single piece on the connecting ray. If there is, and it belongs to the stm, it uses a special move generator for that piece, which only generates moves along the pin ray. And then it temporarily marks the piece as captured in the piece list during the normal move generation (which is also done by running through the piece list), so that no other moves are generated for that piece. After which it restores the true location of the piece.

The only moves that have to be tested for legality this way are King moves and e.p. captures. (The latter because they can have the capturing and captured Pawn both on the pin ray, a constellation overlooked by the pin detection.)

Note that a list of sliders aligned with the King could be maintained incrementally; you then only have to do the alignment test for a slider that moves, or retest all when the King moves.

mvanthoor · Post by **mvanthoor** » Fri Jul 09, 2021 3:45 am

algerbrex wrote: ↑Thu Jul 08, 2021 10:43 pm I would agree with what Marcel said. I'm also in the same position as you and very much still a novice when it comes to chess programming. But one important lesson I've learned from talking with more experienced chess programmers on here and from my own research is that move generation speed isn't nearly as important as you might think it is when trying to create a decently strong engine.

While this is true, 300.000 nodes/s is really slow.

I've been using Lithander's MinimalChess as a sparring partner, because it's at around the same development stage as Rustic. It's a much slower engine than Rustic (5 M nodes/second), but it still reaches 1.5 M nodes/s. With MinimalChess however, that's by design. It isn't designed to be fast, but simple to understand and maintain; and it's written in C#, so it loses some speed there (compared to Rust, C, and C++).

I would run a profiler on the engine to see where the slowest code is and improve that. It is necessary to do so, because Rustic is 16 times faster. Every speed doubling gains 50-70 Elo, so this engine would be at a 200-250 Elo disadvantage when all other things are equal.

That's 200-250 Elo that's lost forever. (Until the code is fixed.)

So my advice to you as someone who's in a similar position is to focus right now on getting a working move generator. And really make sure it's working. If you google "perft test positions" or "perft test suite", you'll find different collections of positions to test your engine with. Once you finish your move generator, write a little testing script to go through each position and make sure you get the correct numbers at each depth.

That's the first thing to do, and after the move generator works, spend some time redoing the slow code by using a profiler to spot it.

The nice thing about this approach is that both of the arrays used can be pre-computed before the engine starts, so the pin ray doesn't need to be generated on the fly.

That is also important. If there are things you can pre-compute, or keep incrementally when the pieces move so you don't have to re-compute during move generation or evaluation, that will gain you a lot of speed.

I can't say this approach is original to me though. I got the idea from reading through the source code of Stockfish and hearing some members on here talk about similar ideas.

Many of the ideas have been around so long that they were already invented before they could actually be programmed properly... the important thing is that you understand the idea, then implement it yourself, fully understanding what each function and line of code does.

algerbrex · Post by **algerbrex** » Fri Jul 09, 2021 6:03 am

mvanthoor wrote: ↑Fri Jul 09, 2021 3:45 am While this is true, 300.000 nodes/s is really slow.

I've been using Lithander's MinimalChess as a sparring partner, because it's at around the same development stage as Rustic. It's a much slower engine than Rustic (5 M nodes/second), but it still reaches 1.5 M nodes/s. With MinimalChess however, that's by design. It isn't designed to be fast, but simple to understand and maintain; and it's written in C#, so it loses some speed there (compared to Rust, C, and C++).

I would run a profiler on the engine to see where the slowest code is and improve that. It is necessary to do so, because Rustic is 16 times faster. Every speed doubling gains 50-70 Elo, so this engine would be at a 200-250 Elo disadvantage when all other things are equal.

That's 200-250 Elo that's lost forever. (Until the code is fixed.)

That's fair. In retrospect, I do think I overstated things in my reply. I think, as someone who was in a position similar to his, I wanted to very much emphasize the importance of not getting to caught up in move generation speed since I think that's definitely a rabbit hole that you can fall down.

And I agree, a good profiler should be used to identify bottlenecks in the program.

This is unrelated, but I am a little curious though to see you mention Rust in the same vein as C and C++. Are their speeds really comparable? I don't know much at all about Rust, but it was one of the languages I considered using when I first started planning my engine. I ultimately ended up choosing Go since it seemed like a very clean language, easy to learn language.

But it's been a bit of a pain in the butt trying to work with Go's garbage collector to optimize my code, and I do wonder what kind of speed up I'd see using a language without automatic memory management.

amanjpro · Post by **amanjpro** » Fri Jul 09, 2021 7:10 am

algerbrex wrote: ↑Fri Jul 09, 2021 6:03 am
mvanthoor wrote: ↑Fri Jul 09, 2021 3:45 am While this is true, 300.000 nodes/s is really slow.

I've been using Lithander's MinimalChess as a sparring partner, because it's at around the same development stage as Rustic. It's a much slower engine than Rustic (5 M nodes/second), but it still reaches 1.5 M nodes/s. With MinimalChess however, that's by design. It isn't designed to be fast, but simple to understand and maintain; and it's written in C#, so it loses some speed there (compared to Rust, C, and C++).

I would run a profiler on the engine to see where the slowest code is and improve that. It is necessary to do so, because Rustic is 16 times faster. Every speed doubling gains 50-70 Elo, so this engine would be at a 200-250 Elo disadvantage when all other things are equal.

That's 200-250 Elo that's lost forever. (Until the code is fixed.)
That's fair. In retrospect, I do think I overstated things in my reply. I think, as someone who was in a position similar to his, I wanted to very much emphasize the importance of not getting to caught up in move generation speed since I think that's definitely a rabbit hole that you can fall down.

And I agree, a good profiler should be used to identify bottlenecks in the program.

This is unrelated, but I am a little curious though to see you mention Rust in the same vein as C and C++. Are their speeds really comparable? I don't know much at all about Rust, but it was one of the languages I considered using when I first started planning my engine. I ultimately ended up choosing Go since it seemed like a very clean language, easy to learn language.

But it's been a bit of a pain in the butt trying to work with Go's garbage collector to optimize my code, and I do wonder what kind of speed up I'd see using a language without automatic memory management.

Counter Go is written in Go, it has no NNUE, not even Syzygy support and it is rated around 3000 or a bit more. The language was clearly not an issue for the author. RubiChess written in Rubi and rated 3200 or more, I've even seen JavaScript engines up in 3000s

algerbrex · Post by **algerbrex** » Fri Jul 09, 2021 7:15 am

amanjpro wrote: ↑Fri Jul 09, 2021 7:10 am
algerbrex wrote: ↑Fri Jul 09, 2021 6:03 am
mvanthoor wrote: ↑Fri Jul 09, 2021 3:45 am While this is true, 300.000 nodes/s is really slow.

I've been using Lithander's MinimalChess as a sparring partner, because it's at around the same development stage as Rustic. It's a much slower engine than Rustic (5 M nodes/second), but it still reaches 1.5 M nodes/s. With MinimalChess however, that's by design. It isn't designed to be fast, but simple to understand and maintain; and it's written in C#, so it loses some speed there (compared to Rust, C, and C++).

I would run a profiler on the engine to see where the slowest code is and improve that. It is necessary to do so, because Rustic is 16 times faster. Every speed doubling gains 50-70 Elo, so this engine would be at a 200-250 Elo disadvantage when all other things are equal.

That's 200-250 Elo that's lost forever. (Until the code is fixed.)
That's fair. In retrospect, I do think I overstated things in my reply. I think, as someone who was in a position similar to his, I wanted to very much emphasize the importance of not getting to caught up in move generation speed since I think that's definitely a rabbit hole that you can fall down.

And I agree, a good profiler should be used to identify bottlenecks in the program.

This is unrelated, but I am a little curious though to see you mention Rust in the same vein as C and C++. Are their speeds really comparable? I don't know much at all about Rust, but it was one of the languages I considered using when I first started planning my engine. I ultimately ended up choosing Go since it seemed like a very clean language, easy to learn language.

But it's been a bit of a pain in the butt trying to work with Go's garbage collector to optimize my code, and I do wonder what kind of speed up I'd see using a language without automatic memory management.
Counter Go is written in Go, it has no NNUE, not even Syzygy support and it is rated around 3000 or a bit more. The language was clearly not an issue for the author. RubiChess written in Rubi and rated 3200 or more, I've even seen JavaScript engines up in 3000s

Thank you, both those programs sound interesting. But my question wasn't so much whether or not a strong engine could be written in Go, I assumed it could, which is why I've stuck with it. I was simply wondering how automatic memory management impacted the raw speed of the engine.

mvanthoor · Post by **mvanthoor** » Fri Jul 09, 2021 9:56 am

algerbrex wrote: ↑Fri Jul 09, 2021 6:03 am This is unrelated, but I am a little curious though to see you mention Rust in the same vein as C and C++. Are their speeds really comparable? I don't know much at all about Rust, but it was one of the languages I considered using when I first started planning my engine. I ultimately ended up choosing Go since it seemed like a very clean language, easy to learn language.

Yes, they are of comparable speed. Clang and RustC as both front-end compilers for LLVM. The only thing in Rust is that it's harder to create large variables such as array's when in a loop. For example, for your move list. In C, you can just create a variable (pointer) without initializing anything, and then start filling up your move list. In Rust, the language doesn't accept that and initializes everything with 0. That costs too much time, so you'll have to write some unsafe code to get it to just make the pointer.

But it's been a bit of a pain in the butt trying to work with Go's garbage collector to optimize my code, and I do wonder what kind of speed up I'd see using a language without automatic memory management.

Rust isn't an easy language to learn if you're coming from garbage collected languages. C seems easier to start with, but it's much harder to make memory-related mistakes. C is easy to learn, but hard to write correctly. Rust is hard to learn, but easy to write correctly. As soon as you do something that's not memory-safe or not thread-safe, the compiler will slap you over the head with it.

Efficiently Generated Legal moves question

Efficiently Generated Legal moves question

Re: Efficiently Generated Legal moves question

Re: Efficiently Generated Legal moves question

Re: Efficiently Generated Legal moves question

Re: Efficiently Generated Legal moves question

Re: Efficiently Generated Legal moves question

Re: Efficiently Generated Legal moves question

Re: Efficiently Generated Legal moves question

Re: Efficiently Generated Legal moves question

Re: Efficiently Generated Legal moves question