Further progress on new move generator

Michael Sherwin · Post by **Michael Sherwin** » Thu Mar 14, 2019 11:10 pm

Joost Buijs wrote: ↑Thu Mar 14, 2019 7:20 pm Well, I don't know how fast LC0 runs perft(kiwipete, 5), my engine which is also bitboard based does the 193690690 moves in 1.039 second on a 3800 MHz. Intel CPU. This is ~186 million moves/sec. As said, move generation speed is not very important. Positional evaluation, static exchange analysis and probing the transposition table take the largest part of the consumed time.

For LC0 it is even less important because a NN based evaluation function runs roughly 1000 times slower than a traditional hand crafted one.

How many times do I have to read that node rate does not matter? Yes, I agree it is less important than search and evaluation. I think that we all know that! But, we all know, or should know, that it does not hurt to have a faster move generator. However, on top of that there is no information on how your perft works. For example, do you make and unmake all moves generated? Do you do any kind of bulk counting? Do you traverse all possible moves to a certain depth? Or do you just generate moves for a list of positions? Without this information 193690690 moves in 1.039 seconds does not mean anything.

I give all this information for my perft. I make and unmake all generated moves. I do no bulk counting of any kind. I traverse all possible moves to a fixed depth. And in the original position (something else you failed to specify) I search 75 million nodes per second using only one thread of execution.

It matters not if you reply. I'm only pointing out the value of your post. So yes, ALL YOU BEGINNERS OUT THERE, Joost Buijs advises you not to worry about node rate. And he is absolutely correct!

Joost Buijs · Post by **Joost Buijs** » Fri Mar 15, 2019 7:59 am

Michael Sherwin wrote: ↑Thu Mar 14, 2019 11:10 pm
Joost Buijs wrote: ↑Thu Mar 14, 2019 7:20 pm Well, I don't know how fast LC0 runs perft(kiwipete, 5), my engine which is also bitboard based does the 193690690 moves in 1.039 second on a 3800 MHz. Intel CPU. This is ~186 million moves/sec. As said, move generation speed is not very important. Positional evaluation, static exchange analysis and probing the transposition table take the largest part of the consumed time.

For LC0 it is even less important because a NN based evaluation function runs roughly 1000 times slower than a traditional hand crafted one.
How many times do I have to read that node rate does not matter? Yes, I agree it is less important than search and evaluation. I think that we all know that! But, we all know, or should know, that it does not hurt to have a faster move generator. However, on top of that there is no information on how your perft works. For example, do you make and unmake all moves generated? Do you do any kind of bulk counting? Do you traverse all possible moves to a certain depth? Or do you just generate moves for a list of positions? Without this information 193690690 moves in 1.039 seconds does not mean anything.

I give all this information for my perft. I make and unmake all generated moves. I do no bulk counting of any kind. I traverse all possible moves to a fixed depth. And in the original position (something else you failed to specify) I search 75 million nodes per second using only one thread of execution.

It matters not if you reply. I'm only pointing out the value of your post. So yes, ALL YOU BEGINNERS OUT THERE, Joost Buijs advises you not to worry about node rate. And he is absolutely correct!

I do exactly the same as you, single core, traverse all moves to a fixed depth, no hashing and no bulk counting. When I use bulk counting on the last ply it is hardly any faster because the function I use to determine move legality runs slow compared to move do/undo. Maybe it wasn't clear but the position I used was the 'Kiwipete' to depth 5 as Fulvio mentioned for LC0 somewhere earlier in this thread.

My guess is that Perft runs almost entirely from the cache because it has a low memory footprint, in a full fledged search things will be different though.

Joost Buijs · Post by **Joost Buijs** » Fri Mar 15, 2019 9:31 am

Michael Sherwin wrote: ↑Thu Mar 14, 2019 11:10 pm It matters not if you reply. I'm only pointing out the value of your post. So yes, ALL YOU BEGINNERS OUT THERE, Joost Buijs advises you not to worry about node rate. And he is absolutely correct!

And I have to add that I never said that node rate is not important. I was talking about the speed of move generation, and that is something completely different.

In the example of 75 mln. nodes/sec you gave above you forgot to mention what kind of CPU you tested this on, this is not very useful either.

hgm · Post by **hgm** » Fri Mar 15, 2019 9:55 am

Perft is a poor measure for engine move-generation speed anyway, for several reasons:
* It emulates full-width search, while an engine is spending most of its time in QS.
* It requires explicit checking of the moves for legality, while most engines just test for King capture in the daughter node
* It is measured under conditions where there is much less cache pressure

Joost Buijs · Post by **Joost Buijs** » Fri Mar 15, 2019 10:46 am

hgm wrote: ↑Fri Mar 15, 2019 9:55 am Perft is a poor measure for engine move-generation speed anyway, for several reasons:
* It emulates full-width search, while an engine is spending most of its time in QS.
* It requires explicit checking of the moves for legality, while most engines just test for King capture in the daughter node
* It is measured under conditions where there is much less cache pressure

Right, but I'm not sure how many engines test for King capture on the next ply instead of checking for full legality.
Crafty did this in the past, that is the only engine I know of, I always disliked this, somehow it doesn't feel right.
I stopped looking at other sources many years ago, maybe most engines do it this way, but my engine is not one of them.

Fulvio · Post by **Fulvio** » Fri Mar 15, 2019 10:58 am

Joost Buijs wrote: ↑Fri Mar 15, 2019 9:31 am In the example of 75 mln. nodes/sec you gave above you forgot to mention what kind of CPU you tested this on, this is not very useful either.

That's why I posted the code, and I should have also posted how to compile it.
After downloading the lc0 code from github and saving that code into lc0/src/chess/perft.cc compile with:
g++ -O3 -march=native -I.. perft.cc board.cc ../utils/logging.cc

Using it as a shared reference measure it is possible to say: this move generator does perft x time faster than the lc0 one.
Not perfect, but more useful in my opinion.

Fulvio · Post by **Fulvio** » Fri Mar 15, 2019 11:06 am

hgm wrote: ↑Fri Mar 15, 2019 9:55 am Perft is a poor measure for engine move-generation speed anyway, for several reasons:
* It emulates full-width search, while an engine is spending most of its time in QS.
* It requires explicit checking of the moves for legality, while most engines just test for King capture in the daughter node
* It is measured under conditions where there is much less cache pressure

We all agree, the only real benchmark for chess engines is playing games.
However this does not mean that it is useless to have different benchmarks that try to isolate the various components.
Obviously, always keeping in mind that they have a relative value.

Joost Buijs · Post by **Joost Buijs** » Fri Mar 15, 2019 11:45 am

Fulvio wrote: ↑Fri Mar 15, 2019 10:58 am
Joost Buijs wrote: ↑Fri Mar 15, 2019 9:31 am In the example of 75 mln. nodes/sec you gave above you forgot to mention what kind of CPU you tested this on, this is not very useful either.
That's why I posted the code, and I should have also posted how to compile it.
After downloading the lc0 code from github and saving that code into lc0/src/chess/perft.cc compile with:
g++ -O3 -march=native -I.. perft.cc board.cc ../utils/logging.cc

Using it as a shared reference measure it is possible to say: this move generator does perft x time faster than the lc0 one.
Not perfect, but more useful in my opinion.

Actually my comment was meant for Michael.

I didn't try to compile LC0 because I don't use g++, probably it compiles with Intel C++ or MSVC just fine, since it is not important for me to know how fast the move generator of LC0 is I don't want to spend time on it.

Most move generators based on bitboard representation with magic multiplication will perform in the same realm because there are not many different ways you can do it.

hgm · Post by **hgm** » Fri Mar 15, 2019 1:53 pm

Fulvio wrote: ↑Fri Mar 15, 2019 11:06 amWe all agree, the only real benchmark for chess engines is playing games.

Well, I guess we don't even agree on that, because LC0 is doing fine in games, but prone to tactical blunders that 2000-Elo engine have no difficulty spotting. So being good at playing games still can give you an engine that sucks.

However this does not mean that it is useless to have different benchmarks that try to isolate the various components.

That is not really the point I wanted to make. Sure, making any of the components faster, even just a bit, and even a component that consumes only a minor fraction of the time, would in principle benefit the engine.

But my point was that to know what move generation would be fastest in satisfying the engines needs would not be the same as one that gives you the fastest perft. Not only because the conditions of program execution change due to interaction with other components (e.g. higher cache pressure), but also because you will use the same code for an entirely different task (generating all moves rather than just captures).

Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator

Re: Further progress on new move generator