Comparing 4 move generators: 0x88 vs 10x12 vs 10x12 + bitboards HYBRID vs Pure MAGIC BITBOARDS

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Comparing 4 move generators: 0x88 vs 10x12 vs 10x12 + bitboards HYBRID vs Pure MAGIC BITBOARDS

Post by mvanthoor »

OliverBr wrote: Fri Sep 18, 2020 10:40 pm On the leafs, it's only counting the legal moves, not actually generating it.
OK; so this is bulk counting. I could add this to my engine easily as an option... but I'm hesitant about it because it doesn't do anything for playing chess. You can't NOT include those leaves in your move lists when you're playing, of course ... in perft, you can just count the length of the list and throw it away because you know all moves will be legal.

Maybe I'll add it just to see what perft speed would become... and then take it out again. Or make it an option for people who'd want to test with it. I'd like my perft to be brute force to test everything down to the last leaf.
Hardware is not that importan, as it looks like one 1 core.
The result comes from: "gcc8 64bit AMD EPYC 7502P 1/32-Core Processor", which looks stunning, but only 1 core is being used.
On my i7-8850H MacBookPro it's about 80% performance (of course only 1 core being used, too).

This move generator has been built in 2008 with the main purpose to be as fast as possible. I invented an own style of bitboard representation. Later I learned it is called "Kindergarten bitboard".

The performance gain since 2008 may be interesting (using hash):

Code: Select all

/* OliPerft 1.0.5 - Bitboard Magic Move (c) Oliver Brausch 16.Sep.2020, ob112@web.de */
/* oliperft 6 "r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 1"
Nodes: 8229523927 cs: 1156 knps: 711527 (gcc8 64bit AMD EPYC 7502P 1/32-Core Processor)
Nodes: 8229523927 cs: 1526 knps: 539145 (gcc4 OSX i7 8850H 2.6 GHz)
Nodes: 8229523927 ms: 40610 knps: 202647 (VS2005 64bit AMD64 4600+) (1.0.2)
Nodes: 8229523927 ms: 64860 knps: 126881 (VS2005 32bit AMD64 4600+) (1.0.2)
Nodes: 8229523927 ms: 97251 knps: 84621 (gcc4 32bit AMD Opteron 1210HE) (1.0.2)
the last 3 lines and version 1.0.2 are from 2008 with hardware of that time. The speed for 1 core increased only about a factor 3 in 12 years. Who had ever thought this?
Don't know if I'm missing anything, but I see the following node counts:
84.621 knps (84 million)
126.881 knps (127 million)
202.647 knps (202 million)
539.145 knps (539 million)
711.527 knps (712 million)

That is a speedup of 8.47x. Where do you get a factor 3 in 12 years?

What really stands out to me is the massive jump in speed when you switched to 64-bit. I see that with my engine as well: if I compile it as a 32-bit engine, the speed drops by 40-50%.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
Jhoravi
Posts: 291
Joined: Wed May 08, 2013 6:49 am

Re: Comparing 4 move generators: 0x88 vs 10x12 vs 10x12 + bitboards HYBRID vs Pure MAGIC BITBOARDS

Post by Jhoravi »

is BMI2 support activated in the bitboard tests?
OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Comparing 4 move generators: 0x88 vs 10x12 vs 10x12 + bitboards HYBRID vs Pure MAGIC BITBOARDS

Post by OliverBr »

mvanthoor wrote: Sat Sep 19, 2020 12:36 am
OliverBr wrote: Fri Sep 18, 2020 10:40 pm

Code: Select all

Nodes: 8229523927 cs: 1526 knps: 539145 (gcc4 OSX i7 8850H 2.6 GHz)
Nodes: 8229523927 ms: 40610 knps: 202647 (VS2005 64bit AMD64 4600+) (1.0.2)
That is a speedup of 8.47x. Where do you get a factor 3 in 12 years?
Between those two lines lie 12 years and it's even less than a factor 3. The 8.47x speedup was against an inferior cpu and 32bit.
What really stands out to me is the massive jump in speed when you switched to 64-bit. I see that with my engine as well: if I compile it as a 32-bit engine, the speed drops by 40-50%.
This is because bitboard move generator are designed for 64bit machine- How many squares does a chess board have? 64...
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: Comparing 4 move generators: 0x88 vs 10x12 vs 10x12 + bitboards HYBRID vs Pure MAGIC BITBOARDS

Post by maksimKorzh »

mvanthoor wrote: Fri Sep 18, 2020 8:23 pm
maksimKorzh wrote: Fri Aug 28, 2020 7:27 pm ...
thanks for this post :) I'll have to download the bitboard chess engine and compare it with Rustic on my own computer. (I'll compile it myself in MSYS2 / GCC, at the fastest speed setting, with --target-cpu set at "Skylake", just as with my own engine.) I'm curious to see a speed comparison on the same computer.

OliverBr wrote: Fri Sep 18, 2020 12:10 pm 5. OliThink5 / OliPerft (Kindergarten Bitboards)

- independent developed bitboard representation
- no piece lists
https://github.com/olithink/OliThink/bl ... oliperft.c

1. Without hash:

Code: Select all

Pa2-a3: 4463267
Pa2-a4: 5363555
Pb2-b3: 5310358
Pb2-b4: 5293555
Pc2-c3: 5417640
Pc2-c4: 5866666
Pd2-d3: 8073082
Pd2-d4: 8879566
Pe2-e3: 9726018
Pe2-e4: 9771632
Pf2-f3: 4404141
Pf2-f4: 4890429
Pg2-g3: 5346260
Pg2-g4: 5239875
Ph2-h3: 4463070
Ph2-h4: 5385554
Nb1-a3: 4856835
Nb1-c3: 5708064
Ng1-f3: 5723523
Ng1-h3: 4877234

Nodes: 119060324 cs: 51 knps: 232539
0.51 seconds, 232 mnps

2. With hash:

Code: Select all

Nodes: 119060324 cs: 43 knps: 271208
0.43 seconds, 271 mnps
As you said your engine uses a legal-only move generator, I assume you're using bulk counting? That is a MASSIVE nps count. (Personally I count 'leaves found per second'; do you count every node you traverse, or only the leaves with regard to speed?)

Also... the speed of the computer is of great importance of course.
The results I've provided are achieved with the ONLY flag "-Ofast" that as far as I've understood adds gcc's "O2", "O3" flags together or somathing like that.

And one more thing - like I've been kindly pointed out by someone on this forum a couple of years ago(not sure) - move generator's speed is important but it's definitely not the most important part making engine fast and strong. The optimizations of alpha beta search are way more important compared to movegen because even if you have a lighting speed movegen still with a poor move ordering it would be as slow as hell.
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: Comparing 4 move generators: 0x88 vs 10x12 vs 10x12 + bitboards HYBRID vs Pure MAGIC BITBOARDS

Post by maksimKorzh »

Jhoravi wrote: Sat Sep 19, 2020 5:46 am is BMI2 support activated in the bitboard tests?
Nope. For bit manipulations does not rely on hardware instructions, only software algorithms.
OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

Re: Comparing 4 move generators: 0x88 vs 10x12 vs 10x12 + bitboards HYBRID vs Pure MAGIC BITBOARDS

Post by OliverBr »

maksimKorzh wrote: Sat Sep 19, 2020 12:22 pm And one more thing - like I've been kindly pointed out by someone on this forum a couple of years ago(not sure) - move generator's speed is important but it's definitely not the most important part making engine fast and strong. The optimizations of alpha beta search are way more important compared to movegen because even if you have a lighting speed movegen still with a poor move ordering it would be as slow as hell.
Yep, good move ordering and pruning are the most important elements for strength in a chess engine.
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
User avatar
mvanthoor
Posts: 1784
Joined: Wed Jul 03, 2019 4:42 pm
Location: Netherlands
Full name: Marcel Vanthoor

Re: Comparing 4 move generators: 0x88 vs 10x12 vs 10x12 + bitboards HYBRID vs Pure MAGIC BITBOARDS

Post by mvanthoor »

maksimKorzh wrote: Sat Sep 19, 2020 12:23 pm ...
By the way Maksim... I've reached your video's where you show that your engine is playing against TSCP. Good job to reach that stage :) (I know you're much further along already.) There's one thing I'd like to mention.

If you don't implement time control and let the engines go on their own way within that time limit but use a fixed depth search, you are not testing one engine against the other: you're testing the effectiveness of evaluation functions. (Because you set the engines to a fixed depth, you're negating any time management, search function implementations, and speed advantages/disadvantages of both engines.) So, you could basically beat ANY engine you want if you set both engines to depth 5 or 6 and then you start expanding your evaluation until it is better than the other one.
Author of Rustic, an engine written in Rust.
Releases | Code | Docs | Progress | CCRL
User avatar
maksimKorzh
Posts: 771
Joined: Sat Sep 08, 2018 5:37 pm
Location: Ukraine
Full name: Maksim Korzh

Re: Comparing 4 move generators: 0x88 vs 10x12 vs 10x12 + bitboards HYBRID vs Pure MAGIC BITBOARDS

Post by maksimKorzh »

mvanthoor wrote: Sat Sep 19, 2020 8:18 pm
maksimKorzh wrote: Sat Sep 19, 2020 12:23 pm ...
By the way Maksim... I've reached your video's where you show that your engine is playing against TSCP. Good job to reach that stage :) (I know you're much further along already.) There's one thing I'd like to mention.

If you don't implement time control and let the engines go on their own way within that time limit but use a fixed depth search, you are not testing one engine against the other: you're testing the effectiveness of evaluation functions. (Because you set the engines to a fixed depth, you're negating any time management, search function implementations, and speed advantages/disadvantages of both engines.) So, you could basically beat ANY engine you want if you set both engines to depth 5 or 6 and then you start expanding your evaluation until it is better than the other one.
I've added time handling several videos ago: