Fastest 32 (64?) bit BishopAttacks()

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Fastest 32 (64?) bit BishopAttacks()

Post by Gerd Isenberg »

Michael Sherwin wrote: A philosophical note:

During the time that I have been trying to come up with a supper fast bitboard move generator for 32 bit machines, Tord Romstad has written a complete new and very strong chess program. I wonder which one of us has used our time more wisley? I consider his sliding-piece bitboard move generator to be horibly slow and yet he has a new program and I do not. My sliding-piece bitboard move generators will run circles around Tord's, but have no legs with which to do it. Anyway, what percentage of the time is used up by just the sliding-piece generators and does it really matter much how fast they are?

Since I have come this far already I will make both BishopAttacks() and RookAttacks() as 32 bit friendly as possible. Then I will build my next program with them regardless of how fast they are or arn't, just because I made them.
That was discussed often before.

Exponential speedup, search, reduction/extension decisions, domain dependent knowledge for eval and such is much, much more inportant, not to mention bugs and strategies or designs to avoid them.

One may consider those bit-twiddling stuff to save a few percent as a waste of time. For me it is hobby and fun and also a nice variation from that what i professionally do as a software developer. And it keeps the brain fit and trains the mathematical understanding ;-)

Otoh it is nice to have atomic functions with simple scalar register parameters. To work on a higher abstraction level with it, even with another programming language. It is more convenient and efficient to pass a temporary altered occupied set rather than four rotated bitboards into a function. One may become encouraged to use it more often for hopefully "better" knowledge and algorithms to gain exponential speedup. Whether those functions are a few percent slower or faster doesn't matter so much in terms of elo points ;-)

But one percent to the other. There are some thresholds and non-linearities, where a program suddenly becomes a lot slower (or faster), while adding new code and data.

Gerd
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Fastest 32 (64?) bit BishopAttacks()

Post by Tord Romstad »

Michael Sherwin wrote:During the time that I have been trying to come up with a supper fast bitboard move generator for 32 bit machines, Tord Romstad has written a complete new and very strong chess program. I wonder which one of us has used our time more wisley?
Hi Michael,

In my opinion, both of us have spent our time wisely. I prefer to see chess programming not as a sport, but as a community effort with the aim of writing the strongest possible chess program. For this aim, we need people specializing in many kinds of tasks: Testing, inventing new low-level data structures and algorithms, discovering new improvments to alpha-beta search, improving the parallel search algorithms, thinking about new ways to prevent and detect bugs (this has been where I have spent most of my effort during the development of Glaurung 2, by the way), and so on. From this perspective, the work you do in finding clever new ways of generating sliding attacks is at least as valuable as the work I do in writing a complete chess program, or the work Graham, Werner and Patrick (among others) do in testing.

Glaurung is not just my program, it is also yours. I have only a small part of the honor. Without the help from you (and others, of course), Glaurung wouldn't be nearly as strong as it is.

Honestly, I must confess that I sometimes have bad conciousness because I haven't spent more time thinking about techniques for bitboard attack generation. I have enough confidence to think that I could have made a few small contributions of my own in this area, but I've been too lazy, and left all the hard work to people like you, Gerd, Pradu, and H.G. Muller. This is one of the reasons my program is open source: If I kept if private, I would feel like a parasite, using the results of your hard work without contributing anything back.
I consider his sliding-piece bitboard move generator to be horibly slow
I'm actually very happy to hear that. :)
Anyway, what percentage of the time is used up by just the sliding-piece generators and does it really matter much how fast they are?
I think it does. My program doesn't spend a lot of time on it right now, but to a great extent this is because I make a lot of potentially unsound approximations in order to avoid generating too many sliding attacks. If I could generate losts of sliding attacks without worrying about the price, I could afford to do much more interesting things in my search and eval.

Tord
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Fastest 32 (64?) bit BishopAttacks()

Post by Michael Sherwin »

Thanks Tord,

I do not think that I totally agree, however, I am not even sure of that :?

Since I read somewhere that you plan to support 32 bit for awhile to come and if you like the 32 bit friendly alternatives that I have come up with that are also quite good for 64 bit then please consider using them in Glaurung, especially if it will help you to do other interesting things that will make Glaurung stronger. :D

Mike
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Tord Romstad
Posts: 1808
Joined: Wed Mar 08, 2006 9:19 pm
Location: Oslo, Norway

Re: Fastest 32 (64?) bit BishopAttacks()

Post by Tord Romstad »

Michael Sherwin wrote:Since I read somewhere that you plan to support 32 bit for awhile to come
I do. I prefer to optimize for the hardware people are actually using now, and not for the hardware people will be using a few years from now. For the next few years, the overwhelming majority of computer users will still be using 32-bit CPUs. It might be true that most of the computers sold have 64-bit CPUs, but most users don't upgrade or replace their computers nearly as often as the CCC crowd tend to believe. Single-CPU 32-bit comptuers are the norm, and will remain so for quite some time.
and if you like the 32 bit friendly alternatives that I have come up with that are also quite good for 64 bit then please consider using them in Glaurung, especially if it will help you to do other interesting things that will make Glaurung stronger. :D
I'll steal your new algorithms in an instant if they turn out to be significantly faster, and if I can understand how they work. I must admit that I haven't tried reading your code so far.

Tord
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: Fastest 32 (64?) bit BishopAttacks()

Post by Pradu »

Tord Romstad wrote: I'll steal your new algorithms in an instant if they turn out to be significantly faster, and if I can understand how they work. I must admit that I haven't tried reading your code so far.
Tord
The idea is to shift the occupancy bits down to the lower 16 bits without collisions, then use the high and low bytes of the 16 bits to index a move database.
Spock

Re: Fastest 32 (64?) bit BishopAttacks()

Post by Spock »

Tord Romstad wrote: I prefer to optimize for the hardware people are actually using now, and not for the hardware people will be using a few years from now. For the next few years, the overwhelming majority of computer users will still be using 32-bit CPUs. It might be true that most of the computers sold have 64-bit CPUs, but most users don't upgrade or replace their computers nearly as often as the CCC crowd tend to believe. Single-CPU 32-bit comptuers are the norm, and will remain so for quite some time.
Hi Tord,

Well you may optimise for single CPU 32-bit, but you seem to have done pretty well on high end hardware as well. On both our 40/4 and 40/40 lists, Glaurung gains about 100 ELO moving from 1CPU to 4CPUs, which I think is impressive :)
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Fastest 32 (64?) bit BishopAttacks()

Post by Michael Sherwin »

Tord Romstad wrote:
Michael Sherwin wrote:Since I read somewhere that you plan to support 32 bit for awhile to come
I do. I prefer to optimize for the hardware people are actually using now, and not for the hardware people will be using a few years from now. For the next few years, the overwhelming majority of computer users will still be using 32-bit CPUs. It might be true that most of the computers sold have 64-bit CPUs, but most users don't upgrade or replace their computers nearly as often as the CCC crowd tend to believe. Single-CPU 32-bit comptuers are the norm, and will remain so for quite some time.
and if you like the 32 bit friendly alternatives that I have come up with that are also quite good for 64 bit then please consider using them in Glaurung, especially if it will help you to do other interesting things that will make Glaurung stronger. :D
I'll steal your new algorithms in an instant if they turn out to be significantly faster, and if I can understand how they work. I must admit that I haven't tried reading your code so far.

Tord
Then check out my final renditions that I am posting in a new thread! :D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Fastest 32 (64?) bit BishopAttacks()

Post by Gerd Isenberg »

Michael Sherwin wrote:I conceed that trying to beat Kindergarden RookAttacks() on a 64 bit machine is a waste of time! However, 32 bit is a different story. I want to take one last try at RookAttacks() targeted for 32 bit machines, with:

Code: Select all

U64 RookAttacks(square *sq, u64 *occ) {
  u32 occ1 = (u32)(*(occ + sq->off1));
  u32 occ2 = (u32)(*(occ + sq->off2));
  u32 idx  = (occ1 & sq->rankMask) >> sq->rankShift;
      idx |= (occ1 & sq->colMask1) >> sq->colShift1;
      idx |= (occ2 & sq->colMask2) >> sq->colShift2;
      idx |= (idx  &   0xffff0000) >>  sq->idxShift;
      return rookAttacks[sq->rMinimal1[idx & 255] | sq->rMinimal2[(idx >> 8) & 255]];
}
Notice that the only 64 bit instruction is the final return!

How does this stack up against 32 bit friendly Kindergarden RookAttacks()?

.
Your folded bishop-attacks clearly is very interesting, but i had not have the time yet for a closer look (it will probably take some weeks, due to my job and the coming events in the netherlands, ict7 and wccc).

In rook-attacks I count 13 32-bit memory reads, but that are likely only four cachelines from L1. The four variable shifts are a bottleneck since each needs cl. This avoids possible parallel computatation of otherwise independent instruction chains.

32-bit kindergarten rook-attacks has one conditional jump for shifting right the rank and two 32-bit imuls so far.

It is your own approach - if you don't care for a few percent (if at all) of speed, it is absolutely fine to make it run and to use it. Although it is simple to replace.

Cheers,
Gerd