Bitboards using 2 DOUBLE's??

Gerd Isenberg · Post by **Gerd Isenberg** » Tue Jun 02, 2009 11:40 pm

bob wrote:
Makes no difference because I'm curious about the bit operations as floating points.
Those do not naturally exist in any processor I know of (IEEE math processor). One can do any arbitrary boolean function as a mathematical operation. "not" means subtract from all 1's, AND means to multiply bit by bit. Etc. Not going to be fast at all when compared to using two normal 32 bit integer values which do have AND/OR/XOR operations that operate on all bits at once...

Still SSE has explicit bitwise boolean opcodes for doubles (and floats) ANDNPD, ANDPD, ORPD and XORPD, which work exactly as their SSE2 integer counterparts, but avoid int-double stalls. Probably beside clearing a xmm-register by XORPD (setting two doubles zero), they might be used to clear/set/toggle the sign bit, or to apply a multiplication by {0.0, 1.0} with ANDPD by {0.0, -1#QNAN}. No idea for what ANDNPD (x = ~x & y) is good for, may be there some freaky bit-twiddling tricks or even algorithms. Looks like ones' complement of x acts like -4.0 / x:

~ 0.5 -> -7.999...
~ 1.0 -> -3.999...
~ 1.5 -> -2.999...
~ 1.75 -> -2.499...
~ 2.0 -> -1.999...
~ 4.0 -> -0.999...
~ 8.0 -> -0.499...

bob · Post by **bob** » Wed Jun 03, 2009 1:30 am

Gerd Isenberg wrote:
bob wrote:
Makes no difference because I'm curious about the bit operations as floating points.
Those do not naturally exist in any processor I know of (IEEE math processor). One can do any arbitrary boolean function as a mathematical operation. "not" means subtract from all 1's, AND means to multiply bit by bit. Etc. Not going to be fast at all when compared to using two normal 32 bit integer values which do have AND/OR/XOR operations that operate on all bits at once...
Still SSE has explicit bitwise boolean opcodes for doubles (and floats) ANDNPD, ANDPD, ORPD and XORPD, which work exactly as their SSE2 integer counterparts, but avoid int-double stalls. Probably beside clearing a xmm-register by XORPD (setting two doubles zero), they might be used to clear/set/toggle the sign bit, or to apply a multiplication by {0.0, 1.0} with ANDPD by {0.0, -1#QNAN}. No idea for what ANDNPD (x = ~x & y) is good for, may be there some freaky bit-twiddling tricks or even algorithms. Looks like ones' complement of x acts like -4.0 / x:

~ 0.5 -> -7.999...
~ 1.0 -> -3.999...
~ 1.5 -> -2.999...
~ 1.75 -> -2.499...
~ 2.0 -> -1.999...
~ 4.0 -> -0.999...
~ 8.0 -> -0.499...

I understand that. He was wanting to use floating point specifically, the way I interpreted his question, not just using the SSE extensions to alter how the FP registers are used...

bob · Post by **bob** » Wed Jun 03, 2009 1:38 am

Carey wrote:
bob wrote:
BTW, one error above. You have 52 bits, not 53. The 53rd bit _must_ be a 1 bit (the IEEE "hidden" bit).
Sorry Bob, but no. Obvious example is the number zero. It has no bits set. (Yes, I know it uses a special exponent code to indicate zero.)
Here we are going to have to disagree, because I teach IEEE floating point in my asm class.
Then you don't teach it very well, Sir.

I mean no disrespect, especially considering how much I respect your chess programming abilities and history. When I tell somebody that "Hyatt said..." I know you are right because I know you've tested it thourghly.

But if you are going to brag about teaching float in your asm class, then I have to stand up and say that I don't think you do it well. That in this case, you are wrong.

Zero is a special case. when the exponent field is all zero, we now have an unnormalized number (perfectly legal) which eliminates that hidden 1 bit. But you can't use it. For 32 bit IEEE, you have 1 sign bit, 8 exponents, and 32 - 1 - 8 leaves 23 bits, not 24. Ditto for 64 bit IEEE which has 52 bits stored in the value, + the hidden 1 bit that is always present except for non-nomalized numbers.

But you show me any way to represent a 53 bit number with the high-order bit 0 or 1, and I'll kiss you.
The specs say that there *are* 52 bits of genuine data space physically available with the 53rd bit being implicitly and unchangably set as '1'. You are saying that as well.

So, for a 53 bit integer where the 53rd bit *is* set, then there's no problem. Because you have the high bit set and 52 bits of other data. The very thing the float specs explicitly allow with the 53rd bit being set but not physically present.

Not so fast. There _is_ a problem. Notably when I want to turn that 1 bit into a zero. That is no longer a "single bit operation". Which was my point.

For the case where you have 53 bits of data where the 53rd bit is *NOT* set then you don't actually have 53 bits of data. You have 52 bits. Or 51 or 50 or where ever the highest bit is set. In which case the 52 bits that are physically there are enough to hold it without issue. The FPU will normalize things so the highest bit is set and adjust its exponent accordingly.

For the case where there are no bits at all set, the FPU has a special flag to idicate that.

So you've covered all the bases. Bit 53 set and 52 bits of other data. Bit 53 not set and up to 52 bits of other data. No bits set.

A total of 53 bits of user data.

You are completely missing my point. My point was, and is, that you do not have any specific operation that can change that upper bit from 0 to 1 or 1 to zero, when dealing with a single bit. For example, I have all "53 bits" set to zero. By using the IEEE fp 0.0 representation. Now I want to turn that leftmost bit to a 1. I have to mangle the exponent to do this, there is no arithmetic operation that will accomplish that. I can turn any of the other 52 bits on or off whenever I want. Simple subtraction or addition will work. But that 53rd bit won't work the same way, which is a problem IMHO, and this is _the_ problem I have been trying (without success) to explain. Using that single extra bit is going to make this much more complicated to do, when there is no benefit to the idea in the first place.

You can get it to be zero with a zero exponent. You can get it to be 1 with a normal exponent. But you can't do boolean operations on that bit. It would require fiddling with the exponent instead since that 53rd bit does not exist in the actual number in IEEE format.
Doing boolean operations would require some fiddling with the numbers.

It's possible some sort of MAGIC multiplication would work, but I'm not sure. I suspect it would, but I don't know the values or how many bits you could do at a time.

The best I've been able to come up that I know would work is that since we would have to use two doubles to hold 64 bits of real data, we'd keep 32 bits in each one. (Just to make things consistant with both doubles.)

We'd then add a huge value to shift our 32 bits down to the lower word.

We then use the lowest 8 bits as an index into an array of truth tables. Repeat with the other bytes of data. We are using an integer to grab the lowest 8 bits and use it as an index, but we aren't really doing math as integers.

I'm not real fond of using truth tables but that's the best I've come up with.

(The addition of huge values to shift our data to a lower part of the word was a classic x87 hack for conversion to integer, truncating, etc. The x87 was so slow doing those operations it was actually faster to do them by hand with some clever math. Not common today, but the ideas are well tested and would still work.)

You can safely have any arbitrary 53 bit integer stored into a double without any risk of corruption. The FPU will represent it however it needs to.
See above. one of those 53 bits does not physically exist. Which means to manipulate the rightmost 52 bits, you twiddle with those. To manipulate that non-existent bit, you have to twiddle with the entire left end (exponent field) of the number rather than a single bit.
1) We don't have to actually store a full 53 bits into each double. We could store just 32, that way we have some extra space to mess with if we want.

2) If we did do some floating point operation that involved the hidden 53rd bit, then the FPU would take care of that automatically. It's designed to do that.

No, it's not designed to do boolean operations, but with the right kind of math we might be able to trick it. If not, then there's always the 'cheating' method of using truth tables like I mentioned above.

Yes, there are only 52 bits available, but the FPU does what it needs to hold the full 53 bits of integer data.

It normalizes things and has a special code for zero, etc. If the 53'rd bit is '1' then things are fine. If it's a zero then the FPU normalizes it and the exponent is adjusted. It all works out that you can safely hold 53 bits of integer data.

*HOWEVER*, just to make you happy, I'll say that you can only old 40 bits of mantissa in a double. Doesn't matter because I'm talking about using software based 'double-double' math that doubles the number of mantissa bits anyway.

So in this case instead of 53*2=106 it'd be 40*2=80 which is still more than enough for what I'm talking about.

Makes no difference because I'm curious about the bit operations as floating points.
Those do not naturally exist in any processor I know of (IEEE math processor). One can do any arbitrary boolean function as a mathematical operation. "not" means subtract from all 1's, AND means to multiply bit by bit. Etc. Not going to be fast at all when compared to using two normal 32 bit integer values which do have AND/OR/XOR operations that operate on all bits at once...
No, they don't exist. I thought I was pretty clear about that in my first post. And was one of the reasons I was asking if anybody had thought about this kind of useless stuff.

And no, it's not going to be fast. But who cares? As I've said several times this wasn't a practical method. It was just something I got curious about.

Just a "Can it be done?" kind of thing.

Anything "can be done". Whether it "should be done" is another matter, of course.

sje · Post by **sje** » Wed Jun 03, 2009 2:27 am

bob wrote:
sje wrote:What is old is new again, for this is what Slate and Atkin did thirty five years ago on a CDC 6400 mainframe with their Chess 4.x program.

But I'm unable to see why the idea should be making a comeback.
I don't remember them using floating point numbers. They just ran on a 60 bit word CDC and had to use 2 words to store one value...

Well, they used floating point instructions on 32 bit integers contained in 60 bit words, two words per bitboard. This is how they managed a (relatively) fast FirstSq()/NextSq().

For the record, the CDCs of that era had a 60 bit float (48 bit mantissa) and a 2 word, 120 bit float (96 bit mantissa). Also, integer multiplication and division was performed with floating point hardware, thus limiting the values to 48 bits. But integer addition and subtraction could use the full 60 bits in a word. Oh, and everything was one's complement including addresses, so there was the negative zero headache as well.

Carey · Post by **Carey** » Wed Jun 03, 2009 3:20 am

bob wrote:
Carey wrote:
bob wrote:
BTW, one error above. You have 52 bits, not 53. The 53rd bit _must_ be a 1 bit (the IEEE "hidden" bit).
Sorry Bob, but no. Obvious example is the number zero. It has no bits set. (Yes, I know it uses a special exponent code to indicate zero.)
Here we are going to have to disagree, because I teach IEEE floating point in my asm class.
Then you don't teach it very well, Sir.

I mean no disrespect, especially considering how much I respect your chess programming abilities and history. When I tell somebody that "Hyatt said..." I know you are right because I know you've tested it thourghly.

But if you are going to brag about teaching float in your asm class, then I have to stand up and say that I don't think you do it well. That in this case, you are wrong.

Zero is a special case. when the exponent field is all zero, we now have an unnormalized number (perfectly legal) which eliminates that hidden 1 bit. But you can't use it. For 32 bit IEEE, you have 1 sign bit, 8 exponents, and 32 - 1 - 8 leaves 23 bits, not 24. Ditto for 64 bit IEEE which has 52 bits stored in the value, + the hidden 1 bit that is always present except for non-nomalized numbers.

But you show me any way to represent a 53 bit number with the high-order bit 0 or 1, and I'll kiss you.
The specs say that there *are* 52 bits of genuine data space physically available with the 53rd bit being implicitly and unchangably set as '1'. You are saying that as well.

So, for a 53 bit integer where the 53rd bit *is* set, then there's no problem. Because you have the high bit set and 52 bits of other data. The very thing the float specs explicitly allow with the 53rd bit being set but not physically present.
Not so fast. There _is_ a problem. Notably when I want to turn that 1 bit into a zero. That is no longer a "single bit operation". Which was my point.

No, it's not a single *bit* operation. It's a math operation.

FloatNum - 2^53.

Simple subtraction. The FPU will then perform the subtraction and renormalize.

Since I'd be treating the numbers as floating point numbers, I don't care if the FPU normalizes them. It doesn't matter that bit 1 is really in bit position one. I let the FPU worry about aligning the bits properly to its math.

The numbers are no longer 64 bit sets but instead floating point numbers. It's a mental shift.

In the bitboard programs I've done, I rarely need to set specific bits. It's not a case of setting Bit 53, for example. It's usually a matter of clearing a bit and setting a bit like while making a move.

In that case, I already know whether the bits are set or clear. So I can safely add and subtract. I don't have to do OR to set it or NAND to clear it. I can't be clever and XOR it flip it, but that's just a convenience.

Even when I do need to set a specific bit, I don't actually care that it's stored at certain location. What I do care is that when I do an operation, things will align themselves so the correct bit is effected. The FPU does guarantee that.

It may normalize things for its own convenience but it doesn't matter as long as its consistant and knows how to align things when I do math on them. Which it does.

I don't have to treat them as binary operations but can do it as arithmetic operations and let the FPU normalize the numbers however it wants.

The other operations, such as shifting, can also be done without too much effort. It's just multiplication by 2 and will require a bit of software renormalization. (By 'without too much effort' I mean its possible to do wholely within floating point. I don't have to do it as integers or precomputed tables, etc.)

The only problems I see are AND, OR, PopCount and NextSetBit.

The first two could be done via truth table. The third by a look up table. The fourth could be done several ways including tables or maybe even using frexp().

Some operations will need software 'normalization'. Such as trying to shift bits from one double to another. But that's entirely do-able. The 'double-double' math packages already handle things just like. And do so reliably. (From a math point of view, those 'double-double' packages look just like a real 128 bit floating point number with a 106 bit mantissa.)

For the case where you have 53 bits of data where the 53rd bit is *NOT* set then you don't actually have 53 bits of data. You have 52 bits. Or 51 or 50 or where ever the highest bit is set. In which case the 52 bits that are physically there are enough to hold it without issue. The FPU will normalize things so the highest bit is set and adjust its exponent accordingly.

For the case where there are no bits at all set, the FPU has a special flag to idicate that.

So you've covered all the bases. Bit 53 set and 52 bits of other data. Bit 53 not set and up to 52 bits of other data. No bits set.

A total of 53 bits of user data.
You are completely missing my point. My point was, and is, that you do not have any specific operation that can change that upper bit from 0 to 1 or 1 to zero, when dealing with a single bit. For example, I have all "53 bits" set to zero. By using the IEEE fp 0.0 representation. Now I want to turn that leftmost bit to a 1. I have to mangle the exponent to do this, there is no arithmetic operation that will accomplish that. I can turn any of the other 52 bits on or off whenever I want. Simple subtraction or addition will work. But that 53rd bit won't work the same way, which is a problem IMHO, and this is _the_ problem I have been trying (without success) to explain. Using that single extra bit is going to make this much more complicated to do, when there is no benefit to the idea in the first place.

Bob, you are still thinking of the double's as if they were bit sets. Where a specific bit *HAS* to be in a specific location. Where Bit 53 really is bit 53 and bit 17 really is at bit 17.

And for some reason you are really hung up about the exponent ....

When you treat them as math, you no longer have to worry about that. It doesn't matter if the FPU normalizes things so bit 17 is really that hidden 53rd bit.

There is no 'leftmost' or 'rightmost' bit. You just have a floating point number that just happens to be able to hold 53 bits of data. How it organizes it is irrelevant.

If you have all zeros and you want to set that 53rd bit, then set it. Just add 2^53 to it.

If you have all zeros and you want to set bit 1, then just add 1. And guess what, the FPU will normalize that and put it into that hidden 53rd bit. That's OKAY.

So WHAT if the fpu messes with the exponent to normalize things. That's what it does. Let it. It's not part of your 53 bits of data so let it normalize its little heart out.

With only a few exceptions we don't have to care how the bits are stored in the double. The few times we would care, we can take steps to force alignment so that bit 1 of data really is at bit 1 location. But we don't have to keep them that way when doing math because the FPU will adjust things to keep itself happy.

We are trying to do this as MATH not bit / set operations. (Which is why AND & OR are so difficult....)

And, as I've said several times, we don't have to use 53 bits. If you are uncomfortable with it, then don't.... In fact, for convenience it would actually be better to just do 32 bits into each double. Some things will have to change but are do-able. And some things would actually be more convenient.

And if we went all the way and actually used some pre-existing "double-double" math package (like David Baiely's) then our mantissa will actually be 106 bits. We wouldn't have to worry about hidden bits or how they behave. The whole 64 bit bitboard would fit into the mantissa with room to spare.

You can get it to be zero with a zero exponent. You can get it to be 1 with a normal exponent. But you can't do boolean operations on that bit. It would require fiddling with the exponent instead since that 53rd bit does not exist in the actual number in IEEE format.
Doing boolean operations would require some fiddling with the numbers.

It's possible some sort of MAGIC multiplication would work, but I'm not sure. I suspect it would, but I don't know the values or how many bits you could do at a time.

The best I've been able to come up that I know would work is that since we would have to use two doubles to hold 64 bits of real data, we'd keep 32 bits in each one. (Just to make things consistant with both doubles.)

We'd then add a huge value to shift our 32 bits down to the lower word.

We then use the lowest 8 bits as an index into an array of truth tables. Repeat with the other bytes of data. We are using an integer to grab the lowest 8 bits and use it as an index, but we aren't really doing math as integers.

I'm not real fond of using truth tables but that's the best I've come up with.

(The addition of huge values to shift our data to a lower part of the word was a classic x87 hack for conversion to integer, truncating, etc. The x87 was so slow doing those operations it was actually faster to do them by hand with some clever math. Not common today, but the ideas are well tested and would still work.)

You can safely have any arbitrary 53 bit integer stored into a double without any risk of corruption. The FPU will represent it however it needs to.
See above. one of those 53 bits does not physically exist. Which means to manipulate the rightmost 52 bits, you twiddle with those. To manipulate that non-existent bit, you have to twiddle with the entire left end (exponent field) of the number rather than a single bit.
1) We don't have to actually store a full 53 bits into each double. We could store just 32, that way we have some extra space to mess with if we want.

2) If we did do some floating point operation that involved the hidden 53rd bit, then the FPU would take care of that automatically. It's designed to do that.

No, it's not designed to do boolean operations, but with the right kind of math we might be able to trick it. If not, then there's always the 'cheating' method of using truth tables like I mentioned above.

Yes, there are only 52 bits available, but the FPU does what it needs to hold the full 53 bits of integer data.

It normalizes things and has a special code for zero, etc. If the 53'rd bit is '1' then things are fine. If it's a zero then the FPU normalizes it and the exponent is adjusted. It all works out that you can safely hold 53 bits of integer data.

*HOWEVER*, just to make you happy, I'll say that you can only old 40 bits of mantissa in a double. Doesn't matter because I'm talking about using software based 'double-double' math that doubles the number of mantissa bits anyway.

So in this case instead of 53*2=106 it'd be 40*2=80 which is still more than enough for what I'm talking about.

Makes no difference because I'm curious about the bit operations as floating points.
Those do not naturally exist in any processor I know of (IEEE math processor). One can do any arbitrary boolean function as a mathematical operation. "not" means subtract from all 1's, AND means to multiply bit by bit. Etc. Not going to be fast at all when compared to using two normal 32 bit integer values which do have AND/OR/XOR operations that operate on all bits at once...
No, they don't exist. I thought I was pretty clear about that in my first post. And was one of the reasons I was asking if anybody had thought about this kind of useless stuff.

And no, it's not going to be fast. But who cares? As I've said several times this wasn't a practical method. It was just something I got curious about.

Just a "Can it be done?" kind of thing.
Anything "can be done". Whether it "should be done" is another matter, of course.

That's probably the same attitude many people had in the 60's about using expensive, serious business computers for something as frivolous as chess....

Frankly, I doubt many people in this forum would find it too odd that somebody got curious about something and wondered if it could be done. They may not care about this particular thing, but I doubt they'd find it odd.

bob · Post by **bob** » Wed Jun 03, 2009 6:13 am

Carey wrote:
bob wrote:
Carey wrote:
bob wrote:
BTW, one error above. You have 52 bits, not 53. The 53rd bit _must_ be a 1 bit (the IEEE "hidden" bit).
Sorry Bob, but no. Obvious example is the number zero. It has no bits set. (Yes, I know it uses a special exponent code to indicate zero.)
Here we are going to have to disagree, because I teach IEEE floating point in my asm class.
Then you don't teach it very well, Sir.

I mean no disrespect, especially considering how much I respect your chess programming abilities and history. When I tell somebody that "Hyatt said..." I know you are right because I know you've tested it thourghly.

But if you are going to brag about teaching float in your asm class, then I have to stand up and say that I don't think you do it well. That in this case, you are wrong.

Zero is a special case. when the exponent field is all zero, we now have an unnormalized number (perfectly legal) which eliminates that hidden 1 bit. But you can't use it. For 32 bit IEEE, you have 1 sign bit, 8 exponents, and 32 - 1 - 8 leaves 23 bits, not 24. Ditto for 64 bit IEEE which has 52 bits stored in the value, + the hidden 1 bit that is always present except for non-nomalized numbers.

But you show me any way to represent a 53 bit number with the high-order bit 0 or 1, and I'll kiss you.
The specs say that there *are* 52 bits of genuine data space physically available with the 53rd bit being implicitly and unchangably set as '1'. You are saying that as well.

So, for a 53 bit integer where the 53rd bit *is* set, then there's no problem. Because you have the high bit set and 52 bits of other data. The very thing the float specs explicitly allow with the 53rd bit being set but not physically present.
Not so fast. There _is_ a problem. Notably when I want to turn that 1 bit into a zero. That is no longer a "single bit operation". Which was my point.
No, it's not a single *bit* operation. It's a math operation.

FloatNum - 2^53.

Simple subtraction. The FPU will then perform the subtraction and renormalize.

"renormalize"? Shifts the entire fraction. I don't think you want to do that.

Since I'd be treating the numbers as floating point numbers, I don't care if the FPU normalizes them. It doesn't matter that bit 1 is really in bit position one. I let the FPU worry about aligning the bits properly to its math.

The numbers are no longer 64 bit sets but instead floating point numbers. It's a mental shift.

OK, that has a possibility. But it makes absolutely no sense to me, as to use bitboards, one has to visualize what the bits represent. For example, "is the E-file open?" The bits can't move around all over, or you can never visualize how to set up the masks to ask that kind of question.

The concept of bitboards is that a specific bit represents a specific square. Going one level up as you are describing appears to me to be an impossible task, mentally...

In the bitboard programs I've done, I rarely need to set specific bits. It's not a case of setting Bit 53, for example. It's usually a matter of clearing a bit and setting a bit like while making a move.

I have to set specific bits every time I make or unmake a move. And then for the special case moves it is even more involved (castling, en passant, pawn promotions. And then there are the masks set up to ask specific questions like "is the pawn on e4 passed?" or "is the e-file open?" or "is the pawn on f3 backward?"

In that case, I already know whether the bits are set or clear. So I can safely add and subtract. I don't have to do OR to set it or NAND to clear it. I can't be clever and XOR it flip it, but that's just a convenience.

What about captures? Bit is already set, you want to set it again as you move the piece to make the capture...

Even when I do need to set a specific bit, I don't actually care that it's stored at certain location. What I do care is that when I do an operation, things will align themselves so the correct bit is effected. The FPU does guarantee that.

Again, that sounds like no chess program/engine I know of. Chess knowledge is pattern-based. The patterns are expressed as specific sets of 1 bits or 0 bits, depending on how it will be used. I very much care what is what.

It may normalize things for its own convenience but it doesn't matter as long as its consistant and knows how to align things when I do math on them. Which it does.

I don't have to treat them as binary operations but can do it as arithmetic operations and let the FPU normalize the numbers however it wants.

The other operations, such as shifting, can also be done without too much effort. It's just multiplication by 2 and will require a bit of software renormalization. (By 'without too much effort' I mean its possible to do wholely within floating point. I don't have to do it as integers or precomputed tables, etc.)

Now you go to far. Previously you said you don't care where the bits are, but now you are talking about shifting 1 bit. So either you _do_ care where the bits are and what they represent, or else you would _never_ want to shift 1 bit since you don't know what it represents...

This sounds like a bad dream at best, approaching a nightmare.

The only problems I see are AND, OR, PopCount and NextSetBit.

The first two could be done via truth table. The third by a look up table. The fourth could be done several ways including tables or maybe even using frexp().

Some operations will need software 'normalization'. Such as trying to shift bits from one double to another. But that's entirely do-able. The 'double-double' math packages already handle things just like. And do so reliably. (From a math point of view, those 'double-double' packages look just like a real 128 bit floating point number with a 106 bit mantissa.)

For the case where you have 53 bits of data where the 53rd bit is *NOT* set then you don't actually have 53 bits of data. You have 52 bits. Or 51 or 50 or where ever the highest bit is set. In which case the 52 bits that are physically there are enough to hold it without issue. The FPU will normalize things so the highest bit is set and adjust its exponent accordingly.

For the case where there are no bits at all set, the FPU has a special flag to idicate that.

So you've covered all the bases. Bit 53 set and 52 bits of other data. Bit 53 not set and up to 52 bits of other data. No bits set.

A total of 53 bits of user data.
You are completely missing my point. My point was, and is, that you do not have any specific operation that can change that upper bit from 0 to 1 or 1 to zero, when dealing with a single bit. For example, I have all "53 bits" set to zero. By using the IEEE fp 0.0 representation. Now I want to turn that leftmost bit to a 1. I have to mangle the exponent to do this, there is no arithmetic operation that will accomplish that. I can turn any of the other 52 bits on or off whenever I want. Simple subtraction or addition will work. But that 53rd bit won't work the same way, which is a problem IMHO, and this is _the_ problem I have been trying (without success) to explain. Using that single extra bit is going to make this much more complicated to do, when there is no benefit to the idea in the first place.
Bob, you are still thinking of the double's as if they were bit sets. Where a specific bit *HAS* to be in a specific location. Where Bit 53 really is bit 53 and bit 17 really is at bit 17.

And for some reason you are really hung up about the exponent ....

When you treat them as math, you no longer have to worry about that. It doesn't matter if the FPU normalizes things so bit 17 is really that hidden 53rd bit.

There is no 'leftmost' or 'rightmost' bit. You just have a floating point number that just happens to be able to hold 53 bits of data. How it organizes it is irrelevant.

If you have all zeros and you want to set that 53rd bit, then set it. Just add 2^53 to it.

If you have all zeros and you want to set bit 1, then just add 1. And guess what, the FPU will normalize that and put it into that hidden 53rd bit. That's OKAY.

So WHAT if the fpu messes with the exponent to normalize things. That's what it does. Let it. It's not part of your 53 bits of data so let it normalize its little heart out.

With only a few exceptions we don't have to care how the bits are stored in the double. The few times we would care, we can take steps to force alignment so that bit 1 of data really is at bit 1 location. But we don't have to keep them that way when doing math because the FPU will adjust things to keep itself happy.

We are trying to do this as MATH not bit / set operations. (Which is why AND & OR are so difficult....)

And, as I've said several times, we don't have to use 53 bits. If you are uncomfortable with it, then don't.... In fact, for convenience it would actually be better to just do 32 bits into each double. Some things will have to change but are do-able. And some things would actually be more convenient.

And if we went all the way and actually used some pre-existing "double-double" math package (like David Baiely's) then our mantissa will actually be 106 bits. We wouldn't have to worry about hidden bits or how they behave. The whole 64 bit bitboard would fit into the mantissa with room to spare.

You can get it to be zero with a zero exponent. You can get it to be 1 with a normal exponent. But you can't do boolean operations on that bit. It would require fiddling with the exponent instead since that 53rd bit does not exist in the actual number in IEEE format.
Doing boolean operations would require some fiddling with the numbers.

It's possible some sort of MAGIC multiplication would work, but I'm not sure. I suspect it would, but I don't know the values or how many bits you could do at a time.

The best I've been able to come up that I know would work is that since we would have to use two doubles to hold 64 bits of real data, we'd keep 32 bits in each one. (Just to make things consistant with both doubles.)

We'd then add a huge value to shift our 32 bits down to the lower word.

We then use the lowest 8 bits as an index into an array of truth tables. Repeat with the other bytes of data. We are using an integer to grab the lowest 8 bits and use it as an index, but we aren't really doing math as integers.

I'm not real fond of using truth tables but that's the best I've come up with.

(The addition of huge values to shift our data to a lower part of the word was a classic x87 hack for conversion to integer, truncating, etc. The x87 was so slow doing those operations it was actually faster to do them by hand with some clever math. Not common today, but the ideas are well tested and would still work.)

You can safely have any arbitrary 53 bit integer stored into a double without any risk of corruption. The FPU will represent it however it needs to.
See above. one of those 53 bits does not physically exist. Which means to manipulate the rightmost 52 bits, you twiddle with those. To manipulate that non-existent bit, you have to twiddle with the entire left end (exponent field) of the number rather than a single bit.
1) We don't have to actually store a full 53 bits into each double. We could store just 32, that way we have some extra space to mess with if we want.

2) If we did do some floating point operation that involved the hidden 53rd bit, then the FPU would take care of that automatically. It's designed to do that.

No, it's not designed to do boolean operations, but with the right kind of math we might be able to trick it. If not, then there's always the 'cheating' method of using truth tables like I mentioned above.

Yes, there are only 52 bits available, but the FPU does what it needs to hold the full 53 bits of integer data.

It normalizes things and has a special code for zero, etc. If the 53'rd bit is '1' then things are fine. If it's a zero then the FPU normalizes it and the exponent is adjusted. It all works out that you can safely hold 53 bits of integer data.

*HOWEVER*, just to make you happy, I'll say that you can only old 40 bits of mantissa in a double. Doesn't matter because I'm talking about using software based 'double-double' math that doubles the number of mantissa bits anyway.

So in this case instead of 53*2=106 it'd be 40*2=80 which is still more than enough for what I'm talking about.

Makes no difference because I'm curious about the bit operations as floating points.
Those do not naturally exist in any processor I know of (IEEE math processor). One can do any arbitrary boolean function as a mathematical operation. "not" means subtract from all 1's, AND means to multiply bit by bit. Etc. Not going to be fast at all when compared to using two normal 32 bit integer values which do have AND/OR/XOR operations that operate on all bits at once...
No, they don't exist. I thought I was pretty clear about that in my first post. And was one of the reasons I was asking if anybody had thought about this kind of useless stuff.

And no, it's not going to be fast. But who cares? As I've said several times this wasn't a practical method. It was just something I got curious about.

Just a "Can it be done?" kind of thing.
Anything "can be done". Whether it "should be done" is another matter, of course.
That's probably the same attitude many people had in the 60's about using expensive, serious business computers for something as frivolous as chess....

Frankly, I doubt many people in this forum would find it too odd that somebody got curious about something and wondered if it could be done. They may not care about this particular thing, but I doubt they'd find it odd.

bob · Post by **bob** » Wed Jun 03, 2009 6:16 am

Carey wrote:
bob wrote:
Carey wrote:
bob wrote:
BTW, one error above. You have 52 bits, not 53. The 53rd bit _must_ be a 1 bit (the IEEE "hidden" bit).
Sorry Bob, but no. Obvious example is the number zero. It has no bits set. (Yes, I know it uses a special exponent code to indicate zero.)
Here we are going to have to disagree, because I teach IEEE floating point in my asm class.
Then you don't teach it very well, Sir.

I mean no disrespect, especially considering how much I respect your chess programming abilities and history. When I tell somebody that "Hyatt said..." I know you are right because I know you've tested it thourghly.

But if you are going to brag about teaching float in your asm class, then I have to stand up and say that I don't think you do it well. That in this case, you are wrong.

Zero is a special case. when the exponent field is all zero, we now have an unnormalized number (perfectly legal) which eliminates that hidden 1 bit. But you can't use it. For 32 bit IEEE, you have 1 sign bit, 8 exponents, and 32 - 1 - 8 leaves 23 bits, not 24. Ditto for 64 bit IEEE which has 52 bits stored in the value, + the hidden 1 bit that is always present except for non-nomalized numbers.

But you show me any way to represent a 53 bit number with the high-order bit 0 or 1, and I'll kiss you.
The specs say that there *are* 52 bits of genuine data space physically available with the 53rd bit being implicitly and unchangably set as '1'. You are saying that as well.

So, for a 53 bit integer where the 53rd bit *is* set, then there's no problem. Because you have the high bit set and 52 bits of other data. The very thing the float specs explicitly allow with the 53rd bit being set but not physically present.
Not so fast. There _is_ a problem. Notably when I want to turn that 1 bit into a zero. That is no longer a "single bit operation". Which was my point.
No, it's not a single *bit* operation. It's a math operation.

FloatNum - 2^53.

Simple subtraction. The FPU will then perform the subtraction and renormalize.

"renormalize"? Shifts the entire fraction. I don't think you want to do that.

Since I'd be treating the numbers as floating point numbers, I don't care if the FPU normalizes them. It doesn't matter that bit 1 is really in bit position one. I let the FPU worry about aligning the bits properly to its math.

The numbers are no longer 64 bit sets but instead floating point numbers. It's a mental shift.

OK, that has a possibility. But it makes absolutely no sense to me, as to use bitboards, one has to visualize what the bits represent. For example, "is the E-file open?" The bits can't move around all over, or you can never visualize how to set up the masks to ask that kind of question.

The concept of bitboards is that a specific bit represents a specific square. Going one level up as you are describing appears to me to be an impossible task, mentally...

In the bitboard programs I've done, I rarely need to set specific bits. It's not a case of setting Bit 53, for example. It's usually a matter of clearing a bit and setting a bit like while making a move.

I have to set specific bits every time I make or unmake a move. And then for the special case moves it is even more involved (castling, en passant, pawn promotions. And then there are the masks set up to ask specific questions like "is the pawn on e4 passed?" or "is the e-file open?" or "is the pawn on f3 backward?"

In that case, I already know whether the bits are set or clear. So I can safely add and subtract. I don't have to do OR to set it or NAND to clear it. I can't be clever and XOR it flip it, but that's just a convenience.

What about captures? Bit is already set, you want to set it again as you move the piece to make the capture...

Even when I do need to set a specific bit, I don't actually care that it's stored at certain location. What I do care is that when I do an operation, things will align themselves so the correct bit is effected. The FPU does guarantee that.

Again, that sounds like no chess program/engine I know of. Chess knowledge is pattern-based. The patterns are expressed as specific sets of 1 bits or 0 bits, depending on how it will be used. I very much care what is what.

It may normalize things for its own convenience but it doesn't matter as long as its consistant and knows how to align things when I do math on them. Which it does.

I don't have to treat them as binary operations but can do it as arithmetic operations and let the FPU normalize the numbers however it wants.

The other operations, such as shifting, can also be done without too much effort. It's just multiplication by 2 and will require a bit of software renormalization. (By 'without too much effort' I mean its possible to do wholely within floating point. I don't have to do it as integers or precomputed tables, etc.)

Now you go to far. Previously you said you don't care where the bits are, but now you are talking about shifting 1 bit. So either you _do_ care where the bits are and what they represent, or else you would _never_ want to shift 1 bit since you don't know what it represents...

This sounds like a bad dream at best, approaching a nightmare.

The only problems I see are AND, OR, PopCount and NextSetBit.

The first two could be done via truth table. The third by a look up table. The fourth could be done several ways including tables or maybe even using frexp().

Some operations will need software 'normalization'. Such as trying to shift bits from one double to another. But that's entirely do-able. The 'double-double' math packages already handle things just like. And do so reliably. (From a math point of view, those 'double-double' packages look just like a real 128 bit floating point number with a 106 bit mantissa.)

For the case where you have 53 bits of data where the 53rd bit is *NOT* set then you don't actually have 53 bits of data. You have 52 bits. Or 51 or 50 or where ever the highest bit is set. In which case the 52 bits that are physically there are enough to hold it without issue. The FPU will normalize things so the highest bit is set and adjust its exponent accordingly.

For the case where there are no bits at all set, the FPU has a special flag to idicate that.

So you've covered all the bases. Bit 53 set and 52 bits of other data. Bit 53 not set and up to 52 bits of other data. No bits set.

A total of 53 bits of user data.
You are completely missing my point. My point was, and is, that you do not have any specific operation that can change that upper bit from 0 to 1 or 1 to zero, when dealing with a single bit. For example, I have all "53 bits" set to zero. By using the IEEE fp 0.0 representation. Now I want to turn that leftmost bit to a 1. I have to mangle the exponent to do this, there is no arithmetic operation that will accomplish that. I can turn any of the other 52 bits on or off whenever I want. Simple subtraction or addition will work. But that 53rd bit won't work the same way, which is a problem IMHO, and this is _the_ problem I have been trying (without success) to explain. Using that single extra bit is going to make this much more complicated to do, when there is no benefit to the idea in the first place.
Bob, you are still thinking of the double's as if they were bit sets. Where a specific bit *HAS* to be in a specific location. Where Bit 53 really is bit 53 and bit 17 really is at bit 17.

And for some reason you are really hung up about the exponent ....

When you treat them as math, you no longer have to worry about that. It doesn't matter if the FPU normalizes things so bit 17 is really that hidden 53rd bit.

There is no 'leftmost' or 'rightmost' bit. You just have a floating point number that just happens to be able to hold 53 bits of data. How it organizes it is irrelevant.

If you have all zeros and you want to set that 53rd bit, then set it. Just add 2^53 to it.

If you have all zeros and you want to set bit 1, then just add 1. And guess what, the FPU will normalize that and put it into that hidden 53rd bit. That's OKAY.

So WHAT if the fpu messes with the exponent to normalize things. That's what it does. Let it. It's not part of your 53 bits of data so let it normalize its little heart out.

With only a few exceptions we don't have to care how the bits are stored in the double. The few times we would care, we can take steps to force alignment so that bit 1 of data really is at bit 1 location. But we don't have to keep them that way when doing math because the FPU will adjust things to keep itself happy.

We are trying to do this as MATH not bit / set operations. (Which is why AND & OR are so difficult....)

And, as I've said several times, we don't have to use 53 bits. If you are uncomfortable with it, then don't.... In fact, for convenience it would actually be better to just do 32 bits into each double. Some things will have to change but are do-able. And some things would actually be more convenient.

And if we went all the way and actually used some pre-existing "double-double" math package (like David Baiely's) then our mantissa will actually be 106 bits. We wouldn't have to worry about hidden bits or how they behave. The whole 64 bit bitboard would fit into the mantissa with room to spare.

You can get it to be zero with a zero exponent. You can get it to be 1 with a normal exponent. But you can't do boolean operations on that bit. It would require fiddling with the exponent instead since that 53rd bit does not exist in the actual number in IEEE format.
Doing boolean operations would require some fiddling with the numbers.

It's possible some sort of MAGIC multiplication would work, but I'm not sure. I suspect it would, but I don't know the values or how many bits you could do at a time.

The best I've been able to come up that I know would work is that since we would have to use two doubles to hold 64 bits of real data, we'd keep 32 bits in each one. (Just to make things consistant with both doubles.)

We'd then add a huge value to shift our 32 bits down to the lower word.

We then use the lowest 8 bits as an index into an array of truth tables. Repeat with the other bytes of data. We are using an integer to grab the lowest 8 bits and use it as an index, but we aren't really doing math as integers.

I'm not real fond of using truth tables but that's the best I've come up with.

(The addition of huge values to shift our data to a lower part of the word was a classic x87 hack for conversion to integer, truncating, etc. The x87 was so slow doing those operations it was actually faster to do them by hand with some clever math. Not common today, but the ideas are well tested and would still work.)

You can safely have any arbitrary 53 bit integer stored into a double without any risk of corruption. The FPU will represent it however it needs to.
See above. one of those 53 bits does not physically exist. Which means to manipulate the rightmost 52 bits, you twiddle with those. To manipulate that non-existent bit, you have to twiddle with the entire left end (exponent field) of the number rather than a single bit.
1) We don't have to actually store a full 53 bits into each double. We could store just 32, that way we have some extra space to mess with if we want.

2) If we did do some floating point operation that involved the hidden 53rd bit, then the FPU would take care of that automatically. It's designed to do that.

No, it's not designed to do boolean operations, but with the right kind of math we might be able to trick it. If not, then there's always the 'cheating' method of using truth tables like I mentioned above.

Yes, there are only 52 bits available, but the FPU does what it needs to hold the full 53 bits of integer data.

It normalizes things and has a special code for zero, etc. If the 53'rd bit is '1' then things are fine. If it's a zero then the FPU normalizes it and the exponent is adjusted. It all works out that you can safely hold 53 bits of integer data.

*HOWEVER*, just to make you happy, I'll say that you can only old 40 bits of mantissa in a double. Doesn't matter because I'm talking about using software based 'double-double' math that doubles the number of mantissa bits anyway.

So in this case instead of 53*2=106 it'd be 40*2=80 which is still more than enough for what I'm talking about.

Makes no difference because I'm curious about the bit operations as floating points.
Those do not naturally exist in any processor I know of (IEEE math processor). One can do any arbitrary boolean function as a mathematical operation. "not" means subtract from all 1's, AND means to multiply bit by bit. Etc. Not going to be fast at all when compared to using two normal 32 bit integer values which do have AND/OR/XOR operations that operate on all bits at once...
No, they don't exist. I thought I was pretty clear about that in my first post. And was one of the reasons I was asking if anybody had thought about this kind of useless stuff.

And no, it's not going to be fast. But who cares? As I've said several times this wasn't a practical method. It was just something I got curious about.

Just a "Can it be done?" kind of thing.
Anything "can be done". Whether it "should be done" is another matter, of course.
That's probably the same attitude many people had in the 60's about using expensive, serious business computers for something as frivolous as chess....

Not the same thing. I can represent chess positions by sets of 64 bit integer values, and manipulate them directly using hardware, including finding the first or last one bit or popcnt. Why on earth would anyone _want_ to do something that is going to be so much more complex was my point. We already have a requirement to do 64 bit multiplies for the magic move generation stuff. That doesn't have a prayer on this kind of approach. So we start to give up things that are quite common.

Frankly, I doubt many people in this forum would find it too odd that somebody got curious about something and wondered if it could be done. They may not care about this particular thing, but I doubt they'd find it odd.

Carey · Post by **Carey** » Wed Jun 03, 2009 10:00 pm

bob wrote:
Carey wrote:
bob wrote:
Carey wrote:
bob wrote:
BTW, one error above. You have 52 bits, not 53. The 53rd bit _must_ be a 1 bit (the IEEE "hidden" bit).
Sorry Bob, but no. Obvious example is the number zero. It has no bits set. (Yes, I know it uses a special exponent code to indicate zero.)
Here we are going to have to disagree, because I teach IEEE floating point in my asm class.
Then you don't teach it very well, Sir.

I mean no disrespect, especially considering how much I respect your chess programming abilities and history. When I tell somebody that "Hyatt said..." I know you are right because I know you've tested it thourghly.

But if you are going to brag about teaching float in your asm class, then I have to stand up and say that I don't think you do it well. That in this case, you are wrong.

Zero is a special case. when the exponent field is all zero, we now have an unnormalized number (perfectly legal) which eliminates that hidden 1 bit. But you can't use it. For 32 bit IEEE, you have 1 sign bit, 8 exponents, and 32 - 1 - 8 leaves 23 bits, not 24. Ditto for 64 bit IEEE which has 52 bits stored in the value, + the hidden 1 bit that is always present except for non-nomalized numbers.

But you show me any way to represent a 53 bit number with the high-order bit 0 or 1, and I'll kiss you.
The specs say that there *are* 52 bits of genuine data space physically available with the 53rd bit being implicitly and unchangably set as '1'. You are saying that as well.

So, for a 53 bit integer where the 53rd bit *is* set, then there's no problem. Because you have the high bit set and 52 bits of other data. The very thing the float specs explicitly allow with the 53rd bit being set but not physically present.
Not so fast. There _is_ a problem. Notably when I want to turn that 1 bit into a zero. That is no longer a "single bit operation". Which was my point.
No, it's not a single *bit* operation. It's a math operation.

FloatNum - 2^53.

Simple subtraction. The FPU will then perform the subtraction and renormalize.
"renormalize"? Shifts the entire fraction. I don't think you want to do that.

Doesn't matter what I *want*.

The reality is that's what the FPU does. So you live with it.

You work with abstractions instead.

If I have squares A1 A3 and D5 set, then the FPU is going to put them wherever it wants. I have no real control over that (without jumping through lots of hoops and kludges etc. and I don't want to do that.)

Sure, it'll normalize things and move the bits up or down in the FPU word. But, it keeps track of it. *It* knows where the bits are really located even if I don't.

So, if later want to clear A1, then I load from a var that has A1 set. That var will also be normaled. (In fact, since it's a single bit, it'll actually be in that hidden 53rd bit.)

BUT, when I do Board-Sqr the FPU will automatically align things so that when it removes the A1 square from the board, it'll effect the right square.

That's just basic floating point math.

As long as you don't try to peek at the float's bits, it just doesn't matter because the math works.

Since I'd be treating the numbers as floating point numbers, I don't care if the FPU normalizes them. It doesn't matter that bit 1 is really in bit position one. I let the FPU worry about aligning the bits properly to its math.

The numbers are no longer 64 bit sets but instead floating point numbers. It's a mental shift.
OK, that has a possibility. But it makes absolutely no sense to me, as to use bitboards, one has to visualize what the bits represent. For example, "is the E-file open?" The bits can't move around all over, or you can never visualize how to set up the masks to ask that kind of question.

That would require doing a AND operation. Which, as I've said before, is definetly a problem in floating point. AND & OR seem to be the only two operations that will be difficult to simulate or work around.

The only three things I can think of are:

1) to actually go ahead and un-normalize the numbers and use the data as indices into a truth table. Unormalizing it is fairly simple. You add a huge value to it so the FPU's auto-normalization will force the bits down into the proper spot. This is trivial stuff. A few implimentation issues, such as it would be more convenient to not put 53 bits into a double. Probably better to put 32 into both 'double's just to make things the same.

But that's implementation issues & personal choices. The point is, doing that kind of forced un-normalization has been around a long long time and is quite solid.

2) Do the software un-normalization like in #1 but skip the truth tables and cheat and just use integer AND & OR operations. Accept that it's too darn difficult to do AND & OR as floating point ops. I really don't like this option because I want to pretend we're on a cpu that either doesn't have integers or at least they are slow & difficult to use.

3) Still hoping that there might be some sort of MAGIC multiplication that could prepare the numbers for binary operations.

Converting 1 to the bit pair 01 and then add your other number. If the result is 10 then AND was true and your result is 1, else it's 0.

Trying to do that in large scale though.... I really don't know, but then I'm still amazed at all the MAGIC multiplications being done for bitboard attacks.

But anyway, I've said from the beginning that doing the AND & OR operations seem to be the big problems. It can be worked around, but not in a nice floating point way.

The concept of bitboards is that a specific bit represents a specific square. Going one level up as you are describing appears to me to be an impossible task, mentally...

It's not really. In fact, I bet you've been doing it for years.

Think of C's regular Long Long and doing operations on it.

If you do LL |= 1 or LL <<= 8 do you *really* know how the bits are arranged?

C guarantees the operations will work the same regardless of how it chooses to arange things in the CPU.

You could be working on a 64 bit big endian system. Or a 64 bit little endian system. Or a 32 bit little endian system simulating 64 bit integers. Or a 32 bit big endian system simulating 64 bit integers. Or even an old 8 bit micro that is simulating 64 bit integers.

C even allows the bits to be out of order in the CPU registers!! As long as its consistant and does the operations correctly, C doesn't care.

You don't truely know where the bits are. All you really know is how the C language represents it in the source code you write and you just trust that the compiler knows how the cpu will do things and that it'll be consistant.

Same thing when doing it with floating point. (Except for AND & OR, of course.) As long as you don't look at where the FPU wants to put the bits, the math is going to work.

If I say A1= 2^0 and B1=2^1 and so on, then I can keep using those values without caring how the FPU arranges the data.

In the bitboard programs I've done, I rarely need to set specific bits. It's not a case of setting Bit 53, for example. It's usually a matter of clearing a bit and setting a bit like while making a move.
I have to set specific bits every time I make or unmake a move. And then for the special case moves it is even more involved (castling, en passant, pawn promotions.

Setting a bit is addition. Clearing a bit is subtration. That's true whether you are working in LongLong or floating point. In my programs, I never try to set a bit that is already set or clear a bit that is already clear. I already know based on program flow. So addition & subtration work fine. In fact, in one of my early bitboard programs I did it all as add & sub of LongLong.

And then there are the masks set up to ask specific questions like "is the pawn on e4 passed?" or "is the e-file open?" or "is the pawn on f3 backward?"

And yet again, as I have said repeatedly.... Those are AND & OR operations and those are the very ones that I'm having trouble with.

They can be done, but not as pure float math. Which is what I'd like.

As for XOR, that's just a luxury. A convenience. I've never truely needed it.

The only place it's a problem is in the Zobrist hashing. But there are other methods that would work. Addition & subtration of large numbers would also generate a unique signature. You just have to allow enough extra room that they don't overflow and cause data loss.

In that case, I already know whether the bits are set or clear. So I can safely add and subtract. I don't have to do OR to set it or NAND to clear it. I can't be clever and XOR it flip it, but that's just a convenience.
What about captures? Bit is already set, you want to set it again as you move the piece to make the capture...

In my early programs, just to make sure I was doing things right, I always removed the capture first.

if (Capture) Board -= Capture.
Board -= From
Board += To;

It is not as convenient as being able to do Board |= To but it works.

Even later on, with only a few convenience factors like above, I was never in a position where I tried to clear a bit that was already clear or set one that was already set.

I knew from the program flow what things were.

Even when I do need to set a specific bit, I don't actually care that it's stored at certain location. What I do care is that when I do an operation, things will align themselves so the correct bit is effected. The FPU does guarantee that.
Again, that sounds like no chess program/engine I know of. Chess knowledge is pattern-based. The patterns are expressed as specific sets of 1 bits or 0 bits, depending on how it will be used. I very much care what is what.

You can think of them as being in specific locations, if it'll make you happy.

The FPU math will take care of aligning the decimal point to make sure the right bits are done.

And as I pointed out, even in C with LongLong you don't really know where the bits are truely located. You just trust the compiler and cpu to be consistant and do the right things.

It may normalize things for its own convenience but it doesn't matter as long as its consistant and knows how to align things when I do math on them. Which it does.

I don't have to treat them as binary operations but can do it as arithmetic operations and let the FPU normalize the numbers however it wants.

The other operations, such as shifting, can also be done without too much effort. It's just multiplication by 2 and will require a bit of software renormalization. (By 'without too much effort' I mean its possible to do wholely within floating point. I don't have to do it as integers or precomputed tables, etc.)
Now you go to far. Previously you said you don't care where the bits are, but now you are talking about shifting 1 bit. So either you _do_ care where the bits are and what they represent, or else you would _never_ want to shift 1 bit since you don't know what it represents...

No Bob, you are still very confused.

I do know what it represents. Square A1, B1, C1 etc. I know that the numeric value representing A1 is less than B1 (because I choose for it to be.)

I just don't know (or care) where the FPU puts it. That's two different things.

I care that the FPU holds them and is consistant. I don't care *how* the fpu holds them. I don't care whether they are normalized or not. I don't care whether I'm really using an 80 bit Long Double. I don't care whether I am using a software based 'double-double'. I don't even care if the FPU is working in binary or decimal (with the exception of AND & OR which would be rather difficult to do in decimal).

All I care is that the FPU can hold my data without data loss and lets me do the basic mathematical operations. (Plus AND & OR, of course....)

If I need to shift a row of the BITBOARD, then that would involve multiplying by 256 or dividing by 256.

I don't care how the FPU holds the data as long as it understands basic math.

I DO care about the BITBOARD, but unlike you I'm not thinking of the bitboard as a 64 bit SET. I'm thinking of it as 64 binary numbers. 2^0, 2^1, 2^2, 2^3... 2^63.

There are still chess concepts of rows and columns and shifting, but those aren't applied to a bit set. They are applied to a numeric value. And the FPU is free to mangle that value all it wants (ie: normalize it) as long as it doesn't actually change the value of that number it is holding (like it would if we overflowed and we lost some bits.)

This sounds like a bad dream at best, approaching a nightmare.

The only problems I see are AND, OR, PopCount and NextSetBit.

The first two could be done via truth table. The third by a look up table. The fourth could be done several ways including tables or maybe even using frexp().

Some operations will need software 'normalization'. Such as trying to shift bits from one double to another. But that's entirely do-able. The 'double-double' math packages already handle things just like. And do so reliably. (From a math point of view, those 'double-double' packages look just like a real 128 bit floating point number with a 106 bit mantissa.)

For the case where you have 53 bits of data where the 53rd bit is *NOT* set then you don't actually have 53 bits of data. You have 52 bits. Or 51 or 50 or where ever the highest bit is set. In which case the 52 bits that are physically there are enough to hold it without issue. The FPU will normalize things so the highest bit is set and adjust its exponent accordingly.

For the case where there are no bits at all set, the FPU has a special flag to idicate that.

So you've covered all the bases. Bit 53 set and 52 bits of other data. Bit 53 not set and up to 52 bits of other data. No bits set.

A total of 53 bits of user data.
You are completely missing my point. My point was, and is, that you do not have any specific operation that can change that upper bit from 0 to 1 or 1 to zero, when dealing with a single bit. For example, I have all "53 bits" set to zero. By using the IEEE fp 0.0 representation. Now I want to turn that leftmost bit to a 1. I have to mangle the exponent to do this, there is no arithmetic operation that will accomplish that. I can turn any of the other 52 bits on or off whenever I want. Simple subtraction or addition will work. But that 53rd bit won't work the same way, which is a problem IMHO, and this is _the_ problem I have been trying (without success) to explain. Using that single extra bit is going to make this much more complicated to do, when there is no benefit to the idea in the first place.
Bob, you are still thinking of the double's as if they were bit sets. Where a specific bit *HAS* to be in a specific location. Where Bit 53 really is bit 53 and bit 17 really is at bit 17.

And for some reason you are really hung up about the exponent ....

When you treat them as math, you no longer have to worry about that. It doesn't matter if the FPU normalizes things so bit 17 is really that hidden 53rd bit.

There is no 'leftmost' or 'rightmost' bit. You just have a floating point number that just happens to be able to hold 53 bits of data. How it organizes it is irrelevant.

If you have all zeros and you want to set that 53rd bit, then set it. Just add 2^53 to it.

If you have all zeros and you want to set bit 1, then just add 1. And guess what, the FPU will normalize that and put it into that hidden 53rd bit. That's OKAY.

So WHAT if the fpu messes with the exponent to normalize things. That's what it does. Let it. It's not part of your 53 bits of data so let it normalize its little heart out.

With only a few exceptions we don't have to care how the bits are stored in the double. The few times we would care, we can take steps to force alignment so that bit 1 of data really is at bit 1 location. But we don't have to keep them that way when doing math because the FPU will adjust things to keep itself happy.

We are trying to do this as MATH not bit / set operations. (Which is why AND & OR are so difficult....)

And, as I've said several times, we don't have to use 53 bits. If you are uncomfortable with it, then don't.... In fact, for convenience it would actually be better to just do 32 bits into each double. Some things will have to change but are do-able. And some things would actually be more convenient.

And if we went all the way and actually used some pre-existing "double-double" math package (like David Baiely's) then our mantissa will actually be 106 bits. We wouldn't have to worry about hidden bits or how they behave. The whole 64 bit bitboard would fit into the mantissa with room to spare.

You can get it to be zero with a zero exponent. You can get it to be 1 with a normal exponent. But you can't do boolean operations on that bit. It would require fiddling with the exponent instead since that 53rd bit does not exist in the actual number in IEEE format.
Doing boolean operations would require some fiddling with the numbers.

It's possible some sort of MAGIC multiplication would work, but I'm not sure. I suspect it would, but I don't know the values or how many bits you could do at a time.

The best I've been able to come up that I know would work is that since we would have to use two doubles to hold 64 bits of real data, we'd keep 32 bits in each one. (Just to make things consistant with both doubles.)

We'd then add a huge value to shift our 32 bits down to the lower word.

We then use the lowest 8 bits as an index into an array of truth tables. Repeat with the other bytes of data. We are using an integer to grab the lowest 8 bits and use it as an index, but we aren't really doing math as integers.

I'm not real fond of using truth tables but that's the best I've come up with.

(The addition of huge values to shift our data to a lower part of the word was a classic x87 hack for conversion to integer, truncating, etc. The x87 was so slow doing those operations it was actually faster to do them by hand with some clever math. Not common today, but the ideas are well tested and would still work.)

You can safely have any arbitrary 53 bit integer stored into a double without any risk of corruption. The FPU will represent it however it needs to.
See above. one of those 53 bits does not physically exist. Which means to manipulate the rightmost 52 bits, you twiddle with those. To manipulate that non-existent bit, you have to twiddle with the entire left end (exponent field) of the number rather than a single bit.
1) We don't have to actually store a full 53 bits into each double. We could store just 32, that way we have some extra space to mess with if we want.

2) If we did do some floating point operation that involved the hidden 53rd bit, then the FPU would take care of that automatically. It's designed to do that.

No, it's not designed to do boolean operations, but with the right kind of math we might be able to trick it. If not, then there's always the 'cheating' method of using truth tables like I mentioned above.

Yes, there are only 52 bits available, but the FPU does what it needs to hold the full 53 bits of integer data.

It normalizes things and has a special code for zero, etc. If the 53'rd bit is '1' then things are fine. If it's a zero then the FPU normalizes it and the exponent is adjusted. It all works out that you can safely hold 53 bits of integer data.

*HOWEVER*, just to make you happy, I'll say that you can only old 40 bits of mantissa in a double. Doesn't matter because I'm talking about using software based 'double-double' math that doubles the number of mantissa bits anyway.

So in this case instead of 53*2=106 it'd be 40*2=80 which is still more than enough for what I'm talking about.

Makes no difference because I'm curious about the bit operations as floating points.
Those do not naturally exist in any processor I know of (IEEE math processor). One can do any arbitrary boolean function as a mathematical operation. "not" means subtract from all 1's, AND means to multiply bit by bit. Etc. Not going to be fast at all when compared to using two normal 32 bit integer values which do have AND/OR/XOR operations that operate on all bits at once...
No, they don't exist. I thought I was pretty clear about that in my first post. And was one of the reasons I was asking if anybody had thought about this kind of useless stuff.

And no, it's not going to be fast. But who cares? As I've said several times this wasn't a practical method. It was just something I got curious about.

Just a "Can it be done?" kind of thing.
Anything "can be done". Whether it "should be done" is another matter, of course.
That's probably the same attitude many people had in the 60's about using expensive, serious business computers for something as frivolous as chess....

Not the same thing. I can represent chess positions by sets of 64 bit integer values, and manipulate them directly using hardware, including finding the first or last one bit or popcnt.

Bob, there really isn't that much effort in what I'm talking about.

It wouldn't be high performance. But most of what I'm talking about will work equivelent to how 64 bit integers work.

Just differently.

The big exception, of course, still seems to be AND & OR. They can be done but not convenient.

And so what if you can use a 64 bit integer in a 64 bit cpu? It's not even slightly relevant.

Why on earth would anyone _want_ to do something that is going to be so much more complex was my point.

For a teacher, that's a pretty surprising attitude!

For a chess programmer it's a pretty surprising attitude.

Look at the classic Slate & Aktin bitboards. it was a mess to do, espeically on the hardware they had. And look how little they actually used all that data they generated.

But yet they did it anyway and discovered there were benefits and now most high performance programs depend on the concepts they pioneered.

Look at Magic Multiplication for the attacks. In the beginning I think a lot of people didn't expect it to really work, much less end up being so efficient. It was just a toy.

(I didn't take it seriously. I thought it was a toy. I accept that it works, but I'm still not comfortable with using 'random' numbers to do necessary stuff. But I do admit that it works and is fast.)

But yet just a couple years later it's turned into a very fast way to do attacks and move generation and has become the standard way to do things.

Will my ideas join those two greats. Of course not. But that doesn't mean I can't be curious about something.

As I said.... that was a very odd attitude for a teacher.

We already have a requirement to do 64 bit multiplies for the magic move generation stuff.

Really? Gee... all these years people have been managing to do bitboard move gen stuff without 64 bit magic muls....

Who says you *have* to do 64 bit magic multiplication?

You could do 32 bit magic multiplication methods.

Or no magic multiplication and depend on FirstBit() / LastBit() and some precomputed rays etc.

Or some other method.

There are lots of ways that don't require 64 bit magic muls. (Especially since magic muls tend to require AND operations)

But I had already thought of that. Instead of using two doubles to have enough space to hold 64 bits, it might be better to work with three floats instead, and put 22 bits into each.

That way we can do multiplications in a 'double' and hold the whole result.

So instead of doing two 32 bit Magic Muls for attacks (like on 32 bit cpus), we've changed things and are now doing three 22 bit Magic Muls for attacks.

But the point is, you don't *HAVE* to use 64 bit magic multiplication. There are alternatives.

That doesn't have a prayer on this kind of approach. So we start to give up things that are quite common.

They weren't "quite common" two years ago....

Amazing how quickly you can latch onto one idea while discarding the old and rejecting yet others just because they are different.

I never said this would be *practical*. In fact, I've actually explicitly said several times that is would NOT be practical.

Remember, this all started out as simply the question "Is this possible?"

It seems to be a firm yes, with the caveat that AND & OR are likely to be slow & inconvenient or cheating and using integers in those two spots. (At least unless somebody comes up with a better idea.)

If we were working with a FPU that actually had AND & OR, and were willing to loose some portability, I guess that would be toleable.

Like the SEE stuff Gerd pointed.

I'd rather not do SEE stuff though. I would rather depend on normal stuff in C.

Frankly, I doubt many people in this forum would find it too odd that somebody got curious about something and wondered if it could be done. They may not care about this particular thing, but I doubt they'd find it odd.

bob · Post by **bob** » Wed Jun 03, 2009 10:22 pm

Carey wrote:
bob wrote:
Carey wrote:
bob wrote:
Carey wrote:
bob wrote:
BTW, one error above. You have 52 bits, not 53. The 53rd bit _must_ be a 1 bit (the IEEE "hidden" bit).
Sorry Bob, but no. Obvious example is the number zero. It has no bits set. (Yes, I know it uses a special exponent code to indicate zero.)
Here we are going to have to disagree, because I teach IEEE floating point in my asm class.
Then you don't teach it very well, Sir.

I mean no disrespect, especially considering how much I respect your chess programming abilities and history. When I tell somebody that "Hyatt said..." I know you are right because I know you've tested it thourghly.

But if you are going to brag about teaching float in your asm class, then I have to stand up and say that I don't think you do it well. That in this case, you are wrong.

Zero is a special case. when the exponent field is all zero, we now have an unnormalized number (perfectly legal) which eliminates that hidden 1 bit. But you can't use it. For 32 bit IEEE, you have 1 sign bit, 8 exponents, and 32 - 1 - 8 leaves 23 bits, not 24. Ditto for 64 bit IEEE which has 52 bits stored in the value, + the hidden 1 bit that is always present except for non-nomalized numbers.

But you show me any way to represent a 53 bit number with the high-order bit 0 or 1, and I'll kiss you.
The specs say that there *are* 52 bits of genuine data space physically available with the 53rd bit being implicitly and unchangably set as '1'. You are saying that as well.

So, for a 53 bit integer where the 53rd bit *is* set, then there's no problem. Because you have the high bit set and 52 bits of other data. The very thing the float specs explicitly allow with the 53rd bit being set but not physically present.
Not so fast. There _is_ a problem. Notably when I want to turn that 1 bit into a zero. That is no longer a "single bit operation". Which was my point.
No, it's not a single *bit* operation. It's a math operation.

FloatNum - 2^53.

Simple subtraction. The FPU will then perform the subtraction and renormalize.
"renormalize"? Shifts the entire fraction. I don't think you want to do that.
Doesn't matter what I *want*.

The reality is that's what the FPU does. So you live with it.

You work with abstractions instead.

Let's get back to the _real_ world. The chess board can not be abstracted, because it has to be evaluated, and I have to explicitly prepare patterns to operate on the board representation. If the bits are going to get moved, then I have no idea how to prepare any patterns for anything from generating moves to evaluating positions...

If I have squares A1 A3 and D5 set, then the FPU is going to put them wherever it wants. I have no real control over that (without jumping through lots of hoops and kludges etc. and I don't want to do that.)

Then this is a pointless discussion. Because I have to know where A1 is all the time. It can't "dance around" because I now need thousands of patterns to cover all the places where it might be, which is useless...

My original observations dealt with real-world realizable algorithms that might work. we have now gone far beyond that to something that appears to have no usefullness no matter how far out into the twilight one might want to venture.

The entire concept of bitboards is that each square on the board is represented by a specific bit in the bitmap. If that is not valid, then nothing I have said previously applies, as the idea is not just bad, it is unworkable.

Sure, it'll normalize things and move the bits up or down in the FPU word. But, it keeps track of it. *It* knows where the bits are really located even if I don't.

So, if later want to clear A1, then I load from a var that has A1 set. That var will also be normaled. (In fact, since it's a single bit, it'll actually be in that hidden 53rd bit.)

BUT, when I do Board-Sqr the FPU will automatically align things so that when it removes the A1 square from the board, it'll effect the right square.

That's just basic floating point math.

As long as you don't try to peek at the float's bits, it just doesn't matter because the math works.
[/quote

Doesn't sound like it will work in a way that is _usable_ however. Because worrying about that 53 bit that you ought not be using in the first place, is going to cause _other_ bits to move around. So I now have to have patterns for cases where that barely usable bit is set and other patterns for where it is not set. That's beyond a kludge... The bits are all independent, and they can't move around and be usable by a real human programmer.]

Since I'd be treating the numbers as floating point numbers, I don't care if the FPU normalizes them. It doesn't matter that bit 1 is really in bit position one. I let the FPU worry about aligning the bits properly to its math.

The numbers are no longer 64 bit sets but instead floating point numbers. It's a mental shift.
OK, that has a possibility. But it makes absolutely no sense to me, as to use bitboards, one has to visualize what the bits represent. For example, "is the E-file open?" The bits can't move around all over, or you can never visualize how to set up the masks to ask that kind of question.
That would require doing a AND operation. Which, as I've said before, is definetly a problem in floating point. AND & OR seem to be the only two operations that will be difficult to simulate or work around.

The only three things I can think of are:

1) to actually go ahead and un-normalize the numbers and use the data as indices into a truth table. Unormalizing it is fairly simple. You add a huge value to it so the FPU's auto-normalization will force the bits down into the proper spot. This is trivial stuff. A few implimentation issues, such as it would be more convenient to not put 53 bits into a double. Probably better to put 32 into both 'double's just to make things the same.

But that's implementation issues & personal choices. The point is, doing that kind of forced un-normalization has been around a long long time and is quite solid.

2) Do the software un-normalization like in #1 but skip the truth tables and cheat and just use integer AND & OR operations. Accept that it's too darn difficult to do AND & OR as floating point ops. I really don't like this option because I want to pretend we're on a cpu that either doesn't have integers or at least they are slow & difficult to use.

3) Still hoping that there might be some sort of MAGIC multiplication that could prepare the numbers for binary operations.

Converting 1 to the bit pair 01 and then add your other number. If the result is 10 then AND was true and your result is 1, else it's 0.

Trying to do that in large scale though.... I really don't know, but then I'm still amazed at all the MAGIC multiplications being done for bitboard attacks.

But anyway, I've said from the beginning that doing the AND & OR operations seem to be the big problems. It can be worked around, but not in a nice floating point way.

The concept of bitboards is that a specific bit represents a specific square. Going one level up as you are describing appears to me to be an impossible task, mentally...
It's not really. In fact, I bet you've been doing it for years.

Think of C's regular Long Long and doing operations on it.

If you do LL |= 1 or LL <<= 8 do you *really* know how the bits are arranged?
I _absolutely_ do know this. If I didn't I could not generate moves, evaluate positions, move pieces around, clear occupied squares, etc... I also know which bit it bit 0 and which bit is bit 63, and how they are distributed in between, and how each bit corresponds to a single square, etc... I care that the LSB is bit number 0, and I equate that to square A1. I care that the MSB is bit 63, and that square is H8. And I demand that the same bit be in the same place for all time. If the CPU wants to multiplex bits internally, I do not care, but I do care that the bits are numbered 0 thru 63, right to left, and that their positions _never_ change. This doesn't seem to fit the FP model you are describing, because I am going to have to pay attention to the exponent field to see if the bits have been shifted around to unusual locations...

C guarantees the operations will work the same regardless of how it chooses to arange things in the CPU.
Not quite. It guarantees that x << 1 multiplies X by 2. If the bits are not adjacent, and the rightmost bit is the LSB, this will not work. So the view of the integer registers that I am presented is one that is consistent and which does not change based on the contents of the value. The sign bit will always be the MSB. Signed numbers are always stored in 2's complement with the high-order bits always set to 1, etc...

You could be working on a 64 bit big endian system. Or a 64 bit little endian system. Or a 32 bit little endian system simulating 64 bit integers. Or a 32 bit big endian system simulating 64 bit integers. Or even an old 8 bit micro that is simulating 64 bit integers.
Endian is a memory issue, not an internal data representation issue. The bits are always numbered from LSB to MSB, right to left, when you work on them. I don't care how they are stored in memory. I very much care how they are ordered when in a register where I can get at 'em and do something with them.

C even allows the bits to be out of order in the CPU registers!! As long as its consistant and does the operations correctly, C doesn't care.
Now we are out of the real world. I know where the LSB of a word is. C knows otherwise shift and multiply will not work correctly. And there is the problem of BSF/BSR as well, which also correlate specific bit to a specific number.. C has the nasty habit of presenting binary data to me exactly as it is presented by the CPU. The ISA of the machine specifies all this in clear detail and C complies perfectly, at least for the cases I have used.

So the "C doesn't care" is a hyperbolic statement, because C does specify how its data types are represented and how the various operators modify those values. C certainly doesn't care how the _cpu pipelines_ do their thing, but it definitely cares about the bit order. otherwise signed/unsigned could not work at any level.

You don't truely know where the bits are. All you really know is how the C language represents it in the source code you write and you just trust that the compiler knows how the cpu will do things and that it'll be consistant.

Same thing when doing it with floating point. (Except for AND & OR, of course.) As long as you don't look at where the FPU wants to put the bits, the math is going to work.

If I say A1= 2^0 and B1=2^1 and so on, then I can keep using those values without caring how the FPU arranges the data.

In the bitboard programs I've done, I rarely need to set specific bits. It's not a case of setting Bit 53, for example. It's usually a matter of clearing a bit and setting a bit like while making a move.
I have to set specific bits every time I make or unmake a move. And then for the special case moves it is even more involved (castling, en passant, pawn promotions.
Setting a bit is addition. Clearing a bit is subtration. That's true whether you are working in LongLong or floating point. In my programs, I never try to set a bit that is already set or clear a bit that is already clear. I already know based on program flow. So addition & subtration work fine. In fact, in one of my early bitboard programs I did it all as add & sub of LongLong.

And then there are the masks set up to ask specific questions like "is the pawn on e4 passed?" or "is the e-file open?" or "is the pawn on f3 backward?"
And yet again, as I have said repeatedly.... Those are AND & OR operations and those are the very ones that I'm having trouble with.

They can be done, but not as pure float math. Which is what I'd like.

As for XOR, that's just a luxury. A convenience. I've never truely needed it.

The only place it's a problem is in the Zobrist hashing. But there are other methods that would work. Addition & subtration of large numbers would also generate a unique signature. You just have to allow enough extra room that they don't overflow and cause data loss.

In that case, I already know whether the bits are set or clear. So I can safely add and subtract. I don't have to do OR to set it or NAND to clear it. I can't be clever and XOR it flip it, but that's just a convenience.
What about captures? Bit is already set, you want to set it again as you move the piece to make the capture...
In my early programs, just to make sure I was doing things right, I always removed the capture first.

if (Capture) Board -= Capture.
Board -= From
Board += To;

It is not as convenient as being able to do Board |= To but it works.

Even later on, with only a few convenience factors like above, I was never in a position where I tried to clear a bit that was already clear or set one that was already set.

I knew from the program flow what things were.

Even when I do need to set a specific bit, I don't actually care that it's stored at certain location. What I do care is that when I do an operation, things will align themselves so the correct bit is effected. The FPU does guarantee that.
Again, that sounds like no chess program/engine I know of. Chess knowledge is pattern-based. The patterns are expressed as specific sets of 1 bits or 0 bits, depending on how it will be used. I very much care what is what.
You can think of them as being in specific locations, if it'll make you happy.

The FPU math will take care of aligning the decimal point to make sure the right bits are done.

And as I pointed out, even in C with LongLong you don't really know where the bits are truely located. You just trust the compiler and cpu to be consistant and do the right things.

It may normalize things for its own convenience but it doesn't matter as long as its consistant and knows how to align things when I do math on them. Which it does.

I don't have to treat them as binary operations but can do it as arithmetic operations and let the FPU normalize the numbers however it wants.

The other operations, such as shifting, can also be done without too much effort. It's just multiplication by 2 and will require a bit of software renormalization. (By 'without too much effort' I mean its possible to do wholely within floating point. I don't have to do it as integers or precomputed tables, etc.)
Now you go to far. Previously you said you don't care where the bits are, but now you are talking about shifting 1 bit. So either you _do_ care where the bits are and what they represent, or else you would _never_ want to shift 1 bit since you don't know what it represents...
No Bob, you are still very confused.

I do know what it represents. Square A1, B1, C1 etc. I know that the numeric value representing A1 is less than B1 (because I choose for it to be.)

I just don't know (or care) where the FPU puts it. That's two different things.

I care that the FPU holds them and is consistant. I don't care *how* the fpu holds them. I don't care whether they are normalized or not. I don't care whether I'm really using an 80 bit Long Double. I don't care whether I am using a software based 'double-double'. I don't even care if the FPU is working in binary or decimal (with the exception of AND & OR which would be rather difficult to do in decimal).

All I care is that the FPU can hold my data without data loss and lets me do the basic mathematical operations. (Plus AND & OR, of course....)

If I need to shift a row of the BITBOARD, then that would involve multiplying by 256 or dividing by 256.

I don't care how the FPU holds the data as long as it understands basic math.

I DO care about the BITBOARD, but unlike you I'm not thinking of the bitboard as a 64 bit SET. I'm thinking of it as 64 binary numbers. 2^0, 2^1, 2^2, 2^3... 2^63.

There are still chess concepts of rows and columns and shifting, but those aren't applied to a bit set. They are applied to a numeric value. And the FPU is free to mangle that value all it wants (ie: normalize it) as long as it doesn't actually change the value of that number it is holding (like it would if we overflowed and we lost some bits.)

This sounds like a bad dream at best, approaching a nightmare.

The only problems I see are AND, OR, PopCount and NextSetBit.

The first two could be done via truth table. The third by a look up table. The fourth could be done several ways including tables or maybe even using frexp().

Some operations will need software 'normalization'. Such as trying to shift bits from one double to another. But that's entirely do-able. The 'double-double' math packages already handle things just like. And do so reliably. (From a math point of view, those 'double-double' packages look just like a real 128 bit floating point number with a 106 bit mantissa.)

For the case where you have 53 bits of data where the 53rd bit is *NOT* set then you don't actually have 53 bits of data. You have 52 bits. Or 51 or 50 or where ever the highest bit is set. In which case the 52 bits that are physically there are enough to hold it without issue. The FPU will normalize things so the highest bit is set and adjust its exponent accordingly.

For the case where there are no bits at all set, the FPU has a special flag to idicate that.

So you've covered all the bases. Bit 53 set and 52 bits of other data. Bit 53 not set and up to 52 bits of other data. No bits set.

A total of 53 bits of user data.
You are completely missing my point. My point was, and is, that you do not have any specific operation that can change that upper bit from 0 to 1 or 1 to zero, when dealing with a single bit. For example, I have all "53 bits" set to zero. By using the IEEE fp 0.0 representation. Now I want to turn that leftmost bit to a 1. I have to mangle the exponent to do this, there is no arithmetic operation that will accomplish that. I can turn any of the other 52 bits on or off whenever I want. Simple subtraction or addition will work. But that 53rd bit won't work the same way, which is a problem IMHO, and this is _the_ problem I have been trying (without success) to explain. Using that single extra bit is going to make this much more complicated to do, when there is no benefit to the idea in the first place.
Bob, you are still thinking of the double's as if they were bit sets. Where a specific bit *HAS* to be in a specific location. Where Bit 53 really is bit 53 and bit 17 really is at bit 17.

And for some reason you are really hung up about the exponent ....

When you treat them as math, you no longer have to worry about that. It doesn't matter if the FPU normalizes things so bit 17 is really that hidden 53rd bit.

There is no 'leftmost' or 'rightmost' bit. You just have a floating point number that just happens to be able to hold 53 bits of data. How it organizes it is irrelevant.

If you have all zeros and you want to set that 53rd bit, then set it. Just add 2^53 to it.

If you have all zeros and you want to set bit 1, then just add 1. And guess what, the FPU will normalize that and put it into that hidden 53rd bit. That's OKAY.

So WHAT if the fpu messes with the exponent to normalize things. That's what it does. Let it. It's not part of your 53 bits of data so let it normalize its little heart out.

With only a few exceptions we don't have to care how the bits are stored in the double. The few times we would care, we can take steps to force alignment so that bit 1 of data really is at bit 1 location. But we don't have to keep them that way when doing math because the FPU will adjust things to keep itself happy.

We are trying to do this as MATH not bit / set operations. (Which is why AND & OR are so difficult....)

And, as I've said several times, we don't have to use 53 bits. If you are uncomfortable with it, then don't.... In fact, for convenience it would actually be better to just do 32 bits into each double. Some things will have to change but are do-able. And some things would actually be more convenient.

And if we went all the way and actually used some pre-existing "double-double" math package (like David Baiely's) then our mantissa will actually be 106 bits. We wouldn't have to worry about hidden bits or how they behave. The whole 64 bit bitboard would fit into the mantissa with room to spare.

You can get it to be zero with a zero exponent. You can get it to be 1 with a normal exponent. But you can't do boolean operations on that bit. It would require fiddling with the exponent instead since that 53rd bit does not exist in the actual number in IEEE format.
Doing boolean operations would require some fiddling with the numbers.

It's possible some sort of MAGIC multiplication would work, but I'm not sure. I suspect it would, but I don't know the values or how many bits you could do at a time.

The best I've been able to come up that I know would work is that since we would have to use two doubles to hold 64 bits of real data, we'd keep 32 bits in each one. (Just to make things consistant with both doubles.)

We'd then add a huge value to shift our 32 bits down to the lower word.

We then use the lowest 8 bits as an index into an array of truth tables. Repeat with the other bytes of data. We are using an integer to grab the lowest 8 bits and use it as an index, but we aren't really doing math as integers.

I'm not real fond of using truth tables but that's the best I've come up with.

(The addition of huge values to shift our data to a lower part of the word was a classic x87 hack for conversion to integer, truncating, etc. The x87 was so slow doing those operations it was actually faster to do them by hand with some clever math. Not common today, but the ideas are well tested and would still work.)

You can safely have any arbitrary 53 bit integer stored into a double without any risk of corruption. The FPU will represent it however it needs to.
See above. one of those 53 bits does not physically exist. Which means to manipulate the rightmost 52 bits, you twiddle with those. To manipulate that non-existent bit, you have to twiddle with the entire left end (exponent field) of the number rather than a single bit.
1) We don't have to actually store a full 53 bits into each double. We could store just 32, that way we have some extra space to mess with if we want.

2) If we did do some floating point operation that involved the hidden 53rd bit, then the FPU would take care of that automatically. It's designed to do that.

No, it's not designed to do boolean operations, but with the right kind of math we might be able to trick it. If not, then there's always the 'cheating' method of using truth tables like I mentioned above.

Yes, there are only 52 bits available, but the FPU does what it needs to hold the full 53 bits of integer data.

It normalizes things and has a special code for zero, etc. If the 53'rd bit is '1' then things are fine. If it's a zero then the FPU normalizes it and the exponent is adjusted. It all works out that you can safely hold 53 bits of integer data.

*HOWEVER*, just to make you happy, I'll say that you can only old 40 bits of mantissa in a double. Doesn't matter because I'm talking about using software based 'double-double' math that doubles the number of mantissa bits anyway.

So in this case instead of 53*2=106 it'd be 40*2=80 which is still more than enough for what I'm talking about.

Makes no difference because I'm curious about the bit operations as floating points.
Those do not naturally exist in any processor I know of (IEEE math processor). One can do any arbitrary boolean function as a mathematical operation. "not" means subtract from all 1's, AND means to multiply bit by bit. Etc. Not going to be fast at all when compared to using two normal 32 bit integer values which do have AND/OR/XOR operations that operate on all bits at once...
No, they don't exist. I thought I was pretty clear about that in my first post. And was one of the reasons I was asking if anybody had thought about this kind of useless stuff.

And no, it's not going to be fast. But who cares? As I've said several times this wasn't a practical method. It was just something I got curious about.

Just a "Can it be done?" kind of thing.
Anything "can be done". Whether it "should be done" is another matter, of course.
That's probably the same attitude many people had in the 60's about using expensive, serious business computers for something as frivolous as chess....

Not the same thing. I can represent chess positions by sets of 64 bit integer values, and manipulate them directly using hardware, including finding the first or last one bit or popcnt.
Bob, there really isn't that much effort in what I'm talking about.

It wouldn't be high performance. But most of what I'm talking about will work equivelent to how 64 bit integers work.

Just differently.

The big exception, of course, still seems to be AND & OR. They can be done but not convenient.

And so what if you can use a 64 bit integer in a 64 bit cpu? It's not even slightly relevant.

Why on earth would anyone _want_ to do something that is going to be so much more complex was my point.
For a teacher, that's a pretty surprising attitude!

For a chess programmer it's a pretty surprising attitude.

Look at the classic Slate & Aktin bitboards. it was a mess to do, espeically on the hardware they had. And look how little they actually used all that data they generated.

But yet they did it anyway and discovered there were benefits and now most high performance programs depend on the concepts they pioneered.

Look at Magic Multiplication for the attacks. In the beginning I think a lot of people didn't expect it to really work, much less end up being so efficient. It was just a toy.

(I didn't take it seriously. I thought it was a toy. I accept that it works, but I'm still not comfortable with using 'random' numbers to do necessary stuff. But I do admit that it works and is fast.)

But yet just a couple years later it's turned into a very fast way to do attacks and move generation and has become the standard way to do things.

Will my ideas join those two greats. Of course not. But that doesn't mean I can't be curious about something.

As I said.... that was a very odd attitude for a teacher.

We already have a requirement to do 64 bit multiplies for the magic move generation stuff.
Really? Gee... all these years people have been managing to do bitboard move gen stuff without 64 bit magic muls....

Who says you *have* to do 64 bit magic multiplication?

You could do 32 bit magic multiplication methods.

Or no magic multiplication and depend on FirstBit() / LastBit() and some precomputed rays etc.

Or some other method.

There are lots of ways that don't require 64 bit magic muls. (Especially since magic muls tend to require AND operations)

But I had already thought of that. Instead of using two doubles to have enough space to hold 64 bits, it might be better to work with three floats instead, and put 22 bits into each.

That way we can do multiplications in a 'double' and hold the whole result.

So instead of doing two 32 bit Magic Muls for attacks (like on 32 bit cpus), we've changed things and are now doing three 22 bit Magic Muls for attacks.

But the point is, you don't *HAVE* to use 64 bit magic multiplication. There are alternatives.

That doesn't have a prayer on this kind of approach. So we start to give up things that are quite common.
They weren't "quite common" two years ago....

Amazing how quickly you can latch onto one idea while discarding the old and rejecting yet others just because they are different.

I never said this would be *practical*. In fact, I've actually explicitly said several times that is would NOT be practical.

Remember, this all started out as simply the question "Is this possible?"

It seems to be a firm yes, with the caveat that AND & OR are likely to be slow & inconvenient or cheating and using integers in those two spots. (At least unless somebody comes up with a better idea.)

If we were working with a FPU that actually had AND & OR, and were willing to loose some portability, I guess that would be toleable.

Like the SEE stuff Gerd pointed.

I'd rather not do SEE stuff though. I would rather depend on normal stuff in C.

Frankly, I doubt many people in this forum would find it too odd that somebody got curious about something and wondered if it could be done. They may not care about this particular thing, but I doubt they'd find it odd.

Carey · Post by **Carey** » Fri Jun 05, 2009 3:45 am

Big snipping of dead wood to get this thing down to size....

bob wrote:
Carey wrote: Doesn't matter what I *want*.

The reality is that's what the FPU does. So you live with it.

You work with abstractions instead.
Let's get back to the _real_ world. The chess board can not be abstracted, because it has to be evaluated, and I have to explicitly prepare patterns to operate on the board representation. If the bits are going to get moved, then I have no idea how to prepare any patterns for anything from generating moves to evaluating positions...

Simple Bob, the FPU will move them around as needed. You don't have to worry about them.

Instead of worrying that A1 is in bit position 0, you instead know that A1 has a *value* of 2^0 and B1 has a value of 2^1 and so on. (Or however you want to rotate the bitboard and whether you want A1, H1 A8 or H8 to start at 2^0....)

It *exactly* like doing a fpu addition.

If you are adding 1 + 234235437 then nothing is going to be aligned while in the FPU register or stored in memory.

The '1' is going to be in that hidden 53rd bit and the other number is going to be spread out over a couple dozen upper bits.

But, as soon as you do FADD, the fpu is automagically going to align those decimal points and do the math right.

With the exception of the binary AND & OR (which are a problem), all the rest will work out as plain ordinary FPU math operations.

The FPU doesn't know it's really representing a chess bitmap. Since we are treating it like a number, it 's happy and will give us the right answer.

If I have squares A1 A3 and D5 set, then the FPU is going to put them wherever it wants. I have no real control over that (without jumping through lots of hoops and kludges etc. and I don't want to do that.)
Then this is a pointless discussion. Because I have to know where A1 is all the time. It can't "dance around" because I now need thousands of patterns to cover all the places where it might be, which is useless...

My original observations dealt with real-world realizable algorithms that might work. we have now gone far beyond that to something that appears to have no usefullness no matter how far out into the twilight one might want to venture.

The entire concept of bitboards is that each square on the board is represented by a specific bit in the bitmap. If that is not valid, then nothing I have said previously applies, as the idea is not just bad, it is unworkable.

We are still working with essentially bitboards. We're just letting the FPU think it's really a number.

You insisnt on treating them as a SET. A classic Pascal style SET. Which technically they are.

I'm treating them like they were numbers. Something the FPU is pretty good at doing (with the exception of AND & general purpose ORs.)

You shift by 8, I multiply by 256.

There really isn't that much difference, Bob.

Honestly.

There's very little difference in the two (with the exception of certain areas) except which ALU operates on the bitboard.

In yours, its the integer ALU. In my idea, it's the FPU's ALU.

Sure, it'll normalize things and move the bits up or down in the FPU word. But, it keeps track of it. *It* knows where the bits are really located even if I don't.

So, if later want to clear A1, then I load from a var that has A1 set. That var will also be normaled. (In fact, since it's a single bit, it'll actually be in that hidden 53rd bit.)

BUT, when I do Board-Sqr the FPU will automatically align things so that when it removes the A1 square from the board, it'll effect the right square.

That's just basic floating point math.

As long as you don't try to peek at the float's bits, it just doesn't matter because the math works.
Doesn't sound like it will work in a way that is _usable_ however. Because worrying about that 53 bit that you ought not be using in the first place, is going to cause _other_ bits to move around. So I now have to have patterns for cases where that barely usable bit is set and other patterns for where it is not set. That's beyond a kludge... The bits are all independent, and they can't move around and be usable by a real human programmer.]

Hmmm.... Maybe I see part of your problem....

You are thinking of bitboard patterns. And what do you do with bitboard patterns? You end up AND'ing them against each other.

In many cases, where you know you are setting or clearing a bit that is already clear or set, you can do it with addition & subtraction.

Try it with LongLong. It'll work. Stuff it into an Intel Long Double and it'll work. Stick it into some software based decimal math package and do it and it'll still work.

General purpose OR's are a problem though. Fortunately they are rare.

With the AND, though, you are definetly going to have a problem. The FPU just doesn't do that. (Unless somebody pops up with a MAGIC mul way to do even small ANDs)

For general AND's, we will have to align things and do them with the CPU's integer hardware or via table lookups because I don't see any reasonable way to do them in the FPU.

But I've said that since probably the first or second message, so that's not some new admission on my part.

Now when you do AND's, that 53rd bit can be a problem because the FPU doesn't know how to do an AND and you have to do it yourself.

BUT, if you are setting or clearing bits that you already know the state of (like in moving a piece), then it's not a problem. Addition and subtration will work properly and the FPU will deal with that hidden 53rd bit just like its designed to.

That 53rd bit can only be a problem when you try do a general purpose AND, like if you are masking an evaluation pattern.

And the only reason that becomes is a problem is you are no longer trying to do things in a math oriented way. You are trying to shift back and do things in classic SET ways.

Is there a math oriented way to do AND that will make the FPU happy? Dunno. I don't know of any, but I can't rule it out.

The concept of bitboards is that a specific bit represents a specific square. Going one level up as you are describing appears to me to be an impossible task, mentally...
It's not really. In fact, I bet you've been doing it for years.

Think of C's regular Long Long and doing operations on it.

If you do LL |= 1 or LL <<= 8 do you *really* know how the bits are arranged?
I _absolutely_ do know this. If I didn't I could not generate moves, evaluate positions, move pieces around, clear occupied squares, etc... I also know which bit it bit 0 and which bit is bit 63, and how they are distributed in between, and how each bit corresponds to a single square, etc... I care that the LSB is bit number 0, and I equate that to square A1. I care that the MSB is bit 63, and that square is H8. And I demand that the same bit be in the same place for all time. If the CPU wants to multiplex bits internally, I do not care, but I do care that the bits are numbered 0 thru 63, right to left, and that their positions _never_ change. This doesn't seem to fit the FP model you are describing, because I am going to have to pay attention to the exponent field to see if the bits have been shifted around to unusual locations...

Gotcha.

No, Bob, you don't know how the CPU stores the bits.

Take a simple, generic C based chess program around to nearly every system and it's going to work right.

Why? Because the C language requires that it work right. And the CPU takes steps to make sure things work right, no matter how it chooses to do things internally. If a CPU can't do it right, C compilers have to fake it in software. (Think 'long long' on an 16 or 32 bit system.)

It may be a little, big or even middle endian system.

Even the CPU registers themselves may store the bytes of a chess board in some out of order way.

All you know is that the C language organizes things as a series of 64 bits. You assume that 0x000000001 really is the lowest bit in the cpu register. (As safe assumption, but still technially an asumption.)

But you don't know what really goes on the CPU. For all you know, the CPU *could* be storing those integer values as FPU data internally and only converting to pure integer when storing them.

Or it could even be a decimal based computer. C would still work right and it would generate appropriate (complicated) code to do those 'simple' ANDs.

You don't know because the CPU and compiler hides all that from you.

You just said you don't care how the CPU handles them internally. Fine.

As long as the C oriented operations you do give you the right answer, you don't care.

So why are you so upset about the idea of it being in a FPU register? With the exception of AND, it'll still give you the same answer provided you concentrate on what the FPU can do well.

Meaning only clear bits that are already set or set bits that are already clear. Meaning addition and subtration.

And so on.

The only part you can't do reasonably easily are ANDs.

C guarantees the operations will work the same regardless of how it chooses to arange things in the CPU.
Not quite. It guarantees that x << 1 multiplies X by 2. If the bits are not adjacent, and the rightmost bit is the LSB, this will not work. So the view of the integer registers that I am presented is one that is consistent and which does not change based on the contents of the value. The sign bit will always be the MSB. Signed numbers are always stored in 2's complement with the high-order bits always set to 1, etc...

Really...

What about doing it in Decimal? You telling me that x<<1 still won't give x*2?

Of course it will.

Like you said, the C language requires (and common convention does anyway) that you be presented with a consistant interface & result.

That means what it wants to do internally is entirely up to it. As long as things look to you like you expect.

FPU does that, with the exception of AND, general purpose OR, provided you start treating the bitboard as a NUMBER rather than a SET.

You could be working on a 64 bit big endian system. Or a 64 bit little endian system. Or a 32 bit little endian system simulating 64 bit integers. Or a 32 bit big endian system simulating 64 bit integers. Or even an old 8 bit micro that is simulating 64 bit integers.
Endian is a memory issue, not an internal data representation issue. The bits are always numbered from LSB to MSB, right to left, when you work on them. I don't care how they are stored in memory. I very much care how they are ordered when in a register where I can get at 'em and do something with them.

Endian is a *storage* issue, not a memory issue.

You have endianness even in the CPU registers. Especially when you combine registers to make a longer one. But it knows how to make things work right.

The FPU is very similar. It stores the data in a way it likes. But it knows how to operate on it properly.

C even allows the bits to be out of order in the CPU registers!! As long as its consistant and does the operations correctly, C doesn't care.
Now we are out of the real world. I know where the LSB of a word is. C knows otherwise shift and multiply will not work correctly.

C would generate correct code to fake it.

(I'm not saying it does that. Just that if you were on a system where it had to do so, it would and program will still work correctly.

And there is the problem of BSF/BSR as well, which also correlate specific bit to a specific number.. C has the nasty habit of presenting binary data to me exactly as it is presented by the CPU. The ISA of the machine specifies all this in clear detail and C complies perfectly, at least for the cases I have used.

I didn't say your system didn't behave like you think.

I said C allows quite a few odd things. And yes, that includes bits in the middle of words that don't effect many things.

That's straight from one of the C language designers themselves. I used to read a lot of P. J. Plauger's articles in CUJ and Embedded Systems Programming and in several of them he talked about some of the odd things that C has to tolerate. Some of the decisions they made, and why. And some of the odd architectures that existed & were being built that they wanted to allow C to run on.

C was deliberately specified in a way that it would work right even on odd architectures. If the cpu architecture itself couldn't handle something the way C wanted, then the compiler writer was obligated to fake it in software.

C is very flexible in many ways, but it does have certain requirements and expectations.

So the "C doesn't care" is a hyperbolic statement, because C does specify how its data types are represented and how the various operators modify those values. C certainly doesn't care how the _cpu pipelines_ do their thing, but it definitely cares about the bit order. otherwise signed/unsigned could not work at any level.

C doesn't care how a lot of the hardware operates. If the hardware can't do what C wants, C has to fake it. On most systems today, C rarely has to do much faking.

That's not just something I popped out of my mouth. That's from the C89 language designers themselves.

If you were running on a pure decimal system that provided enough decimal digits to hold 2^64, then your program would still work. (Wtih a couple issues like overflow, but that would cause problems even if you worked on a real 128 bit cpu since you probably 'carelessly' depend on exactly 64 bits.)

Bitboards using 2 DOUBLE's??

Re: Bitboards using 2 DOUBLE's??

Re: Bitboards using 2 DOUBLE's??

Re: Bitboards using 2 DOUBLE's??

Re: Bitboards using 2 DOUBLE's??

Re: Bitboards using 2 DOUBLE's??

Re: Bitboards using 2 DOUBLE's??

Re: Bitboards using 2 DOUBLE's??

Re: Bitboards using 2 DOUBLE's??

Re: Bitboards using 2 DOUBLE's??

Re: Bitboards using 2 DOUBLE's??