two values in one integer

Rebel · Post by **Rebel** » Mon Jan 23, 2012 9:15 am

Sven Schüle wrote:
Pierre Bokma wrote:hi friends,

I want to have two evaluations returned from my evaluation function one for the endgame and one for the middle game. By scaling these two values the value of the position will be calculated like fruit and stockfish do.

I am wondering, can this be done by using only one integer? eg the endgame value in bits 0-15 and the midgame value in the other bits? I have expirimented with this approach but i cannot get it to work. Especially negative values in least significant bits give problems.

any help appreaciated
It can be done correctly, as discussed in detail in the thread given by Richard. In my opinion a struct with two 16-bit values can be handled easier and with less "headache".
Code: Select all
struct Score {
    int16_t mgPart;
    int16_t egPart;
};

inline void setScore(Score & score, int16_t mg, int16_t eg) {
    score.mgPart = mg;
    score.egPart = eg;
}

inline void addScore(Score & score, int16_t mg, int16_t eg) {
    score.mgPart += mg;
    score.egPart += eg;
}

Score myScore = { 0, 0 };

addScore(myScore, RookOn7thRankBonusMG, RookOn7thRankBonusEG);
Hard to beat in terms of coding effort, correctness, and runtime performance.

Sven

Hi Sven,

Have you tried to compile the above and look at the created ASM code ?

I have a hard time to believe that it will be more efficient than just just 2 x add

Code: Select all

  add     EDI,100                  // midgame
  add     ESI,50                   // endgame

or in 64 bit mode as the processor has all these extra registers

Code: Select all

  add     r9d,100                  // midgame
  add     r8d,50                   // endgame

It just runs in one clock cycle.

Sven · Post by **Sven** » Mon Jan 23, 2012 1:29 pm

Rebel wrote:
Sven Schüle wrote:SNIP (struct solution ...)

Hard to beat in terms of coding effort, correctness, and runtime performance.
Hi Sven,

Have you tried to compile the above and look at the created ASM code ?

I have a hard time to believe that it will be more efficient than just just 2 x add
Code: Select all
  add     EDI,100                  // midgame
  add     ESI,50                   // endgame 
or in 64 bit mode as the processor has all these extra registers
Code: Select all
  add     r9d,100                  // midgame
  add     r8d,50                   // endgame
It just runs in one clock cycle.

Hi Ed,

what did you mean by "compile the above": the "struct" solution or the "tricky" solution? I guess you mean the latter, since the "struct" solution IS "2x add", only the C++ source looks different.

Sven

Rebel · Post by **Rebel** » Mon Jan 23, 2012 3:32 pm

My worries are that packing 2 16 bit scores in a 32 bit integer will result in slower code as nothing can beat:

Code: Select all

  add     r9d,100                  // midgame 
  add     r8d,50                   // endgame

due to the capability of the processor to execute 2 instructions at the same time. Meaning that basically the processor has done the work already, no need to pack 2 C-instructions into 1.

I could not compile your code, hence I asked to have a look yourself what code your compiler generates.

Don · Post by **Don** » Mon Jan 23, 2012 3:44 pm

Rebel wrote:My worries are that by packing 2 16 bit scores in a 32 bit integer will result in slower code as nothing can beat:
Code: Select all
  add     r9d,100                  // midgame 
  add     r8d,50                   // endgame 
due to the capability of the processor to execute 2 instructions at the same time. Meaning that basically the processor has done the work already, no need to pack 2 C-instructions into 1.

I could not compile your code, hence I asked to have a look at yourself what code your compiler generates.

My experience has been that the packing has sped up my code by 1/2 percent which to me means it is probably a speedup but not enough to say with certainty. Anything much less than 1% can easily be noise.

In fact, it's especially hard to say as I reorganized the code to make this happen and so all I really got out of it was a code cleanup. My suggestion is that if you are not packing it would be a big waste of time to start doing it this way.

Ed, are you working on a new program? I would LOVE to see a new 64 bit modernized version of Rebel (or whatever you decide to call it.) We need to have more diversity at the top. The bitboard stuff is easy to get used to, you will quickly master it even it at first seems different from what you are used to. Us old-timers need to show these punk kids a thing or too ...

or as they say in the movies, we still have a few tricks up our sleeves.

Rebel · Post by **Rebel** » Mon Jan 23, 2012 4:20 pm

Don wrote:
Rebel wrote:My worries are that by packing 2 16 bit scores in a 32 bit integer will result in slower code as nothing can beat:
Code: Select all
  add     r9d,100                  // midgame 
  add     r8d,50                   // endgame 
due to the capability of the processor to execute 2 instructions at the same time. Meaning that basically the processor has done the work already, no need to pack 2 C-instructions into 1.

I could not compile your code, hence I asked to have a look at yourself what code your compiler generates.
My experience has been that the packing has sped up my code by 1/2 percent which to me means it is probably a speedup but not enough to say with certainty. Anything much less than 1% can easily be noise.

In fact, it's especially hard to say as I reorganized the code to make this happen and so all I really got out of it was a code cleanup. My suggestion is that if you are not packing it would be a big waste of time to start doing it this way.

There is always external noise indeed. I can imagine a small performance with a 32-bit compile since it saves 1 register and the PC in 32-bit mode has only 6-7 workable but without packing a 64-bit compile should deliver the fastest code as it has so many extra registers.

Ed, are you working on a new program? I would LOVE to see a new 64 bit modernized version of Rebel (or whatever you decide to call it.) We need to have more diversity at the top. The bitboard stuff is easy to get used to, you will quickly master it even it at first seems different from what you are used to. Us old-timers need to show these punk kids a thing or too ... or as they say in the movies, we still have a few tricks up our sleeves.

I am toying with the idea to convert my 32-bit ASM stuff to C in order to make use of 64-bit and in the process indeed replace mailbox to bitboard. What's keeping me is not the conversion work, that's all nice and creative, but the inevitable process of bugs, bugs, Bugs, BUGS. I have learned to hate the latter with passion experience wise

Don · Post by **Don** » Mon Jan 23, 2012 4:54 pm

Rebel wrote:
Don wrote:
Rebel wrote:My worries are that by packing 2 16 bit scores in a 32 bit integer will result in slower code as nothing can beat:
Code: Select all
  add     r9d,100                  // midgame 
  add     r8d,50                   // endgame 
due to the capability of the processor to execute 2 instructions at the same time. Meaning that basically the processor has done the work already, no need to pack 2 C-instructions into 1.

I could not compile your code, hence I asked to have a look at yourself what code your compiler generates.
My experience has been that the packing has sped up my code by 1/2 percent which to me means it is probably a speedup but not enough to say with certainty. Anything much less than 1% can easily be noise.

In fact, it's especially hard to say as I reorganized the code to make this happen and so all I really got out of it was a code cleanup. My suggestion is that if you are not packing it would be a big waste of time to start doing it this way.
There is always external noise indeed. I can imagine a small performance with a 32-bit compile since it saves 1 register and the PC in 32-bit mode has only 6-7 workable but without packing a 64-bit compile should deliver the fastest code as it has so many extra registers.

Ed, are you working on a new program? I would LOVE to see a new 64 bit modernized version of Rebel (or whatever you decide to call it.) We need to have more diversity at the top. The bitboard stuff is easy to get used to, you will quickly master it even it at first seems different from what you are used to. Us old-timers need to show these punk kids a thing or too ... or as they say in the movies, we still have a few tricks up our sleeves.
I am toying with the idea to convert my 32-bit ASM stuff to C in order to make use of 64-bit and in the process indeed replace mailbox to bitboard. What's keeping me is not the conversion work, that's all nice and creative, but the inevitable process of bugs, bugs, Bugs, BUGS. I have learned to hate the latter with passion experience wise

I'm pretty sure that if you convert to bitboards, it will cause you to want to rethink everything. Bitboards come with different trade-offs, but mostly positive. You don't just want a 64 clone of Rebel.

It will take you a while to get a really strong program so you should expect that and not get discouraged. Believe me, I know.

Gerd Isenberg · Post by **Gerd Isenberg** » Mon Jan 23, 2012 5:05 pm

Rebel wrote:My worries are that packing 2 16 bit scores in a 32 bit integer will result in slower code as nothing can beat:
Code: Select all
  add     r9d,100                  // midgame 
  add     r8d,50                   // endgame 
due to the capability of the processor to execute 2 instructions at the same time. Meaning that basically the processor has done the work already, no need to pack 2 C-instructions into 1.

I could not compile your code, hence I asked to have a look yourself what code your compiler generates.

32-bit registers would be fine, but a minor issue is to have two 16-bit short ints. Working with 16-bit registers/instructions has penalties and weak compiler/optimizer support on x86 or x86-64.

Code: Select all

struct Score {
    int16_t mgPart;
    int16_t egPart;
};

; to avoid on x86
mov	 r8w, WORD PTR  [...]
mov	 r9w, WORD PTR  [... + 2]
add	 r8w, 100
add	 r9w, 50

add	 r8w, WORD PTR  [...]
add	 r9w, WORD PTR  [... + 2]...
...
mov	 WORD PTR  [...], r8w
mov	 WORD PTR  [... + 2], r9w

Rebel · Post by **Rebel** » Mon Jan 23, 2012 6:47 pm

Gerd Isenberg wrote: 32-bit registers would be fine, but a minor issue is to have two 16-bit short ints. Working with 16-bit registers/instructions has penalties and weak compiler/optimizer support on x86 or x86-64.

Yep. That's what I remember from the Pentium days. Apparently nowadays it seems nothing has changed. 16 bit is (still) bad news.

Code: Select all

struct Score {
    int16_t mgPart;
    int16_t egPart;
};

; to avoid on x86
mov	 r8w, WORD PTR  [...]
mov	 r9w, WORD PTR  [... + 2]
add	 r8w, 100
add	 r9w, 50

add	 r8w, WORD PTR  [...]
add	 r9w, WORD PTR  [... + 2]...
...
mov	 WORD PTR  [...], r8w
mov	 WORD PTR  [... + 2], r9w

wgarvin · Post by **wgarvin** » Mon Jan 23, 2012 9:47 pm

Rebel wrote:
Gerd Isenberg wrote: 32-bit registers would be fine, but a minor issue is to have two 16-bit short ints. Working with 16-bit registers/instructions has penalties and weak compiler/optimizer support on x86 or x86-64.
Yep. That's what I remember from the Pentium days. Apparently nowadays it seems nothing has changed. 16 bit is (still) bad news.
Code: Select all
struct Score {
    int16_t mgPart;
    int16_t egPart;
};

; to avoid on x86
mov	 r8w, WORD PTR  [...]
mov	 r9w, WORD PTR  [... + 2]
add	 r8w, 100
add	 r9w, 50

add	 r8w, WORD PTR  [...]
add	 r9w, WORD PTR  [... + 2]...
...
mov	 WORD PTR  [...], r8w
mov	 WORD PTR  [... + 2], r9w

Actually, any math you do on 16-bit quantities will probably be done in a 32-bit register because of the penalties that 16-bit instructions have (partial register stalls or false dependencies on previous contents, etc.) What x86 chips do have is excellent support for zero- or sign-extension into 32- or 64-bits while loading from an 8- or 16-bit memory value. So you can expect the compiler to do a movzx or movsx to load it into a 32-bit register, do 32-bit math on it, and then store back the 16 bits when done. Occasionally it will have to insert extra instructions to preserve the semantics of the 16-bit type (truncation before multiplying, or something like that?). Its still a good idea to declare temporary variables as a 32-bit type and just truncate it back to 8- or 16-bits when you store it in a data structure.

But the "two scores in one register" trick: Does it save instructions? Yes. Will it execute any faster? Maybe a cycle faster here or there, but probably you won't notice any difference. So is it worth doing at all? If so, only for code cleanliness reasons (thinking of a "Score Pair" for MG/EG and always updating both at the same time)

mcostalba · Post by **mcostalba** » Mon Jan 23, 2012 10:06 pm

wgarvin wrote:So is it worth doing at all? If so, only for code cleanliness reasons (thinking of a "Score Pair" for MG/EG and always updating both at the same time)

I fully second this. When we moved to Score Pair we measured around 1% speed up (in line with Don's measurements) but we removed more than 80 lines of code and overall evaluation (that is where this is used most) was more readable. So we kept it.

two values in one integer

Re: two values in one integer

Re: two values in one integer

Re: two values in one integer

Re: two values in one integer

Re: two values in one integer

Re: two values in one integer

Re: two values in one integer

Re: two values in one integer

Re: two values in one integer

Re: two values in one integer