I want to have two evaluations returned from my evaluation function one for the endgame and one for the middle game. By scaling these two values the value of the position will be calculated like fruit and stockfish do.
I am wondering, can this be done by using only one integer? eg the endgame value in bits 0-15 and the midgame value in the other bits? I have expirimented with this approach but i cannot get it to work. Especially negative values in least significant bits give problems.
any help appreaciated
It can be done correctly, as discussed in detail in the thread given by Richard. In my opinion a struct with two 16-bit values can be handled easier and with less "headache".
what did you mean by "compile the above": the "struct" solution or the "tricky" solution? I guess you mean the latter, since the "struct" solution IS "2x add", only the C++ source looks different.
due to the capability of the processor to execute 2 instructions at the same time. Meaning that basically the processor has done the work already, no need to pack 2 C-instructions into 1.
I could not compile your code, hence I asked to have a look yourself what code your compiler generates.
due to the capability of the processor to execute 2 instructions at the same time. Meaning that basically the processor has done the work already, no need to pack 2 C-instructions into 1.
I could not compile your code, hence I asked to have a look at yourself what code your compiler generates.
My experience has been that the packing has sped up my code by 1/2 percent which to me means it is probably a speedup but not enough to say with certainty. Anything much less than 1% can easily be noise.
In fact, it's especially hard to say as I reorganized the code to make this happen and so all I really got out of it was a code cleanup. My suggestion is that if you are not packing it would be a big waste of time to start doing it this way.
Ed, are you working on a new program? I would LOVE to see a new 64 bit modernized version of Rebel (or whatever you decide to call it.) We need to have more diversity at the top. The bitboard stuff is easy to get used to, you will quickly master it even it at first seems different from what you are used to. Us old-timers need to show these punk kids a thing or too ... or as they say in the movies, we still have a few tricks up our sleeves.
due to the capability of the processor to execute 2 instructions at the same time. Meaning that basically the processor has done the work already, no need to pack 2 C-instructions into 1.
I could not compile your code, hence I asked to have a look at yourself what code your compiler generates.
My experience has been that the packing has sped up my code by 1/2 percent which to me means it is probably a speedup but not enough to say with certainty. Anything much less than 1% can easily be noise.
In fact, it's especially hard to say as I reorganized the code to make this happen and so all I really got out of it was a code cleanup. My suggestion is that if you are not packing it would be a big waste of time to start doing it this way.
There is always external noise indeed. I can imagine a small performance with a 32-bit compile since it saves 1 register and the PC in 32-bit mode has only 6-7 workable but without packing a 64-bit compile should deliver the fastest code as it has so many extra registers.
Ed, are you working on a new program? I would LOVE to see a new 64 bit modernized version of Rebel (or whatever you decide to call it.) We need to have more diversity at the top. The bitboard stuff is easy to get used to, you will quickly master it even it at first seems different from what you are used to. Us old-timers need to show these punk kids a thing or too ... or as they say in the movies, we still have a few tricks up our sleeves.
I am toying with the idea to convert my 32-bit ASM stuff to C in order to make use of 64-bit and in the process indeed replace mailbox to bitboard. What's keeping me is not the conversion work, that's all nice and creative, but the inevitable process of bugs, bugs, Bugs, BUGS. I have learned to hate the latter with passion experience wise
due to the capability of the processor to execute 2 instructions at the same time. Meaning that basically the processor has done the work already, no need to pack 2 C-instructions into 1.
I could not compile your code, hence I asked to have a look at yourself what code your compiler generates.
My experience has been that the packing has sped up my code by 1/2 percent which to me means it is probably a speedup but not enough to say with certainty. Anything much less than 1% can easily be noise.
In fact, it's especially hard to say as I reorganized the code to make this happen and so all I really got out of it was a code cleanup. My suggestion is that if you are not packing it would be a big waste of time to start doing it this way.
There is always external noise indeed. I can imagine a small performance with a 32-bit compile since it saves 1 register and the PC in 32-bit mode has only 6-7 workable but without packing a 64-bit compile should deliver the fastest code as it has so many extra registers.
Ed, are you working on a new program? I would LOVE to see a new 64 bit modernized version of Rebel (or whatever you decide to call it.) We need to have more diversity at the top. The bitboard stuff is easy to get used to, you will quickly master it even it at first seems different from what you are used to. Us old-timers need to show these punk kids a thing or too ... or as they say in the movies, we still have a few tricks up our sleeves.
I am toying with the idea to convert my 32-bit ASM stuff to C in order to make use of 64-bit and in the process indeed replace mailbox to bitboard. What's keeping me is not the conversion work, that's all nice and creative, but the inevitable process of bugs, bugs, Bugs, BUGS. I have learned to hate the latter with passion experience wise
I'm pretty sure that if you convert to bitboards, it will cause you to want to rethink everything. Bitboards come with different trade-offs, but mostly positive. You don't just want a 64 clone of Rebel.
It will take you a while to get a really strong program so you should expect that and not get discouraged. Believe me, I know.
due to the capability of the processor to execute 2 instructions at the same time. Meaning that basically the processor has done the work already, no need to pack 2 C-instructions into 1.
I could not compile your code, hence I asked to have a look yourself what code your compiler generates.
32-bit registers would be fine, but a minor issue is to have two 16-bit short ints. Working with 16-bit registers/instructions has penalties and weak compiler/optimizer support on x86 or x86-64.
Gerd Isenberg wrote: 32-bit registers would be fine, but a minor issue is to have two 16-bit short ints. Working with 16-bit registers/instructions has penalties and weak compiler/optimizer support on x86 or x86-64.
Yep. That's what I remember from the Pentium days. Apparently nowadays it seems nothing has changed. 16 bit is (still) bad news.
Gerd Isenberg wrote: 32-bit registers would be fine, but a minor issue is to have two 16-bit short ints. Working with 16-bit registers/instructions has penalties and weak compiler/optimizer support on x86 or x86-64.
Yep. That's what I remember from the Pentium days. Apparently nowadays it seems nothing has changed. 16 bit is (still) bad news.
struct Score {
int16_t mgPart;
int16_t egPart;
};
; to avoid on x86
mov r8w, WORD PTR [...]
mov r9w, WORD PTR [... + 2]
add r8w, 100
add r9w, 50
add r8w, WORD PTR [...]
add r9w, WORD PTR [... + 2]...
...
mov WORD PTR [...], r8w
mov WORD PTR [... + 2], r9w
Actually, any math you do on 16-bit quantities will probably be done in a 32-bit register because of the penalties that 16-bit instructions have (partial register stalls or false dependencies on previous contents, etc.) What x86 chips do have is excellent support for zero- or sign-extension into 32- or 64-bits while loading from an 8- or 16-bit memory value. So you can expect the compiler to do a movzx or movsx to load it into a 32-bit register, do 32-bit math on it, and then store back the 16 bits when done. Occasionally it will have to insert extra instructions to preserve the semantics of the 16-bit type (truncation before multiplying, or something like that?). Its still a good idea to declare temporary variables as a 32-bit type and just truncate it back to 8- or 16-bits when you store it in a data structure.
But the "two scores in one register" trick: Does it save instructions? Yes. Will it execute any faster? Maybe a cycle faster here or there, but probably you won't notice any difference. So is it worth doing at all? If so, only for code cleanliness reasons (thinking of a "Score Pair" for MG/EG and always updating both at the same time)
wgarvin wrote:So is it worth doing at all? If so, only for code cleanliness reasons (thinking of a "Score Pair" for MG/EG and always updating both at the same time)
I fully second this. When we moved to Score Pair we measured around 1% speed up (in line with Don's measurements) but we removed more than 80 lines of code and overall evaluation (that is where this is used most) was more readable. So we kept it.