couple of questions about stockfish code ?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

BeyondCritics
Posts: 400
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: couple of questions about stockfish code ?

Post by BeyondCritics »

I like that code, although i have noticed that bug too. I think you know what you are doing and this is clearly seen in the implementation. But i believe i can even improve on that a tiny bit, but first i have to figure out how to work with fishtest. Please be patient.
BeyondCritics
Posts: 400
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: couple of questions about stockfish code ?

Post by BeyondCritics »

I figured out something, but i somehow need an arithmetic right shift (it is ironic, i know). Would you assume, that any target of stockfish uses arithmetic right shift, or better not?
Karlo Bala
Posts: 373
Joined: Wed Mar 22, 2006 10:17 am
Location: Novi Sad, Serbia
Full name: Karlo Balla

Re: couple of questions about stockfish code ?

Post by Karlo Bala »

Fulvio wrote:
Sven Schüle wrote: Would you see a difference between a struct of two 32-bit integers and a struct of two 16-bit integers?
This is a wonderful tool:
https://godbolt.org/
I quickly tried this code:

Code: Select all

#include <stdint.h>

int main&#40;) &#123;
  volatile struct &#123; int a; int b; &#125; test1;
  test1.a += 1;
  test1.b += 1;
  volatile struct &#123; int16_t a; int16_t b; &#125; test2;
  test2.a += 1;
  test2.b += 1;  
&#125;
and clang on x86-64 compiles to

Code: Select all

        inc     dword ptr &#91;rsp - 4&#93;
        inc     dword ptr &#91;rsp - 8&#93;
        inc     word ptr &#91;rsp - 10&#93;
        inc     word ptr &#91;rsp - 12&#93;
        xor     eax, eax
        ret
so the only difference here is the size of the object.
This make sense considering the implicit integer promotions:

Code: Select all

int16_t a, b;
a + b;
in reality is:

Code: Select all

static_cast<int>&#40;a&#41; + static_cast<int>&#40;b&#41;;
Here is what says Agner Fog (http://www.agner.org/optimize/optimizing_cpp.pdf):

The compiler will always select the most efficient integer size if you declare an int, without specifying the size. Integers of smaller sizes (char, short int) are only slightly less efficient. In many cases, the compiler will convert these types of integers to the default size when doing calculations, and then use only the lower 8 or 16 bits of the result. You can assume that the type conversion takes zero or one clock cycle. In 64 bit systems, there is only a minimal difference between the efficiency of 32 bit integers and 64 bit integers, as long as you are not doing divisions.

It is recommended to use the default integer size in cases where the size does not matter and there is no risk of overflow, such as simple variables, loop counters, etc. In large arrays, it maybe preferred to use the smallest integer size that is big enough for the specific purpose in order to make better use of the data cache. Bit fields of sizes other than 8, 16, 32 and 64
bits are less efficient. In 64 bit systems, you may use 64 bit integers if the application can make use of the extra bits.
Best Regards,
Karlo Balla Jr.
mar
Posts: 2564
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: couple of questions about stockfish code ?

Post by mar »

Fulvio wrote:I already posted the link that explains that
You miss the point.
You waste 1 register that might me used to cache something else; the compiler (optimizer actually) might then need to spill and reload (this is slow).

Similar thing holds for pipeline and instruction cache as well.

Also (from my experience) caching temporaries in 16-bit shorts can be measurably slower than using ints.

The only rule that always holds is to measure everything, but that's obvious.

Sometimes results are unexpected and surprising - I've seen cases where extra spill executed a tight inner loop faster;

compiler X can generate faster code than compiler Y in most cases, but there can be cases Y outperforms X etc.

code A can run faster on processor X but code B can run faster on Y (even if you use the same compiler)
syzygy
Posts: 5569
Joined: Tue Feb 28, 2012 11:56 pm

Re: couple of questions about stockfish code ?

Post by syzygy »

syzygy wrote:
Fulvio wrote:The only thing in which we agree is "you should test and measure" that contrasts pretty badly with your "I'm pretty sure using a struct of two 16-bit ints would be measurably slower"
"Pretty sure" allows for the possibility that I turn out to be dead wrong, in which case I will simply have to admit that and will do so. But I don't think you'll prove me wrong here. And I'm talking about Stockfish, not about a simple loop that does not suffer from register pressure.

The reason for being "pretty sure" is that a single register for holding 1 value is pretty certain to be more efficient than two registers for holding 2 values. And while it is true that modern CPUs can perform many operations in parallel, reducing the number of operations is not going to hurt and will leave execution units free for performing other operations.
https://github.com/syzygy1/Stockfish/co ... ore_struct

The slowdown is about 3.5% on my system.

Please let me know if the operators can somehow be implemented more efficiently.

I hope we don't get a discussion now on whether a 3.5% slowdown should be somehow acceptable for a program like Stockfish just to make a few lines of code in types.h perhaps easier to read. (Code that one does not even need to understand when working on any other part of Stockfish.)
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: couple of questions about stockfish code ?

Post by Sven »

syzygy wrote:
syzygy wrote:
Fulvio wrote:The only thing in which we agree is "you should test and measure" that contrasts pretty badly with your "I'm pretty sure using a struct of two 16-bit ints would be measurably slower"
"Pretty sure" allows for the possibility that I turn out to be dead wrong, in which case I will simply have to admit that and will do so. But I don't think you'll prove me wrong here. And I'm talking about Stockfish, not about a simple loop that does not suffer from register pressure.

The reason for being "pretty sure" is that a single register for holding 1 value is pretty certain to be more efficient than two registers for holding 2 values. And while it is true that modern CPUs can perform many operations in parallel, reducing the number of operations is not going to hurt and will leave execution units free for performing other operations.
https://github.com/syzygy1/Stockfish/co ... ore_struct

The slowdown is about 3.5% on my system.

Please let me know if the operators can somehow be implemented more efficiently.

I hope we don't get a discussion now on whether a 3.5% slowdown should be somehow acceptable for a program like Stockfish just to make a few lines of code in types.h perhaps easier to read. (Code that one does not even need to understand when working on any other part of Stockfish.)
Thank you for testing it. A slowdown of 3.5% confirms, in my view, that both solutions differ only marginally in performance. The rating difference, if it is possible to measure it at all, would be around 3-4 Elo points based on the well-known rule "doubling speed = roughly 70-80 Elo". I will certainly accept that a world class engine should not give away those 3-4 points (so no discussion about it from my side ;-) ). In my own engine which is still more than 1000 Elo weaker than SF I will stick to the "struct" solution, however, for the simple reason that it does not hurt my brain :-) And I also believe, after the discussion above, that it is a solution that simply works everywhere, on each relevant platform, without thinking a lot about it, while we are not 100% sure about it with the "sophisticated" solution. Please note that we were not talking about "readability" here, the doubts were mostly about being "legal" C++/C++11 code or not.

One last point: how did you determine the value of 3.5% slowdown? Did you perform a fixed-depth search for a number of different positions, or the usual SF benchmark, or something else?
syzygy
Posts: 5569
Joined: Tue Feb 28, 2012 11:56 pm

Re: couple of questions about stockfish code ?

Post by syzygy »

Sven Schüle wrote:
syzygy wrote:I hope we don't get a discussion now on whether a 3.5% slowdown should be somehow acceptable for a program like Stockfish just to make a few lines of code in types.h perhaps easier to read. (Code that one does not even need to understand when working on any other part of Stockfish.)
Thank you for testing it. A slowdown of 3.5% confirms, in my view, that both solutions differ only marginally in performance.
So we do get the discussion, but I am not going to join it. An overall slowdown of 3.5% is massive for such a change. End of discussion for me.
One last point: how did you determine the value of 3.5% slowdown? Did you perform a fixed-depth search for a number of different positions, or the usual SF benchmark, or something else?
bench 128 1 20 (and I am able to get very reproducible timings on my system)
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: couple of questions about stockfish code ?

Post by Sven »

syzygy wrote:
Sven Schüle wrote:
syzygy wrote:I hope we don't get a discussion now on whether a 3.5% slowdown should be somehow acceptable for a program like Stockfish just to make a few lines of code in types.h perhaps easier to read. (Code that one does not even need to understand when working on any other part of Stockfish.)
Thank you for testing it. A slowdown of 3.5% confirms, in my view, that both solutions differ only marginally in performance.
So we do get the discussion, but I am not going to join it. An overall slowdown of 3.5% is massive for such a change. End of discussion for me.
Ending a discussion that I did not start was of course easier than continuing to read what I wrote, and then quoting me correctly:
Sven Schüle wrote:I will certainly accept that a world class engine should not give away those 3-4 points (so no discussion about it from my side ;-) )
Karlo Bala
Posts: 373
Joined: Wed Mar 22, 2006 10:17 am
Location: Novi Sad, Serbia
Full name: Karlo Balla

Re: couple of questions about stockfish code ?

Post by Karlo Bala »

syzygy wrote: https://github.com/syzygy1/Stockfish/co ... ore_struct

The slowdown is about 3.5% on my system.

Please let me know if the operators can somehow be implemented more efficiently.

I hope we don't get a discussion now on whether a 3.5% slowdown should be somehow acceptable for a program like Stockfish just to make a few lines of code in types.h perhaps easier to read. (Code that one does not even need to understand when working on any other part of Stockfish.)
It would be interesting to see the results when using int32_t instead int16_t.
Best Regards,
Karlo Balla Jr.
syzygy
Posts: 5569
Joined: Tue Feb 28, 2012 11:56 pm

Re: couple of questions about stockfish code ?

Post by syzygy »

Karlo Bala wrote:
syzygy wrote: https://github.com/syzygy1/Stockfish/co ... ore_struct

The slowdown is about 3.5% on my system.

Please let me know if the operators can somehow be implemented more efficiently.

I hope we don't get a discussion now on whether a 3.5% slowdown should be somehow acceptable for a program like Stockfish just to make a few lines of code in types.h perhaps easier to read. (Code that one does not even need to understand when working on any other part of Stockfish.)
It would be interesting to see the results when using int32_t instead int16_t.
I tested that too and somewhat surprisingly it was again slower than using a struct with two int16_t members.