couple of questions about stockfish code ?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: couple of questions about stockfish code ?

Post by BeyondCritics »

Hint: It would be helpful for others, to post complete compilable code.
Anyway, if i enter this into http://gcc.godbolt.org/:

Code: Select all

#include <cstdint>

enum Value &#58; int32_t &#123;&#125;;
enum Score &#58; int32_t &#123;&#125;;

Value mg_value&#40;Score s&#41; &#123; 
  int v = s & 0xffff; 
  return Value&#40;v < 0x8000 ? v &#58; v - 0x10000&#41;; 
&#125;
and use these options:

Code: Select all

-std=c++11 -O3
i get this for all tried compilers targeting x86-64:

Code: Select all

mg_value&#40;Score&#41;&#58;
        movzx   edi, di
        lea     eax, &#91;rdi-65536&#93;
        cmp     edi, 32768
        cmovl   eax, edi
        ret
I believe this does as expected. The first assembly instruction implements the first line of the function. Next the second expression of the ?-operator is calculated. Seemingly as an optimization, the compiler uses the 64-bit register rdi, to do this. This is not a problem, since the higher 32 bit of the outcome are discarded later. In the next step, the condition is evaluated and finally the correct alternative moved into eax, according to the flags set.

Therefore this variation is still correct, but far from optimal.
BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

2nProposal requiring arithmetic right shift + 2^s complement

Post by BeyondCritics »

Here is my final proposal:

Code: Select all

#include <climits>
#include <cstdint>

// SHRT_MIN < v <= SHRT_MAX must hold for all valid Values v.
enum Value &#58; int &#123;&#125;;
enum Score &#58; int &#123;&#125;;

/// mg, eg must be valid Values
Score make_score&#40;int mg, int eg&#41; &#123;
  static const int val_bits = 16;

  return Score&#40;&#40;eg << val_bits&#41; + mg&#41;;
&#125;

/// Extract the end game value from a score
Value eg_value&#40;Score s&#41; &#123;
  static const int val_bits = 16;
  static const int min_val = SHRT_MIN;

  return Value&#40;&#40;s - min_val&#41; >> val_bits&#41;;
&#125;

/// Extract the middle game value from a score
Value mg_value&#40;Score s&#41; &#123;
  return Value&#40;int16_t&#40;s&#41;);
&#125;
Enter this into http://gcc.godbolt.org/ and you will see, that it always elicits seemingly optimal assembly.
It relies on 2^s complement and arithmetic right shift though.
syzygy
Posts: 5557
Joined: Tue Feb 28, 2012 11:56 pm

Re: 2nProposal requiring arithmetic right shift + 2^s comple

Post by syzygy »

BeyondCritics wrote:Here is my final proposal:

Code: Select all

#include <climits>
#include <cstdint>

// SHRT_MIN < v <= SHRT_MAX must hold for all valid Values v.
enum Value &#58; int &#123;&#125;;
enum Score &#58; int &#123;&#125;;

/// mg, eg must be valid Values
Score make_score&#40;int mg, int eg&#41; &#123;
  static const int val_bits = 16;

  return Score&#40;&#40;eg << val_bits&#41; + mg&#41;;
&#125;

/// Extract the end game value from a score
Value eg_value&#40;Score s&#41; &#123;
  static const int val_bits = 16;
  static const int min_val = SHRT_MIN;

  return Value&#40;&#40;s - min_val&#41; >> val_bits&#41;;
&#125;

/// Extract the middle game value from a score
Value mg_value&#40;Score s&#41; &#123;
  return Value&#40;int16_t&#40;s&#41;);
&#125;
Enter this into http://gcc.godbolt.org/ and you will see, that it always elicits seemingly optimal assembly.
It relies on 2^s complement and arithmetic right shift though.
Getting optimal assembly is not so difficult. The much-criticised current SF approach already does that.

The question is how to do it reliably without undefined behavior and, preferably, without relying on any implementation-defined behavior.

Your make_score() has undefined behavior: left-shift of negative ints. Fixing this by casting eg to unsigned will give implementation-defined behavior: casting of unsigned int to int of values which don't all fit in int because they can be >= 0x80000000.

Your eg_value() relies on implementation-defined behavior: right-shift of negative ints.

Your mg_value() also relies on implementation-defined behavior: cast to int16_t of values outside the range of int16_t.

What I proposed earlier does not suffer from such problems (I believe), but icc seems unable to optimise it fully. gcc and clang have no problems with it.

Code: Select all

#include <cstdint>

enum Value &#58; int &#123; VALUE_ZERO &#125;; 
enum Score &#58; uint32_t &#123; SCORE_ZERO &#125;;

Score make_score&#40;int mg, int eg&#41; &#123;
  return Score&#40;&#40;uint32_t&#40;eg&#41; << 16&#41; + mg&#41;;
&#125;

inline int16_t cast_to_int16_t&#40;uint16_t v&#41; &#123;
  return v < 0x8000 ? v &#58; v - 0x10000;
&#125;

Value eg_value&#40;Score s&#41; &#123;
  return Value&#40;cast_to_int16_t&#40;&#40;s + 0x8000&#41; >> 16&#41;);
&#125;

Value mg_value&#40;Score s&#41; &#123;
  return Value&#40;cast_to_int16_t&#40;s&#41;);
&#125;
If we decide that full optimisation on all relevant compilers is more important than making sure it works on Xbox 360, then I propose:

Code: Select all

#include <cstdint>

enum Value &#58; int &#123; VALUE_ZERO &#125;; 
enum Score &#58; uint32_t &#123; SCORE_ZERO &#125;;

Score make_score&#40;int mg, int eg&#41; &#123;
  return Score&#40;&#40;uint32_t&#40;eg&#41; << 16&#41; + mg&#41;;
&#125;

Value eg_value&#40;Score s&#41; &#123;
  return Value&#40;int16_t&#40;&#40;s + 0x8000&#41; >> 16&#41;);
&#125;

Value mg_value&#40;Score s&#41; &#123;
  return Value&#40;int16_t&#40;s&#41;);
&#125;
Now both eg_value() and mg_value() rely on the compiler casting out-of-range values to the unique int16_t value that is equal to the original value mod 2^16 and that's it - if I don't overlook anything.

Alternatively, we could implement SF's approach correctly (icc 13.0.1 messes it up, but icc 16 and 17 get it):

Code: Select all

#include <cstdint>
#include <cstring>

enum Value &#58; int &#123; VALUE_ZERO &#125;;
enum Score &#58; uint32_t &#123; SCORE_ZERO &#125;;

Score make_score&#40;int mg, int eg&#41; &#123;
  return Score&#40;&#40;uint32_t&#40;eg&#41; << 16&#41; + mg&#41;;
&#125;

Value eg_value&#40;Score score&#41; &#123;
  int16_t s;
  uint16_t u = &#40;score + 0x8000&#41; >> 16;
  memcpy&#40;&s, &u, sizeof&#40;int16_t&#41;);
  return Value&#40;s&#41;;
&#125;

Value mg_value&#40;Score score&#41; &#123;
  int16_t s;
  uint16_t u = score;
  memcpy&#40;&s, &u, sizeof&#40;int16_t&#41;);
  return Value&#40;s&#41;;
&#125;
This relies on bit representation of int16_t and uint16_t being "the same". I think that can go wrong only in theory.
BeyondCritics
Posts: 396
Joined: Sat May 05, 2012 2:48 pm
Full name: Oliver Roese

Re: 2nProposal requiring arithmetic right shift + 2^s comple

Post by BeyondCritics »

syzygy wrote:Your make_score() has undefined behavior: left-shift of negative ints.
Damned, this is true. So please replace make_score with this:

Code: Select all

inline Score make_score&#40;int mg, int eg&#41; &#123;
  static const int val_bits = 16;

  return Score&#40;eg * &#40;1 << val_bits&#41; + mg&#41;;
&#125;
syzygy wrote:What I proposed earlier does not suffer from such problems (I believe), but icc seems unable to optimise it fully
You name it. Other variants i tried do not work well with icc, therefore i decided to go greater length. Even your high performance proposal does not work well with icc i believe, since the compiler fails to utilize the adress logic for calculation using LEA, which increases pressure on the alu.
syzygy wrote: Now both eg_value() and mg_value() rely on the compiler casting out-of-range values to the unique int16_t value that is equal to the original value mod 2^16 and that's it - if I don't overlook anything.
You rely on 2^s complement architecture, as i do. My further assumptions are less convenient, but it think could go wrong only in theory.

As a side note: make_score is not a total function (consider SHRT_MIN and operator-) in any case, which is noticeable since some tuning script might stumble over that.
corres
Posts: 3657
Joined: Wed Nov 18, 2015 11:41 am
Location: hungary

Re: couple of questions about stockfish code ?

Post by corres »

[quote="Milos"]
Increase available hash in fishtest and suddenly you'll notice 16bit version becoming much worse.
[/quote]

I think maybe this issue is one of the causes for the strange behavior of Stockfish showed in some cases during TCEC-s.
Because above TCEC there lot of peoples who use Stockfish to analysis with large hash it would be needed a version of Stockfish with 32 bits (or the supposed 24b bits) hash table.
Ras
Posts: 2487
Joined: Tue Aug 30, 2016 8:19 pm
Full name: Rasmus Althoff

Re: couple of questions about stockfish code ?

Post by Ras »

MahmoudUthman wrote:3-I stumbled upon an old post here that claims that storing only the 32bit upper portion of the key degrades performance "I know it causes more collisions"
It doesn't have to. There are also the implicit lower bits of the hash value, these are in the index, assuming that the table size is a power of two.

Robert Hyatt had a paper somewhere where he examined what's going on with collisions and hash bit numbers. The result boiled down to:
1) 32 bits is not enough.
2) 64 bits is totally fine.
3) 48 bits is also fine, but does not fit common architecture integer sizes.

Now if you store 32 bits and want to get up to 48, then your index needs 16 bits. That means, a table size of at least 65k entries will do the trick.

Another possibility is of course to fiddle in some extra hash bits to table entry variables that are not completely used up. Only recommended for systems where memory is scarce since this approach is a bit confusing to read.