My experience with Linux/GCC

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
rvida
Posts: 481
Joined: Thu Apr 16, 2009 12:00 pm
Location: Slovakia, EU

My experience with Linux/GCC

Post by rvida »

I just want to share my experience with porting from win/msvc to linux/gcc.

GCC with -O2 setting optimizes very aggressively. In most cases where it makes a potentially 'unsafe' assumption it gives at least a warning. I had to rewrite some code pieces due to pointer-aliasing but that was pretty straightforward.
The biggest headache was with the following code:

Code: Select all

inline Score eg_value(ScorePair s) {
  return Score(short(s & 0xffff));
}

inline Score mg_value(ScorePair s) {
  return Score((int(s) + 0x8000) >> 16);
}
MSVC (and GCC with -O1) does understand this, and correctly sign-extends by emmitting movsx/cwd/cdq instructions depending on context. However, GCC with -O2 completely optimized away eg_value(), and did some very weird things with mg_value().

In desperation, I took a look at StockFish's value.h and indeed found a solution therein (SF guys are much better at C/C++ stuff than me).
Now this code works correctly:

Code: Select all

#ifdef __GNUC__

inline Score eg_value(ScorePair s) {
  return Score((int)(unsigned(s) & 0x7fffu) - (int)(unsigned(s) & 0x8000u));
}

inline Score mg_value(ScorePair s) {
  return Score(((int(s) + 0x8000) & ~0xffff) / 0x10000);
}

#else

inline Score eg_value(ScorePair s) {
  return Score(short(s & 0xffff));
}

inline Score mg_value(ScorePair s) {
  return Score((int(s) + 0x8000) >> 16);
}

#endif
Big thanks to the StockFish team! I owe them some beer ;)

Contrary to my fears, the threading, locking and signaling stuff wasn't very hard to port. In case someone finds it useful here is my smp.h:

Code: Select all

#pragma once

#ifdef _WIN32

#include <windows.h>

class Event &#123;
  HANDLE hEvent;
public&#58;
  inline Event&#40;) &#123;&#125;;

  inline void init&#40;) &#123; 
    hEvent = CreateEvent&#40;NULL, false, false, NULL&#41;; 
  &#125;

  inline void destroy&#40;) &#123; 
    CloseHandle&#40;hEvent&#41;; 
  &#125;

  inline void signal&#40;) &#123; 
    SetEvent&#40;hEvent&#41;; 
  &#125;

  inline void wait_for&#40;) &#123; 
    WaitForSingleObject&#40;hEvent, INFINITE&#41;; 
  &#125;

  inline bool wait_for&#40;int timeout&#41; &#123;
    return &#40;WaitForSingleObject&#40;hEvent, timeout&#41; == WAIT_TIMEOUT&#41;;
  &#125;

&#125;;

#define ThreadProcResult DWORD WINAPI
typedef LPVOID ThreadProcParam;
typedef ThreadProcResult ThreadProc&#40;ThreadProcParam&#41;;

inline void start_thread&#40;ThreadProc proc, ThreadProcParam param&#41; &#123;
  DWORD tid;
  CloseHandle&#40;CreateThread&#40;NULL, 0, proc, param, 0, &tid&#41;);
&#125;

typedef CRITICAL_SECTION Lock;

#define lock_init&#40;lock&#41; InitializeCriticalSection&#40;&&#40;lock&#41;)
#define lock_acquire&#40;lock&#41; EnterCriticalSection&#40;&&#40;lock&#41;)
#define lock_release&#40;lock&#41; LeaveCriticalSection&#40;&&#40;lock&#41;)
#define lock_delete&#40;lock&#41; DeleteCriticalSection&#40;&&#40;lock&#41;)

#else

#include <pthread.h>
#include <errno.h>

class Event &#123;
  pthread_cond_t cond;
  pthread_mutex_t mutex;
  volatile bool signaled;
public&#58;
  inline Event&#40;) &#123;&#125;;

  inline void init&#40;) &#123;
    pthread_cond_init&#40;&cond, NULL&#41;;
    pthread_mutex_init&#40;&mutex, NULL&#41;;
    signaled = false;
  &#125;

  inline void destroy&#40;) &#123;
    pthread_cond_destroy&#40;&cond&#41;;
    pthread_mutex_destroy&#40;&mutex&#41;;
  &#125;

  inline void signal&#40;) &#123;
    pthread_mutex_lock&#40;&mutex&#41;;
    pthread_cond_broadcast&#40;&cond&#41;;
    signaled = true;
    pthread_mutex_unlock&#40;&mutex&#41;;
  &#125;

  inline void wait_for&#40;) &#123;
    pthread_mutex_lock&#40;&mutex&#41;;
    if (!signaled&#41;
      pthread_cond_wait&#40;&cond, &mutex&#41;;
    signaled = false;
    pthread_mutex_unlock&#40;&mutex&#41;;
  &#125;

  inline bool wait_for&#40;int timeout&#41; &#123;
    pthread_mutex_lock&#40;&mutex&#41;;
    int res = 0;
    if (!signaled&#41; &#123;
      struct timespec abstime;
      abstime.tv_sec  = timeout / 1000000;
      abstime.tv_nsec = timeout % 1000000;
      res = pthread_cond_timedwait&#40;&cond, &mutex, &abstime&#41;;
    &#125;
    signaled = false;
    pthread_mutex_unlock&#40;&mutex&#41;;
    return &#40;res == ETIMEDOUT&#41;;
  &#125;
&#125;;

#define ThreadProcResult void *
typedef void * ThreadProcParam;
typedef ThreadProcResult ThreadProc&#40;ThreadProcParam&#41;;

inline void start_thread&#40;ThreadProc proc, ThreadProcParam param&#41; &#123;
  pthread_t pthread&#91;1&#93;;
  pthread_attr_t attrib;
  pthread_attr_init&#40;&attrib&#41;;
  pthread_create&#40;pthread, &attrib, proc, param&#41;;
&#125;

typedef pthread_mutex_t Lock;

#define lock_init&#40;x&#41; pthread_mutex_init&#40;&&#40;x&#41;, NULL&#41;
#define lock_acquire&#40;x&#41; pthread_mutex_lock&#40;&&#40;x&#41;)
#define lock_release&#40;x&#41; pthread_mutex_unlock&#40;&&#40;x&#41;)
#define lock_delete&#40;x&#41; pthread_mutex_destroy&#40;&&#40;x&#41;)

#endif
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: My experience with Linux/GCC

Post by wgarvin »

rvida wrote:GCC with -O2 setting optimizes very aggressively. In most cases where it makes a potentially 'unsafe' assumption it gives at least a warning. I had to rewrite some code pieces due to pointer-aliasing but that was pretty straightforward.
Usually its the programmer who makes the potentially unsafe assumptions! ;)

Anyone who's never seen it before might find this set of slides interesting: Dangerous Optimizations and the Loss of Causality

It has examples of various undefined or implementation-defined behaviours such as signed int overflow, pointer math overflow, and aliasing violations. Programmers used to do these things all the time and get away with it, but compilers are allowed to optimize away things with undefined results, and so some of these common tricks can have surprising results with an aggressive compiler.
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: My experience with Linux/GCC

Post by mcostalba »

rvida wrote: The biggest headache was with the following code:
I remember this code chunk very very well !!!

It took us a terrible, painful and several days long debugging session to find it !

As is reported on the comment

Code: Select all

// Extracting the _signed_ lower and upper 16 bits it not so trivial
// because according to the standard a simple cast to short is
// implementation defined and so is a right shift of a signed integer.
It means that gcc can and do aggresively optimize using also the smallest trick and obscure corner case that the C++ standard allows, and this potentially breaks a lot of implementations based on "common practices" or "well known" tricks.

The problem anyhow is in the code that is not fully 100% standard compliant, not in gcc.

The best approach is to _always_ avoid tricks, like sneak in two values in a single variable. When, as in this case, we really want to be "cool" then we also must be very careful about what we are doing.
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: My experience with Linux/GCC

Post by jshriver »

This was an enjoyable read. Sorry I didn't realise you were the author of Critter. I've enjoyed playing it.
User avatar
hgm
Posts: 27787
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: My experience with Linux/GCC

Post by hgm »

Wouldn't it be better to simply define it as a struct/union?

Code: Select all

typedef union &#123;
  int i;
  struct &#123;
    short int eg_value;
    short int mg_value;
  &#125; s;
&#125; ScorePair;
Then you can write score.s.eg_value in stead of eg_value(score), and you can still access it as an int through score.i. Or do you rely on the order in which the two halves are stored in the int?
User avatar
jshriver
Posts: 1342
Joined: Wed Mar 08, 2006 9:41 pm
Location: Morgantown, WV, USA

Re: My experience with Linux/GCC

Post by jshriver »

Will you also be releasing the linux version anytime soon :) been anxious to try 0.5 especially if there is a linux port. Been using the win32 under wine.

-Josh
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: My experience with Linux/GCC

Post by bob »

rvida wrote:I just want to share my experience with porting from win/msvc to linux/gcc.

GCC with -O2 setting optimizes very aggressively. In most cases where it makes a potentially 'unsafe' assumption it gives at least a warning. I had to rewrite some code pieces due to pointer-aliasing but that was pretty straightforward.
The biggest headache was with the following code:

Code: Select all

inline Score eg_value&#40;ScorePair s&#41; &#123;
  return Score&#40;short&#40;s & 0xffff&#41;);
&#125;

inline Score mg_value&#40;ScorePair s&#41; &#123;
  return Score&#40;&#40;int&#40;s&#41; + 0x8000&#41; >> 16&#41;;
&#125;

First, are you _really_ storing two signed values in 32 bits? Why not use excess-N notation so that you don't have to worry about the sign extension stuff.

For example, assuming your values are -32768 < N < +32767 (normal signed short range), and given two values s1 and s2, why not:

uint32_t merge = (s1 + 32768) << 16 + (s2 + 32768};

To get 'em back:

s1 = (merge >> 16) - 32768;
s2 = (merge & 0xffff) - 32768;

Now you don't have to worry about sign extension at all. (parens added for clarity).

What is the type "Score"? short or int?

One of the two returns has to do something since one returns an int value, one returns a short value.

That is not really something that I would complain about with respect to gcc, it is a really bad coding style in the first place... Not all machines have a sar instruction (x86 does). The C standards group left a bit too much "unspecified".

The way I did it above, you don't shift signed values, so that shl,shr are used and the sign bit is just a part of the value, not a sign at all...


MSVC (and GCC with -O1) does understand this, and correctly sign-extends by emmitting movsx/cwd/cdq instructions depending on context. However, GCC with -O2 completely optimized away eg_value(), and did some very weird things with mg_value().

In desperation, I took a look at StockFish's value.h and indeed found a solution therein (SF guys are much better at C/C++ stuff than me).
Now this code works correctly:

Code: Select all

#ifdef __GNUC__

inline Score eg_value&#40;ScorePair s&#41; &#123;
  return Score&#40;&#40;int&#41;&#40;unsigned&#40;s&#41; & 0x7fffu&#41; - &#40;int&#41;&#40;unsigned&#40;s&#41; & 0x8000u&#41;);
&#125;

inline Score mg_value&#40;ScorePair s&#41; &#123;
  return Score&#40;(&#40;int&#40;s&#41; + 0x8000&#41; & ~0xffff&#41; / 0x10000&#41;;
&#125;

#else

inline Score eg_value&#40;ScorePair s&#41; &#123;
  return Score&#40;short&#40;s & 0xffff&#41;);
&#125;

inline Score mg_value&#40;ScorePair s&#41; &#123;
  return Score&#40;&#40;int&#40;s&#41; + 0x8000&#41; >> 16&#41;;
&#125;

#endif
Big thanks to the StockFish team! I owe them some beer ;)

Contrary to my fears, the threading, locking and signaling stuff wasn't very hard to port. In case someone finds it useful here is my smp.h:

Code: Select all

#pragma once

#ifdef _WIN32

#include <windows.h>

class Event &#123;
  HANDLE hEvent;
public&#58;
  inline Event&#40;) &#123;&#125;;

  inline void init&#40;) &#123; 
    hEvent = CreateEvent&#40;NULL, false, false, NULL&#41;; 
  &#125;

  inline void destroy&#40;) &#123; 
    CloseHandle&#40;hEvent&#41;; 
  &#125;

  inline void signal&#40;) &#123; 
    SetEvent&#40;hEvent&#41;; 
  &#125;

  inline void wait_for&#40;) &#123; 
    WaitForSingleObject&#40;hEvent, INFINITE&#41;; 
  &#125;

  inline bool wait_for&#40;int timeout&#41; &#123;
    return &#40;WaitForSingleObject&#40;hEvent, timeout&#41; == WAIT_TIMEOUT&#41;;
  &#125;

&#125;;

#define ThreadProcResult DWORD WINAPI
typedef LPVOID ThreadProcParam;
typedef ThreadProcResult ThreadProc&#40;ThreadProcParam&#41;;

inline void start_thread&#40;ThreadProc proc, ThreadProcParam param&#41; &#123;
  DWORD tid;
  CloseHandle&#40;CreateThread&#40;NULL, 0, proc, param, 0, &tid&#41;);
&#125;

typedef CRITICAL_SECTION Lock;

#define lock_init&#40;lock&#41; InitializeCriticalSection&#40;&&#40;lock&#41;)
#define lock_acquire&#40;lock&#41; EnterCriticalSection&#40;&&#40;lock&#41;)
#define lock_release&#40;lock&#41; LeaveCriticalSection&#40;&&#40;lock&#41;)
#define lock_delete&#40;lock&#41; DeleteCriticalSection&#40;&&#40;lock&#41;)

#else

#include <pthread.h>
#include <errno.h>

class Event &#123;
  pthread_cond_t cond;
  pthread_mutex_t mutex;
  volatile bool signaled;
public&#58;
  inline Event&#40;) &#123;&#125;;

  inline void init&#40;) &#123;
    pthread_cond_init&#40;&cond, NULL&#41;;
    pthread_mutex_init&#40;&mutex, NULL&#41;;
    signaled = false;
  &#125;

  inline void destroy&#40;) &#123;
    pthread_cond_destroy&#40;&cond&#41;;
    pthread_mutex_destroy&#40;&mutex&#41;;
  &#125;

  inline void signal&#40;) &#123;
    pthread_mutex_lock&#40;&mutex&#41;;
    pthread_cond_broadcast&#40;&cond&#41;;
    signaled = true;
    pthread_mutex_unlock&#40;&mutex&#41;;
  &#125;

  inline void wait_for&#40;) &#123;
    pthread_mutex_lock&#40;&mutex&#41;;
    if (!signaled&#41;
      pthread_cond_wait&#40;&cond, &mutex&#41;;
    signaled = false;
    pthread_mutex_unlock&#40;&mutex&#41;;
  &#125;

  inline bool wait_for&#40;int timeout&#41; &#123;
    pthread_mutex_lock&#40;&mutex&#41;;
    int res = 0;
    if (!signaled&#41; &#123;
      struct timespec abstime;
      abstime.tv_sec  = timeout / 1000000;
      abstime.tv_nsec = timeout % 1000000;
      res = pthread_cond_timedwait&#40;&cond, &mutex, &abstime&#41;;
    &#125;
    signaled = false;
    pthread_mutex_unlock&#40;&mutex&#41;;
    return &#40;res == ETIMEDOUT&#41;;
  &#125;
&#125;;

#define ThreadProcResult void *
typedef void * ThreadProcParam;
typedef ThreadProcResult ThreadProc&#40;ThreadProcParam&#41;;

inline void start_thread&#40;ThreadProc proc, ThreadProcParam param&#41; &#123;
  pthread_t pthread&#91;1&#93;;
  pthread_attr_t attrib;
  pthread_attr_init&#40;&attrib&#41;;
  pthread_create&#40;pthread, &attrib, proc, param&#41;;
&#125;

typedef pthread_mutex_t Lock;

#define lock_init&#40;x&#41; pthread_mutex_init&#40;&&#40;x&#41;, NULL&#41;
#define lock_acquire&#40;x&#41; pthread_mutex_lock&#40;&&#40;x&#41;)
#define lock_release&#40;x&#41; pthread_mutex_unlock&#40;&&#40;x&#41;)
#define lock_delete&#40;x&#41; pthread_mutex_destroy&#40;&&#40;x&#41;)

#endif
mcostalba
Posts: 2684
Joined: Sat Jun 14, 2008 9:17 pm

Re: My experience with Linux/GCC

Post by mcostalba »

hgm wrote:Wouldn't it be better to simply define it as a struct/union?
No. The trick here is that you can sum/subtract two values in a single operation:

Code: Select all

s = s1 + s2;
Instead of writing

Code: Select all

s.eg_value = s1.eg_value + s2.eg_value;
s.mg_value = s1.mg_value+ s2.mg_value;
User avatar
Evert
Posts: 2929
Joined: Sat Jan 22, 2011 12:42 am
Location: NL

Re: My experience with Linux/GCC

Post by Evert »

mcostalba wrote:
hgm wrote:Wouldn't it be better to simply define it as a struct/union?
No. The trick here is that you can sum/subtract two values in a single operation:

Code: Select all

s = s1 + s2;
You can do that using the i field in the union.
You don't know whether the middle-game or the end-game value is the high or the low bit, but you don't need to know that either, since you'd use the struct inside the union to acces them when you need to.
UncombedCoconut
Posts: 319
Joined: Fri Dec 18, 2009 11:40 am
Location: Naperville, IL

Re: My experience with Linux/GCC

Post by UncombedCoconut »

hgm wrote:Wouldn't it be better to simply define it as a struct/union?

Code: Select all

typedef union &#123;
  int i;
  struct &#123;
    short int eg_value;
    short int mg_value;
  &#125; s;
&#125; ScorePair;
Then you can write score.s.eg_value in stead of eg_value(score), and you can still access it as an int through score.i. Or do you rely on the order in which the two halves are stored in the int?
I once tried replacing Stockfish's Score type with a struct that used two 16-bit fields, and got identical benchmark results. The result surprised me at the time. (I had a pre-conceived notion that the compiler would generate inefficient code for loading and storing bitfields.)
Last edited by UncombedCoconut on Wed Mar 23, 2011 7:21 pm, edited 1 time in total.