Compiler switches

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
abik
Posts: 819
Joined: Fri Dec 01, 2006 10:46 pm
Location: Mountain View, CA, USA
Full name: Aart Bik

Re: Compiler switches

Post by abik »

Dear Jarkko,
It pains me to admit, but this is a compiler bug all right (in 9.1, no longer in 10.0). I downloaded the source and could reproduce and debug the difference with “go depth 15” exactly. By a very strange coincidence, but most fitting, the bug was in my own module, namely automatic vectorization. Thanks to your sharp eye, I am able to correct this mistake in the 9.1 version! Ironic how my hobby and job met here.
Thanks again,
Aart Bik
http://www.aartbik.com/
jarkkop
Posts: 198
Joined: Thu Mar 09, 2006 2:44 am
Location: Helsinki, Finland

Re: Compiler switches

Post by jarkkop »

Nice that I could help you and was not imagining things like it sometimes is the case.
Can you as an expert say what could switches could help take most of "your" compiler to make toga even faster executable? Can you say with your fixed version is the /QxT making Toga any faster than /QxP for E4300?

Jarkko
User avatar
abik
Posts: 819
Joined: Fri Dec 01, 2006 10:46 pm
Location: Mountain View, CA, USA
Full name: Aart Bik

Re: Compiler switches

Post by abik »

Dear Jarkko,

FWIW, I just committed the compiler fix to our development workspace, which means that it will eventually find its way to a product update. As for your performance question, some good suggestions were already made in this thread. Below, I show some results with the fixed 9.1 and upcoming 10.0 on a 2.4GHz Conroe (keep in mind that results you reported earlier for –QxT after about 9 seconds exposed the bug, it would change the variation on depth 15 a few seconds later; the results below are the only variation reported for depth 15). Chess engines pose challenges on compiler optimization, partly due to the nature of the application and probably partly due to the fact that most chess programmers understand compilers well enough to do a lot of optimization at source level already. So I am glad to see that at least some performance benefits are obtained.
-O2 (9.1)
info multipv 1 depth 15 seldepth 44 score cp 16 time 24156 nodes 21493535 pv b1c3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1

-Qprof_use -O3 -Qipo –QxP (9.1)
info multipv 1 depth 15 seldepth 44 score cp 16 time 19172 nodes 21493535 pv b1c3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1

-Qprof_use -O3 -Qipo –QxT (9.1)
info multipv 1 depth 15 seldepth 44 score cp 16 time 19094 nodes 21493535 pv b1c3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1

-Qprof_use -O3 -Qipo –QxP (10.0)
info multipv 1 depth 15 seldepth 44 score cp 16 time 18828 nodes 21493535 pv b1c3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1

-Qprof_use -O3 -Qipo –QxT (10.0)
info multipv 1 depth 15 seldepth 44 score cp 16 time 18672 nodes 21493535 pv b1c3 g8f6 d2d4 d7d5 c1f4 c7c5 e2e3 c5d4 e3d4 d8b6 d1d3 c8d7 e1c1 b8a6 d3f3 a6b4 c1b1
Thanks again for bringing this bug to my attention. One final comment, I did not peak at the Toga source other than to debug the compiler (the “weakness” of my own chess engine gives sufficient proof for that).
:wink:

Aart Bik
http://www.aartbik.com/
jwes
Posts: 778
Joined: Sat Jul 01, 2006 7:11 am

Re: Compiler switches

Post by jwes »

I read in the intel optimization manual that the bit operations are now very fast in the Core 2 Duo. Does the Intel compiler use these ? E.g., translate

if (x & (1 << n))
do something
x &= ~(1 << n)

to

BTR x,n
JNC xx
do something
xx:
User avatar
abik
Posts: 819
Joined: Fri Dec 01, 2006 10:46 pm
Location: Mountain View, CA, USA
Full name: Aart Bik

Re: Compiler switches

Post by abik »

If you simply are referring to bit-test instructions, then yes, see below. If I miss a subtle detail in your question, please forgive my ignorance and elaborate.

int x, n;

if (x & (1 << n))
global = 0;

translates by default (O2) to:

mov ecx, DWORD PTR [_n]
mov eax, 1
shl eax, cl
test DWORD PTR [_x], eax
je skip

mov DWORD PTR [_global], 0
skip:

but when compiled for Core 2 Duo (QxT) to:

mov eax, DWORD PTR [_x]
mov edx, DWORD PTR [_n]
bt eax, edx
jae skip

mov DWORD PTR [_global], 0
skip:
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Compiler switches

Post by Gerd Isenberg »

abik wrote:If you simply are referring to bit-test instructions, then yes, see below. If I miss a subtle detail in your question, please forgive my ignorance and elaborate.
Hi Aart,

guess Wesley's question was related, whether the compiler understands the semantic of resetting the bit by using btr instead of bt. Eg. what is the assembly of this inlined bool bitTestAndReset - routine:

Code: Select all

bool bitTestAndReset&#40;unsigned int &set, unsigned int bitIndex&#41;
&#123;
    unsigned int bit = 1 << bitIndex;
    bool isSet = &#40;set & bit&#41; != 0;
    set &= ~bit
    return isSet;
&#125;

Code: Select all

if ( bitTestAndReset&#40;x, n&#41;)
   doSomething&#40;);
Does it translate to something like this?

Code: Select all

mov eax, DWORD PTR &#91;_x&#93;
mov edx, DWORD PTR &#91;_n&#93;
btr eax, edx
mov DWORD PTR &#91;_x&#93;, eax
jnz skip 
Or do we explicitly need the _bittestandreset (or _bittestandreset64) intrinsics?

Code: Select all

if ( _bittestandreset&#40;&x, n&#41;)
   doSomething&#40;);
Thanks,
Gerd
User avatar
abik
Posts: 819
Joined: Fri Dec 01, 2006 10:46 pm
Location: Mountain View, CA, USA
Full name: Aart Bik

Re: Compiler switches

Post by abik »

Thanks for the detailed explanation Gerd, which was very helpful. In that case the answer is unfortunately no, or perhaps, not yet, as I am going to discuss this idea with our code generator experts.