Diep tested on latest AMD and Intel processors

Discussion of chess software programming and technical issues.

Moderators: hgm, Harvey Williamson, bob

Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
Post Reply
diep
Posts: 1780
Joined: Thu Mar 09, 2006 10:54 pm
Location: The Netherlands
Contact:

Diep tested on latest AMD and Intel processors

Post by diep » Mon Mar 31, 2008 1:01 am

http://arstechnica.com/reviews/hardware ... view.ars/3

Most important is to look at one last page: power consumption. Measured also with Diep.

(thanks Joel Hruska for using diep for that).

95 watt difference. Boy oh boy.

Vincent

Gerd Isenberg
Posts: 2105
Joined: Wed Mar 08, 2006 7:47 pm
Location: Hattingen, Germany

Re: Diep tested on latest AMD and Intel processors

Post by Gerd Isenberg » Mon Mar 31, 2008 4:14 pm

diep wrote:http://arstechnica.com/reviews/hardware ... view.ars/3

Most important is to look at one last page: power consumption. Measured also with Diep.

(thanks Joel Hruska for using diep for that).

95 watt difference. Boy oh boy.

Vincent
Hi Vincent,

How many micro wattseconds does Diep take per node on those processors?

Intel clearly has the edge now. Future Nehalem and even more Sandy Bridge with 256-bit wide vector extensions (AVX) with a three operand risc -instructions set similar to amd's anounced 128-bit SSE5 sounds very interesting. Four bitboards in one register! Each bitboard may be shifted by a different generalized +-amount. A lot of shuffling and permuation stuff - Wow!

Cheers,
Gerd

Pradu
Posts: 287
Joined: Sat Mar 11, 2006 2:19 am
Location: Atlanta, GA
Contact:

Re: Diep tested on latest AMD and Intel processors

Post by Pradu » Mon Mar 31, 2008 9:32 pm

Gerd Isenberg wrote:Four bitboards in one register! Each bitboard may be shifted by a different generalized +-amount. A lot of shuffling and permuation stuff - Wow!

Cheers,
Gerd
How much faster do you guess fills become with quadbitboards? I'm sure there'll be a plethora of new bitboard tricks we can do with 256-bits if it becomes competitively fast :).

Gerd Isenberg
Posts: 2105
Joined: Wed Mar 08, 2006 7:47 pm
Location: Hattingen, Germany

Re: Diep tested on latest AMD and Intel processors

Post by Gerd Isenberg » Tue Apr 01, 2008 6:32 am

Pradu wrote:How much faster do you guess fills become with quadbitboards? I'm sure there'll be a plethora of new bitboard tricks we can do with 256-bits if it becomes competitively fast :).
With recent intel cpus or K10 we already have a throughput of three independent 128-bit instructions per cycle. So guesswork. Assuming 256-bit alus and busses (which might not be the case in the first processor generation of sandy bridge) we may perform some stuff 1.5-2 times faster.

The generalized, independent shifts are definitly usefull for single bitboards, where you initialize a 256-bit register with copies (shuffle) of one bitboard - to shift four directions with one instruction. But with distinct bitboards inside a quad you'll likely shift each vector by same amount and immediate shift per direction, to do several directions in parallel with multiple registers. Actually msvc schedules the c++ code, based on a sse2-intrinsic wrapper quite nicely. It fills a quadbitboard by Kogge-Stone (e.g. wSliders:bSilders, bKing:wKing) interlaced with two opposite directions (north, south) in one run. With wider registers we often need additional shuffles or unpacks to arrange bitboards horizontally to further combine them.

PPERM as specified by amd's sse5, able to reverse bitboards, looks interesting for pure calculation of attacks ala hyperbola quintessence. With 256-bit registers we were able to calculate the four attacking lines of queen-attacks in one run likely with max ipc.

bit[64]*char[64] or bit[64]*short[64] dot-products will clearly profit - even more with future 512-bit register sets.

Post Reply