Diep tested on latest AMD and Intel processors

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Diep tested on latest AMD and Intel processors

Post by diep »

http://arstechnica.com/reviews/hardware ... view.ars/3

Most important is to look at one last page: power consumption. Measured also with Diep.

(thanks Joel Hruska for using diep for that).

95 watt difference. Boy oh boy.

Vincent
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Diep tested on latest AMD and Intel processors

Post by Gerd Isenberg »

diep wrote:http://arstechnica.com/reviews/hardware ... view.ars/3

Most important is to look at one last page: power consumption. Measured also with Diep.

(thanks Joel Hruska for using diep for that).

95 watt difference. Boy oh boy.

Vincent
Hi Vincent,

How many micro wattseconds does Diep take per node on those processors?

Intel clearly has the edge now. Future Nehalem and even more Sandy Bridge with 256-bit wide vector extensions (AVX) with a three operand risc -instructions set similar to amd's anounced 128-bit SSE5 sounds very interesting. Four bitboards in one register! Each bitboard may be shifted by a different generalized +-amount. A lot of shuffling and permuation stuff - Wow!

Cheers,
Gerd
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: Diep tested on latest AMD and Intel processors

Post by Pradu »

Gerd Isenberg wrote:Four bitboards in one register! Each bitboard may be shifted by a different generalized +-amount. A lot of shuffling and permuation stuff - Wow!

Cheers,
Gerd
How much faster do you guess fills become with quadbitboards? I'm sure there'll be a plethora of new bitboard tricks we can do with 256-bits if it becomes competitively fast :).
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Diep tested on latest AMD and Intel processors

Post by Gerd Isenberg »

Pradu wrote:How much faster do you guess fills become with quadbitboards? I'm sure there'll be a plethora of new bitboard tricks we can do with 256-bits if it becomes competitively fast :).
With recent intel cpus or K10 we already have a throughput of three independent 128-bit instructions per cycle. So guesswork. Assuming 256-bit alus and busses (which might not be the case in the first processor generation of sandy bridge) we may perform some stuff 1.5-2 times faster.

The generalized, independent shifts are definitly usefull for single bitboards, where you initialize a 256-bit register with copies (shuffle) of one bitboard - to shift four directions with one instruction. But with distinct bitboards inside a quad you'll likely shift each vector by same amount and immediate shift per direction, to do several directions in parallel with multiple registers. Actually msvc schedules the c++ code, based on a sse2-intrinsic wrapper quite nicely. It fills a quadbitboard by Kogge-Stone (e.g. wSliders:bSilders, bKing:wKing) interlaced with two opposite directions (north, south) in one run. With wider registers we often need additional shuffles or unpacks to arrange bitboards horizontally to further combine them.

PPERM as specified by amd's sse5, able to reverse bitboards, looks interesting for pure calculation of attacks ala hyperbola quintessence. With 256-bit registers we were able to calculate the four attacking lines of queen-attacks in one run likely with max ipc.

bit[64]*char[64] or bit[64]*short[64] dot-products will clearly profit - even more with future 512-bit register sets.