http://arstechnica.com/reviews/hardware ... view.ars/3
Most important is to look at one last page: power consumption. Measured also with Diep.
(thanks Joel Hruska for using diep for that).
95 watt difference. Boy oh boy.
Vincent
Diep tested on latest AMD and Intel processors
Moderators: hgm, Rebel, chrisw
-
- Posts: 1822
- Joined: Thu Mar 09, 2006 11:54 pm
- Location: The Netherlands
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Diep tested on latest AMD and Intel processors
Hi Vincent,diep wrote:http://arstechnica.com/reviews/hardware ... view.ars/3
Most important is to look at one last page: power consumption. Measured also with Diep.
(thanks Joel Hruska for using diep for that).
95 watt difference. Boy oh boy.
Vincent
How many micro wattseconds does Diep take per node on those processors?
Intel clearly has the edge now. Future Nehalem and even more Sandy Bridge with 256-bit wide vector extensions (AVX) with a three operand risc -instructions set similar to amd's anounced 128-bit SSE5 sounds very interesting. Four bitboards in one register! Each bitboard may be shifted by a different generalized +-amount. A lot of shuffling and permuation stuff - Wow!
Cheers,
Gerd
-
- Posts: 287
- Joined: Sat Mar 11, 2006 3:19 am
- Location: Atlanta, GA
Re: Diep tested on latest AMD and Intel processors
How much faster do you guess fills become with quadbitboards? I'm sure there'll be a plethora of new bitboard tricks we can do with 256-bits if it becomes competitively fast .Gerd Isenberg wrote:Four bitboards in one register! Each bitboard may be shifted by a different generalized +-amount. A lot of shuffling and permuation stuff - Wow!
Cheers,
Gerd
-
- Posts: 2250
- Joined: Wed Mar 08, 2006 8:47 pm
- Location: Hattingen, Germany
Re: Diep tested on latest AMD and Intel processors
With recent intel cpus or K10 we already have a throughput of three independent 128-bit instructions per cycle. So guesswork. Assuming 256-bit alus and busses (which might not be the case in the first processor generation of sandy bridge) we may perform some stuff 1.5-2 times faster.Pradu wrote:How much faster do you guess fills become with quadbitboards? I'm sure there'll be a plethora of new bitboard tricks we can do with 256-bits if it becomes competitively fast .
The generalized, independent shifts are definitly usefull for single bitboards, where you initialize a 256-bit register with copies (shuffle) of one bitboard - to shift four directions with one instruction. But with distinct bitboards inside a quad you'll likely shift each vector by same amount and immediate shift per direction, to do several directions in parallel with multiple registers. Actually msvc schedules the c++ code, based on a sse2-intrinsic wrapper quite nicely. It fills a quadbitboard by Kogge-Stone (e.g. wSliders:bSilders, bKing:wKing) interlaced with two opposite directions (north, south) in one run. With wider registers we often need additional shuffles or unpacks to arrange bitboards horizontally to further combine them.
PPERM as specified by amd's sse5, able to reverse bitboards, looks interesting for pure calculation of attacks ala hyperbola quintessence. With 256-bit registers we were able to calculate the four attacking lines of queen-attacks in one run likely with max ipc.
bit[64]*char[64] or bit[64]*short[64] dot-products will clearly profit - even more with future 512-bit register sets.