Improvements in AMD's K10 - good for Diep

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Improvements in AMD's K10 - good for Diep

Post by diep »

hi,

More and more articles slowly get there around AMD's new flagship that they want to launch quickly as highend opteron cpu. I am praying btw it won't be too expensive that chip, but i fear otherwise, despite its low clock of 2.0Ghz

http://www.xbitlabs.com/articles/cpu/print/amd-k10.html

For Diep there is real good news.

A step back in history, I remember GCP was doubting what to do with respect to sjeng's submission to SPEC. Of course on top of my list was next: indirect execution of function pointers. Already for years i'm doubting whether to rewrite that part in Diep, as it gets mispredicted always at for example K8.

Provided with this great opportunity we decided to add it to sjeng-spec. Of course then it was just a matter of waiting for a few years for new processors instead of rewriting that part in Diep.

Now i'm real happy that AMD engineers have undertaken action for specint2006 where Sjengspec is inside.

See here: "As we have expected, K10 boasts improved conditional branch prediction algorithms"

This will boost Diep a lot on K10!

Vincent
Dan Andersson
Posts: 442
Joined: Wed Mar 08, 2006 8:54 pm

Re: Improvements in AMD's K10 - good for Diep

Post by Dan Andersson »

It's not only the sjeng benchmark that stresses indirect branches.
The hmmer is a protein sequencing program that works by finding Hidden Markov models. And if it is reasonably state of the art it would be using the Viterbi algorithm that is a state machine.
Same with the gobmk GNUgo derivative witch contains a lot of pattern recognition. Patterns are usually recognized by state machines.
This is all good though. The giant switch penalty is mitigated (a boon for virtual machines) and dynamic and functional programming will also benefit greatly.

MvH Dan Andersson
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Improvements in AMD's K10 - good for Diep

Post by Michael Sherwin »

Hi Vincent,

I have a perft example program called Godzilla which is a hand coded assembler program that uses exclusively indirect jump tables for virtually everything. I wrote it before I knew anything about branch prediction and the penalties associated with it. Please, tell me, what will the new K10 do for code like the following?

Thanks,
Mike

Code: Select all

wqmf        dd          0
            dd          wqmnd,wqmnd,wqmnd,wqmnd,wqmnd,wqmnd,wqmnd,wqmnd
            dd          wqmnd,wqmnd,wqmnd,wqmnd,wqmnd,wqmnd,wqmnd,wqmnd
            dd          0,0,0
            dd          wqmrm
            dd          0
            dd          wqmrc,wqmrc,wqmrc,wqmrc,wqmrc,wqmrc,wqmrc,wqmrc
            dd          wqmrc,wqmrc,wqmrc,wqmrc,wqmrc,wqmrc,wqmrc,wmcbki
            dd          wnxtm

wqm:        mov         ecx,[ps+edi*4]
            mov         esi,[qol+ecx*4]
            movsx       ebx,[qns+esi+ecx]
            mov         edx,[brd+ebx*4]
            jmp         [wqmf+edx*4]
        
wqmrm:      mov         [tree.fsq+eax*8],cl
            mov         [tree.tsq+eax*8],bl
            mov         [tree.typ+eax*8],QMOV
            inc         eax
            movsx       ebx,[qns+esi+ebx]
            mov         edx,[brd+ebx*4]
            jmp         [wqmf+edx*4]

wqmrc:      mov         [tree.fsq+eax*8],cl
            mov         [tree.tsq+eax*8],dl
            mov         [tree.typ+eax*8],QCAP
            inc         eax
wqmnd:      movsx       ebx,[qnd+esi+ebx]
            mov         edx,[brd+ebx*4]
            jmp         [wqmf+edx*4]
:D
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: Improvements in AMD's K10 - good for Diep

Post by Gerd Isenberg »

Michael Sherwin wrote:Hi Vincent,

I have a perft example program called Godzilla which is a hand coded assembler program that uses exclusively indirect jump tables for virtually everything. I wrote it before I knew anything about branch prediction and the penalties associated with it. Please, tell me, what will the new K10 do for code like the following?

Thanks,
Mike
I am not Vincent, but it is bit hard to say without profiling information. No idea of the randomness of your indirect jumps and the duration of your cases. Not to mention how exactly the K10 heuristics work, eg. what pattern are easy and what are hard to predict for indirect jumps. Assuming changed-target-always the prediction rate may vary between let say 5% and 95% depending on the randomness and cycles of the pattern.

Some OO-approaches e.g. virtual pieces will benefit from K10 - java virtual machines as well:

Code: Select all

class Piece { ...
  virtual int getValue() = 0; // abstract base
};

class Pawn : public Piece { ...
  virtual int getValue() {return 100;}
};
class Rook : public Piece { ...
  virtual int getValue() {return 500;}
};

...
while (p = getNextPiece())
  x += p->getValue(); 
with some assembly like this:

Code: Select all

mov  edx, [board+4*esi]   ; pointer to next "piece"
push edx	          ; "this" as actual parameter for the function call
mov  eax, [edx.vptr]      ; pointer to virtual table of the concrete class
call [eax+OffsetGetValueInVTBL] ; indirect call
Each time next piece changes, K8 (and core2duo?) will miss-predict the indirect branch. K10 will hopefully do better, assuming the piece-pattern doesn't change that randomly. State machines inside a recursive or iterative search have likely harder to predict pattern over all plies. Last target always wrong cases obviously gain more than cases with last target most often right.