eval pieces

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: eval pieces

Post by Joost Buijs »

Theory is nice but in practice things will almost always be different.

I have two versions of my evaluation function, an old one with a lot of branches depending on color and another one with templates.
I made the one with templates because I didn't like the messy code of the old version, it was difficult to maintain and not very easy to read.
Both evaluation functions are identical in any other way, when I check with RDTSC how much time each function takes there is at the very most a difference of a few nanoseconds which might as well be noise.

I prefer the one with templates because it is much cleaner in any way.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: eval pieces

Post by sje »

Symbolic has only a very few conditional statements of the form color == ColorWhite. These few are used only when normalizing some data for I/O and not deep in any calculation.

What Symbolic uses instead are sets of constant value arrays with at least one dimension indexed by color. That's how the program knows about pawn advance direction and the like. These arrays are small and live comfortably in the L1 data cache. There's little need to trust in branch prediction here because there are no test-on-color branches.

Symbolic does use templates, but not with color or piece type parameters as a matter of style and because the array indexing by color which works well. However. I don't think that templates will add much to code cache pressure because the L1 cache size on the better CPUs is large enough so that a few template instantiations based on color/piece/man won't make much difference anyway.
Joost Buijs
Posts: 1563
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: eval pieces

Post by Joost Buijs »

sje wrote:Symbolic has only a very few conditional statements of the form color == ColorWhite. These few are used only when normalizing some data for I/O and not deep in any calculation.

What Symbolic uses instead are sets of constant value arrays with at least one dimension indexed by color. That's how the program knows about pawn advance direction and the like. These arrays are small and live comfortably in the L1 data cache. There's little need to trust in branch prediction here because there are no test-on-color branches.

Symbolic does use templates, but not with color or piece type parameters as a matter of style and because the array indexing by color which works well. However. I don't think that templates will add much to code cache pressure because the L1 cache size on the better CPUs is large enough so that a few template instantiations based on color/piece/man won't make much difference anyway.
In my evaluation function most constants are also indexed by color with a few exceptions, and in my case it uses a template type specifier as index.

Like you already mentioned, the L1 instruction cache on modern CPU's is large enough (32KB per core) to easily hold a few template instantiations for an evaluation function.

I've tested it thoroughly and could not find any speed disadvantage when using templates.
Maybe things will be different on older CPU or with larger evaluation functions, I really can't tell.
Daniel Anulliero
Posts: 759
Joined: Fri Jan 04, 2013 4:55 pm
Location: Nice

Re: eval pieces

Post by Daniel Anulliero »

Thanks for the answers
I will try to write an eval with templates and we'll see 😉
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: eval pieces

Post by bob »

Joost Buijs wrote:Theory is nice but in practice things will almost always be different.

I have two versions of my evaluation function, an old one with a lot of branches depending on color and another one with templates.
I made the one with templates because I didn't like the messy code of the old version, it was difficult to maintain and not very easy to read.
Both evaluation functions are identical in any other way, when I check with RDTSC how much time each function takes there is at the very most a difference of a few nanoseconds which might as well be noise.

I prefer the one with templates because it is much cleaner in any way.
I did not say that either (a) duplicate black/white or (b) combined black/white was significantly better for everyone. There are lots of things that affect this.

(1) how much stress do you put on L1/L2/L3 cache? If you are stressing it heavily, adding more code to the cache footprint increases this and it can definitely affect performance.

(2) the branches in the combined vs separate cases are irrelevant. Modern branch prediction, particularly correlated branch prediction, is simply too good for that to be an issue at all.

So the ONLY consideration is cache footprint. If you have not tried to optimize your code to manage this, then it probably won't matter what you do. But if you have carefully laid out functions in memory based on locality, have carefully grouped variables in memory based on locality, then any cache mismanagement can be measured. But it is a no-brainer to realize that duplicated code is not a good idea, period. How much it hurts will depend on the program, but it definitely hurts unless everything fits in cache either way. There is certainly no performance benefit with duplicated code, whether it be written using templates (best since at least you don't have duplicated source to work on) or done in plain C-style code where the procedures are manually duplicated. Eliminating branches that are predicted with 100% accuracy won't speed up anything unless the code executes with zero stalls of any kind, which is not going to happen in a chess engine.

c++ is teaching sloppy programming practices, unfortunately. For example, in Crafty, the ENTIRE source compiles into an object file of about 300K. egtb.cpp (the Nalimov code with massive template usage) compiles into a 1mb object file. From a performance perspective, that certainly sucks with two straws... L3 cache might be able to cope with that, but L1/L2 are not going to do so well..
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: eval pieces

Post by sje »

bob wrote:c++ is teaching sloppy programming practices, unfortunately. For example, in Crafty, the ENTIRE source compiles into an object file of about 300K. egtb.cpp (the Nalimov code with massive template usage) compiles into a 1mb object file. From a performance perspective, that certainly sucks with two straws... L3 cache might be able to cope with that, but L1/L2 are not going to do so well..
I'm rather doubtful about the merit of correlating object size with run time efficiency. Many of the routines in an executable may be called only a few times or even just once, so no real code cache pressure. Many run time support routines are doing slow I/O and so any overhead will be hard to measure.

In Symbolic, there are no chess specific templates. The templates which are used are for ornamenting various classes with handy linkage pointers and routines for organizing class instances into stacks, queues, lists, and trees. Using these templates is no different at run time than using explicitly coding C language equivalents.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: eval pieces

Post by bob »

sje wrote:
bob wrote:c++ is teaching sloppy programming practices, unfortunately. For example, in Crafty, the ENTIRE source compiles into an object file of about 300K. egtb.cpp (the Nalimov code with massive template usage) compiles into a 1mb object file. From a performance perspective, that certainly sucks with two straws... L3 cache might be able to cope with that, but L1/L2 are not going to do so well..
I'm rather doubtful about the merit of correlating object size with run time efficiency. Many of the routines in an executable may be called only a few times or even just once, so no real code cache pressure. Many run time support routines are doing slow I/O and so any overhead will be hard to measure.

In Symbolic, there are no chess specific templates. The templates which are used are for ornamenting various classes with handy linkage pointers and routines for organizing class instances into stacks, queues, lists, and trees. Using these templates is no different at run time than using explicitly coding C language equivalents.
Without initialized data, the size of an object file is pretty much proportional to the instruction count. egtb is huge in that regard, since there are so many endgame combinations (with 6 pieces) to enumerate... Yes if you have unused stuff that won't bother cache, but at least for evaluation, you call both sides most of the time.

Certainly if you template evaluation for black and white, you are going to duplicate that code. If you do like Rybka did, I don't remember the final count but there were so many eval instances everyone got tired of counting. Think having one eval when white has no rooks, one when black has no rooks, one when both have no rooks, as that eliminates the test and procedure calls. Repeat for pawns, knights, bishops and queens, 3 combinations each. 3^5 = huge code bloat to save a few branches that are predicted correctly anyway...
mar
Posts: 2555
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: eval pieces

Post by mar »

sje wrote:I'm rather doubtful about the merit of correlating object size with run time efficiency.
Yes and I bet most of that is symbols anyway :)
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: eval pieces

Post by lucasart »

bob wrote: c++ is teaching sloppy programming practices, unfortunately. For example, in Crafty, the ENTIRE source compiles into an object file of about 300K. egtb.cpp (the Nalimov code with massive template usage) compiles into a 1mb object file. From a performance perspective, that certainly sucks with two straws... L3 cache might be able to cope with that, but L1/L2 are not going to do so well..
What matters is the size of the executable, not the size of the various object files used before linking.

The problem of C++ is more in the compilation speed, which is extremely slow (partly because the syntax rules are extremely complicated and context sensitive, and also because the standard libraries are so crufty). But the linker can then throw away what's not needed. So the result should not be as large as the sum of object file sizes. The speed of the compiled code is equivalent to C (assuming the programmer knows what he's doing, though most people who use C++ really don't know what they're doing).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: eval pieces

Post by sje »

mar wrote:
sje wrote:I'm rather doubtful about the merit of correlating object size with run time efficiency.
Yes and I bet most of that is symbols anyway :)
About 5%:

Code: Select all

gail:Symbolic sje$ ls -l Symbolic
-rwxr-xr-x  1 sje  staff  1010200 Jun 19 22:52 Symbolic
gail:Symbolic sje$ strip Symbolic
gail:Symbolic sje$ ls -l Symbolic
-rwxr-xr-x  1 sje  staff  955168 Jun 20 04:36 Symbolic