Initializing portions of arrays

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Initializing portions of arrays

Post by matthewlai »

Robert Pope wrote:This is related to C programming. I currently initialize a number of arrays using the format, e.g.:
int pcVal[]={0,103,300,9999,325,500,900};
int nPcSq[] = {
-10, -30, -10, -10, -10, -10, -30, -10,
-10, 0, 0, 0, 0, 0, 0, -10,
-10, 0, 6, 6, 6, 6, 0, -10,
-10, 3, 6, 12, 12, 6, 3, -10,
-10, 3, 6, 12, 12, 6, 3, -10,
-10, 0, 6, 6, 6, 6, 0, -10,
-10, 0, 0, 0, 0, 0, 0, -10,
-10, -30, -10, -10, -10, -10, -30, -10 };

Now that I am trying to do some optimization, I've put all the piece square tables and bonus values into one big array, with pointers for the different components:

int evalTerm[EVALNUM];
int* nPcSq;
int* bPcSq;
int* kPcSq;

nPcSq=&evalTerm[0];
bPcSq=&evalTerm[64];
kPcSq=&evalTerm[128];

But now is there a good way to set the values of the piece square tables, etc., without resorting to nPcSq[0]=-10; nPcSq[1]=-30; etc. ?
Just a heads up, but if you are not very very careful, accessing those fields using pointers can potentially make the compiler miss a lot of optimization opportunities, due to concerns like aliasing (eg, in some cases the compiler cannot prove that 2 pointers cannot point to the same thing, and so will have to take that into account when optimizing).

So it has the potential to make your code much slower.

If you need to access all parameters for tuning, I would use separate arrays as they were, and make an additional array of pointer-length pairs. Then you get the performance hit in tuning, not in playing.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Initializing portions of arrays

Post by hgm »

This is why I preferred the #define method. You can probably achieve the same with 'const':

const int *nPcSq = &evalTerm[0];

etc. Then the compiler will know at all times that it is just a fixed address.
matthewlai
Posts: 793
Joined: Sun Aug 03, 2014 4:48 am
Location: London, UK

Re: Initializing portions of arrays

Post by matthewlai »

hgm wrote:This is why I preferred the #define method. You can probably achieve the same with 'const':

const int *nPcSq = &evalTerm[0];

etc. Then the compiler will know at all times that it is just a fixed address.
That is a better idea.
Disclosure: I work for DeepMind on the AlphaZero project, but everything I say here is personal opinion and does not reflect the views of DeepMind / Alphabet.
Robert Pope
Posts: 558
Joined: Sat Mar 25, 2006 8:27 pm

Re: Initializing portions of arrays

Post by Robert Pope »

Thanks for the warning. I am working on hgm's #define method.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Initializing portions of arrays

Post by wgarvin »

My suggestion is to define a struct with your arrays inside it. (If you are in C++, do not put any private: or public: declarations as that allows the compiler freedom to mess up your hand-made layout at that point... not that I've ever seen one that does). Then declare a global var that is an instance of the struct, and you can initialize it like any other global var (i.e. with a { } initializer that contains initializers for each array).

You should always access it through the name of the global var, so the compiler can pre-compute addresses and have the best info about (non)aliasing of the accesses. If you find that ugly, you can make macro(s) that hide the ugly bit.

Maybe something like this:

Code: Select all

struct Globals
{
    int  board[10*18];
    int  holdings[3*18];  // or whatever
};

Globals g_Globals =
{
    /* board */ { ... },
    /* holdings */ { ... },
};

#define G_Board  (g_Globals.board)
#define G_Holdings  (g_Globals.holdings)
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Initializing portions of arrays

Post by hgm »

This would indeed be a cleaner solution. But I abandoned it because there is one thing it cannot do: interleave arrays. When I use a 0x88 board, the Square numbers run upto 128, but half of them are unused. So PST indexed by square numbers waste a lot of space when you implement them as seperate arrays. To prevent that I often use the off-board square numbers of the white table to store the black table. E.g.

#define whitePawn (PST + 0)
#define blackPawn (PST + 8)
#define whiteKnight (PST + 128 + 0)
#define blackKnight (PST + 128 + 8)
#define whiteBishop (PST + 2*128 + 0)
#define blackBishop (PST + 2*128 + 8)

Due to the irregularity this requires a list of pointers pointing into the tables, which can then be indexed by piece or piece type. To avoid that extra level of indirection I use a slightly different technique in the code excerpt from Shokidoki I posted above: the board is 18x12 there, only the left 9 columns in use, and the PST are accessed as PST[117*pieceType + square]. As 117 is an odd multiple of 9, this interleaves the tables, apart from 9 wasted bytes between one PST and that of two piece types higher.
AlvaroBegue
Posts: 931
Joined: Tue Mar 09, 2010 3:46 pm
Location: New York
Full name: Álvaro Begué (RuyDos)

Re: Initializing portions of arrays

Post by AlvaroBegue »

hgm wrote:This is why I preferred the #define method. You can probably achieve the same with 'const':

const int *nPcSq = &evalTerm[0];

etc. Then the compiler will know at all times that it is just a fixed address.
No, that's not necessarily a fixed address. You probably meant

int const * const nPcSq = &evalTerm[0];

This first `const' means that the `int's being pointed to are constant. The second `const' means the address `nPcSq' points to is constant.

The order of the `int' and the first `const' doesn't matter. I prefer the order I used because you can then read the type backwards from the variable name: "A constant pointer to constant ints".
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Initializing portions of arrays

Post by hgm »

Ah, thanks for the correction. I have never actually used const; when my programs contain constants I always #define them as macros. So what I meant would actually be

int * const nPcSq = &evalTerm[0];

Then the address is constant, but the evalTerm data can still be changed by learning, or when switching to a different variant.
wgarvin
Posts: 838
Joined: Thu Jul 05, 2007 5:03 pm
Location: British Columbia, Canada

Re: Initializing portions of arrays

Post by wgarvin »

AlvaroBegue wrote:
hgm wrote:This is why I preferred the #define method. You can probably achieve the same with 'const':

const int *nPcSq = &evalTerm[0];

etc. Then the compiler will know at all times that it is just a fixed address.
No, that's not necessarily a fixed address. You probably meant

int const * const nPcSq = &evalTerm[0];

This first `const' means that the `int's being pointed to are constant. The second `const' means the address `nPcSq' points to is constant.

The order of the `int' and the first `const' doesn't matter. I prefer the order I used because you can then read the type backwards from the variable name: "A constant pointer to constant ints".
Actually that's not quite what const means. Const gives us a way to have read access to something for a while, without also being granted permission to write to it. So if my function accepts a const Foo* then callers can think of my function as promising to "maybe read from, but not write to" whatever Foo object they pass to it. Without const, the ability to read and ability to write were always paired together. But if const is being consistently used throughout the program, then when a function accepts a Foo* the caller can think of that as "this function needs to write to the Foo". Similarly, if you mark all of your local variables as const except the ones that you actually need to modify, then suddenly 90% of your variables are const and very easy to reason about, and you only have to worry about the other 10% when reviewing or auditing the code.

So you can use it to express your intention to human readers, and make your code clearer and easier to reason about. But it is mostly useless to the optimizer, because of aliasing.

If two different pointers A and B of compatible types point to the same storage, then writes to *A will be visible when you read *B, even if B's type was "const int* B". That declaration basically means "B is a pointer to some int(s) which I don't want to write to through B", however it can't guarantee that some other code somewhere doesn't write to that same int through some other pointer. A compiler can do alias analysis and sometimes it can prove that at the point where you use B, nothing else is aliased to it (or some other known pointer is definitely aliased to it). But const won't help it at all for this kind of analysis. Even local variables can be aliased, if you take the address of the variable and that address "escapes" to a global context (e.g. if it was passed as a parameter to some function that the optimizer can't see the source to and doesn't know the semantics of).

Any two pointers to compatible types might or might not alias, and unless the optimizer can prove that its one or the other, it has to conservatively generate code that works correctly for both cases. :) So basically, our promise to not modify it is unnecessary when there's no aliasing happening, and its not strong enough to be useful to the optimizer when there is (or might be) aliasing happening.

Const can be a bit subtle. There's a distinction between "logical constness" and "physical constness", but there's only one keyword, and they usually work out to the same thing, except when they don't.
The keyword is actually a "type qualifier" just like volatile (together, they are called "cv-qualifiers", and you can cv-qualify methods and overload them using the qualifier, which is a big part of why const is so useful in C++ : you can basically write const methods that "only read" the object, and non-const methods that "might write" to the object, and have transitive logical constness everywhere, but in special cases where you don't want it, that's easy too). Anyways, in some ways "const int" and "int" will be treated as the same type, but in other ways they won't.

And then there's a few special cases like "const int X = 5" where the language says the actual X variable can be elided and all uses of it replaced with the value 5. But for "const float Y = 5.0f", I think its allowed to replace uses with 5.0f but it can't elide the variable Y. Or something. I guess the moral of the story is: const is a weird animal ! :lol:

Anyway, I'm beating a dead horse. tl;dr : const is really for humans, not compilers.


Back on topic...
hgm wrote:int * const nPcSq = &evalTerm[0];

Then the address is constant, but the evalTerm data can still be changed by learning, or when switching to a different variant.
The address is "constant" in the sense that you aren't allowed to assign a new address to the nPcSq variable, but that has nothing to do with whether it will be "a fixed address" at the spot where you use it


If and only if evalTerm is a global variable of an array type (NOT pointer type!) then &evalTerm[0] will be a "fixed address" known to the linker. If its anything else--function parameter, array variable on the stack, etc. then it won't be. Whether or not you mark it const, if and only if your compiler knows nPcSq points at a "fixed address" at the point where you use it, will it be able to generate instructions that access directly at that address, instead of indirectly through some register/pointer. (It doesn't need to know the actual address, just that it will be fixed once the program is loaded into memory.. the linker, and perhaps the loader, will help resolve the actual value)
mar
Posts: 2554
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: Initializing portions of arrays

Post by mar »

wgarvin wrote:And then there's a few special cases like "const int X = 5" where the language says the actual X variable can be elided and all uses of it replaced with the value 5. But for "const float Y = 5.0f", I think its allowed to replace uses with 5.0f but it can't elide the variable Y. Or something. I guess the moral of the story is: const is a weird animal ! :lol:
A bit of OT (sorry):

Yes a similar thing happened to me as well (sigh).
I'm not sure if it was a float, but I had a static const name=value declared in a class.
Then somewhere in the code (later) I used Max( x, y ), where say x was that constant.
Max is just a simple template that takes arguments by const reference (just like std::max).
And the compiler (I think it was gcc) complained at link phase (I think) because it probably wanted to pass const by reference (debug build IIRC).
When I changed Max to accept args by value, it worked, but of course that's not what I want for generic types :)
Problem is I can't "fix" this by instantiating it as a variable like type Class::name = value (or maybe without the assignment).
While gcc/clang are happy with this, msc gave me a redefinition error so obviously this not standardized.
So somehow I still haven't found a fully portable and clean way to declare pure constants (except for enum when working with ints).
And I certainly don't want to use preprocessor macros :)
Maybe this has been addressed in modern C++11 and onwards?