Evaluation functions. Why integer?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
xsadar
Posts: 147
Joined: Wed Jun 06, 2007 10:01 am
Location: United States
Full name: Mike Leany

Re: Evaluation functions. Why integer?

Post by xsadar »

Uri Blass wrote:
xsadar wrote:
Uri Blass wrote:
Zach Wegner wrote:
Uri Blass wrote:I agree about point 1 but I disagree about 2.

You can represent every integer as double.
You even can have exactly the same evaluation when you replace integer by double and the only difference is that you are going to be slower.
Doubles can represent integers exactly, but that's not what we're doing. We're trying to represent fixed-point decimals of the form x.yy. You cannot represent 1/100 exactly in a double, as you must round at some point:

Code: Select all

0.0100000000000000002081668171172168513294309377670288085937500000000000000000000000000000000000000000
And though there is a one-to-one correspondence between every centipawn evaluation to its double representation, the rounding will crop up. Notice how 1/100 is rounded so that the double representation is bigger than 1/100? Try running this program:

Code: Select all

#include <stdio.h>
main&#40;void&#41;&#123;
    double d = 0.01, s = 0;
    int i;
    for &#40;i = 0; i < 1000; i++)
        s += d;
    printf&#40;"%.100f\n", s&#41;;
&#125;
You would expect s to be either exactly 10, or maybe a bit bigger? Here's what it actually is:

Code: Select all

9.9999999999998312461002569762058556079864501953125000000000000000000000000000000000000000000000000000
Which is _less_ than 10.

Now if you wanted to use "binary" values, and have your base unit be 1/256 or whatever of a pawn, go ahead. Or you could also use doubles with each number scaled up by 100 or so, but then what's the point?
I get almost the same
I get
9.9999999999998312000000000000000000000000000000000000000000000000000000000000000000000000000000000000

I understand that 0.01 is binary approximation of 0.01 and it seems that the C is misleading.
I think that they simply should not allow me to write d=0.01 without warning by the compiler that d is not 0.01 but binary approximation of that value and double means
the finite set of non-zero values of the form s * m * 2e, where s is 1 or -1, and 0 < m < 253 and -1075 <= e <= 970.
.

I wonder why they do not do it.


I even could not see it based on clicking help on the word double and I had to search in google to find the following link that gives me the exact range of data

http://www.dotgnu.org/pnetlib-doc/System/Double.html

When I only click on help about double and later about data type range
I got simply that range of values of double is
1.7E +/- 308 (15 digits)

Clearly misleading.

Uri
Your compiler doesn't give you a warning when you write d = 0.01, because (1) when you use a feature it's expected that you already know how it works, and (2) it would have to give you an approximation warning (very nearly) every time you use a floating-point literal, which would be annoying to those who actually understand and need to use floating point arithmetic, not to mention that it would hide legitimate warnings.

However, it does appear that your documentation could have been a bit better. I'll have to admit though that I'm quite surprised that you didn't know floating-point values were binary approximations. I even thought it was implied in your response to Bob when I first read it.
I disagree.
The compiler should allow the user to disable some type of warning but
I think that it should warn the user about something that is not correct.

Not everybody studied computer programming at university and know it and at least people who learn C by other ways may not know it.

It is obvious that the computer cannot calculate correctly every division because of finite memory but it is not obvious that the computer cannot calculate correctly numbers like 0.01 that are constants.

It is simply counter intuitive and if I want to have numbers in base 2 and I compose computer language then my first thought is simply not to allow expressions like 0.01 and I can allow instead of it
only numbers in base 2 or base 4 or base 8 or base 16.

Second thought is that maybe I want to allow 0.01 for people who prefer to use base 10 and do not care about the error
but if people want to use base that is not a power of 2 and if they do not disable warning they should get a warning by the compiler.

Note that in case that the compiler does not allow 0.1 but allow 1/10 I could probably guess without more hints that 1/10 is not an exact value for the same reason that I know that 1/3 is not an exact value.

Uri
Compiler warnings are not intended for those who haven't had adequate instruction. They're intended for cases when those with adequate instruction make potential mistakes.

But, if we were to have a compiler warning, perhaps just a warning to not use floating point variables unless you know what you're getting yourself into. Not representing decimal and other fractions exactly is only one of the potential problems. Many of the rules of math don't work as you expect. As others have mentioned, with floating-point arithmetic ((a + b) + c) is not equivalent to (a + (b + c)). I've taken entire courses on the subject, and I prefer to stay away from using floating point values unless it's absolutely necessary.

Regarding writing floating-point constants in other bases so you have exact values, floating-point was not intended for representing exact values. If you want exact values you use integers (which was Bob's original point). And I see no reason why you would ever want to write typical floating-point constants (like Pi, or the speed of light, or the mass of an object) in another base.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Evaluation functions. Why integer?

Post by bob »

Uri Blass wrote:
xsadar wrote:
Uri Blass wrote:
Zach Wegner wrote:
Uri Blass wrote:I agree about point 1 but I disagree about 2.

You can represent every integer as double.
You even can have exactly the same evaluation when you replace integer by double and the only difference is that you are going to be slower.
Doubles can represent integers exactly, but that's not what we're doing. We're trying to represent fixed-point decimals of the form x.yy. You cannot represent 1/100 exactly in a double, as you must round at some point:

Code: Select all

0.0100000000000000002081668171172168513294309377670288085937500000000000000000000000000000000000000000
And though there is a one-to-one correspondence between every centipawn evaluation to its double representation, the rounding will crop up. Notice how 1/100 is rounded so that the double representation is bigger than 1/100? Try running this program:

Code: Select all

#include <stdio.h>
main&#40;void&#41;&#123;
    double d = 0.01, s = 0;
    int i;
    for &#40;i = 0; i < 1000; i++)
        s += d;
    printf&#40;"%.100f\n", s&#41;;
&#125;
You would expect s to be either exactly 10, or maybe a bit bigger? Here's what it actually is:

Code: Select all

9.9999999999998312461002569762058556079864501953125000000000000000000000000000000000000000000000000000
Which is _less_ than 10.

Now if you wanted to use "binary" values, and have your base unit be 1/256 or whatever of a pawn, go ahead. Or you could also use doubles with each number scaled up by 100 or so, but then what's the point?
I get almost the same
I get
9.9999999999998312000000000000000000000000000000000000000000000000000000000000000000000000000000000000

I understand that 0.01 is binary approximation of 0.01 and it seems that the C is misleading.
I think that they simply should not allow me to write d=0.01 without warning by the compiler that d is not 0.01 but binary approximation of that value and double means
the finite set of non-zero values of the form s * m * 2e, where s is 1 or -1, and 0 < m < 253 and -1075 <= e <= 970.
.

I wonder why they do not do it.


I even could not see it based on clicking help on the word double and I had to search in google to find the following link that gives me the exact range of data

http://www.dotgnu.org/pnetlib-doc/System/Double.html

When I only click on help about double and later about data type range
I got simply that range of values of double is
1.7E +/- 308 (15 digits)

Clearly misleading.

Uri
Your compiler doesn't give you a warning when you write d = 0.01, because (1) when you use a feature it's expected that you already know how it works, and (2) it would have to give you an approximation warning (very nearly) every time you use a floating-point literal, which would be annoying to those who actually understand and need to use floating point arithmetic, not to mention that it would hide legitimate warnings.

However, it does appear that your documentation could have been a bit better. I'll have to admit though that I'm quite surprised that you didn't know floating-point values were binary approximations. I even thought it was implied in your response to Bob when I first read it.
I disagree.
The compiler should allow the user to disable some type of warning but
I think that it should warn the user about something that is not correct.

Not everybody studied computer programming at university and know it and at least people who learn C by other ways may not know it.

It is obvious that the computer cannot calculate correctly every division because of finite memory but it is not obvious that the computer cannot calculate correctly numbers like 0.01 that are constants.

It is simply counter intuitive and if I want to have numbers in base 2 and I compose computer language then my first thought is simply not to allow expressions like 0.01 and I can allow instead of it
only numbers in base 2 or base 4 or base 8 or base 16.

Second thought is that maybe I want to allow 0.01 for people who prefer to use base 10 and do not care about the error
but if people want to use base that is not a power of 2 and if they do not disable warning they should get a warning by the compiler.

Note that in case that the compiler does not allow 0.1 but allow 1/10 I could probably guess without more hints that 1/10 is not an exact value for the same reason that I know that 1/3 is not an exact value.

Uri
Unfortunately what you are doing is _not_ incorrect. Every computer science major knows that floating point math is not exact when doing computation so nobody would consider that to be wrong.

As far as the "guessing" that is just a nonsensical way of programming. You have to know how the hardware you are using functions, and use it in a way that it was intended. .1 is not the only issue. there are a ton of others that are just as problematic. And you can't expect the compiler to produce diagnostics in one place and not another. Given that logic every multiply and divide would need to be flagged. Any number that can't be expressed (fractional part) as the sum of the reciprocals of various powers of two is going to be a problem. FP number fractions are simply binary values, where the bit to the right of the decimal represents 1/2, the next bit is 1/4, the next bit 1/8. Any number that can't be represented by some combination of those fractions is a problem. There are too many to flag, and they can be produced by computation as well as by using constants.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Evaluation functions. Why integer?

Post by bob »

One of the classic mistakes students make when using C/C++ is using a float to control a loop. And they think that this loop:

for (f=0; f< 1.0; f+=.000001) will execute exactly 1 million times. It won't. and that leads to interesting problems later on in the program
User avatar
xsadar
Posts: 147
Joined: Wed Jun 06, 2007 10:01 am
Location: United States
Full name: Mike Leany

Re: Evaluation functions. Why integer?

Post by xsadar »

bob wrote:One of the classic mistakes students make when using C/C++ is using a float to control a loop. And they think that this loop:

for (f=0; f< 1.0; f+=.000001) will execute exactly 1 million times. It won't. and that leads to interesting problems later on in the program
Ever seen any students accidentally do something like this?

Code: Select all

#include <iostream>
using namespace std;

int main&#40;)
&#123;
  float f;
  int i = 0;

  for &#40;f = 0.0; f < 3.0; f += 0.0000001f&#41;
    i++;

  cout << "Iterations&#58; " << i << endl;
&#125;
Not only does it not output 30000000 (30 million) as they might expect, but it never even completes the loop. f maxes out at 2.0 (which it reaches in 17845299 iterations).
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Evaluation functions. Why integer?

Post by bob »

xsadar wrote:
bob wrote:One of the classic mistakes students make when using C/C++ is using a float to control a loop. And they think that this loop:

for (f=0; f< 1.0; f+=.000001) will execute exactly 1 million times. It won't. and that leads to interesting problems later on in the program
Ever seen any students accidentally do something like this?

Code: Select all

#include <iostream>
using namespace std;

int main&#40;)
&#123;
  float f;
  int i = 0;

  for &#40;f = 0.0; f < 3.0; f += 0.0000001f&#41;
    i++;

  cout << "Iterations&#58; " << i << endl;
&#125;
Not only does it not output 30000000 (30 million) as they might expect, but it never even completes the loop. f maxes out at 2.0 (which it reaches in 17845299 iterations).
Also probably depends on the platform. For example, the older IBM RS6000 machines do everything in double precision, and when optimized to not store F during each iteration, that would work OK.
Sven
Posts: 4052
Joined: Thu May 15, 2008 9:57 pm
Location: Berlin, Germany
Full name: Sven Schüle

Re: Evaluation functions. Why integer?

Post by Sven »

xsadar wrote:
bob wrote:One of the classic mistakes students make when using C/C++ is using a float to control a loop. And they think that this loop:

for (f=0; f< 1.0; f+=.000001) will execute exactly 1 million times. It won't. and that leads to interesting problems later on in the program
Ever seen any students accidentally do something like this?

Code: Select all

#include <iostream>
using namespace std;

int main&#40;)
&#123;
  float f;
  int i = 0;

  for &#40;f = 0.0; f < 3.0; f += 0.0000001f&#41;
    i++;

  cout << "Iterations&#58; " << i << endl;
&#125;
Not only does it not output 30000000 (30 million) as they might expect, but it never even completes the loop. f maxes out at 2.0 (which it reaches in 17845299 iterations).
I know two very common platforms where the loop completes and the output is 30 million. Therefore your example might be not an optimal one, although I understand what you planned to show.

After putting the code above into "t.cpp" and adding a final "return 0;" statement, I get:

1) with g++ (an older 3.4.6 version):

Code: Select all

% /usr/bin/g++ -O3 -o t t.cpp
% time ./t
Iterations&#58; 30000000
0.061u 0.000s 0&#58;00.06 100.0%    0+0k 0+0io 0pf+0w

% g++ --version
g++ &#40;GCC&#41; 3.4.6 20060404 &#40;Red Hat 3.4.6-8&#41;
Copyright &#40;C&#41; 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% uname -s -r -v -m -p -i -o
Linux 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23&#58;27&#58;17 EDT 2006 i686 i686 i386 GNU/Linux
2) with MSVC++ 6.0 (SP 5):

Code: Select all

c&#58;\temp>cl -GX /O2 /Fet.exe t.cpp
Microsoft &#40;R&#41; 32-bit C/C++ Optimizing Compiler Version 12.00.8804 for 80x86
Copyright &#40;C&#41; Microsoft Corp 1984-1998. All rights reserved.

t.cpp
Microsoft &#40;R&#41; Incremental Linker Version 6.00.8447
Copyright &#40;C&#41; Microsoft Corp 1992-1998. All rights reserved.

/out&#58;t.exe
t.obj

c&#58;\temp>t.exe
Iterations&#58; 30000000
The optimization options are just to get the result much faster, without optimization I see identical results.

It may be platform dependent, so I propose to be careful with statements that might be understood as if it were not platform dependent.

Sven
User avatar
xsadar
Posts: 147
Joined: Wed Jun 06, 2007 10:01 am
Location: United States
Full name: Mike Leany

Re: Evaluation functions. Why integer?

Post by xsadar »

Sven Schüle wrote:
xsadar wrote:
bob wrote:One of the classic mistakes students make when using C/C++ is using a float to control a loop. And they think that this loop:

for (f=0; f< 1.0; f+=.000001) will execute exactly 1 million times. It won't. and that leads to interesting problems later on in the program
Ever seen any students accidentally do something like this?

Code: Select all

#include <iostream>
using namespace std;

int main&#40;)
&#123;
  float f;
  int i = 0;

  for &#40;f = 0.0; f < 3.0; f += 0.0000001f&#41;
    i++;

  cout << "Iterations&#58; " << i << endl;
&#125;
Not only does it not output 30000000 (30 million) as they might expect, but it never even completes the loop. f maxes out at 2.0 (which it reaches in 17845299 iterations).
I know two very common platforms where the loop completes and the output is 30 million. Therefore your example might be not an optimal one, although I understand what you planned to show.

After putting the code above into "t.cpp" and adding a final "return 0;" statement, I get:

1) with g++ (an older 3.4.6 version):

Code: Select all

% /usr/bin/g++ -O3 -o t t.cpp
% time ./t
Iterations&#58; 30000000
0.061u 0.000s 0&#58;00.06 100.0%    0+0k 0+0io 0pf+0w

% g++ --version
g++ &#40;GCC&#41; 3.4.6 20060404 &#40;Red Hat 3.4.6-8&#41;
Copyright &#40;C&#41; 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

% uname -s -r -v -m -p -i -o
Linux 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23&#58;27&#58;17 EDT 2006 i686 i686 i386 GNU/Linux
2) with MSVC++ 6.0 (SP 5):

Code: Select all

c&#58;\temp>cl -GX /O2 /Fet.exe t.cpp
Microsoft &#40;R&#41; 32-bit C/C++ Optimizing Compiler Version 12.00.8804 for 80x86
Copyright &#40;C&#41; Microsoft Corp 1984-1998. All rights reserved.

t.cpp
Microsoft &#40;R&#41; Incremental Linker Version 6.00.8447
Copyright &#40;C&#41; Microsoft Corp 1992-1998. All rights reserved.

/out&#58;t.exe
t.obj

c&#58;\temp>t.exe
Iterations&#58; 30000000
The optimization options are just to get the result much faster, without optimization I see identical results.

It may be platform dependent, so I propose to be careful with statements that might be understood as if it were not platform dependent.

Sven
Sorry, yes, the results can depend on the platform, the compiler and optimizations settings. Mine was run on x86 using 32-bit MSVC++ 2005. I get identical results with or without optimizations. I'm kind of surprised that our VC++ results don't match. It looks like mine is storing then loading f from memory each iteration (even with optimization turned on) whereas yours probably is not (even with optimizations turned off).

As a side note, I'm kind of disappointed that the compiler let me omit the return statement without even a warning. I'll have to find the option to turn those warnings on before I port my chess engine to Windows.
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Evaluation functions. Why integer?

Post by bob »

without looking carefully, I believe the intent was to "overflow" the calculation, so that the sum is enough bigger than the small fraction being added to it that the sum becomes unchanged once it reaches the "key value". Only way I can see that will not fail is if somehow the platform you are using always uses 64 bit floats, rather than 32 bit floats...

Can you do a "-S" option compile and see what the compiler is doing? Is it somehow using a double when it should be using a float? Double is better for obvious reasons, but then it is easy enough to break that as well...

The issue is that big + small = big when you can only keep the N most significant digits and big is big enough that small becomes zero when you align exponents and add.