An undetected bug for 10 years

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

OliverBr
Posts: 725
Joined: Tue Dec 18, 2007 9:38 pm
Location: Munich, Germany
Full name: Dr. Oliver Brausch

An undetected bug for 10 years

Post by OliverBr »

When starting to develop OliThink 5 in year 2008 I started with a bug. The following function getDir() is supposed to get the direction between two fields "f" and "t" (from/to):

Code: Select all

char getDir(int f, int t) {
	if (!((f ^ t) & 56)) return 8;
	if (!((f ^ t) & 7)) return 16;
	return ((f - t) % 7) ? 32 : 64;
}
Actually this is buggy. The bug lasted for 10 years, nobody detected it. Probably because it's only a wrong when t = 0, f = 63 and vice versa, so for the movement from one black corner to the other.
Francisco Modesto, the author of the Java engine chess (https://github.com/fmodesto/chessy/tree ... ech/chessy) wrote me an email in the year 2018 (!) about the bug. The correct implementation is:

Code: Select all

char getDir(int f, int t) {
	if (!((f ^ t) & 56)) return 8;
	if (!((f ^ t) & 7)) return 16;
	return (!((f - t) % 9)) ? 32 : 64;
}
Thanks again very much to F. Modesto!

I fixed it in 5.3.3, six years after the release of 5.3.2.

Meanwhile, I noted that this bug not only has impact on strength, it crashes the engine in one game of about 200...300 games. Every version before 5.3.3 had this issue. This is why a created a branch "stable" in

https://github.com/olithink/OliThink/tree/stable

where every version before 5.3.3 has the fix. Thus 5.3.2 in "stable" is equal to "5.3.3" in master.

Every version since 5.0.0 has been stabilised and can play now thousands of games without any crash.
Chess Engine OliThink: http://brausch.org/home/chess
OliThink GitHub:https://github.com/olithink
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: An undetected bug for 10 years

Post by hgm »

This is quite slow code, right? Why not just return dir[t-f+63];, with a suitably filled char table dir[127];?
User avatar
flok
Posts: 481
Joined: Tue Jul 03, 2018 10:19 am
Full name: Folkert van Heusden

Re: An undetected bug for 10 years

Post by flok »

hgm wrote: Tue Aug 18, 2020 7:05 pm This is quite slow code, right? Why not just return dir[t-f+63];, with a suitably filled char table dir[127];?
Maybe cache pressure?
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: An undetected bug for 10 years

Post by Gerd Isenberg »

hgm wrote: Tue Aug 18, 2020 7:05 pm This is quite slow code, right? Why not just return dir[t-f+63];, with a suitably filled char table dir[127];?
or dir[t^f] with a suitably filled char table dir[64];?
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: An undetected bug for 10 years

Post by hgm »

That cannot work. With XOR you cannot see if the coordinate went up or down, as a^b = b^a. In particular 044 to 055 (diagonal) would give the same as 045 to 054 (anti-diagonal), namely 011.

But 127 bytes should not give you that much cache pressure, compared to code with a lot of difficult-to-predict branches. Modulo is usually also quite expensive. Especially if the optimizer cannot know the value to divide is always small. But even optimized it would require multiplication with the inverse times some factor, like 1024/7, followed by >>10 to get the rounded-down quotient, which you then have to multiply back and subtract from the original to get the modulo. If the value can be any 32-bit integer, you would haveto do that in 64-bit precision.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: An undetected bug for 10 years

Post by Gerd Isenberg »

hgm wrote: Tue Aug 18, 2020 10:32 pm That cannot work. With XOR you cannot see if the coordinate went up or down, as a^b = b^a. In particular 044 to 055 (diagonal) would give the same as 045 to 054 (anti-diagonal), namely 011.
Of course! :oops:
But 127 bytes should not give you that much cache pressure, compared to code with a lot of difficult-to-predict branches. Modulo is usually also quite expensive. Especially if the optimizer cannot know the value to divide is always small. But even optimized it would require multiplication with the inverse times some factor, like 1024/7, followed by >>10 to get the rounded-down quotient, which you then have to multiply back and subtract from the original to get the modulo. If the value can be any 32-bit integer, you would haveto do that in 64-bit precision.
Yes.
Dann Corbit
Posts: 12538
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: An undetected bug for 10 years

Post by Dann Corbit »

Since there are only 4 distinct answers, the response is really two bits.
So you could use 32 bytes (256 bits) in a pre-computed bit table, if there was a worry about cache pressure.
An algorithm to pick which const unsigned long long and then shift and with a mask should be pretty fast.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
User avatar
lucasart
Posts: 3232
Joined: Mon May 31, 2010 1:29 pm
Full name: lucasart

Re: An undetected bug for 10 years

Post by lucasart »

OliverBr wrote: Tue Aug 18, 2020 5:45 pm When starting to develop OliThink 5 in year 2008 I started with a bug. The following function getDir() is supposed to get the direction between two fields "f" and "t" (from/to):

Code: Select all

char getDir(int f, int t) {
	if (!((f ^ t) & 56)) return 8;
	if (!((f ^ t) & 7)) return 16;
	return ((f - t) % 7) ? 32 : 64;
}
Actually this is buggy. The bug lasted for 10 years, nobody detected it. Probably because it's only a wrong when t = 0, f = 63 and vice versa, so for the movement from one black corner to the other.
Francisco Modesto, the author of the Java engine chess (https://github.com/fmodesto/chessy/tree ... ech/chessy) wrote me an email in the year 2018 (!) about the bug. The correct implementation is:

Code: Select all

char getDir(int f, int t) {
	if (!((f ^ t) & 56)) return 8;
	if (!((f ^ t) & 7)) return 16;
	return (!((f - t) % 9)) ? 32 : 64;
}
Thanks again very much to F. Modesto!

I fixed it in 5.3.3, six years after the release of 5.3.2.

Meanwhile, I noted that this bug not only has impact on strength, it crashes the engine in one game of about 200...300 games. Every version before 5.3.3 had this issue. This is why a created a branch "stable" in

https://github.com/olithink/OliThink/tree/stable

where every version before 5.3.3 has the fix. Thus 5.3.2 in "stable" is equal to "5.3.3" in master.

Every version since 5.0.0 has been stabilised and can play now thousands of games without any crash.
Code like this lights all the red flags upon code review. It is way too clever to be trusted. If it's not obvious, it can't be trusted, and must be tested (leaving a unit test to allow later modifications of the codebase).
Theory and practice sometimes clash. And when that happens, theory loses. Every single time.
User avatar
hgm
Posts: 27788
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: An undetected bug for 10 years

Post by hgm »

Gerd Isenberg wrote: Tue Aug 18, 2020 11:15 pm
hgm wrote: Tue Aug 18, 2020 10:32 pm That cannot work. With XOR you cannot see if the coordinate went up or down, as a^b = b^a. In particular 044 to 055 (diagonal) would give the same as 045 to 054 (anti-diagonal), namely 011.
Of course! :oops:
Sadly my proposal doesn't work either: +7 can be 7 steps right along a rank, or diagonally left-down. That reminds me why I always use 0x88 square numbering.
Gerd Isenberg
Posts: 2250
Joined: Wed Mar 08, 2006 8:47 pm
Location: Hattingen, Germany

Re: An undetected bug for 10 years

Post by Gerd Isenberg »

hgm wrote: Wed Aug 19, 2020 7:07 pm
Gerd Isenberg wrote: Tue Aug 18, 2020 11:15 pm
hgm wrote: Tue Aug 18, 2020 10:32 pm That cannot work. With XOR you cannot see if the coordinate went up or down, as a^b = b^a. In particular 044 to 055 (diagonal) would give the same as 045 to 054 (anti-diagonal), namely 011.
Of course! :oops:
Sadly my proposal doesn't work either: +7 can be 7 steps right along a rank, or diagonally left-down. That reminds me why I always use 0x88 square numbering.
Hehe, I see. 0x88 difference has also the advantage to get other directions as well.

Code: Select all

int arrDirectionBy0x88Diff[240];
int direction(int sq1, int  sq2) {
   return arrDirectionBy0x88Diff [120 + (sq2|7) - (sq1|7) + sq2 - sq1];
}
The compiler explorer with x86-64 gcc 10.2 -O3

Code: Select all

direction(int, int):
        mov     eax, esi
        mov     edx, edi
        or      eax, 7
        or      edx, 7
        add     eax, 120
        sub     eax, edx
        add     eax, esi
        sub     eax, edi
        cdqe
        mov     eax, DWORD PTR arrDirectionBy0x88Diff[0+rax*4]
        ret
which would be likely faster than some mispredicted branches ...