Alpha-Beta bug?

Robert Pope · Post by **Robert Pope** » Tue Jun 30, 2015 6:41 pm

As I was preparing to implement hash tables in Abbess, I ran across a bit a code that looked like a no-brainer fix, but it had a terrible impact on performance. After I loop through the pseudo-legal moves, if none of them were legal, then I want to return a checkmate or stalemate score. Since I use fail-hard alpha-beta, I thought the below code was wrong to return scores that were potentially above beta:

Code: Select all

//------- E&#58; IF NO MOVES WERE LEGAL, IT'S EITHER STALEMATE OR CHECKMATE --------
if&#40;stale&#41; &#123;
		// No move has been found legal
		// if we are not in check, this is stalemate
		if&#40;inCheck==0&#41; &#123;
			space->ply--;
			return Max&#40;alpha,0&#41;;
		&#125;
		else &#123;
			// we are in a checkmate position
			val=-MATE_SCORE+space->ply;
			space->ply--;
			if&#40;-MATE_SCORE+space->ply>alpha&#41;
				return -MATE_SCORE+space->ply;
			else
				return alpha;
		&#125;
	&#125;

So, I changed the returns to the following:

Code: Select all

//------- E&#58; IF NO MOVES WERE LEGAL, IT'S EITHER STALEMATE OR CHECKMATE --------
if&#40;stale&#41; &#123;
		// No move has been found legal
		// if we are not in check, this is stalemate
		if&#40;inCheck==0&#41; &#123;
			space->ply--;
			if&#40;0>=beta&#41;
				return beta;
			else if&#40;0>alpha&#41;
				return 0;
			else
				return alpha;
		&#125;
		else &#123;
			// we are in a checkmate position
			val=-MATE_SCORE+space->ply;
			space->ply--;
			if&#40;val>=beta&#41;
				return beta;
			else if&#40;val>alpha&#41;
				return val;
			else
				return alpha;
		&#125;
	&#125;

But in self-play, this performs significantly worse. Am I missing something? I don’t understand why this "fix" would be so detrimental.

Code: Select all

Version                            ELO  +/- games score draw
A&#41; Original Code                    18   28   262  53%  16%
B&#41; Updated Mate score return         5   26   300  51%  16%
C&#41; Two-month old Code                4   29   262  51%  14%
D&#41; Updated Mate&Stalemate return   -28   26   300  45%  16%

elcabesa · Post by **elcabesa** » Tue Jun 30, 2015 8:06 pm

I think that this tournament is a little too short to asses a so small elo difference.

could you try doing a tournament between A) & d) with more than 1000 games. 10000 should be a good number of games

how long is each game? igf the game is fast enough you can do more than 1000 games in a day. you can achieve more than 1000 games a days with games shorther than a minute

elpapa · Post by **elpapa** » Tue Jun 30, 2015 9:05 pm

I'm not sure I follow, but how can you tell you found a mate if you return a bound?

And since a stalemate is a draw, why not just return 0 no questions asked?

Try this and see how it fares:

Code: Select all

  if&#40;!inCheck&#41;
     return 0;
  else
     return space->ply - MATE_SCORE;

JVMerlino · Post by **JVMerlino** » Tue Jun 30, 2015 9:41 pm

elpapa wrote:I'm not sure I follow, but how can you tell you found a mate if you return a bound?

And since a stalemate is a draw, why not just return 0 no questions asked?

Try this and see how it fares:
Code: Select all
  if&#40;!inCheck&#41;
     return 0;
  else
     return space->ply - MATE_SCORE;

He said he uses fail-hard, so he always returns a score within the [alpha, beta] bounds.

Robert Pope · Post by **Robert Pope** » Tue Jun 30, 2015 10:28 pm

elcabesa wrote:I think that this tournament is a little too short to asses a so small elo difference.

could you try doing a tournament between A) & d) with more than 1000 games. 10000 should be a good number of games

how long is each game? igf the game is fast enough you can do more than 1000 games in a day. you can achieve more than 1000 games a days with games shorther than a minute

I play 50/1' for most of my testing, and only test overnight, so this is about the limit of what I can do in a day. I haven't been happy with cutechess, so that's about as fast as I can go.

With a 46 elo difference with and without the change and only 28 elo bars, I think it's pretty clear with even this few games.

I will try running the two of them through cutechess tonight at a faster TC, though.

Guenther · Post by **Guenther** » Tue Jun 30, 2015 11:52 pm

Robert Pope wrote:
elcabesa wrote:I think that this tournament is a little too short to asses a so small elo difference.

could you try doing a tournament between A) & d) with more than 1000 games. 10000 should be a good number of games :)

how long is each game? igf the game is fast enough you can do more than 1000 games in a day. you can achieve more than 1000 games a days with games shorther than a minute
I play 50/1' for most of my testing, and only test overnight, so this is about the limit of what I can do in a day. I haven't been happy with cutechess, so that's about as fast as I can go.

With a 46 elo difference with and without the change and only 28 elo bars, I think it's pretty clear with even this few games.

I will try running the two of them through cutechess tonight at a faster TC, though.

If you are running on Windows you should use an older version of CuteChess the latest one is unstable there.

Guenther

nionita · Post by **nionita** » Wed Jul 01, 2015 12:12 am

Robert Pope wrote:With a 46 elo difference with and without the change and only 28 elo bars, I think it's pretty clear with even this few games

If I understand correctly the elo story, then you have 95% probability that the real elo is within the 2 limits. So if the interval intersection is not empty, you can't say anything about the relative strenght of the 2 engine versions.

elpapa · Post by **elpapa** » Wed Jul 01, 2015 12:50 am

JVMerlino wrote:He said he uses fail-hard, so he always returns a score within the [alpha, beta] bounds.

Yes, I missed that, I always use fail-soft.

Still, if you're at a leaf node and return a score outside the alpha-beta bounds, wouldn't that cause a cutoff in the parent node, and you're back to fail-hard anyway? It seems to me that if you fail hard at any point the result would be the same as if you failed hard everywhere.

If that reasoning is correct, what's the point of all those tests to keep the score within the bounds?

I'm probably not thinking straight, that Famous Grouse went down way too easily.

zd3nik · Post by **zd3nik** » Wed Jul 01, 2015 7:04 am

Without knowing more specifics I'd guess you get more cutoffs farther up the tree when you return the fail soft scores in branches containing checkmates and stalemates. And more cutoffs possibly means fewer re-searches and more time spent examining other less futile lines. But that depends on your implementation. Are you using PVS, MTD(f), or something else? If PVS do you use the return value to decide whether to do a re-search on an a fail high? Or do you always re-search fail highs no matter what the return value was? Etc...

There are many factors that come into play here. Doing fail-hard vs fail-soft isn't just about what values you return from a search, but also how you use the values returned from a search.

Also, is this still pre-transposition table? Or did this test run use transposition table?

Robert Pope · Post by **Robert Pope** » Thu Jul 02, 2015 4:51 am

So, maybe it wasn't so clear after all. 1000 games at a faster time control gave +8 and -8 elo, with 10 elo error bars. So, within the margin, but this test again had a higher score to the original code.

Alpha-Beta bug?

Alpha-Beta bug?

Re: Alpha-Beta bug?

Re: Alpha-Beta bug?

Re: Alpha-Beta bug?

Re: Alpha-Beta bug?

Re: Alpha-Beta bug?

Re: Alpha-Beta bug?

Re: Alpha-Beta bug?

Re: Alpha-Beta bug?

Re: Alpha-Beta bug?