Discussion of chess software programming and technical issues.
Moderators: hgm, Harvey Williamson, bob
Forum rules
This textbox is used to restore diagrams posted with the [d] tag before the upgrade.
-
Robert Pope
- Posts: 392
- Joined: Sat Mar 25, 2006 7:27 pm
Post
by Robert Pope » Tue Jun 30, 2015 4:41 pm
As I was preparing to implement hash tables in Abbess, I ran across a bit a code that looked like a no-brainer fix, but it had a terrible impact on performance. After I loop through the pseudo-legal moves, if none of them were legal, then I want to return a checkmate or stalemate score. Since I use fail-hard alpha-beta, I thought the below code was wrong to return scores that were potentially above beta:
Code: Select all
//------- E: IF NO MOVES WERE LEGAL, IT'S EITHER STALEMATE OR CHECKMATE --------
if(stale) {
// No move has been found legal
// if we are not in check, this is stalemate
if(inCheck==0) {
space->ply--;
return Max(alpha,0);
}
else {
// we are in a checkmate position
val=-MATE_SCORE+space->ply;
space->ply--;
if(-MATE_SCORE+space->ply>alpha)
return -MATE_SCORE+space->ply;
else
return alpha;
}
}
So, I changed the returns to the following:
Code: Select all
//------- E: IF NO MOVES WERE LEGAL, IT'S EITHER STALEMATE OR CHECKMATE --------
if(stale) {
// No move has been found legal
// if we are not in check, this is stalemate
if(inCheck==0) {
space->ply--;
if(0>=beta)
return beta;
else if(0>alpha)
return 0;
else
return alpha;
}
else {
// we are in a checkmate position
val=-MATE_SCORE+space->ply;
space->ply--;
if(val>=beta)
return beta;
else if(val>alpha)
return val;
else
return alpha;
}
}
But in self-play, this performs significantly worse. Am I missing something? I don’t understand why this "fix" would be so detrimental.
Code: Select all
Version ELO +/- games score draw
A) Original Code 18 28 262 53% 16%
B) Updated Mate score return 5 26 300 51% 16%
C) Two-month old Code 4 29 262 51% 14%
D) Updated Mate&Stalemate return -28 26 300 45% 16%
-
elcabesa
- Posts: 677
- Joined: Sun May 23, 2010 11:32 am
-
Contact:
Post
by elcabesa » Tue Jun 30, 2015 6:06 pm
I think that this tournament is a little too short to asses a so small elo difference.
could you try doing a tournament between A) & d) with more than 1000 games. 10000 should be a good number of games
how long is each game? igf the game is fast enough you can do more than 1000 games in a day. you can achieve more than 1000 games a days with games shorther than a minute
-
elpapa
- Posts: 184
- Joined: Sun Jan 18, 2009 10:27 pm
- Location: Sweden
Post
by elpapa » Tue Jun 30, 2015 7:05 pm
I'm not sure I follow, but how can you tell you found a mate if you return a bound?
And since a stalemate is a draw, why not just return 0 no questions asked?
Try this and see how it fares:
Code: Select all
if(!inCheck)
return 0;
else
return space->ply - MATE_SCORE;
-
JVMerlino
- Posts: 932
- Joined: Wed Mar 08, 2006 9:15 pm
- Location: San Francisco, California
Post
by JVMerlino » Tue Jun 30, 2015 7:41 pm
elpapa wrote:I'm not sure I follow, but how can you tell you found a mate if you return a bound?
And since a stalemate is a draw, why not just return 0 no questions asked?
Try this and see how it fares:
Code: Select all
if(!inCheck)
return 0;
else
return space->ply - MATE_SCORE;
He said he uses fail-hard, so he always returns a score within the [alpha, beta] bounds.
-
Robert Pope
- Posts: 392
- Joined: Sat Mar 25, 2006 7:27 pm
Post
by Robert Pope » Tue Jun 30, 2015 8:28 pm
elcabesa wrote:I think that this tournament is a little too short to asses a so small elo difference.
could you try doing a tournament between A) & d) with more than 1000 games. 10000 should be a good number of games
how long is each game? igf the game is fast enough you can do more than 1000 games in a day. you can achieve more than 1000 games a days with games shorther than a minute
I play 50/1' for most of my testing, and only test overnight, so this is about the limit of what I can do in a day. I haven't been happy with cutechess, so that's about as fast as I can go.
With a 46 elo difference with and without the change and only 28 elo bars, I think it's pretty clear with even this few games.
I will try running the two of them through cutechess tonight at a faster TC, though.
-
Guenther
- Posts: 2259
- Joined: Wed Oct 01, 2008 4:33 am
- Location: Regensburg, Germany
-
Contact:
Post
by Guenther » Tue Jun 30, 2015 9:52 pm
Robert Pope wrote:elcabesa wrote:I think that this tournament is a little too short to asses a so small elo difference.
could you try doing a tournament between A) & d) with more than 1000 games. 10000 should be a good number of games :)
how long is each game? igf the game is fast enough you can do more than 1000 games in a day. you can achieve more than 1000 games a days with games shorther than a minute
I play 50/1' for most of my testing, and only test overnight, so this is about the limit of what I can do in a day. I haven't been happy with cutechess, so that's about as fast as I can go.
With a 46 elo difference with and without the change and only 28 elo bars, I think it's pretty clear with even this few games.
I will try running the two of them through cutechess tonight at a faster TC, though.
If you are running on Windows you should use an older version of CuteChess the latest one is unstable there.
Guenther
-
nionita
- Posts: 153
- Joined: Fri Oct 22, 2010 7:47 pm
- Location: Austria
Post
by nionita » Tue Jun 30, 2015 10:12 pm
Robert Pope wrote:With a 46 elo difference with and without the change and only 28 elo bars, I think it's pretty clear with even this few games
If I understand correctly the elo story, then you have 95% probability that the real elo is within the 2 limits. So if the interval intersection is not empty, you can't say anything about the relative strenght of the 2 engine versions.
-
elpapa
- Posts: 184
- Joined: Sun Jan 18, 2009 10:27 pm
- Location: Sweden
Post
by elpapa » Tue Jun 30, 2015 10:50 pm
JVMerlino wrote:He said he uses fail-hard, so he always returns a score within the [alpha, beta] bounds.
Yes, I missed that, I always use fail-soft.
Still, if you're at a leaf node and return a score outside the alpha-beta bounds, wouldn't that cause a cutoff in the parent node, and you're back to fail-hard anyway? It seems to me that if you fail hard at any point the result would be the same as if you failed hard everywhere.
If that reasoning is correct, what's the point of all those tests to keep the score within the bounds?
I'm probably not thinking straight, that Famous Grouse went down way too easily.
-
zd3nik
- Posts: 193
- Joined: Wed Mar 11, 2015 2:34 am
- Location: United States
-
Contact:
Post
by zd3nik » Wed Jul 01, 2015 5:04 am
Without knowing more specifics I'd guess you get more cutoffs farther up the tree when you return the fail soft scores in branches containing checkmates and stalemates. And more cutoffs possibly means fewer re-searches and more time spent examining other less futile lines. But that depends on your implementation. Are you using PVS, MTD(f), or something else? If PVS do you use the return value to decide whether to do a re-search on an a fail high? Or do you always re-search fail highs no matter what the return value was? Etc...
There are many factors that come into play here. Doing fail-hard vs fail-soft isn't just about what values you return from a search, but also how you use the values returned from a search.
Also, is this still pre-transposition table? Or did this test run use transposition table?
-
Robert Pope
- Posts: 392
- Joined: Sat Mar 25, 2006 7:27 pm
Post
by Robert Pope » Thu Jul 02, 2015 2:51 am
So, maybe it wasn't so clear after all. 1000 games at a faster time control gave +8 and -8 elo, with 10 elo error bars. So, within the margin, but this test again had a higher score to the original code.