Tuning again

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27929
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Tuning again

Post by hgm »

Rebel wrote:
michiguel wrote:I limit my search by nodes and I have been happy thereafter. I enthusiastically recommend it.
Excellent to test search issues, eval IMO by fixed depth. Knowledge vs Knowledge not influenced by search randomness.
The problem is that you might need other knowledge at unrealistically low depth than otherwise. E.g. Pawn-push bonuses tend to get highly exaggerated when you don't allow promotions to be within the horizon.
BubbaTough
Posts: 1154
Joined: Fri Jun 23, 2006 5:18 am

Re: Tuning again

Post by BubbaTough »

Rebel wrote:
michiguel wrote:I limit my search by nodes and I have been happy thereafter. I enthusiastically recommend it.
Excellent to test search issues, eval IMO by fixed depth. Knowledge vs Knowledge not influenced by search randomness.
If you are looking for elo, Fixed depth search is MUCH worse than node based in my experience. Eval has a huge effect on the search tree. HUGE. This is missed in depth based cutoffs but not in timed or node based. When I switched from depth to node based optimization, my evaluation optimization results markedly improved.

-Sam
User avatar
Rebel
Posts: 7038
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Tuning again

Post by Rebel »

hgm wrote:
Rebel wrote:1. introduce a special parameter for internal testing;
2. when the flag is on increase the depth with 1 when queens are exchanged;
3. depth+2 entering the endgame
4. depth+5 entering the simple endgame
That was how Usurpator II did it in the eighties! :shock: I had not learned the blessings of iterative deepening yet, in those days. :lol:
The best generation ever :wink:
Steve B
Posts: 3697
Joined: Tue Jul 31, 2007 4:26 pm

Re: Tuning again

Post by Steve B »

Rebel wrote:
hgm wrote:
Rebel wrote:1. introduce a special parameter for internal testing;
2. when the flag is on increase the depth with 1 when queens are exchanged;
3. depth+2 entering the endgame
4. depth+5 entering the simple endgame
That was how Usurpator II did it in the eighties! :shock: I had not learned the blessings of iterative deepening yet, in those days. :lol:
The best generation ever :wink:
Exactly..
It was the era of the dedicated computers

The Mephisto Academy Sends Its Regards
Steve
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: Tuning again

Post by rbarreira »

michiguel wrote:
Rebel wrote:
Rebel wrote: Eval tuning I strictly do at fixed depth. I don't want external factors like time control or permanent brain to interfere. Enough volume will flatten all the horizon effects eventually, both sides.
mcostalba wrote:IMHO the main drawbacks are: impossible to test depth sensible stuff like king safety and artificial same depth for midgame and endgame. But I agree for some evaluation parameters could be good, actually I will give it a try.
For self-play-ply-depth testing I am planning the following:

1. introduce a special parameter for internal testing;
2. when the flag is on increase the depth with 1 when queens are exchanged;
3. depth+2 entering the endgame
4. depth+5 entering the simple endgame

Or something like that.
I limit my search by nodes and I have been happy thereafter. I enthusiastically recommend it.

Miguel
I suppose that's good for parameter tuning but not for bigger eval changes (due to it not accounting for eval speed).
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: Tuning again

Post by bob »

rbarreira wrote:
michiguel wrote:
Rebel wrote:
Rebel wrote: Eval tuning I strictly do at fixed depth. I don't want external factors like time control or permanent brain to interfere. Enough volume will flatten all the horizon effects eventually, both sides.
mcostalba wrote:IMHO the main drawbacks are: impossible to test depth sensible stuff like king safety and artificial same depth for midgame and endgame. But I agree for some evaluation parameters could be good, actually I will give it a try.
For self-play-ply-depth testing I am planning the following:

1. introduce a special parameter for internal testing;
2. when the flag is on increase the depth with 1 when queens are exchanged;
3. depth+2 entering the endgame
4. depth+5 entering the simple endgame

Or something like that.
I limit my search by nodes and I have been happy thereafter. I enthusiastically recommend it.

Miguel
I suppose that's good for parameter tuning but not for bigger eval changes (due to it not accounting for eval speed).
The other issue is one I have pointed out repeatedly. If your program speeds up (or slows down) in nps in a certain phase of the game, you would normally search deeper (if it speeds up, for example). But if you slow down, say in a complicated attacking position, a fixed node count makes it appear you do not. And as a result, you can tune your program to try to reach those kinds of positions, where you do better because you are not getting penalized by slowing down when doing a fixed node test. That NPS variation adds a new variable that is not obvious, and tuning against that is not always a good idea. I've tried both fixed depth, and fixed nodes. Each has places where they work reasonably. But NOTHING replaces using time, overall, because that is how you actually have to play the game, and tuning like you play is much safer overall...
rbarreira
Posts: 900
Joined: Tue Apr 27, 2010 3:48 pm

Re: Tuning again

Post by rbarreira »

bob wrote:
rbarreira wrote:
michiguel wrote:
Rebel wrote:
Rebel wrote: Eval tuning I strictly do at fixed depth. I don't want external factors like time control or permanent brain to interfere. Enough volume will flatten all the horizon effects eventually, both sides.
mcostalba wrote:IMHO the main drawbacks are: impossible to test depth sensible stuff like king safety and artificial same depth for midgame and endgame. But I agree for some evaluation parameters could be good, actually I will give it a try.
For self-play-ply-depth testing I am planning the following:

1. introduce a special parameter for internal testing;
2. when the flag is on increase the depth with 1 when queens are exchanged;
3. depth+2 entering the endgame
4. depth+5 entering the simple endgame

Or something like that.
I limit my search by nodes and I have been happy thereafter. I enthusiastically recommend it.

Miguel
I suppose that's good for parameter tuning but not for bigger eval changes (due to it not accounting for eval speed).
The other issue is one I have pointed out repeatedly. If your program speeds up (or slows down) in nps in a certain phase of the game, you would normally search deeper (if it speeds up, for example). But if you slow down, say in a complicated attacking position, a fixed node count makes it appear you do not. And as a result, you can tune your program to try to reach those kinds of positions, where you do better because you are not getting penalized by slowing down when doing a fixed node test. That NPS variation adds a new variable that is not obvious, and tuning against that is not always a good idea. I've tried both fixed depth, and fixed nodes. Each has places where they work reasonably. But NOTHING replaces using time, overall, because that is how you actually have to play the game, and tuning like you play is much safer overall...
You are definitely right that in an ideal situation, testing with time is the best approach. But I can see why people would test with fixed nodes, especially in situations where the computer running the test is doing other things which might affect program speed, favoring one or another program under test (along with other advantages others mentioned like being able to merge together results from different hardware).
User avatar
hgm
Posts: 27929
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Tuning again

Post by hgm »

You can of course simply count nodes in such a way that your program does not speed up. How you count nodes is pretty much a matter of taste anyway. Some people count MakeMoves, other count MoveGens, still other count Evals. Would you count nodes that are hash pruned or not, etc.
Sergei S. Markoff
Posts: 227
Joined: Mon Sep 12, 2011 11:27 pm
Location: Moscow, Russia

Re: Tuning again

Post by Sergei S. Markoff »

Hi!

Let me to insert one more coin in this theme)

I think, we should use such a principles in tuning:

1. Use nodes limit per move instead of time limit (more stability)
2. Use fixed set of starting positions (I suppose huge, but strictly predefined) (more stability)

Let's assume that we have several "species" with different current ELO value and number of games played.
At every next step we should find a pair of "species" with the highest top bound of confidence interval of ELO. And play a game between them.
Also there should be some "borning mechanism" which should
1. Randomly select two species; the probability of choose should depends on ELO (higher ELO — higher probability)
2. Produce the "child" using some random recombination technique and random "mutation".
And also, of course, we should have some "garbage collector" that will remove species with lowest top confidence interval bound.

This framework should be distributed, of course :)
I think it's really a good idea to create such a common framework which will provide tuning ability for every chess programmer. And of course it will be really interesting to watch the results :)
The Force Be With You!
diep
Posts: 1822
Joined: Thu Mar 09, 2006 11:54 pm
Location: The Netherlands

Re: Tuning again

Post by diep »

Rebel wrote:
mcostalba wrote:Running games at fixed depth (especially so low like 8 plies) has some drawback,
Eval tuning I strictly do at fixed depth. I don't want external factors like time control or permanent brain to interfere. Enough volume will flatten all the horizon effects eventually, both sides.
running in a GUI like Arena has even more drawbacks, I'd suggest a command line tournament manager like cutechess-cli and run on time.
Downloaded...

I like Arena because it supports nodes-matches. IMO a better way to test search related changes than on time.
BTW your C is very assemblish, lovely stuff, really, no joking: it has a kind of vintage fashion.
All my engines were in assembler. I just can't get used to these brackets.

Code: Select all

{  
   { 
       { 
           {
           }
       }
   }
}
Things like that drives me crazy :wink:
Hi Ed,

In itself running fixed depth matches is not a bad idea. However it tunes a lot better if you get through tactical barrier. That barrier is far above 8 ply.

Just go tune at something like 1 minute entire game and 0.1 second increment.

The idea is that after an engine gets 'better' and more 'well tuned' that you also start to search deeper because of the improvements, which scales up th experiment.

Soon you'll move to 5 minutes a game and so on.

If your engine is capable of running as a winboard engine you'll need to use other tools. Most Gui's simply aren't stable enough to play that many games nor fast enough, as they need to update the graphics and do all sort of central locked i/o.

A core or 30 is no luxury to do stuff like this.

Don't believe by the way that just playing games is the holy grail, they do more than just play games for parameter tuning.

Vincent