Tuning again

hgm · Post by **hgm** » Wed Nov 02, 2011 9:56 pm

Rebel wrote:
michiguel wrote:I limit my search by nodes and I have been happy thereafter. I enthusiastically recommend it.
Excellent to test search issues, eval IMO by fixed depth. Knowledge vs Knowledge not influenced by search randomness.

The problem is that you might need other knowledge at unrealistically low depth than otherwise. E.g. Pawn-push bonuses tend to get highly exaggerated when you don't allow promotions to be within the horizon.

BubbaTough · Post by **BubbaTough** » Wed Nov 02, 2011 11:45 pm

Rebel wrote:
michiguel wrote:I limit my search by nodes and I have been happy thereafter. I enthusiastically recommend it.
Excellent to test search issues, eval IMO by fixed depth. Knowledge vs Knowledge not influenced by search randomness.

If you are looking for elo, Fixed depth search is MUCH worse than node based in my experience. Eval has a huge effect on the search tree. HUGE. This is missed in depth based cutoffs but not in timed or node based. When I switched from depth to node based optimization, my evaluation optimization results markedly improved.

-Sam

Rebel · Post by **Rebel** » Thu Nov 03, 2011 10:16 am

hgm wrote:
Rebel wrote:1. introduce a special parameter for internal testing;
2. when the flag is on increase the depth with 1 when queens are exchanged;
3. depth+2 entering the endgame
4. depth+5 entering the simple endgame
That was how Usurpator II did it in the eighties! I had not learned the blessings of iterative deepening yet, in those days.

The best generation ever

Steve B · Post by **Steve B** » Thu Nov 03, 2011 12:20 pm

Rebel wrote:
hgm wrote:
Rebel wrote:1. introduce a special parameter for internal testing;
2. when the flag is on increase the depth with 1 when queens are exchanged;
3. depth+2 entering the endgame
4. depth+5 entering the simple endgame
That was how Usurpator II did it in the eighties! I had not learned the blessings of iterative deepening yet, in those days.
The best generation ever

Exactly..
It was the era of the dedicated computers

The Mephisto Academy Sends Its Regards
Steve

rbarreira · Post by **rbarreira** » Thu Nov 03, 2011 2:23 pm

michiguel wrote:
Rebel wrote:
Rebel wrote: Eval tuning I strictly do at fixed depth. I don't want external factors like time control or permanent brain to interfere. Enough volume will flatten all the horizon effects eventually, both sides.

mcostalba wrote:IMHO the main drawbacks are: impossible to test depth sensible stuff like king safety and artificial same depth for midgame and endgame. But I agree for some evaluation parameters could be good, actually I will give it a try.
For self-play-ply-depth testing I am planning the following:

1. introduce a special parameter for internal testing;
2. when the flag is on increase the depth with 1 when queens are exchanged;
3. depth+2 entering the endgame
4. depth+5 entering the simple endgame

Or something like that.
I limit my search by nodes and I have been happy thereafter. I enthusiastically recommend it.

Miguel

I suppose that's good for parameter tuning but not for bigger eval changes (due to it not accounting for eval speed).

bob · Post by **bob** » Thu Nov 03, 2011 8:11 pm

rbarreira wrote:
michiguel wrote:
Rebel wrote:
Rebel wrote: Eval tuning I strictly do at fixed depth. I don't want external factors like time control or permanent brain to interfere. Enough volume will flatten all the horizon effects eventually, both sides.

mcostalba wrote:IMHO the main drawbacks are: impossible to test depth sensible stuff like king safety and artificial same depth for midgame and endgame. But I agree for some evaluation parameters could be good, actually I will give it a try.
For self-play-ply-depth testing I am planning the following:

1. introduce a special parameter for internal testing;
2. when the flag is on increase the depth with 1 when queens are exchanged;
3. depth+2 entering the endgame
4. depth+5 entering the simple endgame

Or something like that.
I limit my search by nodes and I have been happy thereafter. I enthusiastically recommend it.

Miguel
I suppose that's good for parameter tuning but not for bigger eval changes (due to it not accounting for eval speed).

The other issue is one I have pointed out repeatedly. If your program speeds up (or slows down) in nps in a certain phase of the game, you would normally search deeper (if it speeds up, for example). But if you slow down, say in a complicated attacking position, a fixed node count makes it appear you do not. And as a result, you can tune your program to try to reach those kinds of positions, where you do better because you are not getting penalized by slowing down when doing a fixed node test. That NPS variation adds a new variable that is not obvious, and tuning against that is not always a good idea. I've tried both fixed depth, and fixed nodes. Each has places where they work reasonably. But NOTHING replaces using time, overall, because that is how you actually have to play the game, and tuning like you play is much safer overall...

rbarreira · Post by **rbarreira** » Thu Nov 03, 2011 9:39 pm

bob wrote:
rbarreira wrote:
michiguel wrote:
Rebel wrote:
Rebel wrote: Eval tuning I strictly do at fixed depth. I don't want external factors like time control or permanent brain to interfere. Enough volume will flatten all the horizon effects eventually, both sides.

mcostalba wrote:IMHO the main drawbacks are: impossible to test depth sensible stuff like king safety and artificial same depth for midgame and endgame. But I agree for some evaluation parameters could be good, actually I will give it a try.
For self-play-ply-depth testing I am planning the following:

1. introduce a special parameter for internal testing;
2. when the flag is on increase the depth with 1 when queens are exchanged;
3. depth+2 entering the endgame
4. depth+5 entering the simple endgame

Or something like that.
I limit my search by nodes and I have been happy thereafter. I enthusiastically recommend it.

Miguel
I suppose that's good for parameter tuning but not for bigger eval changes (due to it not accounting for eval speed).
The other issue is one I have pointed out repeatedly. If your program speeds up (or slows down) in nps in a certain phase of the game, you would normally search deeper (if it speeds up, for example). But if you slow down, say in a complicated attacking position, a fixed node count makes it appear you do not. And as a result, you can tune your program to try to reach those kinds of positions, where you do better because you are not getting penalized by slowing down when doing a fixed node test. That NPS variation adds a new variable that is not obvious, and tuning against that is not always a good idea. I've tried both fixed depth, and fixed nodes. Each has places where they work reasonably. But NOTHING replaces using time, overall, because that is how you actually have to play the game, and tuning like you play is much safer overall...

You are definitely right that in an ideal situation, testing with time is the best approach. But I can see why people would test with fixed nodes, especially in situations where the computer running the test is doing other things which might affect program speed, favoring one or another program under test (along with other advantages others mentioned like being able to merge together results from different hardware).

hgm · Post by **hgm** » Thu Nov 03, 2011 10:43 pm

You can of course simply count nodes in such a way that your program does not speed up. How you count nodes is pretty much a matter of taste anyway. Some people count MakeMoves, other count MoveGens, still other count Evals. Would you count nodes that are hash pruned or not, etc.

Sergei S. Markoff · Post by **Sergei S. Markoff** » Thu Nov 03, 2011 11:11 pm

Hi!

Let me to insert one more coin in this theme)

I think, we should use such a principles in tuning:

1. Use nodes limit per move instead of time limit (more stability)
2. Use fixed set of starting positions (I suppose huge, but strictly predefined) (more stability)

Let's assume that we have several "species" with different current ELO value and number of games played.
At every next step we should find a pair of "species" with the highest top bound of confidence interval of ELO. And play a game between them.
Also there should be some "borning mechanism" which should
1. Randomly select two species; the probability of choose should depends on ELO (higher ELO — higher probability)
2. Produce the "child" using some random recombination technique and random "mutation".
And also, of course, we should have some "garbage collector" that will remove species with lowest top confidence interval bound.

This framework should be distributed, of course

I think it's really a good idea to create such a common framework which will provide tuning ability for every chess programmer. And of course it will be really interesting to watch the results

diep · Post by **diep** » Sun Nov 06, 2011 4:50 pm

Rebel wrote:
mcostalba wrote:Running games at fixed depth (especially so low like 8 plies) has some drawback,
Eval tuning I strictly do at fixed depth. I don't want external factors like time control or permanent brain to interfere. Enough volume will flatten all the horizon effects eventually, both sides.

running in a GUI like Arena has even more drawbacks, I'd suggest a command line tournament manager like cutechess-cli and run on time.
Downloaded...

I like Arena because it supports nodes-matches. IMO a better way to test search related changes than on time.

BTW your C is very assemblish, lovely stuff, really, no joking: it has a kind of vintage fashion.
All my engines were in assembler. I just can't get used to these brackets.
Code: Select all
&#123;  
   &#123; 
       &#123; 
           &#123;
           &#125;
       &#125;
   &#125;
&#125;
Things like that drives me crazy

Hi Ed,

In itself running fixed depth matches is not a bad idea. However it tunes a lot better if you get through tactical barrier. That barrier is far above 8 ply.

Just go tune at something like 1 minute entire game and 0.1 second increment.

The idea is that after an engine gets 'better' and more 'well tuned' that you also start to search deeper because of the improvements, which scales up th experiment.

Soon you'll move to 5 minutes a game and so on.

If your engine is capable of running as a winboard engine you'll need to use other tools. Most Gui's simply aren't stable enough to play that many games nor fast enough, as they need to update the graphics and do all sort of central locked i/o.

A core or 30 is no luxury to do stuff like this.

Don't believe by the way that just playing games is the holy grail, they do more than just play games for parameter tuning.

Vincent

Tuning again

Re: Tuning again

Re: Tuning again

Re: Tuning again

Re: Tuning again

Re: Tuning again

Re: Tuning again

Re: Tuning again

Re: Tuning again

Re: Tuning again

Re: Tuning again