comparing engines - a new way?

Discussion of chess software programming and technical issues.

Moderators: hgm, Rebel, chrisw

mambofish

comparing engines - a new way?

Post by mambofish »

In electronics, the efficiency of a filter (i.e. how well it tunes out unwanted signals and admits only the desired ones) is denoted by the symbol Q.

It seems to me that chess engines are closely analogous to such filters. At the end of a search to depth (d) in any given position, the average branching factor of the number of evaluated nodes (n) is effectively the engine's 'Q':

Code: Select all

Q = n ^ (1/d)
So I have been wondering if there would be any merit in using Q as a formal comparative measure of different engines. The idea is that an engine that reaches the same (correct!) solution as another but by evaluating fewer nodes to do so is presumably "better".

There seem to be two main advantages to such a scheme. First, engines could be compared on entirely different hardware. Second, given a wide enough mix of test positions - strategic, tactical, opening, end-game - it would be possible to identify clearly where any engine's individual strengths and weaknesses lie.

So, does the panel think there is any merit in the idea?
Pradu
Posts: 287
Joined: Sat Mar 11, 2006 3:19 am
Location: Atlanta, GA

Re: comparing engines - a new way?

Post by Pradu »

mambofish wrote:In electronics, the efficiency of a filter (i.e. how well it tunes out unwanted signals and admits only the desired ones) is denoted by the symbol Q.

It seems to me that chess engines are closely analogous to such filters. At the end of a search to depth (d) in any given position, the average branching factor of the number of evaluated nodes (n) is effectively the engine's 'Q':

Code: Select all

Q = n ^ (1/d)
So I have been wondering if there would be any merit in using Q as a formal comparative measure of different engines. The idea is that an engine that reaches the same (correct!) solution as another but by evaluating fewer nodes to do so is presumably "better".

There seem to be two main advantages to such a scheme. First, engines could be compared on entirely different hardware. Second, given a wide enough mix of test positions - strategic, tactical, opening, end-game - it would be possible to identify clearly where any engine's individual strengths and weaknesses lie.

So, does the panel think there is any merit in the idea?
IMO I don't think the node count is a good measure. Engine speed is a big factor and the value of a node can be different among engines. Also people might count nodes slightly differently. Also I don't think branching factor based on the number of nodes searched is a good idea. The time spent inside nodes can vary considerably. For example an instant hash table cutoff or perhaps a game-decidable cutoff will take much less time than say a leaf node within the bounds which has to generate qsearch moves and evaluate the position and all that. Perhaps a better measure of branching factor between plies would be the time spent to reach the current ply/time spent to reach previous ply. In practice the branching factor by nodes is frequently off from the branching factor by time by more than 50% so the two definitions of branching factor can't be used interchangeably.

I think the best comparison would be games against each other and perhaps time to solution in test suites like you mentioned.