Finding equivalent times for different engines

Uri Blass · Post by **Uri Blass** » Thu May 19, 2011 9:12 am

I think that it may be a good idea to have a utility to test engines automatically at changing time control.

The idea is that the stronger engine get always 1 minute per game when the weaker engine get more time in the next game after every loss and less time after every win

The time per game for the weaker engine after every game can be multiplied(after a loss) or divided after a win by (1+1/sqrt(n)) minutes when n is the number
of previous games in the match(there may be a better formula and the idea is to try to find time equivalence between engines).

The idea is that after 1000 games you may say something like
1 minute by houdini1.5 is equivalent to 1.8 minutes by stockfish in a direct match(with possible statistical error of 0.2 minute)

I think that this idea may help people to see if there are engines that earns more from time(because you can see that the factor that the weaker engine get is smaller with more time.

Note that I expect in most cases to see bigger factor with more time and if 1 minutes of houdini is equivalent to 1.8 minutes of stockfish then probably 10 minutes of houdini is equivalent to more than 18 minutes of stockfish(I do not know if it is the case for houdini and stockfish but I guess that a similiar idea happen in most cases).

hgm · Post by **hgm** » Thu May 19, 2011 9:35 am

I think it would be more reliable to conduct a normal time-odds tournament, where you enter each engine with several different time-odds factors (e.g. 1, 3, 9, 24, 72), and then extract ratings in the usual way. That way the strength estimates are based on playing a variety of opponents, which should make them more meaningful quantities. The Elo vs TC curves would tell you how efficient the engine uses time.

LaurenceChen · Post by **LaurenceChen** » Thu May 19, 2011 11:48 am

Uri Blass wrote:I think that it may be a good idea to have a utility to test engines automatically at changing time control.

The idea is that the stronger engine get always 1 minute per game when the weaker engine get more time in the next game after every loss and less time after every win

The time per game for the weaker engine after every game can be multiplied(after a loss) or divided after a win by (1+1/sqrt(n)) minutes when n is the number
of previous games in the match(there may be a better formula and the idea is to try to find time equivalence between engines).

The idea is that after 1000 games you may say something like
1 minute by houdini1.5 is equivalent to 1.8 minutes by stockfish in a direct match(with possible statistical error of 0.2 minute)

I think that this idea may help people to see if there are engines that earns more from time(because you can see that the factor that the weaker engine get is smaller with more time.

Note that I expect in most cases to see bigger factor with more time and if 1 minutes of houdini is equivalent to 1.8 minutes of stockfish then probably 10 minutes of houdini is equivalent to more than 18 minutes of stockfish(I do not know if it is the case for houdini and stockfish but I guess that a similiar idea happen in most cases).

I think such a method is a total waste of time... You are assuming that "search" is a linear method used by strong engines...
If the "search" function is truly linear... such method would equalize, but, there are positional features which a strong engine may have a better understanding... therefore, this make the such non-linear...

Sven · Post by **Sven** » Thu May 19, 2011 4:20 pm

The formula (1+1/sqrt(n)) gives higher weight to the results of the first games since the values of that expression are about 2.00, 1.71, 1.58, 1.50, 1.45, 1.41, ... with increasing n. It also does not take into account that a draw is better than expected from the weaker engine's viewpoint, so it should lead to decreasing the time, too.

In my opinion a better formula might be based on ELO calculation.

Sven

bob · Post by **bob** » Thu May 19, 2011 6:29 pm

hgm wrote:I think it would be more reliable to conduct a normal time-odds tournament, where you enter each engine with several different time-odds factors (e.g. 1, 3, 9, 24, 72), and then extract ratings in the usual way. That way the strength estimates are based on playing a variety of opponents, which should make them more meaningful quantities. The Elo vs TC curves would tell you how efficient the engine uses time.

I like that idea better. The constantly changing time control would probably result in a lot more games being required for the thing to reach some sort of stabilized state. Even with two equal programs, there are going to be wins and losses. Sometimes significant strings of each. That would tend to make this idea "hunt around" quite a bit.

I think the idea of a fixed T/C match with one side getting time odds is easier to do. And one could then vary the handicap until equality is reached. Of course, that begs the question "what does this number mean?"

Uri Blass · Post by **Uri Blass** » Thu May 19, 2011 8:53 pm

Sven Schüle wrote:The formula (1+1/sqrt(n)) gives higher weight to the results of the first games since the values of that expression are about 2.00, 1.71, 1.58, 1.50, 1.45, 1.41, ... with increasing n. It also does not take into account that a draw is better than expected from the weaker engine's viewpoint, so it should lead to decreasing the time, too.

In my opinion a better formula might be based on ELO calculation.

Sven

The reason for my formula is that I want the time control to converge.
Initially I do not know the time handicap that I need to get 50% so I make big changes in the time and later I make small changes in the time.

There is no meaning for elo calculation here because the games are not at the same conditions.

First game time control for the weaker side may be based on a guess by the user.
Let say I give Houdini 1 minute and I guess that engine X needs 10 minutes to get 50%

I am unsure about my guess so I want the 10 minutes to be changed significantly in the first games and later I want to have smaller changes.

It means that the second game may be 5 minutes per game or 10 minute per game or 20 minute per game for the weaker side.

If the second game is 5 minutes per game for the weaker side
The third game may be 5*1.71 minutes per game or 5 minutes per game or 5/1.71 minutes per game for the weaker side.

I do not know which side is weaker in the unequal time conditions so I cannot say that a draw is better than expected for one side.

michiguel · Post by **michiguel** » Thu May 19, 2011 8:56 pm

Uri Blass wrote:I think that it may be a good idea to have a utility to test engines automatically at changing time control.

The idea is that the stronger engine get always 1 minute per game when the weaker engine get more time in the next game after every loss and less time after every win

The time per game for the weaker engine after every game can be multiplied(after a loss) or divided after a win by (1+1/sqrt(n)) minutes when n is the number
of previous games in the match(there may be a better formula and the idea is to try to find time equivalence between engines).

The idea is that after 1000 games you may say something like
1 minute by houdini1.5 is equivalent to 1.8 minutes by stockfish in a direct match(with possible statistical error of 0.2 minute)

I think that this idea may help people to see if there are engines that earns more from time(because you can see that the factor that the weaker engine get is smaller with more time.

Note that I expect in most cases to see bigger factor with more time and if 1 minutes of houdini is equivalent to 1.8 minutes of stockfish then probably 10 minutes of houdini is equivalent to more than 18 minutes of stockfish(I do not know if it is the case for houdini and stockfish but I guess that a similiar idea happen in most cases).

That is what scripting languages are for. Not GUIs.
It should not be difficult to write. I may do it for fun.

Miguel

michiguel · Post by **michiguel** » Thu May 19, 2011 8:58 pm

bob wrote:
hgm wrote:I think it would be more reliable to conduct a normal time-odds tournament, where you enter each engine with several different time-odds factors (e.g. 1, 3, 9, 24, 72), and then extract ratings in the usual way. That way the strength estimates are based on playing a variety of opponents, which should make them more meaningful quantities. The Elo vs TC curves would tell you how efficient the engine uses time.
I like that idea better. The constantly changing time control would probably result in a lot more games being required for the thing to reach some sort of stabilized state. Even with two equal programs, there are going to be wins and losses. Sometimes significant strings of each. That would tend to make this idea "hunt around" quite a bit.

I have the feeling that with Uri's idea the convergence will be faster, if done properly. Besides, no concept of ELO is needed.

Miguel

I think the idea of a fixed T/C match with one side getting time odds is easier to do. And one could then vary the handicap until equality is reached. Of course, that begs the question "what does this number mean?"

Uri Blass · Post by **Uri Blass** » Thu May 19, 2011 9:20 pm

LaurenceChen wrote:
Uri Blass wrote:I think that it may be a good idea to have a utility to test engines automatically at changing time control.

The idea is that the stronger engine get always 1 minute per game when the weaker engine get more time in the next game after every loss and less time after every win

The time per game for the weaker engine after every game can be multiplied(after a loss) or divided after a win by (1+1/sqrt(n)) minutes when n is the number
of previous games in the match(there may be a better formula and the idea is to try to find time equivalence between engines).

The idea is that after 1000 games you may say something like
1 minute by houdini1.5 is equivalent to 1.8 minutes by stockfish in a direct match(with possible statistical error of 0.2 minute)

I think that this idea may help people to see if there are engines that earns more from time(because you can see that the factor that the weaker engine get is smaller with more time.

Note that I expect in most cases to see bigger factor with more time and if 1 minutes of houdini is equivalent to 1.8 minutes of stockfish then probably 10 minutes of houdini is equivalent to more than 18 minutes of stockfish(I do not know if it is the case for houdini and stockfish but I guess that a similiar idea happen in most cases).
I think such a method is a total waste of time... You are assuming that "search" is a linear method used by strong engines...
If the "search" function is truly linear... such method would equalize, but, there are positional features which a strong engine may have a better understanding... therefore, this make the such non-linear...

I do not see how the fact that stronger engines have a better understanding of some positional feature is relevant here.

The fact is that every engine earns from time so with enough time compensation one the weaker engine is going to get 50% against the stornger engine

playing many games with 1:3 and 1:9 and 1:27 time handicap seems to me a very slow way to find the time that you need to give the weak engine to get 50%(if you see 800:200 in the 1:3 handicap games,
600:400 in the 1:9 handicap games and 300:700 in the 1:27 handicap time games then you know that the weaker engine needs something between 9 minutes and 27 minutes to get 50% and most of the 3:1 time handicap games are clearly useless because you can be practically sure that 3:1 time handicap is not enough compensation for the weaker engine.

Sven · Post by **Sven** » Fri May 20, 2011 5:08 pm

Uri Blass wrote:
Sven Schüle wrote:The formula (1+1/sqrt(n)) gives higher weight to the results of the first games since the values of that expression are about 2.00, 1.71, 1.58, 1.50, 1.45, 1.41, ... with increasing n. It also does not take into account that a draw is better than expected from the weaker engine's viewpoint, so it should lead to decreasing the time, too.

In my opinion a better formula might be based on ELO calculation.
The reason for my formula is that I want the time control to converge.
Initially I do not know the time handicap that I need to get 50% so I make big changes in the time and later I make small changes in the time.

I have understood that. However, the first results may drive you quite far away from the target. For instance the weaker engine may start with winning 5 games in a row, which is not unusual. My impression is still that your formula gives too much weight to the first games while it is not clear at all why these games should have more significance for the convergency process. That's why I proposed to do something like ELO calculation which always takes into account all games equally, where the time-odds of each single game are used for the calculation. (I don't have the formula available for that, though, it would have to be developed and tested.)

Uri Blass wrote:There is no meaning for elo calculation here because the games are not at the same conditions.

It is certainly possible to do ELO calculation for time-odds games, and we could certainly find some approximated ELO rating formula even for the case of N games with different time-odds conditions.

I also think that giving time-odds is just another way of expressing different ELO strength of two programs.

Uri Blass wrote:[...] I do not know which side is weaker in the unequal time conditions so I cannot say that a draw is better than expected for one side.

O.k., so you say the following:
If at some point during the game the second engine gets T minutes per game and now both engines play some draws in a row then the second engine still gets T minutes with your formula, which means that you may see T as your desired result.

I think it might be worth to perform some simulation to show that this formula (or a different one) actually tends to converge to a reasonable result.

Sven

Finding equivalent times for different engines

Finding equivalent times for different engines

Re: Finding equivalent times for different engines

Re: Finding equivalent times for different engines

Re: Finding equivalent times for different engines

Re: Finding equivalent times for different engines

Re: Finding equivalent times for different engines

Re: Finding equivalent times for different engines

Re: Finding equivalent times for different engines

Re: Finding equivalent times for different engines

Re: Finding equivalent times for different engines