Diminishing returns in fixed depth testing revisited.

abulmo · Post by **abulmo** » Sun Aug 26, 2012 9:57 am

Uri Blass wrote:I think that there is no constant number for it and it is clearly dependent on the question who play against who.

Suppose A,B,C are strong enough to be unbeatable with white but they can lose with black so they are not perfect players.

A against B may give 50 elo advantage for A
B against C may give 50 elo advantage for B
A against C may only give 75 elo for A so if you play only
A against B and B against C you get 100 elo advantage for A relative to C
and if you play only A against C you get 75 elo advantage for A
relative to C.

I agree with that, and I wonder if there is a rating system that take this into account in their error calculation. This error is easy to calculate :
error_elo_A = sqrt(((elo_A-elo_AB)^2 + (elo_A - elo_AC)^2) / 2)

Ajedrecista · Post by **Ajedrecista** » Sun Aug 26, 2012 10:53 am

Hello Filip:

tvrzsky wrote:What about the influence of hashtable size? It seems to be fair to give both opponents the same value but then unless it is big enough the engine searching deeper has relatively less space in the table than its opponent. Are you for example sure that 16MB for 10 ply search is enough? Otherwise it could imply some false diminishing effect IMHO.

I also thought about it and I do not know the influence. Please take in mind that I am a newbie in computer chess.

I choosed Quazar 0.4 for various reasons:

· It is a strong engine, so I suppose that it is stable, reliable, etc.
· It is single core, so I do not have to care about the number of threads.
· Quazar runs into depth very fast (it is like SF) and I wanted an engine like this, for accumulating more data points. It is the opposite of Rybka, which report very low depths.

Saying that, it looks like depth 10 of Quazar can not be compared with depth 10 of other engines, such as Houdini, Komodo, Critter, Rybka, etc. IMHO the only possible comparison with other top engine could be made with SF. You can run the following match: Rybka 4.1 x64 versus SF 2.2.2 x64 (both single core): they are very close in rating lists (for example in IPON), but if you try a fixed depth match (let us say 8-ply search), then I am sure that Rybka will slaughter SF by a large margin. Depths between engines are not comparable, the same as node count.

Other question is how calculate the best hash table size for a given depth. If I manage to run depth 10 vs. depth 11 of Quazar 0.4, I can set hash to 32 MB, 64 MB, ... but which value is the better one? It is unknown for me.

At least, Adam took a special care with hash table size when I did not, but anyway both of us got a very similar shape of the curve: it is like a side-view of a spoon: non-lineal with very low depths, then almost lineal with higher depths... always representing as I did: (ln(depth=1), rating_depth=1); (ln(depth=2), rating_depth=2); ...

Thanks for your interest, Filip. This topic is opened to more results of different engines.

Regards from Spain.

Ajedrecista.

Rebel · Post by **Rebel** » Sun Aug 26, 2012 4:14 pm

tvrzsky wrote:What about the influence of hashtable size? It seems to be fair to give both opponents the same value but then unless it is big enough the engine searching deeper has relatively less space in the table than its opponent. Are you for example sure that 16MB for 10 ply search is enough? Otherwise it could imply some false diminishing effect IMHO.

Absolutely true. With a full HT the branch factor will go up and thus the diminishing return (DR) with it. When I am finished with the DR thing I plan to do some separate research in this area.

I am not familiar with MP programming but I can imagine each doubling in core needs a 50-75-100% HT size increase to remain effective, that is at longer time controls. Are there known numbers?

tvrzsky · Post by **tvrzsky** » Sun Aug 26, 2012 6:17 pm

Ajedrecista wrote:Hello Filip:

tvrzsky wrote:What about the influence of hashtable size? It seems to be fair to give both opponents the same value but then unless it is big enough the engine searching deeper has relatively less space in the table than its opponent. Are you for example sure that 16MB for 10 ply search is enough? Otherwise it could imply some false diminishing effect IMHO.
I also thought about it and I do not know the influence. Please take in mind that I am a newbie in computer chess.

I choosed Quazar 0.4 for various reasons:

· It is a strong engine, so I suppose that it is stable, reliable, etc.
· It is single core, so I do not have to care about the number of threads.
· Quazar runs into depth very fast (it is like SF) and I wanted an engine like this, for accumulating more data points. It is the opposite of Rybka, which report very low depths.

Saying that, it looks like depth 10 of Quazar can not be compared with depth 10 of other engines, such as Houdini, Komodo, Critter, Rybka, etc. IMHO the only possible comparison with other top engine could be made with SF. You can run the following match: Rybka 4.1 x64 versus SF 2.2.2 x64 (both single core): they are very close in rating lists (for example in IPON), but if you try a fixed depth match (let us say 8-ply search), then I am sure that Rybka will slaughter SF by a large margin. Depths between engines are not comparable, the same as node count.

Other question is how calculate the best hash table size for a given depth. If I manage to run depth 10 vs. depth 11 of Quazar 0.4, I can set hash to 32 MB, 64 MB, ... but which value is the better one? It is unknown for me.

I think the best strategy could be to use the most available amount of memory and to divide it in the ratio propotional to the average branching factor of test engine between the two dephts which are you just testing?

At least, Adam took a special care with hash table size when I did not, but anyway both of us got a very similar shape of the curve: it is like a side-view of a spoon: non-lineal with very low depths, then almost lineal with higher depths... always representing as I did: (ln(depth=1), rating_depth=1); (ln(depth=2), rating_depth=2); ...

Thanks for your interest, Filip. This topic is opened to more results of different engines.

Regards from Spain.

Ajedrecista.

Uri Blass · Post by **Uri Blass** » Sun Aug 26, 2012 6:27 pm

Rebel wrote:
tvrzsky wrote:What about the influence of hashtable size? It seems to be fair to give both opponents the same value but then unless it is big enough the engine searching deeper has relatively less space in the table than its opponent. Are you for example sure that 16MB for 10 ply search is enough? Otherwise it could imply some false diminishing effect IMHO.
Absolutely true. With a full HT the branch factor will go up and thus the diminishing return (DR) with it. When I am finished with the DR thing I plan to do some separate research in this area.

I am not familiar with MP programming but I can imagine each doubling in core needs a 50-75-100% HT size increase to remain effective, that is at longer time controls. Are there known numbers?

doubling the speed help regardless of hash and I see no problem with using the same hash size for all engines.

I use engines for correspondence games and I give them many hours per move and I do not see that the branching factor is going up significantly with big depths.

tvrzsky · Post by **tvrzsky** » Sun Aug 26, 2012 6:45 pm

Uri Blass wrote:doubling the speed help regardless of hash and I see no problem with using the same hash size for all engines.

I use engines for correspondence games and I give them many hours per move and I do not see that the branching factor is going up significantly with big depths.

I think it could be because the tables are already completely full and less effective anyway

The biggest difference can in my opinion occur in the case when one engine would be still in the area where the tables are most effective (i. e. not full) whether the second one not.

Ajedrecista · Post by **Ajedrecista** » Mon Aug 27, 2012 8:21 pm

Hello again:

tvrzsky wrote:I think the best strategy could be to use the most available amount of memory and to divide it in the ratio propotional to the average branching factor of test engine between the two dephts which are you just testing?

That idea sounds good but I do not know the branching factor of any engine (I suppose that top modern engines have a BF near 2).

I stopped to test Quazar as the next 2500-game match will take more than twenty hours in my computer and it is too much for me. OTOH I started to test EXChess 6.50b w32 thanks to the comments of some people in the General subforum: it runs into depth even faster than Quazar! EXChess is not as strong as Quazar is, but it is also single core and does not seem buggy at all... it is enough for me.

I replaced the names of engines in engines.json file but this trick does not work in cutechess-cli 0.5.1, so I rightly supposed that EXChess was not UCI compatible as Quazar is; I changed to winboard protocol, then cutechess-cli crashed; I finally changed to xboard protocol and everything works OK except hash size: it seems that cutechess-cli does not recognize hash option in a xboard engine, although it is not a problem for me. I just added -ratinginterval 1 flag to the command line for getting the Elo difference after each game (thanks for the reminder, Ilari!).

I guess that both copies of EXChess are not using any hash table, but my graphs continue to look very similar between them, although I need more data points, aiming for a better fit. I have just finished the 2500-game match between (depth 7) and (depth 8) of EXChess in a record time, around two hours: the clearest example is that I ran (depth 1) Vs. (depth 2) of Quazar 0.4 w32 in around 75 minutes, when I ran the same match for EXChess in only ten minutes in the same PC under the same conditions! I do not know anything about chess programming but I guess that EXChess has an extremely fast move generation... just a guess.

If nothing goes wrong, I will continue the fixed depth testing of EXChess tomorrow; I have 17500 games at this moment. I will upload EXChess results once I consider that I finish the test or when I consider that I have enough data to share.

Thanks again for your interest!

Regards from Spain.

Ajedrecista.

ilari · Post by **ilari** » Mon Aug 27, 2012 10:00 pm

Ajedrecista wrote:I replaced the names of engines in engines.json file but this trick does not work in cutechess-cli 0.5.1, so I rightly supposed that EXChess was not UCI compatible as Quazar is; I changed to winboard protocol, then cutechess-cli crashed;

Yeah, there's a bug that makes cutechess-cli crash if it encounters an unknown chess protocol in the engines.json file. The next version won't have this problem.

Ajedrecista wrote:I finally changed to xboard protocol and everything works OK except hash size: it seems that cutechess-cli does not recognize hash option in a xboard engine, although it is not a problem for me.

If you used something like "option.Hash=32" then that is expected because Xboard doesn't have a hash option. But it does have a "memory" option which you can use (eg. "option.memory=32"). Of course the "memory" option only works with engines that support it. I think most Xboard engines still use a configuration file to set engine options.

Daniel Shawul · Post by **Daniel Shawul** » Tue Aug 28, 2012 4:32 am

I just read this interesting quote from the wiki page you linked but I am not sure I agree with it.

If two programs play with 5 vs 6 ply search, the second engine has a 20% depth advantage. With 10 vs 11 it's only 10%. So of course the difference in wins is smaller. ...

Diminishing returns are only proven (IMO) if 6 vs 5 wins more games than 12 vs 10 because only then are you comparing something linear and you give a linear advantage.

So the argument is at higher depths 11/10 is less than 6/5 which is why the return is diminishing. What if it is nodes searched that give strength in which case it is BF^11/BF^10 = BF and BF^6/BF^5 = BF. Thus nothing can be deducted from this.Yes experiment tells us there is diminishing returns with additional search but I am not sure of the explanation given above.

Don · Post by **Don** » Tue Aug 28, 2012 4:36 am

Daniel Shawul wrote:I just read this interesting quote from the wiki page you linked but I am not sure I agree with it.
If two programs play with 5 vs 6 ply search, the second engine has a 20% depth advantage. With 10 vs 11 it's only 10%. So of course the difference in wins is smaller. ...

Diminishing returns are only proven (IMO) if 6 vs 5 wins more games than 12 vs 10 because only then are you comparing something linear and you give a linear advantage.
So the argument is at higher depths 11/10 is less than 6/5 which is why the return is diminishing. What if it is nodes searched that give strength in which case it is BF^11/BF^10 = BF and BF^6/BF^5 = BF. Thus nothing can be deducted from this.Yes experiment tells us there is diminishing returns with additional search but I am not sure of the explanation given above.

There is diminishing returns but part of that is due to the fact that there are more draws as programs play stronger.

In fact, many years ago it appeared the gain for adding 1 ply of search was nearly linear. In other words a constant amount for each ply of depth. It turns out it's not quite linear but slightly decreases with depth. The depth ratio has almost nothing to do with it.

Diminishing returns in fixed depth testing revisited.

Re: Diminishing returns in fixed depth testing revisited.

Re: Influence of hash table size?

Re: Diminishing returns in fixed depth testing revisited.

Re: Influence of hash table size?

Re: Diminishing returns in fixed depth testing revisited.

Re: Diminishing returns in fixed depth testing revisited.

Re: Influence of hash table size?

Re: Influence of hash table size?

Re: Diminishing returns in fixed depth testing revisited.

Re: Diminishing returns in fixed depth testing revisited.