LMR Research

Ajedrecista · Post by **Ajedrecista** » Thu Jul 05, 2012 5:14 pm

Hello:

ZirconiumX wrote:
lucasart wrote:
bob wrote: I didn't say "test positions". I said "positions". 2000 games is a marginal test with a significant error bar. Probably too big to measure which of the two is better.
Wrong!
Of course if the score is 50.1% after 2000 games, it's not significant, but the 54.5% I got *is* significant. Just do the math to convince yourself!
We need Jesus Munoz's magic tool for this.

Matthew:out

Not magic at all! It is even open source and freeware. Lucas is right: few elementary calculations are needed. But thank you for thinking in me.

I think that I will upload again my programmes with their improvements in the next days if I do not make more improvements. I hope I will not forget it!

It depends on what you mean by significant: as a sample for 2000 games, I have supposed a draw ratio of 50% (the impact of the draw ratio would be small in the majority of the cases) and some confidence/LOS values (in a one-sided test):

Code: Select all

LOS = 90%&#58;

Theoretical minimum score for no regression&#58; 51.0127 %
Theoretical standard deviation in this case&#58;  1.0127 %
Minimum number of won points for the engine in this match&#58; 1020.5 points.

------------------------

LOS = 95%&#58;

Theoretical minimum score for no regression&#58; 51.2995 %
Theoretical standard deviation in this case&#58;  1.2995 %
Minimum number of won points for the engine in this match&#58; 1026.0 points.

------------------------

LOS = 98%&#58;

Theoretical minimum score for no regression&#58; 51.6219 %
Theoretical standard deviation in this case&#58;  1.6219 %
Minimum number of won points for the engine in this match&#58; 1032.5 points.

------------------------

LOS = 99%&#58;

Theoretical minimum score for no regression&#58; 51.8367 %
Theoretical standard deviation in this case&#58;  1.8367 %
Minimum number of won points for the engine in this match&#58; 1037.0 points.

------------------------

LOS = 99.5%&#58;

Theoretical minimum score for no regression&#58; 52.0330 %
Theoretical standard deviation in this case&#58;  2.0330 %
Minimum number of won points for the engine in this match&#58; 1041.0 points.

------------------------

LOS = 99.9%&#58;

Theoretical minimum score for no regression&#58; 52.4372 %
Theoretical standard deviation in this case&#58;  2.4372 %
Minimum number of won points for the engine in this match&#58; 1049.0 points.

I hope that all the numbers are correct. Remember that the model I use should not be extremely accurate as it is an approximation to a normal distribution instead of the right and exact trinomial distribution, although errors will be very small for 2000 games. I hope that Lucas (or anyone else) will verify soon my approximated results. Good luck with the development of your engines!

Regards from Spain.

Ajedrecista.

Don · Post by **Don** » Thu Jul 05, 2012 5:43 pm

bob wrote:
lucasart wrote:
bob wrote: I didn't say "test positions". I said "positions". 2000 games is a marginal test with a significant error bar. Probably too big to measure which of the two is better.
Wrong!
Of course if the score is 50.1% after 200 games, it's not significant, but the 54.5% I got *is* significant. Just do tha math to convince yourself !
I am not talking about the result of the change. I was talking about your statement about affecting the size of the tree due to researches and where they are done. For me, first rule of investigating a change is to understand how it affects the tree. If you play very fast games, and you are changing the shape of the tree significantly, the results can easily be skewed for, or against the change, where the results might be completely different at a longer time control. In fast games, reduced searches often drop right into q-search, which is cheap. In long games, they don't...

I have had a lot of discussions with Larry over this same phenomenon. Some of our testing is done at such fast levels (we have no real choice) that we cannot properly test changes that have significant impact on the tree. He is quite satisfied that it makes little difference, I am quite a bit more paranoid about that.

For a simple example, imagine testing progressive null move pruning - where you increase the R factor slowly based on depth. That is now a very popular thing to do. I have argued that this change is not something that we can reasonably test with hyper lightening levels such as game in 3 seconds or fixed depth 12 ply searches. You already have to be fairly deep to switch to a reduction of 4. So even though even a fast test may see some reductions of 4 or 5 it's also true that the shape of the tree is different with a deeper search. In fact even doing something different on the LAST PLY is not necessarily the same at any depth - and that applies even more so with changes that only kick in at fairly deep depths such as extended null move reductions. The nodes of the last ply of a 2 ply search are different than the nodes of the last ply of a 20 ply search.

LMR Research

Re: LMR research.

Re: LMR Research