Move ordering contest

Uri Blass · Post by **Uri Blass** » Fri May 31, 2013 2:05 am

Daniel Shawul wrote:I won't buy in to his 'appeal to humility' because I do not think being bad in move ordering indicates higher elo. It is absurd to try and justify negative correlation between move ordering and higher strength just because a strong engine performed bad. Strength is influenced by way too many factors to make any conclusions. Every one belived Rybka had super evaluation until it is proved not to be so. Therefore we need to make a proper test of each move ordering component for the test to be un-flawed by 'percieved elo of engine', otherwise it is no better than the Rybka PR.

1)I see no proof that rybka had no superior evaluation relative to weaker programs.

Note that more complex does not mean better.

2)I think that the test proves nothing about quality of move ordering
and I suspect that komodo is going to get "better" move order if Don change komodo's evaluation to only material evaluation.

I do not see a way to compare move ordering between different programs.
statistics about cutoffs show nothing here.

Even when we talk about the same program statistics may be misleading when we do not change the order of moves.

For example one of the changes in latest stockfish
is changing

const int GrainSize = 8;
to
const int GrainSize = 4;

GrainSize=4 means that the evaluation is more accurate and I think that more accurate evaluation means more fail high not in the first move.

Note that Stockfish has the following code in the evaluation:

return Value((result + GrainSize / 2) & ~(GrainSize - 1));

Daniel Shawul · Post by **Daniel Shawul** » Fri May 31, 2013 4:58 am

Uri Blass wrote:
Daniel Shawul wrote:I won't buy in to his 'appeal to humility' because I do not think being bad in move ordering indicates higher elo. It is absurd to try and justify negative correlation between move ordering and higher strength just because a strong engine performed bad. Strength is influenced by way too many factors to make any conclusions. Every one belived Rybka had super evaluation until it is proved not to be so. Therefore we need to make a proper test of each move ordering component for the test to be un-flawed by 'percieved elo of engine', otherwise it is no better than the Rybka PR.
1)I see no proof that rybka had no superior evaluation relative to weaker programs.

Note that more complex does not mean better.

That may well be the case i.e. a tuned eval but it was not as ground breaking as it was reported at first. I myself believed that it had a superior evaluation at the time because I started adding hanging pieces evaluation and other evals of tactical nature after that claim. Less did I know it was the search I am actually seeing at the time so I am careful at claims that correlate strength with some component of any engine.

2)I think that the test proves nothing about quality of move ordering
and I suspect that komodo is going to get "better" move order if Don change komodo's evaluation to only material evaluation.

I do not see a way to compare move ordering between different programs.
statistics about cutoffs show nothing here.

Infact the major advantage of move ordering of the later moves is not to produce cutoffs as early as possible but for efficient LMR tree search. Before late move reduction days, it was common to sort only the first 2 or 3 non-capture moves and then quit sorting, because it is a waste of time to do so on average compared to the expected gain in early cutoffs there. The motive to sort really late moves was weak, however this has changed now with heavy LMR every 8th move or so. Minor improvements in move ordering there can bring major profits because we decide which move gets to be searched deeper based on something subtle as move order. This important behavior is not captured by simple cutoff statistics here.

Even when we talk about the same program statistics may be misleading when we do not change the order of moves.

For example one of the changes in latest stockfish
is changing

const int GrainSize = 8;
to
const int GrainSize = 4;

GrainSize=4 means that the evaluation is more accurate and I think that more accurate evaluation means more fail high not in the first move.

Note that Stockfish has the following code in the evaluation:

return Value((result + GrainSize / 2) & ~(GrainSize - 1));

Lower grain sizes never worked for me but keep in mind Stockfish uses a pawn value of 256 which is almost 3x bigger than standard 100 centipawns. With the grainsize=4 i.e 64, it is coarser than 100 but not far off compared to grainsize=8 i.e. 32 centipawns which sounds too coarse to me for it be successful in the first place. Finer eval affects performance of tight aspiration or MTD badly as you would have to make researchs for the slightest of improvements (fail highs). As always this is a compromise between accurate eval (fine grain) vs better search.

rjgibert · Post by **rjgibert** » Fri May 31, 2013 7:17 am

Grainsize in centipawns is c = grainsize*100/256, so for a grainsize of 8 it's 8*100/256 = 3.125 centipawns and not 32 centipawns. For a grainsize of 4 it's 4*100/256 = 1.5625 centipawns.

Daniel Shawul · Post by **Daniel Shawul** » Fri May 31, 2013 11:48 am

rjgibert wrote:Grainsize in centipawns is c = grainsize*100/256, so for a grainsize of 8 it's 8*100/256 = 3.125 centipawns and not 32 centipawns. For a grainsize of 4 it's 4*100/256 = 1.5625 centipawns.

Note that I computed the value a pawn would have i.e 100/3.125=32 and 100/1.5625=64 to show how far off it is from standard 100 centipawns. That is assuming 1 unit resolution is used for all without the need to round up. Ofcourse the value of the resolution itself is much smaller; granisize=8 'hexipawns' is same as grainsize=3.125 centipawns.

mvk · Post by **mvk** » Thu Jun 13, 2013 10:10 pm

Daniel Shawul wrote:
Rebel wrote:Final results are up: http://www.top-5000.nl/moresu.htm

Marcel wins the trophy

Don loses with a great distance

It seems I had it all backwards.

I will make a new one as suggested just for the fun of it and see where we will end up. But I won't call it a contest any longer !
I will venture a guess that such contests will end up being an ego-trip for some after the recent 'counter move' buzz. I bet you expected Don to win because he suggested it to stockish and it got +20 elo. The eventual winner Rookie wins by a huge margin again because it has counter move heuristic.

This version doesn't have the counter move heuristic because that tested negative in the previous code base and I haven't tried it ever since. At best the result is caused by very conservative LMR (R=1). I expect the ratio to nosedive once work on that can start.

Move ordering contest

Re: Move ordering contest

Re: Move ordering contest

Re: Move ordering contest

Re: Move ordering contest

Re: Move ordering contest