comparing match results

hgm · Post by **hgm** » Fri Dec 12, 2014 8:34 pm

There never is any need for Polyglot scripts. Just pass the option -fUCI or -sUCI to XBoard to run UCI engines as first or second.

If you have no GUI library on the machine it would of course be a problem. But I can hardly imagine you would not have access to a Linux machine that can run XBoard. Remember that running 10 games against N.E.G. might tell you infinitely more than running 10,000 games against Fairy-Max that result in 10,000 losses.

flok · Post by **flok** » Fri Dec 12, 2014 9:19 pm

hgm wrote:There never is any need for Polyglot scripts. Just pass the option -fUCI or -sUCI to XBoard to run UCI engines as first or second.

If you have no GUI library on the machine it would of course be a problem. But I can hardly imagine you would not have access to a Linux machine that can run XBoard. Remember that running 10 games against N.E.G. might tell you infinitely more than running 10,000 games against Fairy-Max that result in 10,000 losses.

I do but the machines that do these tournaments don't.

hgm · Post by **hgm** » Fri Dec 12, 2014 10:53 pm

So you might learn more from doing a tournament with 10 games per pairing on that one machine, than from all the games you have done so far on the others.

The thing I am afraid of is that playing all the QueenBee MC versions against each other and POS might give misleading information, when you start from a situation where none of them plays a realistic strategy. The versions that perform best could merely be best because they speculate more heavily that the opponent will do something stupid, and this could backfire against any non-stupid opponent. So 'optimizing' their mutual results might actually make them weaker. You need at least some opponents that 'do the right thing' in order to bootstrap yourself out of the 'swamp of randomness'. And opponents that score 100%, like Fairy-Max and HoiChess seem to do, are zero help in discriminating between the various QueenBee versions.

You could try using Fairy-Max with a depth limit, to see if you can weaken it enough to score ~50%. In fact you could add an entire 'spectrum' of depth-limited Fairy-Maxes, searching 1, 2, 3... ply, to use them as a calibration scale. Fairy-Max should support the sd N command.

comparing match results

Re: comparing match results

Re: comparing match results

Re: comparing match results