Testing thread summary for the weary

bob · Post by **bob** » Sat Aug 16, 2008 4:35 pm

hgm wrote:
bob wrote:Only thing that is left, is to figure out how many games are necessary. It would be nice to make sure that the large number of positions (and their variety/balance) are a reasonable set, which I will be happy to provide. Then to refine this to determine how many are needed for a +/- 1 elo comparison, a +/- 10 Elo, and maybe something even bigger, so that for a major change, it can be accepted/rejected more quickly.
Left for you to figure out, that is. The rest of the world of course already knows this, as they do read my posts. But take your time, and eventually you will get there. (Well, any bets on this? We were looking for dead-cert betting opportunities in the other thread, not? ) Even if it takes another 11 months...

Go right ahead. You might find a bit of unexpected correlation in the starting test positions, that will introduce a bit more variation in the results. Choosing N different chess positions that are really independent of each other is not quite as easy as you might imagine. I've already seen a few such positions and I have neither the time nor the will to compare 4,000 positions to each other by eye.

bob · Post by **bob** » Sat Aug 16, 2008 4:39 pm

Harald Johnsen wrote:I've not read the other threads, it was too long.

The question is : how to test two versions of an engine ?

What about methods that eliminate all randomness ? Did anybody tried that ?

HJ.

Yes, it is simple to do, and I reported on the issue. But to do so disables specific parts of an engine that need to be tested The randomness is coming from the way time measurements on an operating system have an element of "jitter" in them. If you eliminate time measurement, you can reproduce the same game over and over. But then you also shift part of the influence on the game from time measurement over to whatever method you use to decide how to limit the search times. Using node counts rather than time is a solution, but it is not so easy to choose a fair number of nodes since the NPS of programs vary dramatically over the course of a game.

The most logical way of testing is to test in the same environment you will use to play real games in tournaments. Which means using time to limit the search space.

Fritzlein · Post by **Fritzlein** » Mon Aug 18, 2008 2:22 am

tiger wrote:Karl I would like to take this opportunity to thank you for your contribution. Your comments were clear and showing a scientific mind at work.
[...]
I would also like to thank Bob for all the time and computing power he has contributed, as well as HGM for his tireless refusals when something looked obviously wrong.

And thank you to all the people who have contributed to this topic in a way or another.

Thank you for the kind words Cristophe. If I have contributed a drop to the ocean of knowledge, I am content, although it feels like less than a drop in hindsight. Unfortunately, the ongoing discussion of who is an idiot, although it has given me grounds to formulate my own opinions, does not draw me back to contribute either those opinions or something more substantive. I wanted to jump into the fray because I thought I had something to say, but I'm not going to be able to stand this forum. I don't know what the solution is for the collective, but the solution for me personally is to stick to friendlier discussions. You I admire for appreciating the individual strength of each individual. Thank you for bestowing smiles and warm fuzzies in a place that desperately needs them. Bless you.

tiger · Post by **tiger** » Mon Aug 18, 2008 8:43 am

Fritzlein wrote:
tiger wrote:Karl I would like to take this opportunity to thank you for your contribution. Your comments were clear and showing a scientific mind at work.
[...]
I would also like to thank Bob for all the time and computing power he has contributed, as well as HGM for his tireless refusals when something looked obviously wrong.

And thank you to all the people who have contributed to this topic in a way or another.
Thank you for the kind words Cristophe. If I have contributed a drop to the ocean of knowledge, I am content, although it feels like less than a drop in hindsight. Unfortunately, the ongoing discussion of who is an idiot, although it has given me grounds to formulate my own opinions, does not draw me back to contribute either those opinions or something more substantive. I wanted to jump into the fray because I thought I had something to say, but I'm not going to be able to stand this forum. I don't know what the solution is for the collective, but the solution for me personally is to stick to friendlier discussions. You I admire for appreciating the individual strength of each individual. Thank you for bestowing smiles and warm fuzzies in a place that desperately needs them. Bless you.

The harsh tone of some messages is quite usual here and I think we do not pay attention so much to it anymore.

If you had a look to the general forum you will find that I'm not that friendly myself at times, and for this I get even harsher replies.

It should not matter too much. The interest of the scientific discussion stands above the tone of the messages and the few individuals that are coming here just for trouble (I"m talking about the other forum, not about this one).

If you don't keep on reading the discussion, I hope someone will call you for help at the right moment. And that you will come back for a while.

All the best, Karl.

// Christophe

Testing thread summary for the weary

Re: Testing thread summary for the weary

Re: Testing thread summary for the weary

Re: Testing thread summary for the weary

Re: Testing thread summary for the weary