How does Monte Carlo analysis work?

Zach Wegner · Post by **Zach Wegner** » Tue Jul 22, 2008 5:17 am

hgm wrote:
CRoberson wrote:What you suggest is not because it would not compensate for the engine being temporarily suspended due to other job priorities.
I don't understand this sentence, not only because of the doublr negation. Whatt dors 'it' refer to?

I propose this because at speeds of 25 msec per move the wall clock is too unreliable and inaccurate. So it would be a useful feature for those playing sub-second games.

It would also remove any non-determinism due to timing jitter, which could be a reason for people to even use it at long TC.

Programmers note this: for proper implmentation, please also announce the engine knows this feature by including 'nps=1' in the features command. You can even do this in protocol version 2.

I believe he means:

"What you suggest is not, because it would not compensate for the engine being temporarily suspended due to other job priorities."

I.e. your method doesn't take the jitters into account. Which is the point I believe.

My problem is that it is too easy to cheat. It's open to the same sort of abuses as depth or node limited search, engines can simply lie about how many nodes per second they are searching. Nobody uses these methods really, unless they might have open source engines that they can verify their honesty. I wouldn't trust closed source engines to comply, in this age of Rybkian output obfuscation...

michiguel · Post by **michiguel** » Tue Jul 22, 2008 6:02 am

Zach Wegner wrote:
hgm wrote:
CRoberson wrote:What you suggest is not because it would not compensate for the engine being temporarily suspended due to other job priorities.
I don't understand this sentence, not only because of the doublr negation. Whatt dors 'it' refer to?

I propose this because at speeds of 25 msec per move the wall clock is too unreliable and inaccurate. So it would be a useful feature for those playing sub-second games.

It would also remove any non-determinism due to timing jitter, which could be a reason for people to even use it at long TC.

Programmers note this: for proper implmentation, please also announce the engine knows this feature by including 'nps=1' in the features command. You can even do this in protocol version 2.
I believe he means:

"What you suggest is not, because it would not compensate for the engine being temporarily suspended due to other job priorities."

I.e. your method doesn't take the jitters into account. Which is the point I believe.

My problem is that it is too easy to cheat. It's open to the same sort of abuses as depth or node limited search, engines can simply lie about how many nodes per second they are searching. Nobody uses these methods really, unless they might have open source engines that they can verify their honesty. I wouldn't trust closed source engines to comply, in this age of Rybkian output obfuscation...

This is not for tournaments, it would be for testing.

Miguel

bob · Post by **bob** » Tue Jul 22, 2008 7:10 am

michiguel wrote:
Zach Wegner wrote:
hgm wrote:
CRoberson wrote:What you suggest is not because it would not compensate for the engine being temporarily suspended due to other job priorities.
I don't understand this sentence, not only because of the doublr negation. Whatt dors 'it' refer to?

I propose this because at speeds of 25 msec per move the wall clock is too unreliable and inaccurate. So it would be a useful feature for those playing sub-second games.

It would also remove any non-determinism due to timing jitter, which could be a reason for people to even use it at long TC.

Programmers note this: for proper implmentation, please also announce the engine knows this feature by including 'nps=1' in the features command. You can even do this in protocol version 2.
I believe he means:

"What you suggest is not, because it would not compensate for the engine being temporarily suspended due to other job priorities."

I.e. your method doesn't take the jitters into account. Which is the point I believe.

My problem is that it is too easy to cheat. It's open to the same sort of abuses as depth or node limited search, engines can simply lie about how many nodes per second they are searching. Nobody uses these methods really, unless they might have open source engines that they can verify their honesty. I wouldn't trust closed source engines to comply, in this age of Rybkian output obfuscation...
This is not for tournaments, it would be for testing.

Miguel

But how can you test and draw reliable conclusions when the players themselves have to own up to how many nodes they have searched? If I knew others were testing against me using this, I might be tempted to search a different number of nodes than told to. Which would make drawing conclusions from the games very difficult. For example, on even hours, I search 2x the specified number of nodes, on odd hours, I search 1/2. That would throw a kink into measuring improvements...

Who would do this you say? I can think of several commercial programs. Heaven knows we have had more than enough mischief over the years. Remember the chessbase case where we couldn't ponder because we got a new position, or later a complete new set of moves each time we were supposed to search? Or we couldn't set hash above a certain size. Or ...

So it would happen...

hgm · Post by **hgm** » Tue Jul 22, 2008 8:46 am

Zach Wegner wrote:I.e. your method doesn't take the jitters into account. Which is the point I believe.

It seems your / his point is wrong, then. It does compensate for time jitter due to OS scheduling. If I tell the engine 'nps 100000' (i.e. equating 1 sec to 100,000 nodes), an WinBoard would tell it (through the 'time' command) that it has 1 sec per move, it would search for 100,000 nodes, no matter what. Even if I closed the lid of my laptop, and re-opened it 10 minutes later, so that it had been suspended, it would stll search the full 100,000 nodes, and not suffer for the remaining moves either.

Cheating is no issue, it will be the norm from the outset, as no two programs will count nodes in the same way, an even if they do, will not have the same nps on equal hardware. This is why the tester can set the nps value independently for both engines. If I would see that Crafty, set to nps 1,000,000 would take 10 min for a 1 min game on an unloaded machine, and Joker would take only 1 min, I would simply set Crafty to nps 100,000, and play it against Joker set at nps 1,000,000.

I was thinking of using the WinBoard /firstTimeOdds=F and /secondTimeOdds=F options for this: with positive numbers they divide the wall-clock time to create virtual engine time, and with negative numbers they would divide node counts in stead (through the absolute value of F, of course).

Uri Blass · Post by **Uri Blass** » Tue Jul 22, 2008 11:01 am

bob wrote:
michiguel wrote:
Zach Wegner wrote:
hgm wrote:
CRoberson wrote:What you suggest is not because it would not compensate for the engine being temporarily suspended due to other job priorities.
I don't understand this sentence, not only because of the doublr negation. Whatt dors 'it' refer to?

I propose this because at speeds of 25 msec per move the wall clock is too unreliable and inaccurate. So it would be a useful feature for those playing sub-second games.

It would also remove any non-determinism due to timing jitter, which could be a reason for people to even use it at long TC.

Programmers note this: for proper implmentation, please also announce the engine knows this feature by including 'nps=1' in the features command. You can even do this in protocol version 2.
I believe he means:

"What you suggest is not, because it would not compensate for the engine being temporarily suspended due to other job priorities."

I.e. your method doesn't take the jitters into account. Which is the point I believe.

My problem is that it is too easy to cheat. It's open to the same sort of abuses as depth or node limited search, engines can simply lie about how many nodes per second they are searching. Nobody uses these methods really, unless they might have open source engines that they can verify their honesty. I wouldn't trust closed source engines to comply, in this age of Rybkian output obfuscation...
This is not for tournaments, it would be for testing.

Miguel
But how can you test and draw reliable conclusions when the players themselves have to own up to how many nodes they have searched? If I knew others were testing against me using this, I might be tempted to search a different number of nodes than told to. Which would make drawing conclusions from the games very difficult. For example, on even hours, I search 2x the specified number of nodes, on odd hours, I search 1/2. That would throw a kink into measuring improvements...

Who would do this you say? I can think of several commercial programs. Heaven knows we have had more than enough mischief over the years. Remember the chessbase case where we couldn't ponder because we got a new position, or later a complete new set of moves each time we were supposed to search? Or we couldn't set hash above a certain size. Or ...

So it would happen...

I see no reason to have non deterministic searching when there are fixed number of nodes.

People can easily find that the program produce different games when you use fixed number of nodes so nobody is going to use it for testing.

Programmers can report wrong number of nodes (for example by not counting qsearch nodes)
but I expect x nodes to be nearly twice the time of x/2 nodes(otherwise people can easily find that there is a problem).

I also expect x nodes to be deterministic and not to give different results at different times.

Uri

hgm · Post by **hgm** » Tue Jul 22, 2008 1:06 pm

Note the aim is not to make the program deterministic per se. In fact, in monte-Carlo testing you would need non-determinism, (as controlled by the WB command 'random'), or you would play 80k identical games.

The 'nps' feature is in the first place to allow accurate timing of ultra-short CPU-time intervals, in the range where the wall-clock (GetTicksCount() and such) would be hopelessly inaccurate, and no measure of CPU usage at all due to scheduling and communication delays (i.e. the msec range).

Uri Blass · Post by **Uri Blass** » Tue Jul 22, 2008 3:30 pm

hgm wrote:Note the aim is not to make the program deterministic per se. In fact, in monte-Carlo testing you would need non-determinism, (as controlled by the WB command 'random'), or you would play 80k identical games.

The 'nps' feature is in the first place to allow accurate timing of ultra-short CPU-time intervals, in the range where the wall-clock (GetTicksCount() and such) would be hopelessly inaccurate, and no measure of CPU usage at all due to scheduling and communication delays (i.e. the msec range).

deterministic testing is good not for one position but for testing changes
in the program and you use many different positions as starting positions so you do not get identical games.

The Rybka team is using fast games of the program against itself in testing and it is clearly possible that they do not use the real clock but use number of nodes to get estimate for the time.

Uri

Zach Wegner · Post by **Zach Wegner** » Tue Jul 22, 2008 3:47 pm

hgm wrote:
Zach Wegner wrote:I.e. your method doesn't take the jitters into account. Which is the point I believe.
It seems your / his point is wrong, then. It does compensate for time jitter due to OS scheduling. If I tell the engine 'nps 100000' (i.e. equating 1 sec to 100,000 nodes), an WinBoard would tell it (through the 'time' command) that it has 1 sec per move, it would search for 100,000 nodes, no matter what. Even if I closed the lid of my laptop, and re-opened it 10 minutes later, so that it had been suspended, it would stll search the full 100,000 nodes, and not suffer for the remaining moves either.

Yes, I agree. I meant that your method will compensate for the jitters (which is the point of your technique), while I believe Charles was faulting it because it's not affected by the jitters. Reading it now again, he could easily have meant something different though. Charles?

hgm · Post by **hgm** » Tue Jul 22, 2008 9:03 pm

Uri Blass wrote:deterministic testing is good not for one position but for testing changes in the program and you use many different positions as starting positions so you do not get identical games.

Agreed. This is why I like the controls to be orthogonal as much as possible. Use 'random' to switch on/off randomness, and use nodes for accurate fast timing insensitive to machine load.

Perhaps it would also be an idea to provide a mode where WinBoard uses the time reported by the engine to decrement the WinBoard clock, accompanied with a command to the engine that can instruct it to use CPU time, rather than wall-clock time for its time management. Perhaps I could use the command 'nps 0' for that.

For engine programmers wanting to implement this feature, note that any 'new' command should switch back the engine in standard mode (i.e. using wall-clock time). WinBoard will send an nps command (if any) immediately after the 'level' command.
'

The Rybka team is using fast games of the program against itself in testing and it is clearly possible that they do not use the real clock but use number of nodes to get estimate for the time.

They probably do not even use a GUI, but just implemented a self-play mode in the engine. This is how I did it in micro-Max, when I was developing it.

bob · Post by **bob** » Tue Jul 22, 2008 10:45 pm

hgm wrote:
Uri Blass wrote:deterministic testing is good not for one position but for testing changes in the program and you use many different positions as starting positions so you do not get identical games.
Agreed. This is why I like the controls to be orthogonal as much as possible. Use 'random' to switch on/off randomness, and use nodes for accurate fast timing insensitive to machine load.

Perhaps it would also be an idea to provide a mode where WinBoard uses the time reported by the engine to decrement the WinBoard clock, accompanied with a command to the engine that can instruct it to use CPU time, rather than wall-clock time for its time management. Perhaps I could use the command 'nps 0' for that.

For engine programmers wanting to implement this feature, note that any 'new' command should switch back the engine in standard mode (i.e. using wall-clock time). WinBoard will send an nps command (if any) immediately after the 'level' command.
'
The Rybka team is using fast games of the program against itself in testing and it is clearly possible that they do not use the real clock but use number of nodes to get estimate for the time.
They probably do not even use a GUI, but just implemented a self-play mode in the engine. This is how I did it in micro-Max, when I was developing it.

I doubt that would work. Suppose you change the search. How would "self-play" be used to play the old against the new? Would be far harder than to just use a simple referee program as I do. Otherwise you introduce new bugs by trying to use two different searches, or two different evaluations in the same program/process...

How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?

Re: How does Monte Carlo analysis work?