Null Move Help

stevemulligan · Post by **stevemulligan** » Sun Feb 12, 2012 3:32 pm

lucasart wrote:I have a duo core and my program is single threaded so with cutechess-cli I run 2 games in parralel.

Is there an option I have to enable to use multiple cores, or is it automatic? I also have duo core but it looks like it's only playing 1 game at a time.

My engine isn't very fast yet so I don't think I can run 6s+0.1s games. Between moves I'm losing about 600ms according to the cutechess pgn output:

46. Re2 {+87.78/6 1.2s} Kf5 {-5.46/6 1.4s}

Those moves were both about 600-700ms according to my engine. Not sure where I'm losing time.

lucasart · Post by **lucasart** » Sun Feb 12, 2012 4:51 pm

stevemulligan wrote:
lucasart wrote:I have a duo core and my program is single threaded so with cutechess-cli I run 2 games in parralel.
Is there an option I have to enable to use multiple cores, or is it automatic? I also have duo core but it looks like it's only playing 1 game at a time.

My engine isn't very fast yet so I don't think I can run 6s+0.1s games. Between moves I'm losing about 600ms according to the cutechess pgn output:

46. Re2 {+87.78/6 1.2s} Kf5 {-5.46/6 1.4s}
Those moves were both about 600-700ms according to my engine. Not sure where I'm losing time.

type

Code: Select all

cutechess-cli --help

everything you're looking for is there

hgm · Post by **hgm** » Sun Feb 12, 2012 6:14 pm

I did some more stats. In stead of just printing the time stats, I let Fairy-Max now also measure the real time, user time and system time, and print those. Unfortunately the latter two are only given in centi-seconds. This should average out, though.

So I played some 40/10sec games with different settings, to see not only the delay between sending a move, and the opponent receiving it, but also how much CPU the GUI (well, all other processes really, but nothing was running) was stealing away from the engine.

The result was that the delay is virtually independent of GUI settings, and almost always 1 msec. As I took histograms, I can look at the worst case as well as the average, and there are just a few very large delay. Inspecting the debug file it turns out these all occur at the end when the engines are mated: they move instantly in that case, so that GUI processing which normally can be done in parallel with the engine thinking now results in a delay. This is to be expected, and not really a problem.

A for the time stolen from the engine (compare to an average time per move of 250 msec):

Code: Select all

90 msec&#58; 72x72, EO, SAN
84 msec&#58; 33x33, EO, SAN
61 msec&#58; 33x33, hide thinking
 6 msec&#58; noGUI
 4 msec&#58; noGUI, nopost

NxN = square size
EO = engine output window open
SAN = PVs were converted to SAN
hide thinking = no PVs printed above the board
nopost = engine did not send thinking output (no scores in PGN)

The raw numbers are below:

Code: Select all

-size 33, engine-output, SAN
delay
  0.     21
  1.    120
  2.      3
  3.      2
  4.      1
122.      1
134.      1
152.      1
156.      1
usage &#40;avg = 84&#41;
  0.      2
 12.      1
 16.      1
 30.      1
 34.      1
 36.      1
 46.      1
 48.      1
 50.      1
 56.      1
 60.      2
 62.      3
 64.      1
 66.      1
 68.      3
 70.      5
 72.      5
 74.      7
 76.      6
 78.      4
 80.      8
 82.      9
 84.      6
 86.      7
 88.      9
 90.      9
 92.     10
 94.      7
 96.      3
 98.      6
100.      6
102.      7
104.      4
106.      2
110.      2
112.      1
116.      2
120.      1
124.      1
128.      1
140.      1

-size 72, engine-ooutput, SAN
delay
  0.     15
  1.     94
  2.      2
140.      1
146.      1
151.      1
usage &#40;avg = 90&#41;
  0.      1
  2.      1
 14.      1
 68.      3
 70.      1
 72.      2
 74.      1
 76.      1
 78.      4
 80.      9
 82.      7
 84.      8
 86.      4
 88.      9
 90.      8
 92.      4
 94.      6
 96.      6
 98.      4
100.      6
102.      5
104.      4
106.      3
108.      6
110.      3
112.      1
114.      2
122.      1
124.      1
130.      1

-noGUI
delay
  0.     18
  1.    146
  2.      3
usage &#40;avg = 6&#41;
  0.     15
  2.      7
  4.     24
  6.     26
  8.     14
 10.     21
 12.     17
 14.      7
 16.      6
 18.      1
 20.      1
 22.      1
 24.      1

-noGUI, nopost
delay
  1.     64
  2.      6
 14.      1
usage &#40;avg = 4&#41;
  0.      9
  2.      6
  4.      8
  6.      3
  8.      5
 10.      5
 12.      6
 14.      3
 16.      1
 28.      1
 32.      1

-size 33, no EO, hide thinking
delay
  0.      6
  1.    124
  2.      5
 13.      1
usage &#40;avg = 61&#41;
  0.      2
 10.      1
 20.      1
 46.      1
 48.      3
 50.      5
 52.      6
 54.     12
 56.     18
 58.     10
 60.     20
 62.     11
 64.     12
 66.     18
 68.      1
 72.      4
 74.      2
 76.      2
 78.      1
 84.      1
 90.      1
106.      1
110.      1
170.      1

hgm · Post by **hgm** » Sun Feb 12, 2012 10:07 pm

I managed to convert Fruit so that it prints the times as info string. (Not so difficult, because it already contained routines now_real() and now_cpu() that I could draw on.) So I ran two Fruits against each other under XBoard + 2 x UCI2WB, where I modified UCI2WB so that it at all times passes the info strings containing the time stamps to the GUI.

The delay (between sending a 'bestmove' and the opponent receiving a 'go') is only slightly increased: instead of being overwhelmingly 1 msec, it is now 2 msec in half the cases (so presumably 1.5 msec). The average CPU time stolen from the engine went up from 6 msec to 11 msec, however. When I switch off WB thinking output, this drops to 9 msec. Like in the case of Fairy-Max, the processing of the thinking output by the GUI is apparently responsible for 2 msec.

The 5 msec that is consumed extra by UCI engines compared to WB engines must thus be due to the data traffic between the engine to and processing of it by UCI2WB. Even when I switch thinking output off, Fruit of course keeps flooding UCI2WB with info stuff. There is very little processing of this by UCI2WB; in nopost mode it immediately discards anything that starts with "info". So it is probably just the OS managing the pipes.

The raw data is:

Code: Select all

-noGUI, UCI2WB
delay
  1.     64
  2.     53
usage &#40;avg = 11&#41;
  0.     14
  4.      5
  6.      9
  8.     23
 10.     22
 12.     17
 14.      6
 16.      5
 18.      3
 20.      6
 22.      3
 24.      1
 40.      1
 56.      1
 66.      1
 68.      1

-noGUI, UCI2WB, nopost
delay
  1.     67
  2.     72
  3.      4
  4.      1
usage &#40;avg = 9&#41;
  0.      4
  2.     14
  4.     18
  6.     29
  8.     25
 10.     23
 12.     13
 14.      9
 16.      2
 18.      1
 26.      2
 32.      1
 60.      1
104.      1

Evert · Post by **Evert** » Sun Feb 12, 2012 11:15 pm

hgm wrote: So I played some 40/10sec games with different settings, to see not only the delay between sending a move, and the opponent receiving it, but also how much CPU the GUI (well, all other processes really, but nothing was running) was stealing away from the engine.

The result was that the delay is virtually independent of GUI settings, and almost always 1 msec. As I took histograms, I can look at the worst case as well as the average, and there are just a few very large delay. Inspecting the debug file it turns out these all occur at the end when the engines are mated: they move instantly in that case, so that GUI processing which normally can be done in parallel with the engine thinking now results in a delay. This is to be expected, and not really a problem.

I don't really understand why processing by the GUI would "steal time" from the engine. To my way of thinging, the process would be:

1. Start clock of engine 1
2. Send "go" command (if needed) or send opponent move
3. Receive move from engine 1
4. Stop clock of engine 1
5. Do GUI processing stuff
6. Start clock of engine 2
7. Send move of engine 1 to engine 2
8. Receive move from engnine 2
9. Stop clock of engine 1
10. Do GUI processing stuff
11. Goto 1

So whenever the GUI is doing "stuff" neither clock is running. The results in an (undesirable?) slowdown of the GUI, but since neither clock is running this shouldn't affect gameplay. Right? Is this not how it works? Am I missing something?

hgm · Post by **hgm** » Mon Feb 13, 2012 10:46 am

The way it is implemented is that always one of the clocks is running. There are several reasons for not doing it as you describe. For one, it would maximize waiting time by putting all processing on the critical path, while conceivably there could have been a speedup (in terms of real time, not CPU load) by doing things in parallel. After all, there could have been an unused core available. And in some cases the GUI processing involves waiting (i.e. time, but no CPU load). For instance, when the user has 'animate moves' on, the animation must take place over a time span of 50-100 msec, and it would be wasteful to have the engine wait for that.

Another problem is that the GUI is not idle during the engine search. The engine sends it thinking output, which has to be read by the GUI, and processd (e.g. converted to SAN, and written in the Engine-Output window, which must then be made to scroll...). This is even significant in the case of WB protocol, where I see the CPU load during engine search drop from 6 msec to 4 msec by suppressing the thinking output. And UCI engines usually send many times as much output as WB engines. It is not unusual to have 95-99% of all engine->GUI traffic be thinking output / info commands, which all happens during the engine search. The 5-1% needed to communicate the moves is the only part that could be taken out by stopping the clock.

Besides, it has a ring of justice that engines are made to suffer in terms of the CPU they get for bombarding the GUI with messages. If you can't spare the time, don't do the crime!

Evert · Post by **Evert** » Mon Feb 13, 2012 2:19 pm

hgm wrote:The way it is implemented is that always one of the clocks is running. There are several reasons for not doing it as you describe. For one, it would maximize waiting time by putting all processing on the critical path, while conceivably there could have been a speedup (in terms of real time, not CPU load) by doing things in parallel. After all, there could have been an unused core available. And in some cases the GUI processing involves waiting (i.e. time, but no CPU load). For instance, when the user has 'animate moves' on, the animation must take place over a time span of 50-100 msec, and it would be wasteful to have the engine wait for that.

That's true, but I guess there are slightly different requirements when running automated engine matches compared to running the GUI to play or view a game.

Another problem is that the GUI is not idle during the engine search. The engine sends it thinking output, which has to be read by the GUI, and processd (e.g. converted to SAN, and written in the Engine-Output window, which must then be made to scroll...). This is even significant in the case of WB protocol, where I see the CPU load during engine search drop from 6 msec to 4 msec by suppressing the thinking output. And UCI engines usually send many times as much output as WB engines. It is not unusual to have 95-99% of all engine->GUI traffic be thinking output / info commands, which all happens during the engine search. The 5-1% needed to communicate the moves is the only part that could be taken out by stopping the clock.

That's true, but using up some of the resources while the engine is thinking will only reduce the depth of the search (marginally). Having the clock run after the engine is done thinking and has send its move is what may cause a time-loss. Unless the GUI would not "call the flag" if the engine is only over the timelimit by less than (say) a few ms (could be a GUI option?).

Besides, it has a ring of justice that engines are made to suffer in terms of the CPU they get for bombarding the GUI with messages.

Yes.
But it's not fair to penalise the engine with a timeloss if it takes the GUI too long to stop the engine clock.

Of course that would only become an issue for ultra-short timecontrol games.

stevemulligan · Post by **stevemulligan** » Tue Feb 21, 2012 11:15 pm

I'm getting a lot of draw results, even with the transposition table disabled. About 50% of the games end in a draw.

I have repetition detection and I return a score of 0 when a repetition is encountered. Am I doing that right? Should I be returning a lower score when it sees a repetition, or skip the move entirely if there is another move available? Are that many draws expected in self-play?

micron · Post by **micron** » Wed Feb 22, 2012 12:00 am

stevemulligan wrote:I'm getting a lot of draw results, even with the transposition table disabled. About 50% of the games end in a draw.

I have repetition detection and I return a score of 0 when a repetition is encountered. Am I doing that right? Should I be returning a lower score when it sees a repetition, or skip the move entirely if there is another move available? Are that many draws expected in self-play?

Strong engines at long TC show draw rates of 60% and up. Weak engines at short TC are likely to have below 30% draws, because they make many losing blunders.

A value of 50% for an engine in early development is suggestive of a bug.

The repetition test is typically placed at the top of Search.

Code: Select all

#define DRAW_SCORE 0
int Search&#40;)
&#123;
  if ( IsFiftyMoveDraw&#40;) || IsRepetition&#40;) ) return DRAW_SCORE;
  ...
&#125;

Null Move Help

Re: Null Move Help

Re: Null Move Help

Re: Null Move Help

Re: Null Move Help

Re: Null Move Help

Re: Null Move Help

Re: Null Move Help

Re: Null Move Help

Re: Null Move Help