Standard candles

Don · Post by **Don** » Fri Aug 30, 2013 7:44 pm

Michel wrote:
I have already implemented it. I can now tell Komodo to do 1 thousand nodes per second and it will do so. Of course it cannot go faster than the hardware permits but it can go slower.
The xboard protocol has the nps option for this. Perhaps the corresponding UCI option should be standardized as well. Perhaps UCI_nps since it is similar to UCI_elo?

I guess a cheap UCI_elo implementation could just translate it into UCI_nps.

It's not clear to me what nps in xboard means. When I described NIT, HG told me that it's already supported in xboard - you just treat nodes as time.

But what I'm proposing is not the same and would not require GUI support - you simple set the program to SLOW DOWN so that it plays at some specified nodes per second rate. If I set it for 1000 nodes per second it should play exactly the same on a slower Android device as it would on the fastest i7 hardware.

The program of course must constantly monitor it's NPS rate and impose the appropriate delays to keep the rate constant.

I don't believe this is the same thing HG described - but maybe I'm wrong?

Don

Michel · Post by **Michel** » Fri Aug 30, 2013 8:20 pm

But what I'm proposing is not the same and would not require GUI support - you simple set the program to SLOW DOWN so that it plays at some specified nodes per second rate. If I set it for 1000 nodes per second it should play exactly the same on a slower Android device as it would on the fastest i7 hardware.

No this indeed not the same. The nps in the xboard protocol simply converts time to nodes and back.

sje · Post by **sje** » Fri Aug 30, 2013 11:05 pm

Don wrote:The program of course must constantly monitor it's NPS rate and impose the appropriate delays to keep the rate constant.

I do not understand the need for this complexity. If you want the program to spend 10,000,000 nodes worth of time on a search, then just rip out all the time routine calls and install a simple, one line node count check which triggers on the 10,000,000th node. If you want a variance of search effort calculated from the state of the game, then allow a 1,000,000,000 nodes per game limit and allocate a node limited search effort from move to move.

Any pseudorandomness needed should be gotten from a deterministic generator seeded so that its output sequence is exactly replicated across different platforms.

Don · Post by **Don** » Fri Aug 30, 2013 11:42 pm

sje wrote:
Don wrote:The program of course must constantly monitor it's NPS rate and impose the appropriate delays to keep the rate constant.
I do not understand the need for this complexity. If you want the program to spend 10,000,000 nodes worth of time on a search, then just rip out all the time routine calls and install a simple, one line node count check which triggers on the 10,000,000th node. If you want a variance of search effort calculated from the state of the game, then allow a 1,000,000,000 nodes per game limit and allocate a node limited search effort from move to move.

Any pseudorandomness needed should be gotten from a deterministic generator seeded so that its output sequence is exactly replicated across different platforms.

It's not that it's necessary, none of this is. But the idea is that you can get hardware independence and STILL run at any standard chess time controls. You cannot do that with your scheme. You can do game in X nodes but that's actually a lot more complicated than this because your GUI has to support it too. Xboard has a mode that supports this but it won't work on other GUI's. My scheme will work on any GUI.

You said that you don't understand all the complexity. But it's not complex. All you have to do is a single UCI option that says, "do X nodes per second" and you are done. I implemented it in Komodo and the actual routine that controls this is something like 5 lines of code - it's trivial.

Something like this should be done right away and some program (I recommend Stockfish 3 or 4) needs to be squirreled away and declared THE benchmark standard. The NPS option is not absolutely necessary for this but it really needs to be in there as soon as possible so that we can start documenting older hardware (e.g. run it at NPS = 700000 to get Core 2 "Conroe" circa 2006 equivalent performance.)

20 years from now we could answer questions such as how strong would program XYZ be running on older hardware compared to the original Houdini 2.0, etc. We would have ALL the data needed to extrapolate and the ability to run these tests as we would have preserved an "ancient" 2013 model of Stockfish for all time.

Another issue is that as programs get stronger, the ratings on the list go up but only in response to the software, not the hardware. If Ingo suddenly was able to procure a machine 10x faster to run his tests, Shredder 12 would still be set to 2800 ELO and the relative differences in hardware continues to go undocumented. I'm not saying he should change that, but we should not paint ourselves into a corner either. But at least Stockfish could be assigned as the reference program with an ELO of (perhaps 3000.)

So this is not just a mental exercise but something that should be done in my view - something that future generations might appreciate. For the same reason there are history museums and such.

How does your program compare to Sargon 2? How about Sargon 2 running on an i7? How would your program do on a 6502 if it were possible to run it on such a machine? Yes, I know that the 6502 probably would not support a modern chess program - but in principle we should have a way that is not unduly complicated to extrapolate information like this.

As the years tick by I'm sure there would have to be adjustments and updates but I think this is something worthwhile. It doesn't seem important now for the same reason that polluting the air doesn't seem important to us either - someone else can deal with that. So this would be something that would benefit future generations of chess authors (and many of us will surely be around in 10, 20 or 30 years to benefit too.)

Don

Michel · Post by **Michel** » Sat Aug 31, 2013 6:11 am

I implemented it in Komodo and the actual routine that controls this is something like 5 lines of code - it's trivial.

How do you control the NPS? By inserting sleeps? Or do you just sleep
at the end of the allotted time?

hgm · Post by **hgm** » Sat Aug 31, 2013 8:16 am

I think this is a functionality that should better be implemented in the GUI. Indeed I did consider implementing an option -maskTimeOdds in XBoard that would sleep (N-1) times the time the engine used thinking after receiving the engine move (i.e. just schedule an event for processing of the move with this delay), when the time odds factor is set to 1.

There is no need for every engine to implement this independently.

It should be easy to not only apply this with real time odds, but also with virtual time odds when running in NPS mode. It can compare the virtual time obtained from converting the engine's reported node count with the elapsed wall-clock time, and insert a delay to make up the difference.

Don · Post by **Don** » Sat Aug 31, 2013 1:36 pm

hgm wrote:I think this is a functionality that should better be implemented in the GUI. Indeed I did consider implementing an option -maskTimeOdds in XBoard that would sleep (N-1) times the time the engine used thinking after receiving the engine move (i.e. just schedule an event for processing of the move with this delay), when the time odds factor is set to 1.

There is no need for every engine to implement this independently.

It should be easy to not only apply this with real time odds, but also with virtual time odds when running in NPS mode. It can compare the virtual time obtained from converting the engine's reported node count with the elapsed wall-clock time, and insert a delay to make up the difference.

There is no way for a GUI to implement this in a natural way because each program has it's own time control algorithm. For example at the end of each iteration Komodo makes the decision whether to move to the next iteration or not - and then there is the decision when to stop searching and if you stop in the middle of an iteration.

Of course there is the "nodes is time" mode that xboard supports but then it's not implemented by the GUI - it STILL has to be implemented by the engine AS WELL AS having GUI support.

The end result of course should be the same - it's just two different ways of implementing the same concept - but the GUI way requires demands support from the GUI and most GUI's, perhaps NONE of them except xboard supports this.

So any engine author can get this without being forced to use just one GUI.

Having said that, my real point is more about nailing down a standard engine, this is just a detail and a personal opinion about a feature that I believe would make a program hardware neutral - and requiring GUI support makes it a little less so as we don't really know what GUI's will be around in 20 years.

Don · Post by **Don** » Sat Aug 31, 2013 2:35 pm

Michel wrote:
I implemented it in Komodo and the actual routine that controls this is something like 5 lines of code - it's trivial.
How do you control the NPS? By inserting sleeps? Or do you just sleep
at the end of the allotted time?

You can do either:

1. Sleep every N nodes where N is monitored and dynamically adjusted

2. You can sleep for S units of time where S is monitored and dynamic.

3. You can do a combination of both.

I tried using usleep and even with usleep(0) Komodo does about 17,000 nodes per second so to go faster you must either use a much higher resolution sleep function or just sleep every N nodes. And to hit your target you must constantly monitor your nodes per second and make adjustments.

To accommodate a ridiculously low number of nodes per second (such as 50) and also be able to run almost full speed you have to do a combination of both.

It's very easy to get close and estimate what you need when the option is set and then to make very small dynamic adjustments.

If you usleep 10,000 between nodes:

usleep(10000);

You will get 100 nodes per second on any unix machine. If you want to go faster than 17,000 nodes per second usleep on every node won't cut it, you would have to skip nodes, for example usleep every 10 or every 100 nodes.

You have finer control with the usleep delay but only if the usleep value is not too low - it's notoriously inaccurate in the low order bits due to all sorts of O/S overheads and is only guaranteed to delay AT LEAST the value you set. It's a very expensive call. usleep(1) is hardly any different than usleep(2) for example or even usleep(0).

So the idea is that when the option is set, you should ESTIMATE a skip factor and a sleep factor - and try to keep the skip factor as low as possible while still having a large enough sleep value for fine tuning (probably at least a 100 or more) and as the search progresses monitor the nodes per second and make fine adjustments in the sleep delay as you go to maintain the target NPS.

It probably sounds a lot more complicated than it is, but it's actually pretty simple and less than about 10 lines of code.

By the way, if your target is close to the maximum nodes per second your program is capable of, you will need a very large skip factor and a very low sleep delay and it will more difficult nailing down the target NPS - but you probably shouldn't be using this for targets more than half the speed of what the program is capable of.

On windows I have not tested this - I know that in the past windows has had issues with high resolution timers and sleeps so it will take some experimentation. You can build your own super high resolution (high CPU utilization) timer by making a complex calculation N times but it won't be hardware independent so the delay amount will vary on each platform and complicate this calculation.

hgm · Post by **hgm** » Sat Aug 31, 2013 2:38 pm

I agree that for the NIT feature the engine has to be involved. (But fortunately it can be trivially implemented, with only changes in the routine that reads the clock.) But normal time odds can be implemented entirely in the GUI; it can just multiply all times by a factor before sending them to the engine, and multiply by the inverse factor when receiving them.

My statement specifically addressed the possibility to hide such time odds (or implicit time odds cause by NIT) from the user by inserting delays. That should better be left to the GUI.

The argument that you can do things in the engine in order to not be dependent on a single GUI is not really convincing. You could say that for everything a GUI does. My Xiangqi engine could pop up its own window to display a Xiangqi board, so that it can run more easily under GUIs that do not support Xiangqi, etc. UCI engines could print the PV in SAN (possibly wrapped as info strings), to force SAN display on GUIs that would not by themselves convert the UCI long algebraic to SAN, etc. It is all true. But it is not what you would want.

Do not try to solve tasks that should be on the plate of GUI developers in the engine! (HGM's Golden Rule!

)

Don · Post by **Don** » Sat Aug 31, 2013 3:03 pm

hgm wrote:I agree that for the NIT feature the engine has to be involved. (But fortunately it can be trivially implemented, with only changes in the routine that reads the clock.) But normal time odds can be implemented entirely in the GUI; it can just multiply all times by a factor before sending them to the engine, and multiply by the inverse factor when receiving them.

My statement specifically addressed the possibility to hide such time odds (or implicit time odds cause by NIT) from the user by inserting delays. That should better be left to the GUI.

The argument that you can do things in the engine in order to not be dependent on a single GUI is not really convincing. You could say that for everything a GUI does. My Xiangqi engine could pop up its own window to display a Xiangqi board, so that it can run more easily under GUIs that do not support Xiangqi, etc. UCI engines could print the PV in SAN (possibly wrapped as info strings), to force SAN display on GUIs that would not by themselves convert the UCI long algebraic to SAN, etc. It is all true. But it is not what you would want.

Do not try to solve tasks that should be on the plate of GUI developers in the engine! (HGM's Golden Rule! )

The HGM golden rules is to relegate as much to the engine as possible - that it the xboard philosophy. You cannot have it both way! You have this rule when it's convenient and the other rule when bashing other protocols.

In this particular case I think the "xboard" philosophy of letting the engine handle it is correct - especially for a reference program. It's a kind of non-standard feature and in 20 years I don't want the "golden standard" reference chess program have features that are tied to 20 year old GUI's, at least not any more than absolutely necessary for basic game play. There is a chance that in 20 years xboard protocol and UCI protocol won't even be common.

Anyway, what choice would I have if I did not want to use xboard for this? I really don't have much confidence that I could convince GUI authors to support this handshaking protocol for this esoteric feature and it's so easy to simply implement it in an engine.

But it's really a non-issue. You CAN have it both ways. It's simple to implement it in an engine and an engine author can ALSO or INSTEAD OF implement the version that requires GUI support that you advocate. In this way an author can have a version guaranteed to work or he can choose the method that only work with xboard or he can do both.

Standard candles

Re: Not so fast

Re: Not so fast

Re: Not so fast

Re: Not so fast

Re: Not so fast

Re: Not so fast

Re: Not so fast

Re: Not so fast

Re: Not so fast

Re: Not so fast