Asymetric Node/move in winboard for UCI engines

bob · Post by **bob** » Thu Oct 01, 2009 6:56 am

hgm wrote:
bob wrote:Some programs vary in speed significantly from opening to middlegame to endgame. I have seen a factor of 2-3X quite often, with some being even larger. Ferret was 4x-5x faster in endgames, as an example.
Then just don pick engines like that as an opponent, and make sure your own way of counting nodes gives a reasonable impression of the time used in your own engine.

This is just a tool, and in the hands of people that know how to wield it it can be a powerful tool. But even the most powerful tool, placed in the hands of the stupid or clumsy, will not make them anything but stupid and clumsy. In fact even more so. That it can be used to cause disasters on improper use can never imply that a tool is bad.

How do I do that? My nps varies by a factor of 2-3x from opening to endgame. Which NPS do I pick? And how do I prevent the distortion I just mentioned? Try to pick opponents that have exactly the same NPS changes over the game? Not so easy.

bob · Post by **bob** » Thu Oct 01, 2009 7:02 am

michiguel wrote:
bob wrote:
hgm wrote:Actually, you can use it for serious testing, and that is exactly where François will be using it for. Of course you cannot use it to measure the relative strength of completely different engines A and B, because of the reasons you give. But when you develop an engine that is immaterial. What you want to test is how versions A and A' of your engine perform relative to each other, when you play them against B (and C and D). And when A and A' count nodes in the same way and have the same nps, the results of such gauntlets are directly comparable.

And even if A and A' have different nps, because (say) you added a very expensive evaluation term, you can either test them at the same node budget, and record the time they take to correct the results with an empirical rating vs time formula, or you can restrict the node budget so that the engines A and A' effectively use the same time.

The latter strategy can even be used when you play A against B. It does not matter much how exactly the engine report the nodes. You just adapt their node budget until they use equal time, which can easily be done by a benchmark for their real nps. Not that when you specify a node-based TC in WinBoard, the TC dialog lets you specify the nps conversion factors for each engine independently.
It is not so immaterial, as I have explained previously. And, for the Nth time, here is why this is not the best form of testing.

Some programs vary in speed significantly from opening to middlegame to endgame. I have seen a factor of 2-3X quite often, with some being even larger. Ferret was 4x-5x faster in endgames, as an example.

If you do fixed-node searches, you have to somehow come up with a value that represents a "fair" approximation to equal time for the two opponents. Doable.

But, then comes the rest of the story. Suppose you play against an opponent where he speeds up by 5x, but you show little or now speed improvement. In an endgame, he will be 5x faster than you. In the opening, you will be pretty equal. Irrelevant, you say?

hardly. What happens if the changes you make, although small, end up making your king safety enough better than you start surviving the middlegame more frequently, Say you were losing 75% of middlegame positions and almost all endgame positions because of the speed advantage your opponent has. Now you have learned to avoid the busted king positions, and now 1/2 of the games you play reach the endgame, where before say only 25% were doing so. So you lost 25% of total games (100% of endgames because you get badly out-searched) and you were losing 3 of 4 of the other 75%. Now you no longer lose 3 of four in the middlegame, you are losing only 1/2 as many. But those now become endgame positions where you lose 'em all. And you now conclude you are doing _worse_ and throw the change away.

I see your point but your numbers are misleading. If the games you lose in the middlegames become endgames, now you lose will them because your endgame is weak. Ok, but the games you won in the middlegame, you still win them. The only thing you do in your example is you pick a different poison. A change that could be potentially beneficial, now is neutral. This happens already in many situations.

The traditional way of testing suffer from the same problem if you pick engines with the same behavior (i.e. engines that play well endgames vs. middlegames or vice versa). Picking few sparring partners enhances the chances to suffer from this hypothetical problem.

Miguel

This can happen in other ways as well, depending on who speeds up where in the game. You might win more endgames because you are faster. But if all your evaluation change does is push the game into a phase where you are either faster or slower, the results will have little to do with the actual changes you made, and the conclusions you reach are wrong.

If the programs do not change their speed significantly, I would agree this will work. But even Crafty varies by a factor of 3x or a little more as there is some slow code I execute in the opening that I do not execute in the MG/EG parts of the game because it has castled. If you play real games using a time limit, and you tune using something else, you really are inviting trouble. And are going to make mistakes.

Trying for repeatability is futile also, since we now know that varying the search space by only one node per move leads to different games anyway. Repeatability implies accuracy, but it is a mirage.

The point is this. Does your evaluation change help or hurt? If you extend games, those games suggest the changes hurt. Did they really hurt? Or did they just push you into a part of the game where the speed difference is the issue, and were you searching at the right time limit would you win or lose?

Too much variability. I'd rather "dance with the one that brung me." I have to play real games based on time. I want to make sure my testing is indicating expected results in actual game conditions, since that is how I am forced to play real games.

I believe that this approach, while it does have some merit, will introduce another level of inaccuracy because most are not going to think about these kinds of distortions, and will take their testing as gospel, when it will be anything but...

bob · Post by **bob** » Thu Oct 01, 2009 7:06 am

michiguel wrote:
bob wrote:
michiguel wrote:
Daniel Mehrmann wrote:As expected you try to do a race between a mouse, elephant and a tiger just for example.

Your idea doesn't work at all because every programmer handle search stuff different. It's starts already with "how to enter a node" or better "how to count nodes".

There is no modell for all possibilities. Furthermore results might be handle different und of course search which isn't finished , we might not use all stages, gives not useable results.

You can't define it inside a protocol as well, because you'll never know every possible idea and implementation of each programmer.

However, its funny - Yes ! But you can't use it for serious testing. Its just more a running gag for the users out there.

Best,
Daniel

ps: Each reliable engine-developer will tell you the same.
I cannot disagree more! In fact, I think it would be fantastic to have as many engines as possible that support this feature. This is a tool, so it depends how you use it. For instance, with this feature you can run matches of your debug version as it were the release version (using a factor to compensate differences in speed). Moreover, the game will become deterministic (if I understand correctly) so it will be much easier to repeat the situation where a rare bug happened. That is what I call serious testing You may be thinking about testing the relative strengths of different engines, but that is not what this feature is about.

Miguel
It is fine for testing to expose and fix errors. It is not so fine to measure whether a change is better or worse as I pointed out in another post in this thread.
Fixing error is not a small thing...

Miguel

No. But that represents less than 1% of what I am doing. When I am doing that kind of debugging, I often use fixed-node testing. But _only_ when I am debugging. Not when I am evaluating changes.

Daniel Mehrmann · Post by **Daniel Mehrmann** » Thu Oct 01, 2009 8:55 am

michiguel wrote:
Daniel Mehrmann wrote:Well, i think Bob wrote a lot of cons arguments already. So, there is no need to write more. It doesn't look like you'll accept any point of view.

And how about the pros?

For professional testing (search/eval) ? None! It's just gambling.

It might be interesting for bugfixing as Bob pointed out already. But for "Node"-Bugfixing it's much better to dump the searchtree and analysis it! That's the way i do handle these kind of problems.

Do you ever try to dump your searchtree and use your own tool for graphic analysis ?

Best,
Daniel

Daniel Mehrmann · Post by **Daniel Mehrmann** » Thu Oct 01, 2009 9:06 am

I don have to accept any point of view. I only supply opportunities.

And of course the most important thing: this feature is useful to _me_,

I expected these views from your side!

It's good for your engine, okay. But you started to add protocol extensions not for _YOU_, you started to add things for other programmers!!

You can't say what's good to me , is good for the rest of the world.
You just added things which are interesting for _YOU_.

Not everything you added to the wbp is bad of course. And why not add these NPS stuff. But i'm missing a general view of things from your side. You just see your own problems and ideas, but you just ignore you're living not alone in the chessprogramming world.

You should take some time and think about it.
I don't wanna blame you and your work on X-/Winboard isn't bad at all, but i just wanna show you you missed something.

Best,
Daniel

hgm · Post by **hgm** » Thu Oct 01, 2009 10:54 am

bob wrote:How do I do that? My nps varies by a factor of 2-3x from opening to endgame. Which NPS do I pick? And how do I prevent the distortion I just mentioned? Try to pick opponents that have exactly the same NPS changes over the game? Not so easy.

If I were you, I would try to fix the method of node counting in my own engine so that it would no longer have this variation. There must be a reason for this variation. You mention a very expensive evaluation term, used before castling. Well, if 85% of your time use is evaulation, count evaluations, and count those that contain that term double. If tablebase probes take 5 times as long as evaluations, add the probes to the count and count them for 5.

WB protocol does not specify exactly what the node count means, and it would be impossible to do this anyway, as it is dependent on implementation details. So there is no need to report a true count of nodes in the tree. Micro-Max counts IID iterations in stead, because it does move generation anew for every iteration, and spends most of its time on that. And as it does IID in every node, it means that each internl node will be counted as many. Who cares? As long as you report the same count in the Thinking Output as what you base your timing decisions on, node-based play under WinBoard will work.

And if you cannot or do not want to do that, than this method is simply not for you, because your engine has too variable a node rate. But that doesn't mean it is not worth having, as others have engines that do not have this problem, or use the facility for purposes where they do not care about this problem. Unlike Crafty, WinBoard is not a "Bob-only" project...

hgm · Post by **hgm** » Thu Oct 01, 2009 11:05 am

Daniel Mehrmann wrote:Do you ever try to dump your searchtree and use your own tool for graphic analysis ?

I do equivalent things, but before I can do them, I need to have an error that occurs reproducibly. I learn nothing from a tree dump where no error occurs. And it is not fesible to save tree dumps of every search in a few thousand games to be able to analyze why my engine crashed in that last game.

So the first step to debugging is always to find a position where a reproducible error occurs. Otherwise you cannot even start debugging, no matter what fancy tools you have developed for it. And node-based time-controls are one convenient way to achieve that, without any need for clearing the hash table before every move. It makes the games _exactly_ reproducible, (my engines randomize, but of course I save the random seed for every game in their log file), and if my engine crashes in move 89 of game 998, I just have to replay that game (i.e. using that random seed) and switch on the tree dump on move 89.

Daniel Mehrmann · Post by **Daniel Mehrmann** » Thu Oct 01, 2009 11:21 am

hgm wrote:
Daniel Mehrmann wrote:Do you ever try to dump your searchtree and use your own tool for graphic analysis ?
I do equivalent things, but before I can do them, I need to have an error that occurs reproducibly.....rest deleted

Do you really think i didn't know all stuff ? I didn't say how i use a dump or in which stages i do it exactly. So, please don't tell me how you do it or in which cases. If i have a problem i would ask for it.

Best,
Daniel

ps: It seems to be that all your postings going this way und i guess you just wanna help. Thanks for that, but i don't need it.

Daniel Mehrmann · Post by **Daniel Mehrmann** » Thu Oct 01, 2009 11:28 am

hgm wrote: If I were you, I would try to fix the method of node counting in my own engine so that it would no longer have this variation.

You're kidding ? Do you ?

Every engine has this NPS variation based on the stages in a chessgame. That's simply a fact based on material and moves.

So your suggestion is that programmers should start to hide the real NPS like Rybka does ? Also Rbka's NPS are changing in tiny steps during a game.

Best,
daniel

hgm · Post by **hgm** » Thu Oct 01, 2009 11:33 am

Daniel Mehrmann wrote:I expected these views from your side!

It's good for your engine, okay. But you started to add protocol extensions not for _YOU_, you started to add things for other programmers!!

You can't say what's good to me , is good for the rest of the world.
You just added things which are interesting for _YOU_.

Not everything you added to the wbp is bad of course. And why not add these NPS stuff. But i'm missing a general view of things from your side. You just see your own problems and ideas, but you just ignore you're living not alone in the chessprogramming world.

You should take some time and think about it.
I don't wanna blame you and your work on X-/Winboard isn't bad at all, but i just wanna show you you missed something.

As I am not paid to work on WinBoard, I do d*mned well as I please. Most of the time, what I want is pretty useful in general. Playing by nodes in particular has been requested many times by different persons in this forum. And, like you say, no one is _forced_ to use it, or even implement it. But there are enough people that have an application for it to make it a worthwile addition. Note that UCI supports a command to have engines play by a number of nodes, and that this alone is reason enough for supporting mechanism in Winboard protocol to specify this too. Otherwise no upward-compatible translation of UCI into WB protocol would be possible, and UCI engines would lose part of their functionality when running through an adapter.

A general view on my side for developing WB protocol is missing, because I don't really have desire to develop WB protocol _at all_. You could say the general view is "if it ain't broken, don't fix it!". One should make a clear distinction on my development work on WinBoard / XBoard and development of WB protocol. For a long time I have refrained from making any alterations to protocol at all.

But there were simply some things that coud not be done with WB protocol at all, very common things that no serious engine wants to do without these days. Like setting the size of a hash table, the number of SMP cores, getting to know the installation path of the tablebases. IMO the fact that WB protocol was not able to convey these infos to an engine did mean that it was _badly_ broken. These omissions had to be repaired, and so I did. So you could say that the overall philosophy / policy is:

1) Change as little as possible; developing protocol is not a goal in itself, but protocol is a means and any extension should be demand driven.
2) Maintain backward compatibility.
3) Make sure UCI can be translated to WB protocol without limitation.

Note that a corollary of (3) is that WinBoard engines will be able to do anything a UCI engine can do, although this is not an aim in itself.

Asymetric Node/move in winboard for UCI engines

Re: Asymetric Node/move in winboard for UCI engines

Re: Asymetric Node/move in winboard for UCI engines

Re: Asymetric Node/move in winboard for UCI engines

Re: Asymetric Node/move in winboard for UCI engines

Re: Asymetric Node/move in winboard for UCI engines

Re: Asymetric Node/move in winboard for UCI engines

Re: Asymetric Node/move in winboard for UCI engines

Re: Asymetric Node/move in winboard for UCI engines

Re: Asymetric Node/move in winboard for UCI engines

Re: Asymetric Node/move in winboard for UCI engines