Stockfish hangs

hgm · Post by **hgm** » Thu Jan 21, 2021 9:36 am

You will always have to test whether your engine really works. But it would not take any extra time to start with a version that works like this. Why waste time on an inferior design if you can start with the superior design? I never saw much logic in the strategy "before I can try a good design, I must test first whether a crappy design for doing the same thing would work too".

If you want reproducibility, you should clear the hash table before every move. That would certainly go at the expense of your time, and also might hurt performance if it wasn't. But reproducibility has a price, that in the debugging phase can certainly be worth it.

BTW, your assumption that clearing the hash table after ucinewgame would not count against your time seems baseless. Even if your interpretation is correct that the isready after ucinewgame is mandatory, (which is not the way I interpret it), so that undefined behavior could occur when a GUI skips it, the specs do not say the GUI could not have started the engines clock before it received the readyok (or sent the isready). The specs do not even say that it is forbidden to send new commands before you receive the readyok reply. When I was still using isready at this point (so that most Shogi engines would hang), the isready, position and go command were just sent as one bunch.

Ras · Post by **Ras** » Thu Jan 21, 2021 11:05 am

hgm wrote: ↑Thu Jan 21, 2021 9:36 amWhy waste time on an inferior design if you can start with the superior design?

The superior design is properly resetting state because it reduces one case to another, already know one. That's a well-established design pattern.

I never saw much logic in the strategy "before I can try a good design, I must test first whether a crappy design for doing the same thing would work too".

Good. Then implement UCI properly, and you won't have the problems mentioned in this thread. Stockfish does run with all other GUIs, after all.

If you want reproducibility, you should clear the hash table before every move.

That would be even nicer, but also cost Elo. Properly resetting state for a new game doesn't, and it's exactly what ucinewgame/isready are specified for (see the spec excerpt I gave earlier).

the specs do not say the GUI could not have started the engines clock before it received the readyok (or sent the isready).

Of course there are no limits to how bad and useless a UCI implementation can be.

When I was still using isready at this point (so that most Shogi engines would hang)

Maybe they should fix their broken engines. Btw., from the USI spec under http://hgm.nubati.net/usi.html:

As the engine's reaction to usinewgame can take some time the GUI should always send isready after usinewgame to wait for the engine to finish its operation.

the isready, position and go command were just sent as one bunch.

That's not how isready is to be used because the whole point is to determine when the engine is ready. Same for USI, see above.

hgm · Post by **hgm** » Thu Jan 21, 2021 5:23 pm

Ras wrote: ↑Thu Jan 21, 2021 11:05 amGood. Then implement UCI properly, and you won't have the problems mentioned in this thread. Stockfish does run with all other GUIs, after all.

That is a premature conclusion. As other people mentioned, the hanging should not be due to this at all. Even if Stockfish would try to perform the initialization it does in response to ucinewgame asynchronously (and why would it? just to make it possible to crash it?), and would crash when it tries to do something in parallel due to a subsequent comment, it could hardly be the case here, as more than 1.5 sec elapsed after the ucinewgame.

That would be even nicer, but also cost Elo. Properly resetting state for a new game doesn't, and it's exactly what ucinewgame/isready are specified for (see the spec excerpt I gave earlier).

Only because you cheat. It would cost Elo if you would have to do it in your own time. You just compensate for the Elo loss by allocating extra time for the match. You could also do that when you clear on every move. Just give the engine a larger increment. For people that run ultra-fast test games, only the amount of thinking per wall-clock minute matters. You don't save anything by letting the GUI clock run a smaller fraction of the time. That is just fooling yourself.

Of course there are no limits to how bad and useless a UCI implementation can be.

'Bad' and 'useless' are subjective qualifications. In this case they just reflect your own, objectively counter-productive opinion. I am of the opinion that encouraging completely needless waste of CPU time is a bad thing, globally detrimental, and that good implementations should therefore punish it.

Maybe they should fix their broken engines. Btw., from the USI spec under http://hgm.nubati.net/usi.html:
As the engine's reaction to usinewgame can take some time the GUI should always send isready after usinewgame to wait for the engine to finish its operation.

Problem is that they are using a GUI that only sends isready after usiok, and they are happy if the engine only runs on that. If you point out the problem, they simply shrug, and say "but these are not the specs my engine uses".

the isready, position and go command were just sent as one bunch.
That's not how isready is to be used because the whole point is to determine when the engine is ready. Same for USI, see above.

But I am not really interested in when the engine is ready. That is the engine's problem. It would become my problem if the engine would be vulnerable for crashing when it receives commands before the readyok. But it seems to me that this would pretty much require an intentionally malicious implementation of UCI. Any normal implementation would receive ucinewgame, do the desired initialization, print readyok, and only then start to read from the input again. Would there really be engines that launch a separate I/O thread during initialization just to detect if there is input they could use as an excuse to crash?

Ras · Post by **Ras** » Thu Jan 21, 2021 11:51 pm

hgm wrote: ↑Thu Jan 21, 2021 5:23 pmThat is a premature conclusion. As other people mentioned, the hanging should not be due to this at all. Even if Stockfish would try to perform the initialization it does in response to ucinewgame asynchronously (and why would it? just to make it possible to crash it?), and would crash when it tries to do something in parallel due to a subsequent comment, it could hardly be the case here, as more than 1.5 sec elapsed after the ucinewgame.

Yeah that's why I suggested the other issue eith the ponder toggle, and I'm not convinced that there is no race condition. Stockfish is designed for Elo, not protocol robustness.

Only because you cheat. It would cost Elo if you would have to do it in your own time. You just compensate for the Elo loss by allocating extra time for the match. You could also do that when you clear on every move. Just give the engine a larger increment. For people that run ultra-fast test games, only the amount of thinking per wall-clock minute matters. You don't save anything by letting the GUI clock run a smaller fraction of the time. That is just fooling yourself.

Nope. In CECP, engines may do this simply by setting reuse to 0 and do their stuff on every startup. If fact, WB2UCI, as you said, triggers this anyway. If that's not "cheating", my re-init isn't either. Just as initially setting up one's pieces on OTB games doesn't count against one's clock.

Of course there are no limits to how bad and useless a UCI implementation can be.

'Bad' and 'useless' are subjective qualifications. In this case they just reflect your own, objectively counter-productive opinion. I am of the opinion that encouraging completely needless waste of CPU time is a bad thing, globally detrimental, and that good implementations should therefore punish it.

Problem is that they are using a GUI that only sends isready after usiok, and they are happy if the engine only runs on that. If you point out the problem, they simply shrug, and say "but these are not the specs my engine uses".

usiok has to be answered at all times. It's broken enough it they don't implement that during search, but there is no reason why this would hang an engine outside of search.

But I am not really interested in when the engine is ready. That is the engine's problem. It would become my problem if the engine would be vulnerable for crashing when it receives commands before the readyok. But it seems to me that this would pretty much require an intentionally malicious implementation of UCI. Any normal implementation would receive ucinewgame, do the desired initialization, print readyok, and only then start to read from the input again. Would there really be engines that launch a separate I/O thread during initialization just to detect if there is input they could use as an excuse to crash?

Well yeah, but so should USI engines with usiok.

[Moderation] I accidentally erased this posting, but I tried to recover most of it from the quotations in the reply. Very sorry about that. HGM

hgm · Post by **hgm** » Fri Jan 22, 2021 1:11 pm

Ras wrote: ↑Thu Jan 21, 2021 5:23 pmYeah that's why I suggested the other issue eith the ponder toggle, and I'm not convinced that there is no race condition. Stockfish is designed for Elo, not protocol robustness.

Be that as it may. I cannot imagine any reasonable implementation that would suffer from sending as many Ponder toggles or confirmations as it wants. The Ponder option is not supposed to trigger any action; it is there only to be consulted during (or even only before) a timed search, for translating clock time to nominal thinking time. It would pretty much require dedicated code to distinguish the Ponder option from others, and add some code that would schedule malicious behavior on some future commands.

Nope. In CECP, engines may do this simply by setting reuse to 0 and do their stuff on every startup. If fact, WB2UCI, as you said, triggers this anyway. If that's not "cheating", my re-init isn't either. Just as initially setting up one's pieces on OTB games doesn't count against one's clock.

What CECP engines do has no impact on whether UCI engines cheat or not. It is a completely unrelated subject. Fact is that you waste (apparently significant) testing time for absolutely no benefit. Reproducibility only at the start of the game is useless. This careless and irresponsible behavior was encouraged by the (as yet unverified) belief that GUIs would not bill the use of that time to the engine.

usiok has to be answered at all times. It's broken enough it they don't implement that during search, but there is no reason why this would hang an engine outside of search.

I suppose you mean 'isready' here. And yes, that is what you think and what I think. Unfortunately it is not what many Japanese think. They think more along the line "don't send it, then you don't have to complain that it isn't answered".

Well yeah, but so should USI engines with usiok.

Apparently those engines have two separate command loops, one that only recognizes 'usi', 'setoption', and 'isready' to break out of it. Then they enter a second command loop, which only recognizes 'position' and 'go', and ignores (or exits on) anything else. That this is so widespread is probably because they all mindlessly copy each other's USI code, which is probably all derived from the demo engine LesserKai.

Ras · Post by **Ras** » Fri Jan 22, 2021 8:40 pm

hgm wrote: ↑Fri Jan 22, 2021 1:11 pmThe Ponder option is not supposed to trigger any action; it is there only to be consulted during (or even only before) a timed search

That's what I was thinking, that Stockfish might start pondering right away. If you don't give any "position" command after startup, it appears to be set to the initial position, after all. However, you're right, I checked in the process monitor that just switching ponder on after startup doesn't cause CPU load, so no calculations happen.

What CECP engines do has no impact

It has because a fair match requires billing parity. If just setting reuse=0 gets you that time for free, it's also cheating, and allowed by the protocol.

Reproducibility only at the start of the game is useless.

It's not because at least I don't have to think about state spilling over between games. Saving CPU cycles is not the only objective for software. Reproducibility means having the same behaviour on different occasions, which makes testing and troubleshooting easier. In short, such software tends to have fewer bugs. Correctness is also a software objective. It's also why we have layered systems in computers instead of just lumping everything together.

This careless and irresponsible behavior was encouraged by the (as yet unverified) belief that GUIs would not bill the use of that time to the engine.

It's rooted in the UCI spec. I don't care about GUIs that don't implement UCI, only that they won't make my engine crash. However, it's a bit of a theoretical question with Winboard specifically because it speaks CECP with centiseconds resolution on its end. That's not suited for such short games, and with longer games, it doesn't matter.

I suppose you mean 'isready' here.

Oh, yeah, sorry.

And yes, that is what you think and what I think. Unfortunately it is not what many Japanese think. They think more along the line "don't send it, then you don't have to complain that it isn't answered".

I know someone who used to argue against catering to broken software because it would cause erosion of standards...

Apparently those engines have two separate command loops, one that only recognizes 'usi', 'setoption', and 'isready' to break out of it. Then they enter a second command loop, which only recognizes 'position' and 'go', and ignores (or exits on) anything else.

OMG, that's a new level of bad protocol implementation.

OTOH, if you want to deal with broken USI engines, you have the choice of either limiting UCI to broken USI, or to discern between them. Because for USI engines, you send usi and usinewgame anyway so that you know who's who here.

So I'd suggest to modify a test version of WB2UCI to implement full UCI (and no ponder toggle, just to be sure) regardless of USI just to find out whether that would fix the Stockfish problem.

hgm · Post by **hgm** » Fri Jan 22, 2021 11:12 pm

Ras wrote: ↑Fri Jan 22, 2021 8:40 pmThat's what I was thinking, that Stockfish might start pondering right away. If you don't give any "position" command after startup, it appears to be set to the initial position, after all. However, you're right, I checked in the process monitor that just switching ponder on after startup doesn't cause CPU load, so no calculations happen.

And so it should. In UCI ponder searches have to be explicitly started by the GUI, through "go ponder".

It has because a fair match requires billing parity. If just setting reuse=0 gets you that time for free, it's also cheating, and allowed by the protocol.

CECP reuse=0 is also bad practice, but my excuse for using it in some engines is that I am lazy, and that my engines practically start instantly. And your assumption that it would not be billed to the engine is a bit unfounced. Originally it certainly was: The only point where WinBoard waited for an engine was at startup (of WinBoard, not of the engine!), when it set the timeout to see if there would be features, and then wait for feature done. During a match (with reuse=0) it just forked off the engine process, send the commands to start it, and sset the clock running. No waiting of any kind. It would send 'ping' before the 'go', but it would not wait for the 'pong'. The purpose of the 'ping' was just to determine whether a move produced by the engine would be the result of the 'go', or from the previous game.

It's not because at least I don't have to think about state spilling over between games. Saving CPU cycles is not the only objective for software. Reproducibility means having the same behaviour on different occasions, which makes testing and troubleshooting easier.

Yes, and you won't have it if you do not clear the hash before every move. So you are 'talking with a double tongue' here: full reproducibility is not worth the miniscule Elo loss it would cause, but it is suddenly very important to have reproducibility in only 1% of the cases...

It's rooted in the UCI spec. I don't care about GUIs that don't implement UCI, only that they won't make my engine crash. However, it's a bit of a theoretical question with Winboard specifically because it speaks CECP with centiseconds resolution on its end. That's not suited for such short games, and with longer games, it doesn't matter.

Centisecond resolution is fine, if you play classical TC or sudden death. As long as your engine can read a clock that is more precise. I don't expect 40 moves / 0.1 sec should not be much of a problem with CECP. And you could always make it 80 moves per 0.2 sec.

I know someone who used to argue against catering to broken software because it would cause erosion of standards...

But it becomes a bit questionable whether a standard that virtually no one complies with can still be considered a standard, or that it just makes the only person who insists it is delusional. The 'balance of power' is also a bit different in Shogi. There are more than a thousand Chess engines, and no one would care if a crappy mediocre one doesn't run (perhaps except its author). If there are only 5 public Shogi engines, and only one of them can run on your GUI... What use then is the GUI?

OMG, that's a new level of bad protocol implementation.

Indeed, it is disgusting. Of course the whole idea to fork a protocol that could have been perfectly used as it was is disgusting in itself.

OTOH, if you want to deal with broken USI engines, you have the choice of either limiting UCI to broken USI, or to discern between them. Because for USI engines, you send usi and usinewgame anyway so that you know who's who here.

So I'd suggest to modify a test version of WB2UCI to implement full UCI (and no ponder toggle, just to be sure) regardless of USI just to find out whether that would fix the Stockfish problem.

Indeed, there is plenty of protocol-dependence already, so I suppose this one could be added as well. Historically I did not care very much how UCI2WB handled UCI, as it was mainly intended for handling the dialects UCI / UCCI / UCI-Cyclone. Now that I want to package it as the main UCI adapter with WinBoard, instead of Polyglot, perhaps requires a more common approach. I should probably also drop the reuse=0; I don't think the current version would need that at all. (In UCI at least; it could be a problem in UCCI.) Of course UCI2WB could wait all it wanted for the 'isready', but WinBoard's clock would still be ticking, as it should.

Ras · Post by **Ras** » Fri Jan 22, 2021 11:49 pm

hgm wrote: ↑Fri Jan 22, 2021 11:12 pmDuring a match (with reuse=0) it just forked off the engine process, send the commands to start it, and sset the clock running. No waiting of any kind.

That's interesting. Is it still like that in the days of engines loading EGTBs, NNUEs and whatnot?

Yes, and you won't have it if you do not clear the hash before every move. So you are 'talking with a double tongue' here: full reproducibility is not worth the miniscule Elo loss it would cause, but it is suddenly very important to have reproducibility in only 1% of the cases...

Between moves of the same game, the data from the previous turn are helpful for move sorting, so that brings Elo. Most notably in case of PV hits of course. It's not just black and white, it's a trade-off.

I don't expect 40 moves / 0.1 sec should not be much of a problem with CECP.

And how would the GUI give the time? Rounding up/down to the nearest 10ms and giving a time forfeit buffer?

But it becomes a bit questionable whether a standard that virtually no one complies with can still be considered a standard, or that it just makes the only person who insists it is delusional. The 'balance of power' is also a bit different in Shogi.

Good point for a bad situation.

I should probably also drop the reuse=0; I don't think the current version would need that at all. (In UCI at least; it could be a problem in UCCI.)

Yeah, UCI engines should not need that.

Of course UCI2WB could wait all it wanted for the 'isready', but WinBoard's clock would still be ticking, as it should.

Well, it's at least good to know that Winboard is doing its own thing so that possible tournament results at very short time controls don't mean much compared to other engine drivers.

hgm · Post by **hgm** » Sat Jan 23, 2021 10:54 am

Ras wrote: ↑Fri Jan 22, 2021 11:49 pmThat's interesting. Is it still like that in the days of engines loading EGTBs, NNUEs and whatnot?

For an NNUE engine I definietly would not recommend a reuse=0 design. For EGT it would mostly be a consequence of poor design, encouraged by detrimental GUI policies. It should be perfectly possible to initialize the EGT during game play, as the info is not required in the opening phase, and the procedure doesn't significantly load the CPU.

Between moves of the same game, the data from the previous turn are helpful for move sorting, so that brings Elo. Most notably in case of PV hits of course. It's not just black and white, it's a trade-off.

The direction in which you make that trade-off shows how little value you attach to having 100% reproducibility. Given that, making a big deal out of having about 1% of that reproducibility just doesn't sound very genuine. In practice you will never have bug-exposing blunders during that first move; they will always come later in the game, when unusual situations have had time to develop.

And how would the GUI give the time? Rounding up/down to the nearest 10ms and giving a time forfeit buffer?

Indeed, it would round, and this should not hurt the engine very much. It probably would already keep a buffer itself, and if you define enough moves in the session it would never reach the point where the buffer would be needed anyway. But it seems to me the example is already highly unrealistic. Are there really test setups that use sub-second games? The engine should always be prepared for being stalled by the OS for several msec, so that in TCs where you can get close to being flagged (as in classical TC at the end of a session, or in incremental TCs with relatively small base time) it would always have to apply a buffer of several msec, or it would regularly forfeit. It would be far better to play by node count, at such high speeds.

But it becomes a bit questionable whether a standard that virtually no one complies with can still be considered a standard, or that it just makes the only person who insists it is delusional. The 'balance of power' is also a bit different in Shogi.

Good point for a bad situation.

I should probably also drop the reuse=0; I don't think the current version would need that at all. (In UCI at least; it could be a problem in UCCI.)

Yeah, UCI engines should not need that.

Well, it's at least good to know that Winboard is doing its own thing so that possible tournament results at very short time controls don't mean much compared to other engine drivers.

But why would you want to compare them in the first place? (Apart from the fact that it makes no sense to use a GUI for sub-second games.) E.g. if your engine gets much lower Elo when testing on GUI A instead of GUI B, because it forfeits a large fraction of the games, would that bother you? Would you just shrug it off, and use GUI B because it suggests higher Elo? Would you keep using GUI A, and just discard all forfeits before the Elo calculation, to improve the Elo?

Ras · Post by **Ras** » Sat Jan 23, 2021 2:42 pm

hgm wrote: ↑Sat Jan 23, 2021 10:54 amFor EGT it would mostly be a consequence of poor design, encouraged by detrimental GUI policies.

The problem is that the time needed will suddenly depend on the I/O capabilities of the computer. When UCI was specified, we didn't have SSDs, but HDDs, so that was a lot more serious, depending on what and how much to load. That's why such front-up disk IO was moved out of the way.

The direction in which you make that trade-off shows how little value you attach to having 100% reproducibility.

As I said, it's a trade-off. It's not about 100%, it's about reducing the scope while not sacrificing Elo.

In practice you will never have bug-exposing blunders during that first move; they will always come later in the game, when unusual situations have had time to develop.

What I dislike here is that I won't be able to reproduce the problem from that game alone if state spills over between games. Especially when using different key schemes, that will cause the hash table entries to be overwritten in different ways, and that may well have an effect upon move sorting. It should be equivalent in terms of Elo, but still different.

Are there really test setups that use sub-second games?

I can test at one second per game, depending on what I want to test. Obviously, I can't test things that require large search depths to kick in because such depths won't be reached at that time control, but e.g. move sorting or eval tuning work for seeing whether things are going in a good direction so that I should invest more time for testing at longer time controls.

it would always have to apply a buffer of several msec

That's more or less what I'm doing, but I also allocate a minimum thinking time depending on the game move number. That minimum is designed to work out if the time control is not faster than 1s/game. Anything lower than that will not work properly, but I state that in the engine spec.

It would be far better to play by node count, at such high speeds.

No really because that skews the metrics. If you have two identical engines, but one has only half the NPS, then you would get something like 50 Elo difference at time based controls, but none at node based ones.

E.g. if your engine gets much lower Elo when testing on GUI A instead of GUI B, because it forfeits a large fraction of the games, would that bother you?

If I knew that a GUI doesn't stick to the UCI spec as I read it, and that this is actually the underlying reason and not something else in my engine, I would shrug it off and say "don't use this GUI for such fast games".

Stockfish hangs

Re: Stockfish hangs

Re: Stockfish hangs

Re: Stockfish hangs

Re: Stockfish hangs

Re: Stockfish hangs

Re: Stockfish hangs

Re: Stockfish hangs

Re: Stockfish hangs

Re: Stockfish hangs

Re: Stockfish hangs