Hello everyone!
I've been on hiatus (so to speak) for a few months now, but the past few weeks I did finally get around to a related project: formalizing the UCI specification.
You can find a copy of the first draft at https://expositor.dev/uci-2022-12-29.pdf
Feedback would be enormously appreciated; I'm particularly interested in comments from engine devs and client devs (by client I mean a user interface or a utility that interfaces with engines, for example).
Formalizing the Universal Chess Interface
Moderator: Ras
-
- Posts: 60
- Joined: Sat Dec 11, 2021 5:03 am
- Full name: expositor
-
- Posts: 253
- Joined: Mon Aug 26, 2019 4:34 pm
- Location: Clearwater, Florida USA
- Full name: JoAnn Peeler
Re: Formalizing the Universal Chess Interface
Very nice and right on time I might say. I have to create a stripped-down client for my own purposes so this will be very useful. I noticed that a couple of options that the old spec mentioned are not included -- namely OwnBook, Clear Hash, Style and NalimovPath. Is there a reason for the omission?
-
- Posts: 253
- Joined: Mon Aug 26, 2019 4:34 pm
- Location: Clearwater, Florida USA
- Full name: JoAnn Peeler
Re: Formalizing the Universal Chess Interface
One more thing... wouldn't it remove some ambiguity if we made the names provided for options be case insensitive? And why not everything including tokens? It's not like this domain specific spec needs the richness of varied case to accomplish everything it needs to do. I guess I could be wrong and there is a good reason for the following options: Hash, hAsh, haSh, etc...
-
- Posts: 60
- Joined: Sat Dec 11, 2021 5:03 am
- Full name: expositor
Re: Formalizing the Universal Chess Interface
Engine options have to be implementation-defined,* but Hash and Threads seemed sufficiently universal to warrant mention – although that should have been a comment (set in light grey), I think.I noticed that a couple of options that the old spec mentioned are not included -- namely OwnBook, Clear Hash, Style and NalimovPath. Is there a reason for the omission?
*At least, this is in line with the rest of the document: the meaning of most messages depends on the engine, and so, in the interest of supporting non-traditional engines, the specification is concerned with syntax, not semantics.
Case-insensitivity is very difficult to do properly in Unicode (that is, for languages other than English), so this was to simplify engine implementations – the client must use the option names that the engine provides verbatim. At least, that was the idea.Wouldn't it remove some ambiguity if we made the names provided for options be case insensitive?
-
- Posts: 253
- Joined: Mon Aug 26, 2019 4:34 pm
- Location: Clearwater, Florida USA
- Full name: JoAnn Peeler
Re: Formalizing the Universal Chess Interface
Displaying options in the given case is one thing, but recognizing Hash, hAsh, or haSh as three different options doesn't seem purposeful. What is the issue with case insensitivity with non-English languages?
-
- Posts: 60
- Joined: Sat Dec 11, 2021 5:03 am
- Full name: expositor
Re: Formalizing the Universal Chess Interface
Well, it makes it a bit simpler for engines – it lets them rely on the fact that the client will use the same name, so they simply can do a byte comparison.Displaying options in the given case is one thing, but recognizing Hash, hAsh, or haSh as three different options doesn't seem purposeful.
Well, it's not really possible to without additional context. For example, what's the lowercase version of U+0049 LATIN CAPITAL LETTER I? If the word in question is an English word, then it's U+0069 LATIN SMALL LETTER I, but if the word in question is a Turkish word, it should be U+0131 LATIN SMALL LETTER DOTLESS I.What is the issue with case insensitivity with non-English languages?
There are other questions, too – should engine be required to perform Unicode normalization? For example, what's the lowercase version of U+0041 LATIN CAPITAL LETTER A plus U+0301 COMBINING ACUTE ACCENT? Does that match U+00E1 LATIN SMALL LETTER A WITH ACUTE, or only U+0061 LATIN SMALL LETTER A plus U+0301 COMBINING ACUTE ACCENT?
One possible compromise is only allowing case insensitivity between U+0041 through U+005A and U+0061 through U+007A.
-
- Posts: 60
- Joined: Sat Dec 11, 2021 5:03 am
- Full name: expositor
Re: Formalizing the Universal Chess Interface
(I forgot to thank you, by the way, for the comments! Just because I'm pushing back, that doesn't mean I'm ignoring them – they're causing me to reevaluate some decisions, and I'm explaining the reasoning I had at the time so that my reasoning can be validated or refuted.)
-
- Posts: 28353
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Formalizing the Universal Chess Interface
I don't think this is a correct description of UCI as it is in common use. The official specs do not stipulate that a client cannot send anything after sending 'stop' and before receiving 'bestmove'. Some existing clients in fact send stop + position + go ponder simultaneously. The only thing that is forbidden is send other stuff to a searching engine before you send 'stop'. Similar for 'isready' / 'readyok', except that there it is even allowed to send it to a searching engine. The 'isready' command exists purely for the convenience of the client, for probing whether the engine is done processing all preceding commands (where a 'go' command is considered processed after launching the search); the engine doesn't derive any rights from it (and in particular not the right to receive no input).
Your state diagram also ignores that 'go' is illegal without a preceding 'position'. It would be better to split the 'idle' state into 'idle' and 'loaded', where 'position' would cause the transition between the two.
Your state diagram also ignores that 'go' is illegal without a preceding 'position'. It would be better to split the 'idle' state into 'idle' and 'loaded', where 'position' would cause the transition between the two.
-
- Posts: 300
- Joined: Mon Apr 30, 2018 11:51 pm
Re: Formalizing the Universal Chess Interface
“The sequence of bytes that the client sends to the engine (via the
engine’s stdin) must be valid UTF-8, and the Unicode scalar values that
the client sends must be U+000A LINE FEED, U+000D CARRIAGE RETURN,
or in the range U+0020 to U+007E inclusive (in other words, printable
ASCII characters and space). When the scalar value U+000D appears it
must be immediately followed by U+000A.”
Why do you specify UTF-8 and not ASCII, when you only ask for a subset of ASCII?
engine’s stdin) must be valid UTF-8, and the Unicode scalar values that
the client sends must be U+000A LINE FEED, U+000D CARRIAGE RETURN,
or in the range U+0020 to U+007E inclusive (in other words, printable
ASCII characters and space). When the scalar value U+000D appears it
must be immediately followed by U+000A.”
Why do you specify UTF-8 and not ASCII, when you only ask for a subset of ASCII?
-
- Posts: 28353
- Joined: Fri Mar 10, 2006 10:06 am
- Location: Amsterdam
- Full name: H G Muller
Re: Formalizing the Universal Chess Interface
I guess there is also ASCII encoded as UTF-16 (wide char).