actually this exactly what I do in my recent projects: internally I store everything in utf-8 and do on-the-fly conversion through wrapped calls to wide char buffer on stack when I want to open a file and so on. I also store paths in a canonical (normalized) form where I swap backslashes with forward slashes, but that's another problem.hgm wrote: ↑Sun Nov 14, 2021 11:45 am The problem is that it would pretty much require a complete rewrite of the WinBoard front-end to make it use wide characters. If the back-end is to remain using UTF-8 and normal characters (as would be required for XBoard), you would have to do back and forth conversions at any point where these interact.
originally I was concerned that with wide chars you can index a character at a specific position directly (assuming UCS-2 or UTF-32, UTF-16 is a bit more complicated due to surrogate pairs - but BMP fits nicely and I don't think anyone really needs code points outside that), but this can be solved as well by iterating and/or caching where performance is critical
overall I'm happy with this decision (using utf-8 for everything) as it simplified a lot of things for me. on unix-based systems, I don't have to do any conversions whatsoever.
utf-8 seems nicer, since it's backward-compatible with ascii. so a proper pgn that doesn't violate the standard by only using ascii character would still load fine as utf-8.Whether this should be UTF-8 or UTF-16, and whether this should be announced through a BOM, is really outside the scope of a standard for game notation: it is an OS property. It is unfortunate that different encodings still exist, but as long as they do, one can expect there will be file-conversion tools for these formats.
BOM at the beginning of the file would't hurt either as comparing 1st 3 bytes of a file should be easy enough - and even that isn't necessary because it's easy to check whether a sequence of non-ascii bytes is a valid utf-8 sequence
yes, absolutely. it's unfortunate that Microsoft decided to go down this path back then. I guess in text edits it might work out of the box (I mean editing itself), though. I haven't done any WinAPI-based UI programming in ages though so I'm not sureI have not looked into this lately, but the problem used to be that Windows API supported UTF-16, and not UTF-8. So to properly display the non-ascii characters in dialogs, or allow their entry in text edits there, you would have to use the wchar versions of the API calls.