WinBoard translations

wgarvin · Post by **wgarvin** » Mon Aug 23, 2010 11:50 pm

I strongly suggest you store the translated strings for each language, in a UTF-8 or UCS-2 text file per language. All the Windows API functions that deal with strings, have wide-char versions which will work correctly with any language as long as the user has appropriate fonts/language packs installed. Using Unicode will make it much easier to switch between languages, etc.

If Winboard uses a lot of char*, then passing around UTF-8 internally might be easier than changing to wide chars... but then you'd have to wrap the API functions to convert the strings. Using wchar_t internally is easier.

hgm · Post by **hgm** » Tue Aug 24, 2010 10:00 am

Well, you obviously know more about this than I do, as I had never heard of UTF-8 or UCS-2. Is this the difference between variable-length encoding or fixed-length encoding?

The sources of the static Chinese translation apparently use variable length encoding: in my NotePad, which does not understand this format, each Chinese character shows up as two garbage characters (codes >= 128) amongst the normal ascii tet. Apparently Chinese Windows systems have some low-level mechanism to interpret this format, join the garbage code and display it as Chinese. This works even for a hello-world program printing such strings, when I send them the .exe, or for menu strings set at run time from such strings. It does not work when the strings where in an .rc file that I compiled, but it thus when they compile it. (And then it even works on my Windows system.)

Indeed WinBoard does pass a lot of (char*) around.

What I do not understand yet is how the font and code page specified in the resource file affects matters, and in there are ways to overrule that at run time. I noticed that the static Chinese 4.2.7 translation uses other fonts (whose names I cannot even read), and also another code page. I use the standard routines SetDlgItemText() and ModifyMenu() to alter the menu texts in the dynamic translation. This happily sets the menu texts to the same garbage when the normal font and code page is specified in the .rc file as what it does when the Chinese font and code page are specified, on my system. But I have no idea what a Chinese compilation of it would do, or what my compilation would print on a Chinese system.

What I use in the dynamic translation is a normal text-format file, which could contain such variable-length encodings which my system is not aware of. I use plain fgetc() to read that file byte by byte, and pass them as normal char strings to the Wndows API routines, not as widhe char.

michiguel · Post by **michiguel** » Tue Aug 24, 2010 11:20 am

hgm wrote:Great! I just extracted the Spanish translation made by Fransisco Garcia from WinBoard 4.2.6, and put it at:

http://hgm.nubati.net/espanol.lng

This was a very complete translation: even the help file was translated in Spanish for a large part! (I did not put that on line.) Of course there are many new strings now, and some strings are slightly altered, so they are not recognized anymore. Anyway, I already shaped up the main menus a little bit, so that these are recognized again. So when you put this file in the same folder as winboard.ee, starting with

winboard /language=espanol.lng

you should already see a lot of Spanish in the menus and dialogs. It would just be a matter of adding the missing strings, copying them from lang450.txt, and completing the Spanish translation.

IMHO, this version needs a bit of work to set it a more "standard" Spanish so people from many places could understand, but I understand that technical neologisms are difficult to translate. Is it possible to set up some sort of a wiki (not only with the Spanish version) so the people could make additions? I guess that google documents will work great for this.

Miguel

michiguel · Post by **michiguel** » Tue Aug 24, 2010 11:28 am

hgm wrote:Well, you obviously know more about this than I do, as I had never heard of UTF-8 or UCS-2. Is this the difference between variable-length encoding or fixed-length encoding?

The sources of the static Chinese translation apparently use variable length encoding: in my NotePad, which does not understand this format, each Chinese character shows up as two garbage characters (codes >= 128) amongst the normal ascii tet. Apparently Chinese Windows systems have some low-level mechanism to interpret this format, join the garbage code and display it as Chinese. This works even for a hello-world program printing such strings, when I send them the .exe, or for menu strings set at run time from such strings. It does not work when the strings where in an .rc file that I compiled, but it thus when they compile it. (And then it even works on my Windows system.)

Indeed WinBoard does pass a lot of (char*) around.

What I do not understand yet is how the font and code page specified in the resource file affects matters, and in there are ways to overrule that at run time. I noticed that the static Chinese 4.2.7 translation uses other fonts (whose names I cannot even read), and also another code page. I use the standard routines SetDlgItemText() and ModifyMenu() to alter the menu texts in the dynamic translation. This happily sets the menu texts to the same garbage when the normal font and code page is specified in the .rc file as what it does when the Chinese font and code page are specified, on my system. But I have no idea what a Chinese compilation of it would do, or what my compilation would print on a Chinese system.

What I use in the dynamic translation is a normal text-format file, which could contain such variable-length encodings which my system is not aware of. I use plain fgetc() to read that file byte by byte, and pass them as normal char strings to the Wndows API routines, not as widhe char.

We should have had a different format with the feature option command, with an extra -label "Whatever is displayed in WB" beside NAME, which is used for communication between the engine and the GUI, not for display.

something like

feature option=NAME -label "LABEL" -button

etc.

Maybe there is a way to get this in somehow w/o breaking the engines that support this feature. i.e. Gaviota and Fairy-max

, maybe putting it as an option at the end...

Miguel

hgm · Post by **hgm** » Tue Aug 24, 2010 11:50 am

I don't think that it is a good idea to let engines decide what GUIs display. The latter is a matter between user and GUI, and the engine should have no say in it. Unfortunately there seems to be no alternative for displaying the engine options than what the engines themselves use as names for these options. They are intentionally left outside any standard, so we cannot anticipate a translation.

I don't see much disadvantage in using the text that is displayed for the user as the name by which the option is indicated in the protocol. It might be needlessly verbose, but so what? I don't think anything would be gained by letting the engine define a shorthand for communicating the option.

I still don't see how it would contributr to solving the transltion problem. I think we cannot burdon engines with an obligation to provide translations, even though they are the only instances able to do it.

People that do want to provide multi-lingual service, can define buttons for various languages. E.g. I could send:

feature option="English -reset"
feature option="Français -reset"
feature option="Español -reset"
feature option="pawn hash size (MB) -spin 64 8 256"

and on reception of the option Français command have the engine resend all its options in French:

feature option="English -reset"
feature option="Français -reset"
feature option="Español -reset"
feature option="magnitude de tableau de hachée pour les péons (Mo.) -spin 128 8 256"

where it would now quote as default values the current settings, which might have already been changed automatically by the GUI in reaction to settings programmed into it for the English options.

michiguel · Post by **michiguel** » Tue Aug 24, 2010 12:14 pm

hgm wrote:Not sure what you intend this command to do.

I don't think that it is a good idea to let engines decide what GUIs display. The latter is a matter between user and GUI, and the engine should have no say in it. Unfortunately there seems to be no alternative for displaying the engine options than what the engines themselves use as names for these options. They are intentionally left outside any standard, so we cannot anticipate a translation.

The engines are already choosing what to display in the engine settings, it is the NAME in the feature option=NAME.

Now NAME is the keyword used for communication and what it is displayed. What I am saying is that it is better to keep the roles separated so the translation from the engine point of view is much easier.

It is kind of awkward to have, for instance, all WB translated to Spanish and the engine menu settings in English. The GUI has no idea what the engine wants to display in the menu, so the only way to do it is the engine to be set in the language that the user chooses. For instance, running it with "./gaviota -spanish" and Gaviota will choose the proper way to show the options in Spanish in the engine settings.

Miguel

michiguel · Post by **michiguel** » Tue Aug 24, 2010 12:22 pm

hgm wrote:I don't think that it is a good idea to let engines decide what GUIs display. The latter is a matter between user and GUI, and the engine should have no say in it. Unfortunately there seems to be no alternative for displaying the engine options than what the engines themselves use as names for these options. They are intentionally left outside any standard, so we cannot anticipate a translation.

I don't see much disadvantage in using the text that is displayed for the user as the name by which the option is indicated in the protocol. It might be needlessly verbose, but so what? I don't think anything would be gained by letting the engine define a shorthand for communicating the option.

I still don't see how it would contributr to solving the transltion problem. I think we cannot burdon engines with an obligation to provide translations, even though they are the only instances able to do it.

People that do want to provide multi-lingual service, can define buttons for various languages. E.g. I could send:

feature option="English -reset"
feature option="Français -reset"
feature option="Español -reset"
feature option="pawn hash size (MB) -spin 64 8 256"

and on reception of the option Français command have the engine resend all its options in French:

feature option="English -reset"
feature option="Français -reset"
feature option="Español -reset"
feature option="magnitude de tableau de hachée pour les péons (Mo.) -spin 128 8 256"

where it would now quote as default values the current settings, which might have already been changed automatically by the GUI in reaction to settings programmed into it for the English options.

IMHO, It is a complication to communicate between WB and the engine in different languages particularly if it needs to be done in wide characters, for debugging purposes and consistency. It is simpler if the communication is always the same in the regular *char. Otherwise, the engine will have to be prepared to received commands in wide chars, for instance.

Miguel

hgm · Post by **hgm** » Tue Aug 24, 2010 1:17 pm

I don't see how you avoid the use of wide characters. If the names you want to be displayed to the user can only be spelled with wide characters, you would still have to send them to the GUI in the LABEL field of the format you propose.

If engines want to force display of option names that contain wide characters, let them deal with wide characters on input as well as output. WinBoard will simply send them back what it got on setting the option, however wide it was. In the end any communication is a stream of bytes.

Of course it looks stupid to see English engine-option names in a Spanish WB. But it is an illusion you could avoid that this way. Fact is that virtually no engine would provide a Spanish translation, even if you would define a mechanism by which it could do it. You simply cannot ask that of engine authors.

I think the better solution is to define a list of standard names for options with a certain function, and put those standard names in a translation table in possession of the GUI. Then translators could make translations for these options available in any language. It seems to me that this would in the end lead to more and better translation than putting the burdon on engine authors. The only thing it requires of them is to use a standard name, when there is one available.

wgarvin · Post by **wgarvin** » Tue Aug 24, 2010 3:15 pm

hgm wrote:Well, you obviously know more about this than I do, as I had never heard of UTF-8 or UCS-2. Is this the difference between variable-length encoding or fixed-length encoding?

I think so, yes. UCS-2 is the older standard and only supports unicode characters in the BMP, "Basic Multilingual Plane" (U+0000 through U+FFFF), so it fits every character into a single 16-bit short. However, the BMP contains virtually all of the Unicode characters that typically get used, and all characters being the same size is convenient. Unicode characters above U+10000 are not very commonly used.

UTF-8 is a multi-byte encoding that can encode all of unicode (U+0000 through U+10FFFF) and is designed so that all ASCII characters 0-127 are a single byte with the high bit set to a zero (the same as they are in ASCII itself). Other characters are represented by two to four bytes, in such a way that you can detect whether a byte is the start of a new character or not, and no ASCII byte (including 0, the nul) will occur anywhere in a multi-byte character. UTF-8 is useful if you want to keep char* around, and things like strcpy and strcmp still work fine with it. However, number of bytes != number of characters.

UTF-16 is like UTF-8, but based on 16-bit shorts instead of 8-bit bytes.

You have to decide what format is most convenient for you to store and pass around the strings in, and you also have to figure out what Windows wants and convert them to that whenever you pass them to it. If you can choose the same format as Windows, that second step is easier. Early versions of Windows with Unicode support, used the fixed-size UCS-2 encoding, but I think they have mostly or completely transitioned to using UTF-16. Anyway, its very unlikely that you will need any characters (for any language) which are actually more than 16-bits in UTF-16.

You can put a "byte order mark" at the beginning of a file, to distinguish what format the file is in: http://en.wikipedia.org/wiki/Byte_order_mark
I think Notepad will open, read and write all of these Unicode encodings.

Anyway, here's some links about Unicode.
http://en.wikipedia.org/wiki/Comparison ... _encodings
http://www.unicode.org/faq/

I think you have to install a unicode font pack in order to see all the translated strings properly in their native language, without changing any system language settings. If you install language packs they probably come with these fonts. In Windows XP Pro, I notice that "Control Panel > Regional and Language Options > Supplemental language support" has some options for extra language support.

wgarvin · Post by **wgarvin** » Tue Aug 24, 2010 3:19 pm

hgm wrote:I don't see how you avoid the use of wide characters. If the names you want to be displayed to the user can only be spelled with wide characters, you would still have to send them to the GUI in the LABEL field of the format you propose.

If engines want to force display of option names that contain wide characters, let them deal with wide characters on input as well as output. WinBoard will simply send them back what it got on setting the option, however wide it was. In the end any communication is a stream of bytes.

Of course it looks stupid to see English engine-option names in a Spanish WB. But it is an illusion you could avoid that this way. Fact is that virtually no engine would provide a Spanish translation, even if you would define a mechanism by which it could do it. You simply cannot ask that of engine authors.

I think the better solution is to define a list of standard names for options with a certain function, and put those standard names in a translation table in possession of the GUI. Then translators could make translations for these options available in any language. It seems to me that this would in the end lead to more and better translation than putting the burdon on engine authors. The only thing it requires of them is to use a standard name, when there is one available.

It might be worthwhile to redefine all of the strings sent to and from the engines to be UTF-8 strings. Those strings have largely been ASCII until now anyway, and UTF-8 is backward-compatible with ASCII.

Once you have the string in Winboard, you can losslessly convert among UTF-8, UTF-16 and UCS-4 (32-bit per char). Even if you convert them all to UCS-2 you won't lose anything from the BMP so that would be 99.9% good enough.

If an engine doesn't understand UTF-8, then any strings you send it will just contain some weird characters with high bit set. But ASCII characters will still be fine, and nul termination is still the same, so engines can pass those strings around, write them to their log files, etc. without even knowing that they are UTF-8.

WinBoard translations

Re: WinBoard translations

Re: WinBoard translations

Re: WinBoard translations

Re: WinBoard translations

Re: WinBoard translations

Re: WinBoard translations

Re: WinBoard translations

Re: WinBoard translations

Re: WinBoard translations

Re: WinBoard translations