"Unicode" support under MS-Windows

wpkg runs with UTF-8 strings. Under Linux and other Unix systems that suffices as most operating system functions accept UTF-8 whatever the characters.

Under MS-Windows, UTF-8 does not work as is. You have to convert those strings to UTF-16 (what Microsoft improperly calls Unicode, if you are wondering about what Unicode really is, check the unicode.org website.).


It is to be noted that older versions of the MS-Windows operating system are NOT capable of using UTF-16. Instead, these were limited to UCS-2. I do not know when the switch occured or whether the characters I tried a while back were invalid, but I think it was around WinXP. So if you have Windows 2000 or older, you probably are limited to UCS-2 (about 63,486 characters.)

Since version 0.8.0, the wpkg tool accepts UTF-16 strings on the command line. These are converted to UTF-8 that later are converted back to UTF-16 when using am I/O function such as open() and unlink(). This is done under the hood so you should not have to do anything special about it.

If you are developing against the libdebpackages library, then make sure to always pass UTF-8 strings.

Obviously, packages are limited to ASCII characters (a-z, 0-9, and a few other characters like dash and underscore) and as such are not directly affected by such problems. However, the system still has to take the string format straight because the path to packages may include characters outside of the ASCII character range and depending on the locale, ASCII may not even be available as is on the platform.

If you run in any problem, report it.

Note that the content of all the files the library deals with are expected to be UTF-8 as well. So the control file, the substvar file, the final .deb, the different .tar files, etc. are all using UTF-8 strings. Only filenames when accessing the hard drives are converted to UTF-16.

See also: MinGW Unicode Limitations