tl;dr: Bing Translation + 500 lines of new code + my old ResXCheck project = Paint.NET now has its own Tower of Babel factory.

Currently, Paint.NET ships in 9 languages. Soon, that number may jump all the way up to 33*.

Earlier today I needed to figure out what some short phrase in Spanish meant because the 2 years I took in high school were completely lost to me. After I learned what the now-forgotten phrase meant, further curiosity took over and I thought, “Don’t they have an API for this? Hmm … maybe I can use it to translate Paint.NET. I really doubt it’ll be that easy though, but why not check it out.”

As some quick background, a group within Microsoft Developer Division had handled translation for about 4 years, spanning versions 3.0 through 3.5 of Paint.NET. It was a volunteer effort done in their precious free time, and the impact has been enormous. Once 3.5 was done, shifting priorities and responsibilities led us to amicably part ways, leaving me without their graciously offered translation abilities. This is the reason why all updates to 3.5 (currently up to 3.5.8) have had almost no changes to the string resources**. New features need new translations, and even new error messages need them. I was in a bit of bind: not only could I not add new features, but I couldn’t even improve error handling! My plan was to hire a translation agency to handle things for the release of 4.0, as it would be cost prohibitive to do incremental translation updates (let’s say ~10 words for every minor update, on average). Anyway, back to the main plot.

I found the API documentation for Bing Translation, signed up for an API Key or whatever they’re calling it nowadays, and started up a new C# command-line project in Visual Studio. I added a “service reference” to their SOAP API and it all magically fell into place with a simple, imperative .NET API. I could create a service object and send queries and get results. It even worked.

“This can’t be that easy. No way.” But it was. I already had a paragraph of code to parse out a RESX file into a key/value pair list (no really, it’s a paragraph of LINQ-to-XML: check out the method “FromResX” from my old ResXCheck project), so I was already halfway there (so to speak).

I didn’t want to retranslate everything that I already had; human translation is usually better than machine translation. So the first requirement was to support incremental translation. This necessitated the ability to specify the source ResX file (English in my case), the previous version of the source ResX, and the latest translation of the source ResX for a given language. With this it’s a few simple LINQ queries and set algebra to determine which strings are new, which are changed, and which ones already have translations (provided they are neither new nor changed). The bulk of the code is devoted to keeping count of these and printing it to the console, for funsies.

As another quick backgrounder for those who haven’t worked with localization: string resources are specified as a name and a value. In code, you use the name in order to lookup the localized text, e.g. Resources.GetString(“MainWindow.FileMenu.Text”) which nets you “File” for English, or whatever is appropriate for the chosen language. It really is just a key/value dictionary and normal set algebra applies. Also, this is all stuffed into an XML file with the extension “resx.” Now you know.

The next hurdle was handling keyboard accelerators, which are specified in WinForms with an ampersand. For instance, the “File” menu’s name is stored as “&File” to indicate that F is the keyboard key you use to access it. This was done by figuring out what the accelerator was (String.IndexOf), removing it, and then adding it back into the translated text. If the translated string had that character in it, then I used it (I inserted an ampsersand at the appropriate spot). Otherwise, I employed the convention of adding “ (&X)” to the string, where ‘X’ is the accelerator key (you see this, for instance, in Japanese translations.) This would be trivial to adopt for WPF which uses a single underscore instead of the ampersand, presumably because ampersands are obnoxious to type in XAML or something.

Another problem I ran into, comically enough, were blank strings. Bing doesn’t like to translate String.Empty, and Paint.NET has a few of those for enumeration values which don’t actually show up in the UI. They are present in the strings file because I have a utility class for handling enumeration value localization lookup, EnumLocalizer, and it requires all enumeration values to be accounted for. I was able to handle this with my own proprietary translation algorithm (hint: copy by value).

One hurdle I was expecting and dreading didn’t turn out to be a problem at all. If you’ve worked with localization in .NET then you know that many strings contain placeholders. For instance, “There are {0} files.” The goofy looking {0} is a placeholder for an integer in this case, e.g. “There are 23 files.” For some reason, and much to my pleasant surprise, these almost all survived translation. There were only 7 instances that required manual fixing: 3 for each of the Chinese variants, and 1 in another language where it goofed up a URL suffix. I highly doubt these were translated with perfect grammar, but I wasn’t really expecting that anyway. I can live with having to manage 7 fixups out of ~24,000 strings. (This also improves the case for wanting incremental translation: these fixups will survive the next time I update the translations, since it will only translate strings which are new or changed.)

The last hurdle was that the Bing API limits you to 50 requests per minute before it starts giving you a Denial-of-Service error. Throttling is easy enough with Thread.Sleep(), along with retry logic in case that doesn’t work.

At this point it took 23 minutes to translate ~1000 strings from English to any other language. Since 8 languages are already done, that gives a running time of about ~9 hours. That’s a long time! Still, that’s much shorter and much cheaper than human translation, so it would’ve still counted as success for me. Fortunately, Bing Translation provides a method overload for doing a batch query, so I was able to knock the time down to about 23 seconds. Yes, seconds. Per language. ~9 minutes total. It would be even faster if the API (or protocol? I don’t know) wasn’t limited to a 64KB query string.

Along with ResXCheck, which lets me verify the structural correctness of a translation, I now have an almost completely automated method of bootstrapping a new language for Paint.NET. I currently have a small group of private testers checking out a beta version of what I’m tentatively calling the “Paint.NET Bing Translation Pack”. We’ve run into a few crashes, but these are the result of my goofy aforementioned EnumLocalizer class, and it will be easy enough to fix (although it will require a 3.5.9 release, eventually). If things go well then I’ll start to include them in the main release as-is, after which I can periodically collect proofreading fixes and incorporate them. It’s hardly perfect, but good enough for the usual “ship now and improve next week” cycle that we’ve all grown accustomed to.

This is one of those magical moments in software development where you write code for an hour or two, fueled only by a Red Bull (or 2***) and some new music, and are amazed that not only does it compile … but it works. At this point my main question is why I didn’t think of this sooner. You don’t often get this kind of bang for the buck.

I may even release this “BingTranslateResX” utility, complete with source code, on its own. Yes there are other automatic ResX translator apps out there, but I didn’t think to look until after I’d already written the code to my exact requirements. And, the ones that I found didn’t handle my need for incremental translation, which makes mine the only utility I know of that can work well in real, production projects. I think Paint.NET is a pretty good litmus test.

Hopefully now I can appease all the e-mails I’ve been getting with requests for Dutch and Czech translations.

* The diligent reader will note that Bing Translation currently supports 34 languages, and will reasonably ask “Why only 33?” The answer: the Thai translation goofs up all of the formatting codes, such as {0} and {1} placeholders, within the strings.

** The only change was to “XP SP2” and “Vista” in the installer’s error message stating the minimum system requirements. They were updated to “XP SP3” and “Vista SP1”. I handled that … all by myself!

*** or 3

About these ads