Today I'm going to give you some tips for globalizing (or internationalizing) your applications. But first, let me define what globalization is and isn't.
Globalization is the process of preparing your application for a global audience.
Globalization is not the same as localization. Localization is the act of tailoring your application to any number of different languages and cultures (or locales). It involves things like translating your user interface into the target language, and adding culture-specific information. When you complete the localization process, you will have a customized copy of your application for each target language/culture. But globalization is not about tailoring your application to any one language or culture—just the opposite, in fact.
Globalization is about removing all assumptions about the target language or culture. You can—and should—always globalize your applications, even if you don't localize them.
Now that we have defined the word, let's look at some globalization best practices you can follow to ensure that your software is truly world-ready. We will look at the first four tips this month; the remainder will be posted next month.
1. Don't use national flags to represent languages
Wrong:
A lot of web sites use flag icons as a compact, visually-appealing way for users to select the UI language. This is wrong for the simple reason that languages and countries are not interchangeable. A language can be spoken in multiple countries, and a country can have multiple languages spoken within its borders. Consider: India has over 20 languages of record. Suppose your application will support the Hindi, Punjabi, and Sanskrit languages. Are they all going to be represented by the Indian flag? On the other side of the coin, consider all the countries that have English as an official language. Which flag should represent English? The United States? The United Kingdom? Canada? Australia? New Zealand? South Africa? Liberia? (and so on...)
Solution: Just use the names of the languages themselves (with each written in that language).
Correct:
2. Don't build sentences dynamically
An incorrect example of dynamically building a sentence in code by concatenating strings (in VB.NET):
"Log me out after " & minutes.ToString() & " minutes"
This is wrong because different languages have different sentence structures. If you translate those fragments individually and then try to concatenate them back in the same order, the resulting sentence may not make any sense.
Instead, you should keep the whole sentence together in a single string; if you need to fill in values, then you can use placeholders. A better way to implement the previous example is:
"Log me out after {0} minutes".Replace("{0}", minutes.ToString())
This way, the entire sentence will be translated as a single linguistic unit, and the translator will put the placeholder in the correct position in the translated sentence.
Corollary: Don't build sentences out of UI controls
Wrong:

This is wrong for the same reason that dynamically building sentences in code is wrong. If you translate both halves of that sentence individually, your control may end up in the wrong part of the sentence when you're done. The correct way to handle this situation is to take the control out of the sentence completely.
Correct:

Also correct:

3. Don't use ASCII encoding
Hey, 1985 called, and it wants its character encoding back. Seriously, the only excuse you have to be using ASCII in this day and age is if you are writing embedded software for some sort of device that only has 4 kilobytes of memory, so you absolutely cannot spare more than 1 byte per character. For all other purposes, however, get in the habit of using Unicode for all your string-handling needs. Then you can rest assured that your application won't break when one of your Russian customers enters his name in Cyrillic script. As far as string encodings go, UTF-8 is always a good choice. UTF-8 uses a single byte for characters in the ASCII range, but it also scales up nicely to handle any Unicode character.
4. Accept special characters as input
I'm going to tell you a story. I work for the U.S. government, and I use a lot of web applications at work. They are very expensive applications—like tens of millions of dollars each. Now, you might think that applications that cost that much money would be cutting-edge, shining examples of best coding practices. Well, you would be wrong.
My last name has an apostrophe in it. On my first day of work, when I received my work email address, I noticed that IT decided to include the apostrophe in my address; I personally would have left it out, but I thought little of it at first. But then I tried registering for some of these web apps that are needed to do my job. And naturally, three of them wouldn't accept my apostrophe email address. These apps are fairly central to my work, so not being able to register really cut down my productivity. The first application was fixed almost immediately (which I greatly appreciate). The second took about 10 months to be patched (it also took the complaints of a second employee who had the same problem as me). But the third application is still broken, and will be for the forseeable future; when I asked their help desk what I could do, they told me that the only solution was to get a new email address. Um, thanks a lot.
There is no conceivable reason to disallow apostrophes (or any other special characters) from email address fields. Obviously apostrophes are legal in addresses because I send and receive email every day. I believe there are three possible explanations:
- The developer gave special meaning to apostrophes for some reason. Maybe the developer was splitting fields on apostrophes, or perhaps the developer was building an SQL database query without escaping his parameters (if you don't know why that's bad, try researching "SQL injection attack").
- The developer made a conscious arbitrary decision to disallow apostrophes from email addresses. Not for any technical reason, but because "I have never seen an email address like that, so therefore, they must not exist."
-
The developer was lazy or naive when he wrote the validation function and didn't consider all possibilities for valid characters. I cringe every time I see [A-Za-z] in a regular expression. There are more than 26 letters in the world, people!
Please, fellow developers, please learn from this story. When you are writing your software, consider the plight of those who have extended characters in their names. Only when you are applying this lesson, make sure that you replace "email address" with "all fields", and "apostrophe" with "any character".

Come back next month for the last four globalization tips!