[ Team LiB ] Previous Section Next Section

14.1 How Java Supports Internationalization and Localization

Java was designed with internationalization in mind and includes a number of classes to make the effort as painless as possible. The primary class used for internationalization represents a specific geographical region. Instances of this class are used by other classes to format dates and numbers, and include localized strings and other objects in an application. There are also classes for dealing with different character encodings, as you will see later in the chapter.

14.1.1 The Locale Class

All Java classes that provide localization support use a class named java.util.Locale. An instance of this class represents a particular geographical, political, or cultural region, as specified by a combination of a language code and a country code. Java classes that perform tasks that differ depending on a user's language and local customs—so called locale-sensitive operations—use a Locale instance to decide how to operate. Examples of locale-sensitive operations are interpreting date strings and formatting numeric values.

You create a Locale instance using a constructor that takes the country code and language code as arguments:

java.util.Locale usLocale = new Locale("en", "US");

Here, a Locale for U.S. English is created. George Bernard Shaw (a famous Irish playwright) once observed that "England and America are two countries divided by a common language," so it's no surprise that both a language code and a country code are needed to describe some locales completely. The language code, a lowercase two-letter combination, is defined by the ISO 639 standard available at http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt. The country code, an uppercase two-letter combination, is defined by the ISO 3166 standard, available at http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html. Tables Table 14-1 and Table 14-2 show some of these codes.

Table 14-1. ISO-639 language codes

Language code

Language

af

Afrikaans

da

Danish

de

German

el

Greek

en

English

es

Spanish

fr

French

ja

Japanese

pl

Polish

ru

Russian

sv

Swedish

zh

Chinese

Table 14-2. ISO-3166 country codes

Country

Country code

Denmark

DK

Germany

DE

Greece

GR

Mexico

MX

New Zealand

NZ

South Africa

ZA

United Kingdom

GB

United States

US

As luck would have it, these two standards are also used to define language and country codes in HTTP. As you may remember from Chapter 2, a browser can send an Accept-Language header with a request for a web resource such as a JSP page. The value of this header contains one or more codes for languages the user prefers, based on how the browser is configured. If you use a Netscape 6 or Mozilla browser, you can specify your preferred languages in the Edit Preferences dialog, under the Navigator Languages tab. In Internet Explorer 5, you find the same thing in Tools Internet Options when you click the Languages button under the General tab. If you specify more than one language, they are included in the header as a comma-separated list:

Accept-Language: en-US, en, sv

The languages are listed in order of preference, with each language represented by just the language code or the language code and country code separated by a dash (-). This example header specifies the first choice as U.S. English, followed by any type of English, and finally Swedish. The HTTP specification allows an alternative to listing the codes in order of preference, namely adding a so-called q-value to each code. The q-value is a value between 0.0 and 1.0, indicating the relative preference between the codes. Very few browsers use this alternative today, however.

The Accept-Language header helps you localize your application. You could write code that reads this header and creates the corresponding Locale instances. The good news is you don't have to do this yourself; the web container takes care of it for you and makes the locale information available through properties of the implicit pageContext object:

${pageContext.request.locale}
${pageContext.request.locales}

The locale property contains the Locale with the highest preference ranking; the locales (plural) property contains a collection of all locales in order of preference. All you have to do is match the preferred locales to the ones your web application supports. The easiest way to do this is to loop through the preferred locales and stop when you find one your application supports. As you will see later, the JSTL I18N actions relieve you of this as well, but now you know how it can be done.

14.1.2 Formatting Numbers and Dates

Let's look at how a locale can be used. One thing we who live on this planet have a hard time agreeing on is how to write dates and numbers. The order of the month, the day, and the year; if the numeric value or the name should be used for the month; what character to use to separate the fractional part of a number; all of these details differ between countries, even between countries that speak the same language. And even though these details may seem picky, using an unfamiliar format can cause a great deal of confusion. For instance, if you ask for something to be done by 5/2, an American thinks you mean May 2 while a Swede believes that it's due by February 5.

Java provides two classes to deal with formatting of numbers and dates for a specific locale, appropriately named java.text.NumberFormat and java.text.DateFormat, respectively.

The JSTL <fmt:formatNumber> action, used in Chapter 10 to format the price information for items in a shopping cart, is based on the NumberFormat class. By default, the NumberFormat class formats numbers based on the locale of the underlying operating system. If used on a server configured to use a U.S. English locale, it formats them according to American customs; on a server configured with an Italian locale, it formats them according to Italian customs, and so forth. But you can also explicitly specify a locale, to format numbers according to the rules for other locales than the one used by the operating system. You will soon see how to tell the JSTL formatting actions to use a specific locale or figure out which one to use based on the Accept-Language header.

The DateFormat class works basically the same way, but how dates are written differs a lot more between locales than numbers do, because the day and month names are sometimes spelled out in the local language. The JSTL <fmt:formatDate> action, used to format date and time values, is based on the DateFormat class.

14.1.3 Using Localized Text

Automatic translation of numbers and dates into the local language is a great help, but until automatic translation software is a lot smarter than it is today, you have to translate all the text used in the application yourself. A set of Java classes can help you pick the right version for a specific locale.

The main class for dealing with localized resources (such as text, images, and sounds) is named java.util.ResourceBundle. This class is actually the abstract superclass for the two subclasses that do the real work, ListResourceBundle and PropertyResourceBundle, but it provides methods that let you get an appropriate subclass instance, hiding the details about which subclass actually provides the resources. Details about the difference between these two subclasses are beyond the scope of this book. Suffice it to say that the JSTL actions can use resources provided through either of them.

For most web applications, an instance of the PropertyResourceBundle is used. A PropertyResourceBundle instance is associated with a named set of localized text resources; a key identifies each resource. The keys and their corresponding text values are stored in a regular text file, known as a resource bundle file:

site_name=The Big Corporation Inc.
company_logo=/images/logo_en.gif
welcome_msg=Hello!

Here three keys, site_name, company_logo, and welcome_msg, have been assigned values. The key is a string, without space or other special characters, and the value is any text. If the value spans more than one line, the line break must be escaped with a backslash character (\):

multi_line_msg=This text value\
continues on the next line.

The file must use the .properties extension, for instance sitetext.properties, and be located in the classpath used by the Java Virtual Machine (JVM). In the case of web applications, you can store the file in the application's WEB-INF/classes directory, because this directory is always included in the classpath.

To localize an application, you create separate resource bundle files for each locale, all with the same main name (called the base name) but with unique suffixes to identify the locale. For instance, a file named sitetext_es_MX.properties, where es is the language code for Spanish, and MX is the country code for Mexico, can contain the text for the Mexican Spanish locale. The JSTL actions that deal with localized text find the resource bundle that most closely matches the selected locale or a default bundle if there is no match. We'll look at an example in detail in the next section.

Besides the ResourceBundle class, there's a class named java.text.MessageFormat you can use for messages composed of fixed text plus variable values, such as "An earthquake measuring 6.7 on the Richter scale hit Northridge, CA, on January 17, 1994.", where each underline represents a variable value. The JSTL actions support this type of formatted messages as well, as you will see in the next section.

    [ Team LiB ] Previous Section Next Section