Nota Bene: This is a republication of a post that originally appeared in the 2013 Perl Advent Calendar.
These days, gettext is far and away the most widely-used localization (l10n) and internationalization (i18n) library for open-source software. So far, it has not been widely used in the Perl community, even though it’s the most flexible, capable, and easy-to use solution, thanks to Locale::TextDomain.1 How easy? Let’s get started!
First, just use Locale::TextDomain. Say you’re creating an awesome new
module, Awesome::Module. These CPAN distribution will be named
Awesome-Module, so that’s the “domain” to use for its localizations. Just let
Locale::TextDomain will later use this string to look for the appropriate
translation catalogs. But don’t worry about that just yet. Instead, start using
it to translate user-visible strings in your code. With the assistance of the
Locale::TextDomain’s [comprehensive documentation], you’ll find it second nature
to internationalize your modules in no time. For example, simple strings are
If you need to specify variables, use
1 2 3 4
Need to manage plurals? Use
1 2 3 4 5
$num_records is 1, the first phrase will be used. Otherwise the second.
Sometimes you gotta do both, mix variables and plurals.
__nx has got you
1 2 3 4 5 6
Congratulations! Your module is now internationalized. Wasn’t that easy? Make a
habit of using these functions in all the modules in your distribution, always
Awesome-Module domain, and you’ll be set.
Encode da Code
Locale::TextDomain is great, but it dates from a time when Perl character encoding was, shall we say, sub-optimal. It therefore took it upon itself to try to do the right thing, which is to to detect the locale from the runtime environment and automatically encode as appropriate. Which might work okay if all you ever do is print localized messages — and never anything else.
If, on the other hand, you will be manipulating localized strings in your code,
or emitting unlocalized text (such as that provided by the user or read from a
database), then it’s probably best to coerce Locale::TextDomain to return Perl
strings, rather than encoded bytes. There’s no formal interface for this in
Locale::TextDomain, so we have to hack it a bit: set the
environment variable to “UTF-8” and then bind a filter. Don’t know what that
means? Me neither. Just put this code somewhere in your distribution where it
will always run early, before anything gets localized:
1 2 3 4 5 6
You only have to do this once per domain. So even if you use Locale::TextDomain
Awesome-Module domain in a bunch of your modules, the presence of
this code in a single early-loading module ensures that strings will always be
returned as Perl strings by the localization functions.
So what about output? There’s one more bit of boilerplate you’ll need to throw
in. Or rather, put this into the
main package that uses your modules to begin
with, such as the command-line script the user invokes to run an application.
First, on the shebang line, follow Tom Christiansen’s advice and put
in it (or set the
$PERL_UNICODE environment variable to
AS). Then use the
setlocale function to the appropriate locale for the runtime
environment. How? Like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Locale::TextDomain will notice the locale and select the appropriate translation catalog at runtime.
Is that All There Is?
Now what? Well, you could do nothing. Ship your code and those internationalized phrases will be handled just like any other string in your code.
But what’s the point of that? The real goal is to get these things translated. There are two parts to that process:
Parsing the internationalized strings from your modules and creating language-specific translation catalogs, or “PO files”, for translators to edit. These catalogs should be maintained in your source code repository.
Compiling the PO files into binary files, or “MO files”, and distributing them with your modules. These files should not be maintained in your source code repository.
Until a year ago, there was no Perl-native way to manage these processes. Locale::TextDomain ships with a sample Makefile demonstrating the appropriate use of the GNU gettext command-line tools, but that seemed a steep price for a Perl hacker to pay.
A better fit for the Perl hacker’s brain, I thought, is Dist::Zilla. So I wrote Dist::Zilla::LocaleTextDomain to encapsulate the use of the gettext utiltiies. Here’s how it works.
First, configuring Dist::Zilla to compile localization catalogs for
distribution: add these lines to your
There are configuration attributes for the
LocaleTextDomain plugin, such as
where to find the PO files and where to put the compiled MO files. In case you
didn’t use your distribution name as your localization domain in your modules,
Then you’d set the
textdomain attribute so that the
can find the translation catalogs:
Check out the configuration docs for details on all available attributes.
At this point, the plugin doesn’t do much, because there are no translation
catalogs yet. You might see this line from
dzil build, though:
Let’s give it something to do!
msg-init command uses the GNU gettext utilities to scan your Perl
source code and initialize the French catalog,
po/fr.po. This file is now
ready translation! Commit it into your source code repository so your
agile-minded French-speaking friends can find it. Use
msg-init to create as
many language files as you like:
1 2 3 4 5
Each language has its on PO file. You can even have region-specific catalogs,
such as the
en_UK variants here. Each time a catalog is updated,
the changes should be committed to the repository, like code. This allows the
latest translations to always be available for compilation and distribution.
The output from
dzil build now looks something like:
1 2 3 4
The resulting MO files will be in the shared directory of your distribution:
1 2 3 4 5
From here Module::Build or ExtUtils::MakeMaker will install these MO files
with the rest of your distribution, right where Locale::TextDomain can find
them at runtime. The PO files, on the other hand, won’t be used at all, so you
might as well exclude them from the distribution. Add this line to your
MANIFEST.SKIP to prevent the
po directory and its contents from being
included in the distribution:
Mergers and Acquisitions
Of course no code base is static. In all likelihood, you’ll change your code
— and end up adding, editing, and removing localizable strings as a result.
You’ll need to periodically merge these changes into all of your translation
catalogs so that your translators can make the corresponding updates. That’s
what the the
msg-merge command is for:
1 2 3 4 5 6
This command re-scans your Perl code and updates all of the language files. Old messages will be commented-out and new ones added. Commit the changes and give your translators a holler so they can keep the awesome going.
msg-merge commands don’t actually scan your source code.
Sort of lied about that. Sorry. What they actually do is merge a template file
into the appropriate catalog files. If this template file does not already
exist, a temporary one will be created and discarded when the initialization or
merging is done.
But projects commonly maintain a permanent template file, stored in the source
code repository along with the translation catalogs. For this purpose, we have
msg-scan command. Use it to create or update the template, or POT file:
From here on in, the resulting
.pot file will be used by
msg-merge instead of scanning your code all over again. But keep in mind
that, if you do maintain a POT file, future merges will be a two-step process:
msg-scan to update the POT file, then
msg-merge to merge its
changes into the PO files:
1 2 3 4 5 6 7
Lost in Translation
One more thing, a note for translators. They can, of course, also use
msg-merge to update the catalogs they’re working on. But how
do they test their translations? Easy: use the
msg-compile command to
compile a single catalog:
The resulting compiled catalog will be saved to the
of the current directory, so it’s easily available to your app for testing.
Just be sure to tell Perl to include the current directory in the search path,
and set the
$LANGUAGE environment variable for your language. For example,
here’s how I test the [Sqitch] French catalog:
1 2 3 4
Just be sure to delete the
LocaleData directory when you’re done — or at
least don’t commit it to the repository.
This may seem like a lot of steps, and it is. But once you have the basics in place — Configuring the Dist::Zilla::LocaleTextDomain plugin, setting up the “textdomain filter”, setting and the locale in the application — there are just a few habits to get into:
- Use the functions
__nxto internationalize user-visible strings
msg-mergeto keep the catalogs up-to-date
- Keep your translators in the loop.
The Dist::Zilla::LocaleTextDomain plugin will do the rest.
What about Locale::Maketext, you ask? It has not, alas, withsthood the test of time. For details, see Nikolai Prokoschenko’s epic 2009 polemic, “On the state of i18n in Perl.” See also Steffen Winkler’s presentation, Internationalisierungs-Framework auswählen (and the English translation by Aristotle Pagaltzis), from German Perl Workshop 2010.↩