::

theory.pm

Sqitch on FLOSS Weekly

§

Yours truly was feature in this week’s episode of FLOSS Weekly, talking about Sqitch. I feel pretty good about this intereview, despite continually banging on my legs, the table, and the mic. It’s interesting to try to communicate what Sqitch is about purely by talking.

If it’s enough to get you interested in giving a try, try [installing it] and using working through one of the tutorials:

Localize Your Perl Apps with this One Weird Trick

§

Nota Bene: This is a republication of a post that originally appeared in the 2013 Perl Advent Calendar.


These days, gettext is far and away the most widely-used localization (l10n) and internationalization (i18n) library for open-source software. So far, it has not been widely used in the Perl community, even though it’s the most flexible, capable, and easy-to use solution, thanks to Locale::TextDomain.1 How easy? Let’s get started!

Module Internationale

First, just use Locale::TextDomain. Say you’re creating an awesome new module, Awesome::Module. These CPAN distribution will be named Awesome-Module, so that’s the “domain” to use for its localizations. Just let Locale::TextDomain know:

Establish Dominion
1
use Locale::TextDomain 'Awesome-Module';

Locale::TextDomain will later use this string to look for the appropriate translation catalogs. But don’t worry about that just yet. Instead, start using it to translate user-visible strings in your code. With the assistance of the Locale::TextDomain’s [comprehensive documentation], you’ll find it second nature to internationalize your modules in no time. For example, simple strings are denoted with __:

Trekkie Greetings
1
say __ 'Greetings puny human!';

If you need to specify variables, use __x:

Animal Thanks
1
2
3
4
say __x(
   'Thank you {sir}, may I have another?',
   sir => $username,
);

Need to manage plurals? Use __n:

Obligatory Python Reference
1
2
3
4
5
say __n(
    'I will not buy this record, it is scratched.',
    'I will not buy these records, they are scratched.',
    $num_records
);

If $num_records is 1, the first phrase will be used. Otherwise the second.

Sometimes you gotta do both, mix variables and plurals. __nx has got you covered there:

Deeper Understanding
1
2
3
4
5
6
say __nx(
    'One item has been grokked.',
    '{count} items have been grokked.',
    $num_items,
    count => $num_items,
);

Congratulations! Your module is now internationalized. Wasn’t that easy? Make a habit of using these functions in all the modules in your distribution, always with the Awesome-Module domain, and you’ll be set.

Encode da Code

Locale::TextDomain is great, but it dates from a time when Perl character encoding was, shall we say, sub-optimal. It therefore took it upon itself to try to do the right thing, which is to to detect the locale from the runtime environment and automatically encode as appropriate. Which might work okay if all you ever do is print localized messages — and never anything else.

If, on the other hand, you will be manipulating localized strings in your code, or emitting unlocalized text (such as that provided by the user or read from a database), then it’s probably best to coerce Locale::TextDomain to return Perl strings, rather than encoded bytes. There’s no formal interface for this in Locale::TextDomain, so we have to hack it a bit: set the $OUTPUT_CHARSET environment variable to “UTF-8” and then bind a filter. Don’t know what that means? Me neither. Just put this code somewhere in your distribution where it will always run early, before anything gets localized:

Control Decode
1
2
3
4
5
6
use Locale::Messages qw(bind_textdomain_filter);
use Encode;
BEGIN {
    $ENV{OUTPUT_CHARSET} = 'UTF-8';
    bind_textdomain_filter 'Awesome-Module' => \&Encode::decode_utf8;
}

You only have to do this once per domain. So even if you use Locale::TextDomain with the Awesome-Module domain in a bunch of your modules, the presence of this code in a single early-loading module ensures that strings will always be returned as Perl strings by the localization functions.

Environmental Safety

So what about output? There’s one more bit of boilerplate you’ll need to throw in. Or rather, put this into the main package that uses your modules to begin with, such as the command-line script the user invokes to run an application.

First, on the shebang line, follow Tom Christiansen’s advice and put -CAS in it (or set the $PERL_UNICODE environment variable to AS). Then use the POSIX setlocale function to the appropriate locale for the runtime environment. How? Like this:

Localization Boilerplate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/usr/bin/perl -CAS

use v5.12;
use warnings;
use utf8;
use POSIX qw(setlocale);
BEGIN {
    if ($^O eq 'MSWin32') {
        require Win32::Locale;
        setlocale POSIX::LC_ALL, Win32::Locale::get_locale();
    } else {
        setlocale POSIX::LC_ALL, '';
    }
}

use Awesome::Module;

Locale::TextDomain will notice the locale and select the appropriate translation catalog at runtime.

Is that All There Is?

Now what? Well, you could do nothing. Ship your code and those internationalized phrases will be handled just like any other string in your code.

But what’s the point of that? The real goal is to get these things translated. There are two parts to that process:

  1. Parsing the internationalized strings from your modules and creating language-specific translation catalogs, or “PO files”, for translators to edit. These catalogs should be maintained in your source code repository.

  2. Compiling the PO files into binary files, or “MO files”, and distributing them with your modules. These files should not be maintained in your source code repository.

Until a year ago, there was no Perl-native way to manage these processes. Locale::TextDomain ships with a sample Makefile demonstrating the appropriate use of the GNU gettext command-line tools, but that seemed a steep price for a Perl hacker to pay.

A better fit for the Perl hacker’s brain, I thought, is Dist::Zilla. So I wrote Dist::Zilla::LocaleTextDomain to encapsulate the use of the gettext utiltiies. Here’s how it works.

First, configuring Dist::Zilla to compile localization catalogs for distribution: add these lines to your dist.ini file:

LocaleTextDomain is your dist.ini
1
2
[ShareDir]
[LocaleTextDomain]

There are configuration attributes for the LocaleTextDomain plugin, such as where to find the PO files and where to put the compiled MO files. In case you didn’t use your distribution name as your localization domain in your modules, for example:

Go with a Java-y Domain
1
use Locale::TextDomain 'com.example.perl-libawesome';

Then you’d set the textdomain attribute so that the LocaleTextDomain plugin can find the translation catalogs:

The Domain in DNS
1
2
[LocaleTextDomain]
textdomain = com.example.perl-libawesome

Check out the configuration docs for details on all available attributes.

At this point, the plugin doesn’t do much, because there are no translation catalogs yet. You might see this line from dzil build, though:

Nothing to See Here
1
[LocaleTextDomain] Skipping language compilation: directory po does not exist

Let’s give it something to do!

Locale Motion

To add a French translation file, use the msg-init command2:

French Neuveau
1
2
% dzil msg-init fr
Created po/fr.po.

The msg-init command uses the GNU gettext utilities to scan your Perl source code and initialize the French catalog, po/fr.po. This file is now ready translation! Commit it into your source code repository so your agile-minded French-speaking friends can find it. Use msg-init to create as many language files as you like:

Multilingualism
1
2
3
4
5
% dzil msg-init de ja.JIS en_US.UTF-8 en_UK.UTF-8
Created po/de.po.
Created po/ja.po.
Created po/en_US.po.
Created po/en_UK.po.

Each language has its on PO file. You can even have region-specific catalogs, such as the en_US and en_UK variants here. Each time a catalog is updated, the changes should be committed to the repository, like code. This allows the latest translations to always be available for compilation and distribution. The output from dzil build now looks something like:

Catalogs Plugged
1
2
3
4
po/fr.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/ja.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/en_US.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.
po/en_UK.po: 10 translated messages, 1 fuzzy translation, 0 untranslated messages.

The resulting MO files will be in the shared directory of your distribution:

She Want Mo Mo Mo
1
2
3
4
5
% find Awesome-Module-0.01/share -type f
Awesome-Module-0.01/share/LocaleData/de/LC_MESSAGES/Awesome-Module.mo
Awesome-Module-0.01/share/LocaleData/en_UK/LC_MESSAGES/Awesome-Module.mo
Awesome-Module-0.01/share/LocaleData/en_US/LC_MESSAGES/Awesome-Module.mo
Awesome-Module-0.01/share/LocaleData/ja/LC_MESSAGES/Awesome-Module.mo

From here Module::Build or ExtUtils::MakeMaker will install these MO files with the rest of your distribution, right where Locale::TextDomain can find them at runtime. The PO files, on the other hand, won’t be used at all, so you might as well exclude them from the distribution. Add this line to your MANIFEST.SKIP to prevent the po directory and its contents from being included in the distribution:

No PO
1
^po/

Mergers and Acquisitions

Of course no code base is static. In all likelihood, you’ll change your code — and end up adding, editing, and removing localizable strings as a result. You’ll need to periodically merge these changes into all of your translation catalogs so that your translators can make the corresponding updates. That’s what the the msg-merge command is for:

Merge Management
1
2
3
4
5
6
% dzil msg-merge
extracting gettext strings
Merging gettext strings into po/de.po
Merging gettext strings into po/en_UK.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

This command re-scans your Perl code and updates all of the language files. Old messages will be commented-out and new ones added. Commit the changes and give your translators a holler so they can keep the awesome going.

Template Scan

The msg-init and msg-merge commands don’t actually scan your source code. Sort of lied about that. Sorry. What they actually do is merge a template file into the appropriate catalog files. If this template file does not already exist, a temporary one will be created and discarded when the initialization or merging is done.

But projects commonly maintain a permanent template file, stored in the source code repository along with the translation catalogs. For this purpose, we have the msg-scan command. Use it to create or update the template, or POT file:

Scanners
1
2
% dzil msg-scan
extracting gettext strings into po/Awesome-Module.pot

From here on in, the resulting .pot file will be used by msg-init and msg-merge instead of scanning your code all over again. But keep in mind that, if you do maintain a POT file, future merges will be a two-step process: First run msg-scan to update the POT file, then msg-merge to merge its changes into the PO files:

1
2
3
4
5
6
7
% dzil msg-scan
extracting gettext strings into po/Awesome-Module.pot
% dzil msg-merge
Merging gettext strings into po/de.po
Merging gettext strings into po/en_UK.po
Merging gettext strings into po/en_US.po
Merging gettext strings into po/ja.po

Lost in Translation

One more thing, a note for translators. They can, of course, also use msg-scan and msg-merge to update the catalogs they’re working on. But how do they test their translations? Easy: use the msg-compile command to compile a single catalog:

Translation Comilation
1
2
% dzil msg-compile po/fr.po
[LocaleTextDomain] po/fr.po: 195 translated messages.

The resulting compiled catalog will be saved to the LocaleData subdirectory of the current directory, so it’s easily available to your app for testing. Just be sure to tell Perl to include the current directory in the search path, and set the $LANGUAGE environment variable for your language. For example, here’s how I test the [Sqitch] French catalog:

Translation Comilation
1
2
3
4
% dzil msg-compile po/fr.po
[LocaleTextDomain] po/fr.po: 148 translated messages, 36 fuzzy translations, 27 untranslated messages.
% LANGUAGE=fr perl -Ilib -CAS -I. bin/sqitch foo
"foo" n'est pas une commande valide

Just be sure to delete the LocaleData directory when you’re done — or at least don’t commit it to the repository.

TL;DR

This may seem like a lot of steps, and it is. But once you have the basics in place — Configuring the Dist::Zilla::LocaleTextDomain plugin, setting up the “textdomain filter”, setting and the locale in the application — there are just a few habits to get into:

  • Use the functions __, __x, __n, and __nx to internationalize user-visible strings
  • Run msg-scan and msg-merge to keep the catalogs up-to-date
  • Keep your translators in the loop.

The Dist::Zilla::LocaleTextDomain plugin will do the rest.


  1. What about Locale::Maketext, you ask? It has not, alas, withsthood the test of time. For details, see Nikolai Prokoschenko’s epic 2009 polemic, “On the state of i18n in Perl.” See also Steffen Winkler’s presentation, Internationalisierungs-Framework auswählen (and the English translation by Aristotle Pagaltzis), from German Perl Workshop 2010.

  2. The msg-init function — like all of the dzil msg-* commands – uses the GNU gettext utilities under the hood. You’ll need a reasonably modern version in your path, or else it won’t work.

Lexical Subroutines

§

Ricardo Signes:

One of the big new experimental features in Perl 5.18.0 is lexical subroutines. In other words, you can write this:

my sub quickly { ... }
my @sorted = sort quickly @list;

my sub greppy (&@) { ... }
my @grepped = greppy { ... } @input;

These two examples show cases where lexical references to anonymous subroutines would not have worked. The first argument to sort must be a block or a subroutine name, which leads to awful code like this:

sort { $subref->($a, $b) } @list

With our greppy, above, we get to benefit from the parser-affecting behaviors of subroutine prototypes.

My favorite tidbit about this feature? Because lexical subs are lexical, and method-dispatch is package-based, lexical subs are not subject to method lookup and dispatch! This just might alleviate the confusion of methods and subs, as chromatic complained about just yesterday. Probably doesn’t solve the problem for imported subs, though.

Multirow Database Updates

§

William Blunn:

So, given a list of updates to apply we could effect them using the following steps:

  1. Use CREATE TEMPORARY TABLE to create a temporary table to hold the updates
  2. Use INSERT to populate the temporary table with the updates
  3. Use UPDATE … FROM to update the target table using updates in the temporary table
  4. Use DROP TABLE to drop the temporary table

So in the example above we can reduce five statements to four. This isn’t a significant improvement in this case. But now the number of statements is no longer directly dependent on the number of rows requiring updates.

Even if we wanted to update a thousand rows with different values, we could still do it with four statements.

Or you could just use one statement. Here’s how to do it with a CTE on PostgreSQL 9.2 and higher:

PostgreSQL Multirow Upate with CTE
1
2
3
4
5
6
7
8
9
10
WITH up(name, salary) AS ( VALUES
     ('Jane',  1200),
     ('Frank', 1100),
     ('Susan', 1175),
     ('John',  1150)
)
UPDATE staff
   SET salary = up.salary
  FROM up
 WHERE staff.name = up.name;

Still on PostgreSQL 9.1 or lower? Use a subselect in the FROM clause instead:

PostgreSQL Multirow Update with Subselect
1
2
3
4
5
6
7
8
9
UPDATE staff
   SET salary = up.salary
   FROM (VALUES
       ('Jane',  1200),
       ('Frank', 1100),
       ('Susan', 1175),
       ('John',  1150)
   ) AS up(name, salary)
 WHERE staff.name = up.name;

Stuck with MySQL or Oracle? Use a UNION query in a second table:

MySQL & Oracle Multirow Update
1
2
3
4
5
6
7
8
UPDATE staff, (
         SELECT 'Jane' AS name, 1200 AS salary
   UNION SELECT 'Frank',        1100
   UNION SELECT 'Susan',        1175
   UNION SELECT 'John',         1150
) AS up
   SET staff.salary = up.salary
 WHERE staff.name = up.name;

Using SQLite? Might make sense to use a temporary table for thousands or millions of rows. But for just a few, use a CASE expression:

SQLite Multirow Update
1
2
3
4
5
6
7
8
UPDATE staff
   SET salary = CASE name
       WHEN 'Jane'  THEN 1200
       WHEN 'Frank' THEN 1100
       WHEN 'Susan' THEN 1175
       WHEN 'John'  THEN 1150
   END
 WHERE name in ('Jane', 'Frank', 'Susan', 'John');

If you need to support multiple database architectures, sure, use something like DBIx::MultiRow to encapsulate things. But if, like most of us, you’re on one database for an app, I can’t recommend stongly enough how well it pays to get to know your database well.

TPF To Revamp Grants

§

Alberto Simões:

Nevertheless, this lack of “lower than $3000” grant proposals, and the fact that lot of people have been discussing (and complaining) about this value being too low, the Grants Committee is starting a discussion on rewriting and reorganizing the way it works. Namely, in my personal blog I opened a discussion about the Grants Committee some time ago, and had plenty of feedback, that will be helpful for our internal discussion.

This is great news. I would love to see more and more ambitious grant proposals, as well as awards people an subsist on. I look forward to seeing the new rules.

Mopping the Moose

§

Stevan Little:

I spent much of last week on vacation with the family so very little actual coding got done on the p5-mop, but instead I did a lot of thinking. My next major goal for the p5-mop is to port a module written in Moose, in particular, one that uses many different Moose features. The module I have chosen to port is Bread::Board and I chose it for two reasons; first, it was the first real module that I wrote using Moose and second, it makes heavy use of a lot of Moose’s features.

I’m so happy to see Stevan making progress on the Perl 5 MOP again.

A Perl Blog

§

I have been unsatisfied with Just a Theory for some time. I started that blog in 2004 more or less for fun, thinking it would be my permanent home on the internet. And it has been. But the design, while okay in 2004, is just awful by today’s standards. A redesign is something I have planned to do for quite some time.

I had also been thinking about my audience. Or rather, audiences. I’ve blogged about many things, but while a few dear family members might want to read everything I ever post, most folks, I think, are interested in only a subset of topics. Readers of Just a Theory came for posts about Perl, or PostgreSQL, or culture, travel, or politics. But few came for all those topics, in my estimation.

More recently, a whole bunch of top-level domains have opened up, often with the opportunity for anyone to register them. I was lucky enough to snag theory.pm and theory.pl, thinking that perhaps I would create a site just for blogging about Perl. I also nabbed theory.so, which I might dedicate to database-related blogging, and theory.me, which would be my personal blog (travel, photography, cultural essays, etc.).

And then there is Octopress. A blogging engine for hackers. Perfect for me. Hard to imagine something more appropriate (unless it was written in Perl). It seemed like a good opportunity to partition my online blogging.

So here we are with my first partition. theory.pm is a Perl blog. Seemed like the perfect name. I fiddled with it off and on for a few months, often following Matt Gemmell’s Advice, and I’m really happy with it. The open-source fonts Source Sans Pro and Source Code Pro, from Adobe, look great. The source code examples are beautifully marked up and displayed using the Solarized color scheme (though presentation varies in feed readers). Better still, it’s equally attractive and readable on computers, tablets and phones, thanks to the foundation laid by Aron Cedercrantz’s BlogTheme.

I expect to fork this code to create a database blog soon, and then perhaps put together a personal blog. Maybe the personal blog will provide link posts for posts on the other sites, so that if anyone really wants to read everything, they can. I haven’t decided yet.

In the meantime, now that I have a dedicated Perl blog, I guess I’ll have to start writing more Perl-related stuff. I’m starting with some posts about the state of exception handling in Perl 5, the first of which is already up. Stay tuned for more.

Trying Times

§

Exception handling is a bit of a pain in Perl. Traditionally, we use eval {}:

Perl eval
1
2
3
4
5
6
eval {
    foo();
}
if (my $err = $@) {
    # Inspect $err…
}

The use of the if block is a bit unfortunate; worse is the use of the global $@ variable, which has inflicted unwarranted pain on developers over the years1. Many Perl hackers put Try::Tiny to work to circumvent these shortcomings:

Try::Tiny
1
2
3
4
5
try {
    foo();
} catch {
    # Inspect $_…
};

Alas, Try::Tiny introduces its own idiosyncrasies, particularly its use of subroutine references rather than blocks. While a necessity of a pure-Perl implementation, it prevents returning from the calling context. One must work around this deficiency by checking return values:

Return from Try::TinyMore Gotchas
1
2
3
4
5
6
7
8
9
my $rv = try {
   f();
} catch {
   # …
};

if (!$rv) {
   return;
}

I can’t tell you how often this quirk burns me.