google translate

Benjamin R. Haskell rxvt-unicode at benizi.com
Sun Jun 27 21:03:12 CEST 2010


On Sun, 27 Jun 2010, Marc Lehmann wrote:

> On Sun, Jun 27, 2010 at 02:38:17AM -0400, "Benjamin R. Haskell" <rxvt-unicode at benizi.com> wrote:
> > Your commented-out code was close.  I think you want locale_decode 
> > instead of locale_encode.
> 
> I don't know anything about google webservices (and this is unlikely 
> to have anything to do with urxvt, which expects unicode), but I 
> somehow don't believe that google delivers text in your local encoding 
> (how would it know :).

Google knows all... :-)

You're right of course.  I thought that locale_{encode,decode} might 
have been using the sometimes-confusing directionality of 
Encode::{encode,decode}.  That is: locale_decode would decode a bunch of 
bytes that contained UTF-8 (into a locale-appropriate encoding).


> My guess is that the webservice expects utf-8 and wants utf-8, so as 
> rxvt delivers and epxects text in unicode, you first have to 
> encode/decode to/from utf-8, using e.g. the Encode module.

Not sure what distinction between unicode and utf-8 (a particular 
Unicode encoding) you're drawing here, unless you're just saying that 
rxvt expects strings to be perl-internal Unicode (UTF-8 flag on).


> But again, I don't know what encoding WebService::Google::Language 
> expects, but it will almostc etrainly not match your local encoding 
> except by chance.

The chance of UTF-8 being the local encoding is pretty good these days 
apparently :-)

Updated version that uses Encode, allows for multiple language pairs, 
and updates the selection when used from the selection popup is at: 
http://benizi.com/urxvt/google-translate

The perl:google-translate command is modified:

perl:google-translate:src:dst   (translates from src to dst)
perl:google-translate:src:dst:1 (...also updates the selection)
(If 'src' is empty, it's auto-detected)

It still uses the gt_lang.src and gt_lang.dst resources, but as 
comma-separated lists.  Default src is '' (auto-detect), default dst is 
equivalent to 'en,fr,it,de,es,pt,ru,hi,zh-CN,zh-TW,ja,ko,el,ar,iw', to 
cover English and the most common translation languages (in the U.S.  
English market): FIGS (France, Italy, Germany, Spain), BRIC (Brazil, 
Russia, India, China), and CJK.  And Greek, Arabic, and Hebrew (to test 
some other charsets).

It also adds the gt_lang.pairs resource, which is a comma-separated set 
of src:dst pairs.  (The .src and .dst resources get expanded via 
Cartesian product, which can be unwieldy.)  The gt_lang.label resource 
provides the label text and defaults to 'Google translate'.  If the 
gt_lang.codes resource is non-empty, the language codes aren't converted 
into names.  (Disables the 'en' => 'English' mapping.)

Anything with wide/variable-width/RTL characters displays weirdly (as 
might be expected), but seems to copy-paste via X selection just fine.

-- 
Best,
Ben




More information about the rxvt-unicode mailing list