run_cmd && utf8 streams

Marc Lehmann schmorp at schmorp.de
Fri Jul 6 08:23:39 CEST 2012


On Thu, Jul 05, 2012 at 06:45:34PM -0600, Darin McBride <darin.mcbride at shaw.ca> wrote:
> Is there a recommended way to run commands that might output utf8 streams 
> (e.g., Japanese)?

The recommended way, really, is to run them in whatever way you like.
Nowadays, all I/O is done in octets, so thats what you receive.

If you want to convert utf-8 to unicode, you can use Encode::decode to
convert utf-8 into unicode codepoints, after gathering all the output in
a scalar.

As for incremental decoding, perl unfortunately doesn't have an interface for
this kind of thing (Encode cannot incrementally encode or decode).

The perlio :encoding layer comes closest (it employs some hacks in more
recent perls), but still cannot decode multibyte data incrementally.

For incremental decoding, your best bet is to implement some state machine
and parse the utf-8 with a regex.

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp at schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\



More information about the anyevent mailing list