run_cmd && utf8 streams
schmorp at schmorp.de
Fri Jul 6 14:08:45 CEST 2012
On Fri, Jul 06, 2012 at 12:54:28PM +0200, Zsbán Ambrus <ambrus at math.bme.hu> wrote:
> Perl's encode can actually handle incremental decoding of utf-8.
> Here's some example code for how to use Encode this way at the end of
That example doesn't decode incrementally, it simply gives up at the first
error and assumes that a partial character is the same as an encoding
error (which isn't allowed in utf-8).
> There are some caveats. Encode still might not be able to
I definitely cannot - it has no state to store the shift state, and no way
to detect code shifts.
> > The perlio :encoding layer comes closest (it employs some hacks in more
> > recent perls), but still cannot decode multibyte data incrementally.
> My problem with the encoding layer is that it still has some bugs even
> in recent perl versions. It seems that some of these bugs I can't
> even work around without throwing away the layer completely.
It's very frustrating yes - the original API was designed while completely
neglecting prior art, and at some point, somebody realised this is not
going to work and added another layer of hacks (cat_decode) that is not
really exposed via the normal Encode API and still fails to work properly.
Of course, the world is full of botched APIs (C multibyte has no way
to deal with generic encodings or unicode, the new char16/char32 API
still uses an implementation-defined encoding, posix iconv is unusable as
originally specified etc.).
But none of these are as botched as Encode :)
The choice of a Deliantra, the free code+content MORPG
-----==- _GNU_ http://www.deliantra.net
----==-- _ generation
---==---(_)__ __ ____ __ Marc Lehmann
--==---/ / _ \/ // /\ \/ / schmorp at schmorp.de
More information about the anyevent