run_cmd && utf8 streams

Marc Lehmann schmorp at schmorp.de
Sat Jul 7 04:07:57 CEST 2012


On Fri, Jul 06, 2012 at 02:17:38PM +0200, Zsbán Ambrus <ambrus at math.bme.hu> wrote:
> It does not assume that.  Note the length checks I've left in.

The length checks don't solve the problem.

> believe these should guarantee that any error is detected at most a
> couple of input bytes later.

Or never...

> I did acknowledge that the errors might be detected later than would be
> possible,

Or never...

The length check increases the detection ability, but doesn't guarantee
detection.

> > I definitely cannot - it has no state to store the shift state, and no way
> > to detect code shifts.
> 
> It could leave approperiate shift bytes in the byte string buffer,

*If* somebody *added* some state ("byte string buffer" is just another name
for state), *then* you could store that in there, yes.

But since Encode has no byte string buffer or other means of storing state
it definitely cannot.

While it might have some merit to think about how to change Encode to
support state, the current API doesn't support it, and thus cannot decode
any kind of multibyte encoding incrementally (including utf-8).

There is also no way to extend the public API to do it properly.

"It works for some inputs but not others" doesn't qualify :)

-- 
                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_              http://www.deliantra.net
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /      schmorp at schmorp.de
      -=====/_/_//_/\_,_/ /_/\_\



More information about the anyevent mailing list