Re: Reliable character encodings conversion



On Sep 30, 2008, at 8:20 AM, Hubert Łępicki wrote:

2008/9/30 James Gray <james@xxxxxxxxxxxxxxxxxxx>:
On Sep 30, 2008, at 7:30 AM, Hubert Łępicki wrote:

I am using Iconv library wrapper to convert texts to UTF8, but it's
throwing "Iconv::IllegalSequence" exception.

You can add a //TRANSLIT to the end of the "to" encoding to have
Iconv
attempt to convert characters to reasonable equivalents in that
encoding.
This is usually more helpful when your input is all one encoding
and just
has some characters that won't translate well (like a UTF-8 …
going to
ISO-8859-1).

Your case of mixed encodings is probably best handled with //IGNORE
instead,
which asks Iconv to skip over any characters that cannot be
converted. You
will loose some data with this, but it will convert what it can.

You can also use //TRANSLIT//IGNORE to convert what can be
converted and
skip the rest.


Thanks, //IGNORE//TRANSLIT seems to help a bit - but it's not perfect.

You listed those backwards. Is that really what you tried? Does
reversing them make any difference?

James Edward Gray II
.



Relevant Pages

  • Re: Strange Characters When Viewing Outlook Express messages
    ... Messages Received in Outlook Express Have Different Characters in the ... messages in the default encoding format regardless of the actual encoding ... changed something with whatever they use to produce the emails. ...
    (microsoft.public.windowsxp.general)
  • Re: Help me!! Why java is so popular
    ... Well, Unicode is not a storage encoding system, or anything like that. ... Unicode is primarily a mapping from characters (in the linguistic conceptual ... French, Russian, Japanese and Korean songs. ...
    (comp.lang.java.programmer)
  • Re: Workable encryption in Tcl??
    ... abstract characters using the concrete UTF-8 encoding, ... character streams and octet streams when doing input and output. ... How does this relate to encryption? ...
    (comp.lang.tcl)
  • Re: Trasferire file
    ... The Base64 Content-Transfer-Encoding is designed to ... The encoding and decoding algorithms ... as output strings of 4 encoded characters. ... that this may be done directly by the encoder rather than in ...
    (it.comp.macintosh)
  • Re: Workable encryption in Tcl??
    ... like TCL deals with the abstract ... > abstract characters using the concrete UTF-8 encoding, ... > character streams and octet streams when doing input and output. ...
    (comp.lang.tcl)