Re: Unicode Emails vom Server als HTML files sichern oder so aehnlich..



wolfgang wrote:
ich habe ein Problem mit Japanern :-) Sie schicken UTF8 Mails.

Dein Beispiel ist aber ISO-2022-JP, nicht UTF-8.


Ich hole Emails von einem IMAP Server ab ud speichere sie incl.
Anhängen als HTML Files auf einer Pladde. Das geht seit langem etwa
so: (sinngemäss, habe zwecks Erklärung massig Zeugs, z.B. das
attachement handling und die HTML-isierung rausgeworfen ):

use Mail::IMAPClient; # for the imap server connection
use MIME::Parser; # for the mime mail parsing
use MIME::Parser::Filer; # to store parser objects to disk
use MIME::WordDecoder; # to handle iso-8859-1 encoded strings

MIME::WordDecoder kann laut Manpage nur ISO-8859-*, für alles andere
muss man sich selber Decoder schreiben. Ein kurzer Test bestätigt das:

% cat test-mime-worddecoder
#!/usr/bin/perl
use MIME::WordDecoder;

my $h =
'=?iso-2022-jp?B?U09OWRskQiEhGyhCSWNoaW5vbWl5YS9GYWlsdXJlIEFuYWx5c2k=?=
=?iso-2022-jp?B?cyBJbml0aWF0aW9uIGZvcm06IDA2LTA4OTsgUURCIEpvYiAxNTk3Mg==?=
';

my $h2 = unmime($h);

print "$h2\n";
% ./test-mime-worddecoder
ignoring text in character set `ISO-2022-JP'
at ./test-mime-worddecoder line 9
ignoring text in character set `ISO-2022-JP'
at ./test-mime-worddecoder line 9
11

Somit weiß ich jetzt wieder, warum ich das nicht verwende, sondern meine
eigene Implementation geschrieben habe. Sind ohnehin nur ein paar
Zeilen:

---8<------8<------8<------8<------8<------8<------8<------8<---
=head2 _decode_rfc2047

decode a string encoded according to RFC 2047

The returned string is in internal perl string representation and has
the UTF-8 flag set if it contains any non-ascii characters.

=cut

use MIME::Words qw(:all);

sub _decode_rfc2047 {
my ($enc) = @_;

my @words = decode_mimewords($enc);

my $dec = "";
for (@words) {
eval {
$dec .= $_->[1] ? decode($_->[1], $_->[0]) : $_->[0];
};
if ($@) {
# if decoding fails for any reason (usually unknown charset)
# we just append he encoded word.
$dec .= $_->[0];
}
}
return $dec;
}
---8<------8<------8<------8<------8<------8<------8<------8<---

hp

--
_ | Peter J. Holzer | Man könnte sich [die Diskussion] auch
|_|_) | Sysadmin WSR/LUGA | sparen, wenn man sie sich einfach sparen
| | | hjp@xxxxxx | würde.
__/ | http://www.hjp.at/ | -- Ralph Angenendt in dang 2006-04-15
.



Relevant Pages

  • Re: Defacto standard string library
    ... string manipulation code works as well and correctly with UTF-8 ... multibyte character strings as it does with ASCII strings. ... sequence is 0xC2 (when encoding character value 0x80). ...
    (comp.lang.c)
  • Re: XML::LibXML UTF-8 toString() -vs- nodeValue()
    ... Wide character in print at -e line 1. ... The differences are in the encoding of the source file (UTF-8 vs. ... the string constant was converted to a character string: ...
    (comp.lang.perl.misc)
  • Re: PHP5 and Double Byte (experts wanted)
    ... :> is able to sort all character sets correctly. ... :> string functions have multibyte aquivalents. ... string routines to search through the utf-string, ... utf-8 aware routines and looking for a character, ...
    (comp.lang.php)
  • Re: creating xml document
    ... In your code there are three different character sets to consider. ... Hence this method:- adoRS.value would be returning a unicode string. ... The second character set to consider is UTF-8 (also known on Windows ... it use a variable length encoding. ...
    (microsoft.public.scripting.vbscript)
  • Re: creating xml document
    ... In your code there are three different character sets to consider. ... Hence this method:- adoRS.value would be returning a unicode string. ... The second character set to consider is UTF-8 (also known on Windows ... it use a variable length encoding. ...
    (microsoft.public.scripting.vbscript)