Re: How to clean an xml files from non-utf-8 chars?
- From: Gregory Brown <gregory.t.brown@xxxxxxxxx>
- Date: Wed, 17 Sep 2008 13:31:17 -0500
On Wed, Sep 17, 2008 at 12:47 PM, Jeremy Hinegardner
<jeremy@xxxxxxxxxxxxxxx> wrote:
On Wed, Sep 17, 2008 at 09:44:23PM +0900, James Gray wrote:
On Sep 17, 2008, at 4:07 AM, Krzysieq wrote:
I have a problem. I'm trying to parse with ruby some test results from
jmeter, that are stored in xml files. Unfortunately, while they should be
utf-8, some of them aren't. Probably because some db data isn't. In any
case, this makes other toys break down, like xslt transformation and
anything else that relies on the xml files being utf-8.
Does anyone know, how to get rid of such characters?
If you can figure out the encoding they are actually in, I recommend using
Iconv's transliterate mode:
require "iconv"
Iconv.conv("UTF-8//TRANSLIT", old_encoding_name, data)
This is the approach we have take on some of our code, basically we wanted to
replicate the 'iconv -c' behavior. Does TRANSLIT do this ? I've never used
that mode before.
module UTF8
module Cleanable
#
# Converts the string representation of this class to a utf8 clean
# string. This assumes that #to_s on the object will result in a utf8
# string. All chars that are not valid utf8 char sequences will be
# silently dropped.
To silently drop chars with IConv, you'd want to do:
Iconv.conv("UTF-8//IGNORE", old_encoding_name, data)
TRANSLIT just works a little harder and tries to convert your
characters into a series of UTF-8 chars if possible.
I'm not sure if it drops chars that can't be transliterated...
-greg
--
Technical Blaag at: http://blog.majesticseacreature.com | Non-tech
stuff at: http://metametta.blogspot.com
.
- References:
- How to clean an xml files from non-utf-8 chars?
- From: Krzysieq
- Re: How to clean an xml files from non-utf-8 chars?
- From: James Gray
- Re: How to clean an xml files from non-utf-8 chars?
- From: Jeremy Hinegardner
- How to clean an xml files from non-utf-8 chars?
- Prev by Date: Re: RMagick > image crop or clipp with 4 cornres coordinates
- Next by Date: Re: How to clean an xml files from non-utf-8 chars?
- Previous by thread: Re: How to clean an xml files from non-utf-8 chars?
- Next by thread: Re: How to clean an xml files from non-utf-8 chars?
- Index(es):
Relevant Pages
|