Re: HTML Parser: Which one is better?
- From: Erik Hollensbe <erik@xxxxxxxxxxxxx>
- Date: Thu, 31 May 2007 23:11:06 -0700
On 2007-05-31 02:36:57 -0700, "Richard Conroy" <richard.conroy@xxxxxxxxx> said:
On 5/31/07, *** Davies <rasputnik@xxxxxxxxx> wrote:Hpricot is a good starting point.
Yeah Hpricot is good, but in general the quality of the Ruby web scraping
choices is pretty impressive. There are variants that are just built on top
of Hpricot but provide an even simpler API.
However your second problem is a bit trickier, where you encounter
alternate encodings. To do any kind of real work with multiple code
pages you want to be converting it to unicode (UTF-8) at fetch time.
I've had great success with this. Just make sure you're using a later version of Ruby 1.8.5+ (that includes the NKF library) and you should be fine.
.
- Prev by Date: Re: Ruby bug in +=?
- Next by Date: Re: Unicode roadmap?
- Previous by thread: Re: HTML Parser: Which one is better?
- Next by thread: Re: HTML Parser: Which one is better?
- Index(es):