Re: WWW::Mechanize with frames
- From: James Britt <james_b@xxxxxxxxxxxxx>
- Date: Wed, 30 Nov 2005 11:14:24 +0900
AlexG wrote:
Hi,
I'm trying to do some screen scraping from a site using frames. Using WWW::Mechanize gives back an 'error' page from the site rather than the data I wanted:
This is the content of the frame page. It, in turn, fetches other pages and loads them into its frames. Browsers that do not support frames see the content in the noframes element.
If you want to snarf a framed page, you'll need to treat each framed items as the separate HTML pages that they are.
Here it appears to be the pages flat_navigation.php4?ecno=1.2.1.12 , flat_head.php4?ecno=1.2.1.12&organism= and flat_result.php4?ecno=1.2.1.12&organism%5B%5D= .
You'll need to supply the complete URL of course.
I do not think that Mechanize handles frames by default, but you could teach it to grab the frame elements and parse the src attribute, then construct the full URL.
James --
http://www.ruby-doc.org - Ruby Help & Documentation http://www.artima.com/rubycs/ - Ruby Code & Style: Writers wanted http://www.rubystuff.com - The Ruby Store for Ruby Stuff http://www.jamesbritt.com - Playing with Better Toys http://www.30secondrule.com - Building Better Tools
.
- Follow-Ups:
- Re: WWW::Mechanize with frames
- From: Jim Van Fleet
- Re: WWW::Mechanize with frames
- References:
- WWW::Mechanize with frames
- From: AlexG
- WWW::Mechanize with frames
- Prev by Date: Re: blocks
- Next by Date: Re: blocks
- Previous by thread: WWW::Mechanize with frames
- Next by thread: Re: WWW::Mechanize with frames
- Index(es):
Relevant Pages
|