Re: Need a regex searching html code
- From: Jari Williamsson <jari.williamsson@xxxxxxxxxxxxxxxxxx>
- Date: Fri, 29 Feb 2008 14:36:37 -0500
Todd Benson wrote:
On Fri, Feb 29, 2008 at 1:19 PM, Jari Williamsson
<jari.williamsson@xxxxxxxxxxxxxxxxxx> wrote:
Mark Thomas wrote:
> All the regex solutions provided will break with the following
> perfectly valid HTML:
>
> <div class="info">
> <h5 >Tagline:</h5>
> Yippee Ki Yay Mo - John 6:27
> </div>
>
> This is one of many reasons it is a BAD idea to use regexes to parse
> HTML. Regular expressions are simply not the right tool for the job.
Sorry if I'm missing the point:
---
the_text = %q{
<div class="info">
<h5 >Tagline:</h5>
Yippee Ki Yay Mo - John 6:27
</div>
}
the_text.each_line do |line|
puts "Within DIV tags: #{line}" if (line=~/<div/)..(line=~/<\/div/)
puts "Within H5 tags: #{line}" if (line=~/<h5/)..(line=~/<\/h5/)
end
---
Result:
Within DIV tags: <div class="info">
Within DIV tags: <h5 >Tagline:</h5>
Within H5 tags: <h5 >Tagline:</h5>
Within DIV tags: Yippee Ki Yay Mo - John 6:27
Within DIV tags: </div>
What if you have a div inside a div? Although, the OP said "any"
legitimate html inside a div, there's part of me that begs the
question: which div?
Sure, for real-life HTML with nested tags it'll break. I just wanted to
point out that for simple parsing needs (as the example that I replied
to) regexps can find both beginning and end tags.
Best regards,
Jari Williamsson
.
- References:
- Need a regex searching html code
- From: Chirantan
- Re: Need a regex searching html code
- From: William James
- Re: Need a regex searching html code
- From: William James
- Re: Need a regex searching html code
- From: Chirantan
- Re: Need a regex searching html code
- From: Mark Thomas
- Re: Need a regex searching html code
- From: Jari Williamsson
- Re: Need a regex searching html code
- From: Todd Benson
- Need a regex searching html code
- Prev by Date: Re: Need a regex searching html code
- Next by Date: Re: Monkeypatching is Destroying Ruby
- Previous by thread: Re: Need a regex searching html code
- Next by thread: Re: Need a regex searching html code
- Index(es):
Relevant Pages
|
Loading