Re: Need a regex searching html code



Todd Benson wrote:
On Fri, Feb 29, 2008 at 1:19 PM, Jari Williamsson
<jari.williamsson@xxxxxxxxxxxxxxxxxx> wrote:
Mark Thomas wrote:
> All the regex solutions provided will break with the following
> perfectly valid HTML:
>
> <div class="info">
> <h5 >Tagline:</h5>
> Yippee Ki Yay Mo - John 6:27
> </div>
>
> This is one of many reasons it is a BAD idea to use regexes to parse
> HTML. Regular expressions are simply not the right tool for the job.

Sorry if I'm missing the point:
---
the_text = %q{

<div class="info">
<h5 >Tagline:</h5>
Yippee Ki Yay Mo - John 6:27
</div>
}

the_text.each_line do |line|
puts "Within DIV tags: #{line}" if (line=~/<div/)..(line=~/<\/div/)
puts "Within H5 tags: #{line}" if (line=~/<h5/)..(line=~/<\/h5/)
end
---

Result:
Within DIV tags: <div class="info">
Within DIV tags: <h5 >Tagline:</h5>
Within H5 tags: <h5 >Tagline:</h5>
Within DIV tags: Yippee Ki Yay Mo - John 6:27
Within DIV tags: </div>

What if you have a div inside a div? Although, the OP said "any"
legitimate html inside a div, there's part of me that begs the
question: which div?

Sure, for real-life HTML with nested tags it'll break. I just wanted to
point out that for simple parsing needs (as the example that I replied
to) regexps can find both beginning and end tags.



Best regards,

Jari Williamsson

.



Relevant Pages

  • Re: Need a regex searching html code
    ... the heading till the end of the div. ... Yippee Ki Yay Mo - John 6:27 ... John McClane takes on an Internet-based terrorist organization who is ... Note that this will give spurious results if an html comment happens ...
    (comp.lang.ruby)
  • Re: can DIV elements be reached?
    ... "There is no legal way to use the name attribute from such tags as ... For the HTML markup operated on MUST NOT contain `div' elements (or `span' ...
    (comp.lang.javascript)
  • Re: deleting HTML tag...but not everyone
    ... work on a tight set of HTML, formatted as you expect it to be. ... you should check out the HTML parsing modules on the CPAN. ... This is currently removing zero tags. ... verify that a DIV or /DIV is not next, ...
    (perl.beginners)
  • Re: OT: why are LAMP sites slow?
    ... > To emulate a table you use the div and span tag. ... html tags to create something like tab stops, ...
    (comp.lang.python)
  • HTMLParser question
    ... and each <div> having one of two classes. ... However what I would like is that when the parser reaches some HTML like ... I should get back the data between the open and close tags. ...
    (comp.lang.python)

Loading