Re: Regex Help



David-
Thanks, the regex you posted works great. I had considered just trimming the text inside the tags and then untrimming until a word end, but I figured there would be a regex that would do it all at once.


Thanks Dave-
Ezra

On Aug 22, 2005, at 2:53 PM, David A. Black wrote:

Hi --

On Tue, 23 Aug 2005, John Halderman wrote:


Seems to me that you're trying to do too much with one regular expression. I
would just grab the content between your tags and then trim that down to 50
characters and reassemble it afterwards.



I'm not sure what you mean by "too much". I think the substitution I suggested does what Ezra said he needed. Is there an error in it?


David



-j

On 8/22/05, David A. Black <dblack@xxxxxxxxxxxx> wrote:


Hi --

On Tue, 23 Aug 2005, Ezra Zygmuntowicz wrote:


Hey Guys-
I have a regex problem that I am not sure how to tackle. I am parsing
some classified ads in order to format them for display online. I have


most

of the parsing done but I need help with the final step. So the file has

one

ad per line and a line looks like this:

<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2 bath,
irrigation, horse barn. $122,000. 509-697-6519<endad>


Now I have already parsed everything to get it to this state but what I
need to do next is to count 50 chars after the <begad:11559303> tag and
insert </ftditm>
But the tricky part is that I need to place the </ftditm> 50 characters


in to

the line but if the 50 chars ends in the middle of a word then I need to
match the rest of the word as well. So I need a way to match at least 50
chars plus the rest of the current word if the 50'th char lands in the


middle

of a word.
So for this particular ad 50 chars makes it to here:
<ftditm><begad:11559303>Selah Country Home 1.5 acres. 3 bdrm, 2 bath,


irri

#<= 50 chars ends here# gation, horse barn. $122,000.

509-697-6519<endad>

So it ends in the middle of the word irrigation and I need it to consume
the whole word.



Here's one idea:

str.sub(/(<begad:[^>]+>.{1,50}.*?\b)/, "\\1<\/ftditm>")


David -- David A. Black dblack@xxxxxxxxxxxx






-- David A. Black dblack@xxxxxxxxxxxx


-Ezra Zygmuntowicz Yakima Herald-Republic WebMaster 509-577-7732 ezra@xxxxxxxxxxxxxxxxx



.



Relevant Pages

  • Re: Regex Help
    ... would just grab the content between your tags and then trim that down to 50 characters and reassemble it afterwards. ... chars plus the rest of the current word if the 50'th char lands in the ... David A. Black dblack@xxxxxxxxxxxx ...
    (comp.lang.ruby)
  • Re: Regex help
    ... Basically I need to parse a page for certain information which ... will be fed back into CURL to post to a site. ... I don't need any other tags. ... i'd apply another regex to break ...
    (comp.lang.php)
  • Re: Regex help
    ... be fed back into CURL to post to a site. ... I don't need any other tags. ... i'd apply another regex to break ... I was thinking of trying to just get everything for a single element ...
    (comp.lang.php)
  • Re: Regex help
    ... fed back into CURL to post to a site. ... I don't need any other tags. ... i'd apply another regex to break out ... was thinking of trying to just get everything for a single element type ...
    (comp.lang.php)
  • Re: Regex help
    ... I don't need any other tags. ... It doesn't have to be all one regex. ... Hi, Steve, ... I was thinking of trying to just get everything for a single element type, but this gives me another idea, also. ...
    (comp.lang.php)