Re: RegExp to capture IMAGE SRC only.
- From: "Evertjan." <exjxw.hannivoort@xxxxxxxxxxxx>
- Date: 14 Jul 2007 15:50:43 GMT
-Lost wrote on 14 jul 2007 in comp.lang.javascript:
Evertjan. wrote:
-Lost wrote on 14 jul 2007 in comp.lang.javascript:
Given this string:
"200%" 000 1111 222222 <img src="image.jpg" alt="Image 1. Fooling
The RegExp!" /> 1-888-555-5555 () / w00f $ ^ @-Lost <a
href="http://foomanchu.com/" title="foomanchu dot com">foomanchu</a>
<a href="relative.htm">relative</a> <a
href="relative.with.problems.htm>relative.with.problems</a>
I wish to extract *only* "image.jpg" sans the quotes.
Having a very limited knowledge of regular expressions I gave:
\w+\.\w{3}
<script type='text/javascript'>
var s ='"200%" 000 1111 222222 <img src="image.jpg"'+
' alt="Image 1. Fooling The RegExp!" /> 1-888-555-5555'+
' () / w00f $ ^ @-Lost <a href="http://foomanchu.com/"'+
' title="foomanchu dot com">foomanchu</a> <a'+
' href="relative.htm">relative</a> <a'+
' href="relative.with.problems.htm>'+
'relative.with.problems</a>'
s = s.replace(/.*?src="([^"]*)".*/,'$1')
alert(s)
</script>
Thank you, Evertjan., but one thing (actually 2).
1. If I added <object src="image.jpg"> before the IMG, it snags the
OBJECT's SRC. Can this be modified to only accept a SRC if it is
enclosed within <IMG ... >?
These regex replace()-es can always be defeted with an alternative
string.
You will have to define the expected string clearly.
Like: What if the containing string has
<img src="image.jpg"> xx <img src="image.jpg">
and I nly want the second one.
Or: What if ' is used in stead of "
So you will have to fine tune your regex again and again, if the source
is not of your own making and tends to change over time, like in data
mining.
2. I inserted another IMG after the first, and replaced $1 with $2.
Why did it not show me the 2nd IMG's SRC?
No, you cannot simply do that, you will have to learn regex by
scrutinizing the specs, and do trial and error code experiments.
Changing regex without thorough understanding the workings is bound to
fail.
I realize the ()'s create
the first back reference, but I guess I'm still a little confused as
to back references in general.
I should say so.
Keep in mind, I don't even understand your regular expression. I'm
not that advanced yet,
Even the best regex guru's have been at that stage, so do not worry, but
try and learn.
or simply not that swift, whichever you prefer.
;)
I put your regular expression in both The Regex Coach, and RegexBuddy
and in those applications the regular expression you provided matches
the *entire* string,
I do not know what those mean [Regex Coach, RegexBuddy], and probably do
not want to, the proof of the pudding is in the eating.
so, the chance to kind of "teach" myself what
your regular expression does was out of the question.
Shrt explanation:
s = s.replace(/.*?src="([^"]*)".*/,'$1')
===============
..* = match all chars [ . is any char, * is zero or more of them]
? = till [ ? is the "sparce" operator,
otherwise .* would be "greedy" till the end of string]
scr=" = till this string
[^"]* = [^"] matches any char EXEPT ", * is zero or more of them
" = matches " [could be left out, just for fun]
..* = all the rest chars in your string
What is IN the () is represented in the replacing string as $1
[a back reference in regex is, to my knowledge, something else, a
referense to an earlier part IN the regex itself, but that is semantics]
===============
This is really simple regex, so go and learn. ;-)
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
.
- Follow-Ups:
- Re: RegExp to capture IMAGE SRC only.
- From: -Lost
- Re: RegExp to capture IMAGE SRC only.
- References:
- RegExp to capture IMAGE SRC only.
- From: -Lost
- Re: RegExp to capture IMAGE SRC only.
- From: Evertjan.
- Re: RegExp to capture IMAGE SRC only.
- From: -Lost
- RegExp to capture IMAGE SRC only.
- Prev by Date: Re: OverLib - highlight text?
- Next by Date: Re: OverLib - highlight text?
- Previous by thread: Re: RegExp to capture IMAGE SRC only.
- Next by thread: Re: RegExp to capture IMAGE SRC only.
- Index(es):
Relevant Pages
|