regex: capture groups and term binding



Hi All,

Let's get down to it...

I have a long string of the form:

string = <<-EOVAR
XD 1 * 100000436 3441863 1550663 1161254 951982
XD 1 479903531056 47988002622 21360568539 18276299303 15476234490
XD 1 66934 5552 321640438 40297830 0
XD 1 0 3235 2197 10907 1631621
XD 1 15488078 210564267 574075997 2405132745 7805716381
XD 1 0 4949 0 58361 0
(goes for about 17 lines, all separated by \n)
<<EOVAR

I'm building a regex for this string and it's pretty straightforward.
Only prerequisite is to capture all numbers for later Ruby fun:

regex = %r{XD\s2\s\*\s(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\n ...etc... }mx

I would like to pare it down a bit, using term binding:

regex = %r{XD 1 \* (\d+\s+){5}\n ...etc...}mx

If I do this then only the last group is captured

pp var.scan(regex)
[["951982\n"]]

If this worked, I could shorten it much much more.. all of the lines
after the first one have exactly the same format and I need to capture
all of the variables.

mother_of_all_regexen = %r{XD\s1\s\*\s((\d+\s+){5})\n(XD\s1
(\d+\s+){5})){17} }mx

or something :-)

So,

- Can I use capture groups and term binding?
- Why am I only capturing the last term?
- Should I just stop trying to be clever and explicitly match against
all parts of the string?

The reason I want to do this as a single regex is that I've written a
framework that grabs files, monkeys around with them and then applies
a rule-set from a YAML file to create output. For each "signature" in
the YAML file one can choose a defined action (match, count, compare
etc) which relate to methods in the main code. This allows the editor
of the YAML to add signatures etc to their hearts desire... And more
importantly, it means that I won't have to maintain the ruleset.
(woohoo!)

Thanks in advance for any suggestion

SM

--
Simon Mullis
_________________
simon@xxxxxxxxxxxx

.



Relevant Pages

  • Re: Regex optimization
    ... I was hoping that someone with knowledge of the Regex engine could ... match per string for either Regex. ... reluctant modifier, may be slower .*?, +? ... Variable parts will try to capture as much as possible. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Regex Capture problem
    ... "learned" my regex using a freeware utility that had slightly different ... was trying to capture instead of. ... I have used Regex utilities before, so I understand the concepts of text ... Function RESub(str As String, SrchFor As String, ReplWith As String) As String ...
    (microsoft.public.excel.programming)
  • Regular expression captures
    ... String: This action has been executed via Console ... RegEx r = RegEx ... This is where I expected the capture!! ...
    (microsoft.public.dotnet.framework)
  • Re: Fastest way to search a string for the occurance of a word??
    ... but the OP's question was what's the "Fastest way to search a string ... in all the tests I did here, the Regex was by far superior. ... However, of course, if you've got new regular expressions all ... Sure - but just that extra Match object could be relevant if the search ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: regular expression help
    ... Basically because if you remove everything that is optional in the regex below you end up with an empty regex: ... So the regex engine will try to match on every character in the string: ... , comma doesn't match, but the nothingness in front of it does. ... A quote followed by any sequence of characters that is not a quote, ...
    (microsoft.public.dotnet.framework)