Re: regex help partial code



On Jan 13, 5:41 pm, Junkone <junko...@xxxxxxxxx> wrote:
On Jan 13, 4:25 pm, Junkone <junko...@xxxxxxxxx> wrote:



Hello
I have with great effort started on teh regex implemention and am
stuck at one point.
here is my code
(add)\s+\S+\s+[i|s|f]\s+[pei]?\s+

The string is as follows and the tokens are equavalent to
1 token=Add
2 token=any text
3. token=one of the following i or s or f
4.token= 0 or 1 of the following characters [p e i]

I am stuck at the 4th token. i thought ? after the token will give
the 0 or 1 option.
pl correct me if i made a error

seede

irb(main):046:0> a=Regexp.new('(add)\s+(\S+)\s+([i|s|f])\s+([pei])?\s
+',true)
=> /(add)\s+(\S+)\s+([i|s|f])\s+([pei])?\s+/i
irb(main):047:0> a =~("Add comp i pe")
=> nil

I dont get match at 4th token

There are a few problems here, I think.
Regexp.new('(add)\s+(\S+)\s+([i|s|f])\s+([pei])?\s+',true)

As written, the second token can't have whitespace. Is that what you
intended?

You want ([isf]) for your third token. That will match a single
instance of one of the three characters.

Also, I'm not clear on what you want for the fourth token. Do you want
0 or 1 of [pei] or 0 or 1 of *each* of [pei] - which of these are
valid for the fourth token: e or pe.

You should also end the regex with \s* unless there's always going to
be whitespace. You're currently checking for one or more whitespace
characters at the end, but your test didn't have any.

Assuming you do allow 0 or 1 of each of [pei] for the fourth token and
0 or more whitespace characters at the end, I would use:

r = /(add)\s+(\S+)\s+([isf])\s+(p?e?i?)\s*/i

Though, if there can't be whitespace in any of the tokens, you might
want to use split rather than a regex. The regex above can only get
the fourth token right if the letters are in order - p will match, ei
will match, ip will not. Split wouldn't have that problem. There may
be a regex way around that, but I can't think of it at the moment.

"Add comp i ep".scan r # => [["Add", "comp", "i", "e"]]
"Add comp i ep".split # => ["Add", "comp", "i", "ep"]
.



Relevant Pages

  • Re: Splitting a string with Regex and keep the separator
    ... The regex is quite big already. ... a key/val pair. ... If this group captures multiple tokens they're added to the group's Captures collection in the order in which they're found. ... A token is made up of one or more alphanumeric characters. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Splitting a string with Regex and keep the separator
    ... I want also to thank you for the regex explanation. ... a key/val pair. ... If this group captures multiple tokens they're added to the ... is made up of one or more alphanumeric characters. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: My first Python program -- a lexer
    ... I had been thinking about further reducing the number of regex matchings needed. ... So I wanted to modify my lexer not to tokenize the whole input at once, but only try to grab the next token from the input "just in time" / on demand. ... [Optimizing performance by putting most frequent tokens first] ... Still, I should be able to gain even better performance with my above approach using a nextfunction, as this would completely eliminate all "useless" matching (like trying to match FOO where no foo is allowed). ...
    (comp.lang.python)
  • Re: preg_match at offset
    ... input string and the tokens array in the constructor. ... regex would be more suitable: ... From personal experience, PHP performs very well ...
    (comp.lang.php)
  • Re: preg_match at offset
    ... I want to split a given string into tokens which are defined by regexes: ... regex would be more suitable: ... can contain assertions such as ^, ...
    (comp.lang.php)