Re: RegExp as Finite State Machine



Lasse Reichstein Nielsen wrote:
Thomas 'PointedEars' Lahn <PointedEars@xxxxxx> writes:
Lasse Reichstein Nielsen wrote:
Classsical counter-example of regularness:
/^(11+)\1+$/
which recognizes composite numbers (non-primes) in unary notation.
In which way is that proof that ECMAScript Regular Expressions are not a
regular language?

The language is not regular (It's composite, the language of "1"'s of
prime length, is not regular. This is failry easily proved by assuming
it's regular, and using pumping to derive a contradition).

Ah, but I think you confuse the language that a regular expression can
match, L_m(r) -- he above matched language is context-free (Chomsky Type-2)
but not regular (Type-3), if I'm not mistaken -- with the Regular Expression
language, L_r. In order to prove that the ECMAScript Regular Expression
language is not regular, you have to apply the pumping lemma for regular
languages to the expression itself, not to the strings it could match.

The expression is a valid Javascript regular expression that recognizes
a non-regular language, so Javascript regexps are not restricted to the
regular languages (and therefore not possible to implement using only
a finite state machine).

Sorry, I don't think you are correct here.

Of the current crop of browsers, AFAIK none of them use a FSM
based implementation of regular expressions.

So sure are you, despite other engines that also support backreferences
(like egrep) have been demonstrated to use FSM implementations?
<http://oreilly.com/catalog/regex/chapter/ch04.html>

Referring to this?
"You might, however, notice that GNU's version of egrep does support
backreferences. It does so by having two complete engines under the
hood! It first uses a DFA engine to see whether a match is likely,
and then uses an NFA engine (which supports the full flavor,
including backreferences) to confirm the match."

Yes, exactly.

I'm only talking about the current regexp implementations because I
have actually been looking at them.

We'll see.

What I probably /should/ have also said, is that by FSM I mean a
deterministic machine (i.e., a DFA, one that is run linearly over the
input). A nondeterministic machine can use any number of tricks to
implement the nondeterminism, including backtracking.

But your premise is incorrect. Nondeterministic Finite Automata (NFAs),
also called "nondeterministic finite state machines", are definitely a
subset of Finite State Machines (FSMs).

<http://en.wikipedia.org/wiki/Nondeterministic_finite_state_machine>


PointedEars
.



Relevant Pages

  • Re: Armenian, Sumerian, Burushaski, and Turkic languages
    ... indicating the importance of finding regular changes. ... There are certainly plenty of known regular correspondences ... Note that due to sporadic language change, ... sound changes, large variance in what counts as "similar"), you're ...
    (sci.lang)
  • Re: American as creolish [was] Re: Baltic Is Gothic
    ... > language and language evolution. ... Forming "regular" preterits, participles, and plurals in English ... > introduces redundancy with formal tokens. ... expressions became "normal", driving out the old expressions with no ...
    (sci.lang)
  • Re: RegExp as Finite State Machine
    ... but not regular, if I'm not mistaken -- with the Regular Expression ... The language recognized by the regexp above is actually not even ... If all Javascript regexps can be implemented using only finite state ... subset of Finite State Machines. ...
    (comp.lang.javascript)
  • Re: Search for multiple things in a string
    ... > language in places. ... > Regular expressions ... None of those state that regular expressions aren't a language. ... readability is a very important part of choosing the best ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Reguar expression - what with *
    ... Which of the pieces does not belong to the language a*? ... It does not contain bbb but is not in your regular expression. ... automaton for the complement looks the same with final states becoming ...
    (sci.math)