Re: RegExp as Finite State Machine
- From: Lasse Reichstein Nielsen <lrn.unread@xxxxxxxxx>
- Date: Mon, 05 Jan 2009 07:15:51 +0100
Thomas 'PointedEars' Lahn <PointedEars@xxxxxx> writes:
Lasse Reichstein Nielsen wrote:
[/^(11+)\1+$/]
Ah, but I think you confuse the language that a regular expression can
match, L_m(r) -- he above matched language is context-free (Chomsky Type-2)
but not regular (Type-3), if I'm not mistaken -- with the Regular Expression
language, L_r.
Probably. The latter is much less interesting (and definitly context free)
The language recognized by the regexp above is actually not even
context-free. For a language over an alphabet with only one element,
the pumping lemma for context free languages is equivalent to the
one for regular languages. It's probably Type-1.
IIRC, it's known to be in (NP intersect co-NP).
In order to prove that the ECMAScript Regular Expression
language is not regular, you have to apply the pumping lemma for regular
languages to the expression itself, not to the strings it could match.
It's even simpler than that. The regular expression language has
matched parentheses to an arbitrary depth. That alone is enough to
rule out being regular.
The expression is a valid Javascript regular expression that recognizes
a non-regular language, so Javascript regexps are not restricted to the
regular languages (and therefore not possible to implement using only
a finite state machine).
Sorry, I don't think you are correct here.
I should be, unless there is a flaw in the logic.
If all Javascript regexps can be implemented using only finite state
machines, then all the languages recognized by them are regular (a FSM
can only recognize a regular language).
Since the language of /^(11+)\1+$/ is not regular, and is a Javascript
regexp, there is at least one language that cannot be implemented using
a FSM.
...
What I probably /should/ have also said, is that by FSM I mean a
deterministic machine (i.e., a DFA, one that is run linearly over the
input). A nondeterministic machine can use any number of tricks to
implement the nondeterminism, including backtracking.
But your premise is incorrect. Nondeterministic Finite Automata (NFAs),
also called "nondeterministic finite state machines", are definitely a
subset of Finite State Machines (FSMs).
That is definitly true. However, we don't have the hardware yet to
implement NFA's directly, so we have to emulate the nondeterminism
somehow. The typical way is to implement a nondeterministic choice by
trying one branch and setting up a backtrack point to try the other in
case the first fails. This gets you a stack of backtrack points, which
means that the implementation no longer uses a finite amount of
space. The space used can depend on the input, not only the automaton.
This is why I don't think of this implementation as really being a
finite state machine, although it is emulating one at a certain
level of abstraction.
There are other ways to implement a NFA that doesn't use a stack,
but instead determinizes the DFA (possibly lazily during execution).
This can instead lead to exponential space blow-up, which is still
finite, but not always manageable.
/L
--
Lasse Reichstein Holst Nielsen
'Javascript frameworks is a disruptive technology'
.
- References:
- Re: RegExp as Finite State Machine
- From: Thomas 'PointedEars' Lahn
- Re: RegExp as Finite State Machine
- From: Lasse Reichstein Nielsen
- Re: RegExp as Finite State Machine
- From: Thomas 'PointedEars' Lahn
- Re: RegExp as Finite State Machine
- From: Lasse Reichstein Nielsen
- Re: RegExp as Finite State Machine
- From: Thomas 'PointedEars' Lahn
- Re: RegExp as Finite State Machine
- Prev by Date: Re: questions about form action= and onsubmit() handling
- Next by Date: Intercept calls to 'document.cookie'?
- Previous by thread: Re: RegExp as Finite State Machine
- Next by thread: Re: RegExp as Finite State Machine
- Index(es):
Relevant Pages
|