Re: Leftmost longest match with DFA search



Stefan Monnier <monnier@xxxxxxxxxxxxxxxx> writes:

[snip]
The "obvious" solution of turning the problem "search for RE" into the
problem "match .*RE" (where I use "match" here to mean "anchored
search") only gives you the leftmost shortest match.
[snip]
Stefan

I've used the approach to compile a DFA for the reverse RE, say ER, and
first match .*ER on the reverse text to find the leftmost anchor point.
Then match RE from that point to find the longest span.

Interesting. But doesn't it basically force you to scan the complete text?
That can be impractical.

You don't (always) need to scan the complete text. You can perform
that reversed search on increasingly longer prefixes of the text. If
you increase the lengths of the prefixes by a constant factor, you
work for a time in O(K) to find a leftmost match when the latter
completely lies in the first K characters.

Danny

.