Independent Study - Assistance?



I have a question for the language experts, if they don't mind. I've
been interested in language theory for quite some time, and I've done
'OK' in studying it on my own for the past year+ or so. I'm
interested in generative programming, and one of the things I want to
create in the future is a Compiler Compiler (of which I posted about
here recently).

Great goal, except there's a few things I don't yet know. One is how
to automate state machine generation for regular expressions, I just
haven't been able to get this part down (close, but not quite,
certainly not the least state version, I forgot to consider repeat
cycles -_-). I'm thinking a large part is due to my educational
background (or lack thereof). I don't know set theory, which I think
large parts of regular expression talk is based upon. I've read
articles that speak about regular languages, and creating NFAs to
transition into a DFA of the same functional nature. What I don't
know is starting to catch up to me, and it's been making it difficult
to even read most of the articles of late interest. They define a
series of symbols that equal theoretical sets and other stuff and use
mathematical operations I've never seen, so the comprehension just
isn't there. It's not that I'm incapable of getting it I just don't
know what they're saying (it's greek to me, if you'll pardon the pun).

( http://compilers.iecc.com/comparch/article/93-05-083 )
Here's an example, I read an article on minimal DFA Regex generation
(from comp.compilers to be exact, circa 1993). I understand what he's
saying but I can't see the practical side to things:
"0x" [0-9A-Fa-f]+ ([Uu] [Ll]? | [Ll] [Uu]?)? |
[0-9]+ (([Uu] [Ll]? | [Ll] [Uu]? | [Mm]) |
('.' [0-9]+ ([Ee] [+-]? [0-9]+)?)? ([Dd] | [FF] | [Mm]))?

[0-9] = x1
[Uu] = x2
[Ll] = x3
[Mm] = x4
[Dd] = x5
[FF] = x6
'0' = x7
[1-9] = x8
'x' = x9
[A-Fa-f] = x10
(x1|x10) = x11
x2 x3? = x12
x3 x2? = x13
x12|x13 = x14
x14|x4 = x15
'.' = x16
x1 x1* = x17
[Ee] = x18
[+-] = x19
x5 | x6 = x20
x20|x4 = x21
x7|x8 = x22

x7 x9 x11 x11* x14? |
x22 x1* (x15 | (x16 x17 (x18 x19? x16)?)? x21)?

Given the concept of taking a regular expression and lifting from it
the used symbols and sets, and using replacements for them, you should
obtain the least-state result regular expression. The problem as I
see it is how do you handle that from a practical standpoint,
generating useful information out of the above. Even assuming you
could write a program to handle it, how would you prevent a faux pas,
or since the replacements are at times a replacement of a replacement,
how would you ensure the movements you're making don't lead from one
area to another that you're not supposed to be in (ie. you follow the
-wrong- path)?

Mind you as far as my education goes, I have an Associate's of Science
in Computer Information Systems. That might give you perspective as
to what I'm lacking as far as traditional theory/math goes. If you're
thinking I should wait until I can study this formally, I don't know
if that's plausible because I don't know when I can afford to go back
to school.

I turned to this newsgroup because the further I go in my own
independent research, the less others around can assist (understand)
me. I'm in a relatively tech-free area, so there's very few tech
jobs, and programming jobs: there are none to speak of.
.



Relevant Pages

  • Re: Queries and OO
    ... >> language other than the programming language. ... regular expression engine is encapsulated and does not pollute the ... FitNesse has such a design. ...
    (comp.object)
  • Re: Why want to do the things Lisp is best at?
    ... Each language typically has a "catch" ... My two favorite languages to program in are Forth and Lisp. ... An example of this was when I got to compiling the regular expression. ... (format t "KEYWORD: ~A~%" text))) ...
    (comp.lang.lisp)
  • Re: Programmers unpaid overtime.
    ... > relevant to this newsgroup, by removing all the irrelevant stuff. ... You've discussed the C language in this ng as the C ... But because incompetent programmers delight in pointing out other ... The regular expression is (if blanks only constitute white ...
    (comp.programming)
  • Re: LLVM was Results of the memswap() smackdown . . .
    ... I guess what we are looking at here is some sort of regular expression ... A regular expression parser in the pre-processor would have ... in a general purpose language is to also validate the context. ... The translation probably should happen earlier in pre-processing ...
    (comp.programming)
  • Re: "It works well enough" (was Re: $640.00 to fill the tanks...)
    ... typical Algol derived procedure oriented language, ... replacements for real news clients, but RSS feeds are ... If RSS had come first, ... as in USENET or "news" as in current events? ...
    (rec.aviation.piloting)

Loading