LaTeX, misallocated effort?
- From: "tsy" <tsy@xxxxxxxxxx>
- Date: 29 Apr 2006 09:38:40 -0700
Dear c.t.t. people, I am posting a sketch below to start a discussion
of some philosophical matters, which interest me greatly now. I hope
for interesting discussion. Thanks in advance.
When Lamport introduced LaTeX it was a great improvement over TeX
because this brought generic markup into TeX. LaTeX is very reasonable
as a markup language (and at that time it was a hit). However, LaTeX
most probably misused TeX. The problem is that TeX is none of a
programming language. It is simply not fully suitable for this sort of
task. LaTeX and other macro-package extensions to TeX are rather
heroic. TeX is a great typesetting engine, but it is a nightmare if
used for parsing and data manipulation, which is needed for a serious
document-preparation system. As is, TeX does not separate different
stages of document preparation. Particularly, it does not separate
parsing of logically marked-up text, manipulation of string and integer
variables (and other relevant, more complex, data) and typesetting.
Here is an example.
\begin{enumerate} \item Text1 \item Text2 \item Text3 \end{enumerate}
This needs to be parsed into markup commands (\begin{enumerate}, \item,
\end{enumerate}) and text. \begin{enumerate} should start preparing
output. It produces typesetting commands, which start a list. Also it
sets and augments the data needed for typesetting a list, like changing
left margin, initializing item counter.
Most of these tasks are better done by some programming language other
than TeX. Also, this is better done in several steps. TeX as a
programming language precludes dividing text processing into separate
steps. One needs to do everything in-place.
Consider an alternative. First, a parser module recognizes the logical
structure of the document. This can be conveniently represented as XML:
<enumerate>
<item> Text1 </item>
<item> Text2 </item>
<item> Text3 </item>
</enumerate>
Note the tree structure of the parsed document (as implied by XML
Document Object Model) as compared to linear, stack-oriented structure
of TeX code.
Next, this parsed document should be processed. The second module
should manipulate data and produce TeX typesetting instructions. A
useful concept here is XML SAX (which stands for "Simple API for XML").
SAX treats document linearly again. It binds events to start tags and
end tags. For example, "on start enumerate" a counter is started, left
margin is changed, some typesetting instructions are produced. "On
start item" a list label is typeset using the value of the counter. And
so on.
Finally, a typesetting engine processes the instructions. It does not
know (for example) anything about the item counter. It just gets an
instruction to typeset 1. (a string data) for the first list item, 2.
for the second one, an so on.
The scheme is superficially the same as with LaTeX. The difference is
that in LaTeX the code for all the stages is mixed. LaTeX lacks
modularity. A rather tricky TeX code handles everything
_simultaneously_.
So, there should be at least three modules, which do the following
Module1. Parse the document.
Module2. Represent the parsed document as a list of typesetting
instructions (like characters, boxes, space, glue).
Module3. Typeset the document (choose line breaks, make paragraphs
etc.)
Module1 and Module2 are better done with some programming language
other than TeX. Module3 is a typesetting engine, so TeX is possibly the
best choice.
One problem is that on some occasions (not very often, though) Module2
has to get information from Module3. For example, sometimes information
on dimensions of typeset material should be used by Module2 to proceed
accordingly. Or page numbers are needed to produce page references.
This has to be solved somehow (e.g. PyTeX manages to do this
indirectly, reading from TeX output). Not a trivial problem, but it
could be solved.
In return, this approach brings several valuable advantages.
First, document markup language may be just anything. LaTeX, XML (like
DocBook, TEI, tbook), GELLMU, wikitext. Anything subject to parsing and
capable of producing a stream of events in SAX style. To simplify
matters, some variant of EBNF (extended Backus-Naur form) metasyntax
can be used to describe custom markup languages.
Second, the task of programming extensions for the basic system
simplifies greatly, as one can use a _real_ programming language
instead of TeX acrobatics.
Third, many important document-preparation tasks could be done
internally. Suppose sorting index entries, producing bibliographic
references, determining dimensions of included graphics, etc.
Fourth, outputs other than typeset document (html, various xml, ...)
are produced more easily, as far as they do not need thorough
typesetting stage. Module3 is not needed for this.
Fifth, a better separation between generic markup and ad hoc markup
could be achieved.
There must be other advantages.
.
- Follow-Ups:
- Re: LaTeX, misallocated effort?
- From: Michele Dondi
- Re: LaTeX, misallocated effort?
- From: David C . Ullrich
- Re: LaTeX, misallocated effort?
- From: Will Robertson
- Re: LaTeX, misallocated effort?
- Prev by Date: Re: large images / EPSI
- Next by Date: Re: large images / EPSI
- Previous by thread: Some LaTeX Homework Templates
- Next by thread: Re: LaTeX, misallocated effort?
- Index(es):
Relevant Pages
|