Re: Sane Syntax



This thread seems to be dying out and before it dies completely I
would like to summarize what I have learnt from it and tell you how I
would envision a successor to the current LaTeX system.

It seems that we all agree that one way or another XML should play a
vital role in the future of TeX but we need some more human friendly
syntax to write it. I will get back to this aspect later on but for
now let’s just assume that we have an equivalence (XML)<->(new syntax)
and the conversion between the two happens without any loss of
information.

Let me first focus on the issue of legacy stuff and transitioning to
the new system. Generating well formed LaTeX2e documents from XML
should be quite straightforward with XSLT and at that point reuse of
old documents and packages is possible. That's one way of doing it.

Another approach is to convert existing documents to XML format and go
from there. I know, doing it reliably is close to impossible and I
wouldn't even mention it if not for the fact that there is an already
existing tool that can do this: Tralics.

Tralics is an open source converter (GPL-like license) that
reimplements a fair part of TeX guts, so that it can understand even
something as convoluted as xii.tex, but more importantly it
understands a lot of LaTeX's markup and packages (this is
configurable) and successfully captures high level semantics of the
document into TEI-like format. I know of at least one positive
experience from its application in the publishing industry [1], which
speaks a lot about the maturity of this tool.

Of course both of the above solutions are not mutually exclusive and
could be used together but converting old documents to the new format
should be preferable as it could significantly accelerate the death of
the old format. It also makes learning the new system easier as you
don’t start with a blank page. Both of those factors would be
important for quick uptake.

If Tralics would be used in the new system then it also makes sense to
use TEI as an XML format. TEI, together with DocBook, are the two
formats on the very top of “the entropy graph” reffered to earlier in
the discussion. I don’t know which one would be better, but based on
my short research on the subject, TEI specification seems to have
several desirable features:
-It covers most of the markup elements typically encountered in
documents
-It is a well established format backed by a formal organization
-It doesn’t try to reinvent the wheel but rather reuses existing
specifications like MathML, SVG and so on (good for interoperability
with other formats)
-It allows for extensibility (more on that below)

TEI specification distinguishes three types of documents:
-conformant (using only markup elements defined in the specification)
-conformable (algorithmically conformant, i.e. automatically
transformable into conformant document without any loss of
information)
-extensions (valid against TEI schema but containing additional
concepts not present in the TEI abstract model)

So TEI not only covers a lot of existing markup but also allows for
extensions that could be used for some stuff that \newcommand is
currently used for. There are several types of extensions that are
allowed but their discussion goes beyond this post (and I also don’t
feel competent enough in that matter).

XML brings a number of other technologies with it: stylesheets that
can replace to a large extent LaTeX packages, XSLT for generating
other formats and XSL:FO that provides formatting capabilities and
could be used as an alternate way of generating PDFs (either via
direct conversion or by using PassiveTeX).

All of the above components can be used more or less straight away, so
let’s now turn to what’s still missing to tie them up into one system.

First of all, the very subject of this thread, syntax that would be
suitable for human writing and reading. Here, no clear consensus has
been reached in this discussion. I argued that it should closely
reflect the underlying XML format, just like GELLMU does. After
skimming through TEI specification I’ve changed my mind a bit. Some
concepts don’t seem to map well to XML restrictions and therefore
might be expressed in rather complex ways. For such cases some other
notation could be preferable but otherwise I would stay close to XML.
This new syntax doesn’t even have to be LaTeX-like, maybe it would be
even better if it isn’t, so that people don’t judge it from that
perspective.

Next would be the access to the typesetter and low level dirty tweaks.
Some posters here suggested that this shouldn’t be allowed at all.
Personally, I think that nothing good has ever come from restricting
people’s choices and desires, even if it means that they occasionally
shoot themselves in their feet. I think that not only some typesetting
model should be offered (maybe in the form of a Lua API) but it should
be further extended to page description and graphic model so that
things requiring more control over layout (posters, brochures,
presentations, newspapers) also become easier. There is a clear need
for some graphic and page description capabilities in TeX as can be
judged from the growing popularity of packages like PGF.

Finally, there is a question of general programmability. I think that
TeX users will settle on nothing short of being Turing complete.
LuaTeX will bring Lua language with it and I see no reason why it
shouldn’t be available from the authoring level. I would go even
further and allow (optional) use of other scripting languages. Again,
I don’t see a problem with this as in the way it exists for LaTeX
macros. I can browse the web with JavaScript or Flash disabled and it
still mostly works (unless somebody is stupid enough to make an entire
page in Flash), because those elements are clearly separated from the
rest of the content and can be just omitted. Thanks to this better
structuring it is possible to put a fine grained control of what's
admissible and what's not.

To conclude this already long post let me run the above ideas against
the four prerequisites of the successful system that I listed early on
in this thread:
1) Smooth transition from old to new system. Achieved by converting
LaTeX2e documents to XML and/or generating LaTeX from XML.
2) Abstraction from TeX idiosyncrasy. This would be dependable on
providing improved abstractions in the form of Lua APIs or similar.
3) Strict, parsable syntax; reuse of existing standards. This would be
ensured by using TEI with its underlying XML format.
4) Get support from prominent figures in TeX community. That one might
be the toughest but I think that such a system as above could offer
enough advantages that it might actually convince a few people to
support it.

At this point all this is just my pipe dream, however. Do you think
this could become a reality some day (hopefully soon)?


Cheers,

Tomek


[1] http://www.river-valley.tv/conferences/dml2008/#0101-Thierry_Bouche
.