Re: Why eliminate <br>?



On Wed, 02 Aug 2006 23:39:54 +0200, Jukka K. Korpela <jkorpela@xxxxxxxxx> wrote:

Punctuation marks can really be treated as conventional markup. This becomes rather clear if we think about quotation marks (especially asymmetric quotation marks, with opening quote different from closing quote) or Spanish-style questions, like "¿Cómo?", where "¿" really corresponds to an opening tag and "?" a closing tag.

I started writing a reply trying to argue against this point of view, but I managed to confuse myself in the process. The following is a train of thoughts, without a clear conclusion...

First idea:
Yes, punctuation does constitute a (perhaps primitive) form of markup. A text can be parsed based on punctuation. But the parsing mechanism is language-dependent. IMHO, it makes more sense to see punctuation as an integral part of grammar, because markup should be language-independent.

Pro punctuation = markup:
On a very abstract level, you could do something like:

<question lang="es">Cómo se va</question>
<question lang="en">How do you do</question>
<question lang="el">Τι κάνετε</question>

which is completely equivalent to:

<span lang="es">¿Cómo se va?</span>
<span lang="en">How do you do?</span>
<span lang="el">Τι κάνετε;</span>

A parser can always transform the punctuated form to the markup form, provided that it knows what language the phrase is, and that it knows the grammar of that language. A smart enough text-to-speech engine should be able to render the punctuated form correctly. It can also transform the markup form to the punctuated form, under the same conditions.

Contra punctuation = markup:
If the markup form is presented as text without any styling applied, it will look like

Cómo se va
How do you do
Τι κάνετε

and when this is copy-pasted to a plain text document, meaning is lost.

Rebuttal: One could argue that a rendering engine must always put the appropriate question marks around a phrase, and that these question marks can be copy-pasted along with the rest of the text.

Another contra argument:
On second thought, language alone is not enough to determine the punctuation. In English, one can use single or double quotation marks, and even within British English, different styles are possible. Still, one could argue that this is only a matter of style, not of content. Another example is a language that can be written in different scripts, such as Kurdish, which can be written in Latin, Cyrillic or Arabic. I don't know enough about that language to give a good example, but I'm sure that <question lang="ku">...</question> must be rendered differently if the enclosed text is in Arabic script than when it is in Latin or Cyrillic script. So the correspondence between markup and presentation is not entirely one-to-one.

Possible rebuttal:
Along with the language, the script could be specified. RFC 1766 (Tags for the Identification of Languages, http://www.ietf.org/rfc/rfc1766.txt) says about the language subtag: "The information in the subtag may for instance be: [...] Script variations, such as az-arabic and az-cyrillic".

Rebuttal of the rebuttal:
But whereas the main tag must be an ISO 639 code, and country information must also be given as described in ISO 639, no fixed format is given for the script information. Would ISO 15924 do? Does it cover all existing scripts? How does a parser know if the subtag gives country information, script information, or something else? In short, I don't think a reliable way exists to indicate the script unambiguously.

But then again:
If the script cannot be determined unambiguously, a parser cannot interpret the punctuated form any more than it can transform. For a text-to-speech engine, it is therefore better to have markup than to have simple punctuation.

Yet another contra argument:
Apart from the details noted above, it is not unreasonable to see (in English) "xxx!" as equivalent to "<excl>xxx</excl>", "xxx?" as equivalent to "<question>xxx</question>" and "xxx." as equivalent to "<sentence>xxx</sentence>". For quotation marks, an equivalent "<q>xxx</q>" already exists. But what about other punctuation marks, such as commas and colons?

Indeed. And markup that has meaning in one presentation media only could be called presentational, right? But there's really no way to tell from markup alone, without human judgement, whether <br> is supposed to be a replacement for something more logical (e.g., for indicating the structure of a poem or a postal address, or even to simulate paragraph division) or just a formatting tool used for practical reasons.

You are right, <br> can be eliminated completely using <div>. For example:

<address>
<div>John Doe</div>
<div>123, Foobar Street</div>
<div>Bazville</div>
</address>

<poem>
<stanza>
<verse>Roses are red,</verse>
<verse>Violets are blue,</verse>
<verse>This poem is boring,</verse>
<verse>And quite silly, too.</verse>
</stanza>
</poem>

--
Garmt de Vries-Uiterweerd
.


Loading