Theory question: U+000C in HTML 4.01
- From: "Jukka K. Korpela" <jkorpela@xxxxxxxxx>
- Date: Thu, 27 Oct 2005 23:52:32 +0300
This question is fairly theoretical (even for me), but it started to puzzle me:
According to the SGML declaration for HTML 4.01, at http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html#h-20.1 the Form Feed character, U+000C (12 in decimal), is UNUSED, i.e. forbidden:
DESCSET 0 9 UNUSED
9 2 9
11 2 UNUSEDYet, the prose of the specification discusses it as if it were an allowed character. Section 9.1 White space says:
"In HTML, only the following characters are defined as white space characters:
- ASCII space ( ) - ASCII tab (	) - ASCII form feed () - Zero-width space (​)"
( http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1 )
Is this just a slip in the SGML declaration, or in the prose? I'd suppose the latter, since the formal rule was the same in HTML 3.2, which did not mention U+000C at all in the prose. So when people wrote the HTML 4.01 prose, they just didn't check what's in the formal declaration.
The W3C validator and the WDG validator seem to report U+000C as an error ("Non-SGML character number 12"), apparently playing by the SGML declaration for HTML 4.01.
(XHTML, as XML in general, forbids U+000C explicitly. And U+000C is not useful in HTML: it's just another whitespace character, not a page eject character, as one might naively expect.)
.
- Follow-Ups:
- Re: Theory question: U+000C in HTML 4.01
- From: Benjamin Niemann
- Re: Theory question: U+000C in HTML 4.01
- From: Benjamin Niemann
- Re: Theory question: U+000C in HTML 4.01
- Prev by Date: Re: Translating foreign text into html code - help
- Next by Date: Re: Translating foreign text into html code - help
- Previous by thread: Centering a DIV within a DIV?
- Next by thread: Re: Theory question: U+000C in HTML 4.01
- Index(es):
Relevant Pages
|