Theory question: U+000C in HTML 4.01



This question is fairly theoretical (even for me), but it started to puzzle me:

According to the SGML declaration for HTML 4.01, at
http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html#h-20.1
the Form Feed character, U+000C (12 in decimal), is UNUSED, i.e. forbidden:

         DESCSET 0       9       UNUSED
                 9       2       9
                 11      2       UNUSED

Yet, the prose of the specification discusses it as if it were an allowed character. Section 9.1 White space says:

"In HTML, only the following characters are defined as white space characters:

- ASCII space ( )
- ASCII tab (	)
- ASCII form feed ()
- Zero-width space (​)"

( http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1 )

Is this just a slip in the SGML declaration, or in the prose? I'd suppose the latter, since the formal rule was the same in HTML 3.2, which did not mention U+000C at all in the prose. So when people wrote the HTML 4.01 prose, they just didn't check what's in the formal declaration.

The W3C validator and the WDG validator seem to report U+000C as an error ("Non-SGML character number 12"), apparently playing by the SGML declaration for HTML 4.01.

(XHTML, as XML in general, forbids U+000C explicitly. And U+000C is not useful in HTML: it's just another whitespace character, not a page eject character, as one might naively expect.)
.




Relevant Pages

  • Re: Theory question: U+000C in HTML 4.01
    ... > "In HTML, only the following characters are defined as white space ... > Is this just a slip in the SGML declaration, or in the prose? ...
    (comp.infosystems.www.authoring.html)
  • Re: Problem applying HTML 4.01 DOM in scripting
    ... Some of us consider that a bug in the spec. ... SHORTTAG from each other, so that this feature could be switched off ... had been selected in the SGML Declaration for HTML, ... in an amended SGML Declaration for HTML. ...
    (comp.infosystems.www.authoring.html)
  • Re: Site map?
    ... I would replace the & with & ... (I say user-agent rather than browser because ... there are things that aren't browsers that read your HTML -- Googlebot, ... the page on a local disk and it is served with another character ...
    (alt.html)
  • Re: CSS width property problem with western european characters
    ... Euorpean character data in some fields. ... Crystal Reports that references CSS styles and an html wrapper I have ... The reason for including the width property in the CSS style ...
    (alt.html)
  • Re: innerHTML problem in IE6
    ... we don't know whether this is supposed to be HTML or XHTML and how it has been served. ... attempt at a character entity. ... In HTML 4.01, "&B" must be parsed as an entity reference, but since no such entity has been defined, we're in the error processing area, and treating "&" as a data character is conventional in browsers in such cases. ...
    (alt.html)