Re: Non-collapsing space



31.08.2011 09:25, Swifty wrote:

Multiple white-space characters in HTML collapse into a single space.

Normally, yes. But not in <pre> content, and in practice not in <textarea> content either, or if entered as <input value="foo bar">. Besides, this all applies (when it applies) to the visual rendering only. For example, if you access an element's text content in JavaScript, your code will see each space as a separate character.

I'm trying to defeat this mechanism while displaying the characters in
font Verdana.

I wonder why you wish to defeat it and why you use Verdana. But the most obvious way to defeat it is to set white-space: pre in CSS. It also makes line breaks in source imply line breaks in visual rendering, but this is irrelevant if you have no line breaks there. Alternatively you could use <pre> markup in HTML, but <pre> elements are rendered as blocks (line breaks before and after) and this might be undesirable.

The text in questions goes through a function which replaces '&' and
'<' characters with&amp; and&lt; respectively, so I can't just
change the blanks to '&nbsp;', as the '&nbsp;' will get changed to
'&amp;nbsp;' (the routine displays debugging information; having the
HTML render as HTML would defeat the object)

That sounds confusing, and I'm pretty sure there is a better way of doing whatever might be the ultimate goal.

No-break spaces tend to be non-collapsible, though formally their effect is undefined:
"This specification does not indicate the behavior, rendering or otherwise, of space characters other than those explicitly identified here as white space characters. For this reason, authors should use appropriate elements and styles to achieve visual formatting effects that involve white space, rather than space characters."
http://www.w3.org/TR/html401/struct/text.html#h-9.1
(No-break space is not among those indicated as white space characters, but under any reasonable interpretation, it is a space character.)

If you wish to use no-break spaces (and they admittedly often do what you want them to do), you can enter them directly, as characters in the declared encoding. You don't need to use an entity reference like &nbsp;. On the other hand, can't you run the function that "escapes" the characters "&" and "<" _before_ changing spaces to &nbsp;?

So, I tested my browser with all the characters 00-FF and found this:

It seems that you mean testing browser behavior for _bytes_ 00 through FF. That is somewhat pointless, because whatever the implied encoding might be, many bytes in that range either do not represent a character at all or represent a character that is forbidden in HTML.

1C-1F produce a non-collapsing blank, but it is very wide

The odds are that in the encoding used, 1C to 1F represent U+001C to U+001F, which are forbidden in HTML. So anything you see is just a browser's "error recovery" (this is here an euphemism for probably unplanned behavior when receiving incorrect data).

A0 produces a standard width, non-collapsing blank. Perfect!

The odds as that in the encoding used, A0 represents U+00A0, which is by definition the no-break space character.

So, I translate blank characters to 0xA0 before passing them to my
routine, and the webpage looks perfect. Since I'm probably the only
person who will ever see the result of this trick, I'm not concerned
about the legality/niceity of what I've done.

So this isn't really about HTML authoring for the WWW, is it? I don't object to asking non-WWW questions related to HTML here, but it would be useful to point such things out at the very beginning of the question. For example, if an HTML page is for one person's use only, we can take many liberties (like assuming specific browser capabilities) that we can't afford when authoring for the WWW.

However, I'm curious about the mechanism that creates the
non-collapsing blank.

Is 0xA0 simply a character that doesn't render in Verdana?

No, the no-break space renders quite well. Despite the somewhat confusing statement in the HTML 4.01 (which probably tries to warn about fixed-width space characters rather than the no-break space), a no-break space acts as a normal graphic character, with a specific advance width, but its glyph is completely empty. Conceptually, think about the vertical bar character "|". Two such characters don't touch each other ("||"), because in addition to the vertical line, there is empty space in the glyph on both sides of it. If you reduce the width of the vertical line to zero, you have a glyph similar to a glyph of the no-break space character, though probably of different width.

Technically, a browser (or some other program that renders characters) _could_ render the no-break space without using any glyph - it could just insert suitable spacing. I don't think any browser does that, though, as there is no reason to.

In practice, the no-break space character has the same width as the space character. At least I don't know any font that has things differently. However, programs often treat them differently in the sense that in adjusting lines to fixed width (justification), they typically adjust the widths of spaces but treat no-break spaces as non-stretchable and non-shrinkable (i.e., fixed width). This also happens in rendering HTML documents when an align=justify attribute or the corresponding CSS declaration (text-align: justify) is in effect.

Although justification is rarely used and even more rarely justified (no pun intended) in HTML documents, this means that the use of no-break spaces may have side effects. Moreover, as a primary effect rather than side effect, a no-break space also prohibits line breaks before and after (and browsers honor that, with few exceptions).

--
Yucca, http://www.cs.tut.fi/~jkorpela/
.



Relevant Pages

  • Re: adding white space in


    ... before offering advice on HTML in public), you will see that explicitly says that the rendering effect of no-break space is undefined. ... So revealing the URL will just make this explicit: ... "This specification does not indicate the behavior, rendering or otherwise, of space characters other than those explicitly identified here as white space characters. ...
    (comp.infosystems.www.authoring.html)

  • Re: Text Edge Detection
    ... text with black halo around each character. ... I have a background image and some text written on top on it. ... This can usaually be done by rendering the text with transparent ... I am not sure if it would achieve the desired effect or not. ...
    (sci.image.processing)
  • Re: Text Edge Detection
    ... text with black halo around each character. ... I have a background image and some text written on top on it. ... This can usaually be done by rendering the text with transparent ... I am not sure if it would achieve the desired effect or not. ...
    (sci.image.processing)
  • Re: Entering Hebrew Text in Word 2004 for Mac
    ... AppleScripts for Entourage: ... I can enter this in using Apple’s keyboard viewer by selecting ... the Hebrew character set in Apple’s character palette. ... This rendering in Apple works in Entourage, ...
    (microsoft.public.mac.office.word)
  • Re: Unicode and composition mappings
    ... Combining a letter and a diacritic in visual presentation might even take place at the _glyph_ level (i.e., the rendering engine might render such a combination using a single glyph from a font), but even that wouldn't be character-level issue. ... a simplistic implementation just takes a base character and a glyph for a diacritic and does an "overprint", and it might even use glyphs from different fonts, since many fonts don't have glyphs for many combining diacritics. ...
    (comp.std.internat)