Re: Soft-hyphens or breakable points in a string
- From: "Norman L. DeForest" <af380@xxxxxxxxxxxxxx>
- Date: Sun, 25 Sep 2005 23:28:33 -0300
On Mon, 12 Sep 2005, Jukka K. Korpela wrote:
> Mark wrote:
>
> > My page has a table with many columns such that the right-side of the
> > table gets chopped off when printed. I specify a table width of 100%,
> > but otherwise no cell dimensions are specified. The culprits are 2 wide
> > columns which contain e-mail addresses.
>
> I would primarily consider the possibilities for reducing the amount of
> information per row. In the absence of a URL demonstrating the actual
> problem, I cannot make any more specific suggestion.
>
> Secondarily, I would consider whether it is possible to reduce the width
> requirements of _other_ columns than those containing E-mail addresses.
> The reason is that breaking an E-mail address may cause confusion and
> even give a wrong idea of what the address is.
>
> > I can get the page to fit entirely on the printer output if the browser
> > would break the e-mail address string at the '@' symbol.
>
> The Unicode line breaking rules define "@" as belonging to line breaking
> class AL, i.e. as comparable to alphabetic characters. Although those
> rules are generally highly debatable, there is wisdom behind this
> particular assignment. The at sign is typically used in contexts like
> E-mail addresses, URLs, and programming language constructs where a line
> break between "@" and an adjacent letter would not be appropriate. An
> E-mail address is basically an unbreakable string that must not contain
> whitespace (except in a comment).
Check out RFC 822.
[blockquote]
3.1.4. STRUCTURED FIELD BODIES
To aid in the creation and reading of structured fields, the
free insertion of linear-white-space (which permits folding
by inclusion of CRLFs) is allowed between lexical tokens.
Rather than obscuring the syntax specifications for these
structured fields with explicit syntax for this linear-white-
space, the existence of another "lexical" analyzer is assumed.
This analyzer does not apply for unstructured field bodies
that are simply strings of text, as described above. The
analyzer provides an interpretation of the unfolded text
composing the body of the field as a sequence of lexical sym-
bols.
These symbols are:
- individual special characters
- quoted-strings
- domain-literals
- comments
- atoms
The first four of these symbols are self-delimiting. Atoms
are not; they are delimited by the self-delimiting symbols and
by linear-white-space. For the purposes of regenerating
sequences of atoms and quoted-strings, exactly one SPACE is
assumed to exist, and should be used, between them. (Also, in
the "Clarifications" section on "White Space", below, note the
rules about treatment of multiple contiguous LWSP-chars.)
So, for example, the folded body of an address field
":sysmail"@ Some-Group. Some-Org,
Muhammed.(I am the greatest) Ali @(the)Vegas.WBA
is analyzed into the following lexical symbols and types:
:sysmail quoted string
@ special
Some-Group atom
. special
Some-Org atom
, special
Muhammed atom
. special
(I am the greatest) comment
Ali atom
@ atom
(the) comment
Vegas atom
. special
WBA atom
The canonical representations for the data in these addresses
are the following strings:
":sysmail"@Some-Group.Some-Org
and
Muhammed.Ali@xxxxxxxxx
[/blockquote]
Muhammed.(I am the greatest) Ali @(the)Vegas.WBA
^ ^
| |
That example appears to have two spaces in it that are not within
parentheses.
I have received more than one request for anti-virus help sent to the
"mailto:" address on my CIH virus page at
http://www.chebucto.ns.ca/~af380/CIH.html
HREF="mailto:%20af380@( Norman )chebucto( De Forest ).ns( CIH.html ).ca"
With spaces *outside* the parentheses the address still works fine with
Lynx on my ISP's system but some email software on some systems
(**cough**cough**Microsoft**cough**) fails to strip out the spaces outside
the comments when doing a DNS lookup on the hostname and/or when passing
the address in the MAIL TO: command (violating the "when passing such
structured information to other systems, such as mail protocol services"
clause quoted below) and thus fails to send the message.
The quoted passage above, is immediately followed by:
[blockquote]
Note: For purposes of display, and when passing such struc-
tured information to other systems, such as mail proto-
col services, there must be NO linear-white-space
between <word>s that are separated by period (".") or
at-sign ("@") and exactly one SPACE between all other
<word>s. Also, headers should be in a folded form.
[/blockquote]
The "For purposes of display" would appear to rule out the original
poster's use of space but the RFC fails to say what should happen should
an address be longer than the character width of a display (only that
any line-break must be followed by a whitespace character (space or tab)).
>
> Thus, I would avoid breaking an E-mail address at almost any cost.
>
> > What I've done
> > for now is replaced the '@' in all e-mail addresses with
> > '[space]@[space]' which now wraps nicely and my table fits.
>
> That's even worse, since it introduces whitespace on both side of "@". A
> naive user might even think that the space is part of the address.
> (After all, few people in the world know the _exact_ syntax of E-mail
> address, i.e. are variations and complications and special cases that
> are allowed.)
>
> > Is there an HTML trick I can use that tells the browser that it is
> > permissible, but only if needed, to break the string at the '@' or dot
> > (.), much like the soft-hyphen does in Word?
>
> There is the <wbr> trick, e.g.
> jkorpela@<wbr>cs.<wbr>tut.<wbr>fi
> It's genuinely a trick: it works in most browsing situations but does
> not conform to any standard. There's also the standard-conforming way of
> using a zero width no-break space, which works very rarely and causes
> quite some trouble when it doesn't. See
> http://www.cs.tut.fi/~jkorpela/html/nobr.html#suggest
Don't you mean the "zero width non-joiner" there (U+200C, ‌) (as
opposed to the zero width joiner, U+200D, &@8205;)?
I think that the use of the zero width non-joiner should be the preferred
way of doing things and that "works very rarely and causes quite some
trouble when it doesn't" sould be replaced by "however it may be
necessary to get[1] software authors to fix their buggy treatment of
this character which causes quite some trouble when it doesn't work".
If Lynx can handle it properly (as well as the soft hyphen), why can't IE
and Firefox do the same? (I haven't tried it with Opera yet.)
>
> According to the reputable "Chicago Manual of Style" (clause 7.44), if a
> URL or E-mail address needs to be broken, the break should appear
> "between elements, after a colon, a slash, a double slash, or the symbol
> @ but before a period or any other punctuation or symbols". I think
> there's a wisdom in not breaking after but before a period: a period at
> the end of line will easily be seen as terminating the address, whereas
> a period at the start of a line suggests that it is a continuation of
> the preceding line.
>
> P.S. The soft hyphen does _not_ work the way you think in MS Word. If
> you enter a soft hyphen character, MS Word treats it as yet another
> graphic character and displayes it in all occasions. You can use an MS
> Word command to add "soft hyphen", but what really happens is that a
> normal hyphen-minus "-" is inserted, together with invisible extra
> information that forbids a line break after it.
[1] The following change is optional depending on the reader's
preferences (may wrap on your display but enter as one long line):
s/get software authors to/beat software authors about the head and shoulders until they/
--
``Why don't you find a more appropiate newsgroup to post this tripe into?
This is a meeting place for a totally differnt kind of "vision impairment".
Catch my drift?'' -- "jim" in alt.disability.blind.social regarding an
off-topic religious/political post, March 28, 2005
.
- Follow-Ups:
- Re: Soft-hyphens or breakable points in a string
- From: Jukka K. Korpela
- Re: Soft-hyphens or breakable points in a string
- References:
- Soft-hyphens or breakable points in a string
- From: Mark
- Re: Soft-hyphens or breakable points in a string
- From: Jukka K. Korpela
- Soft-hyphens or breakable points in a string
- Prev by Date: Re: Announcing SiteID Protocol
- Next by Date: Footnote style
- Previous by thread: Re: Soft-hyphens or breakable points in a string
- Next by thread: Re: Soft-hyphens or breakable points in a string
- Index(es):
Relevant Pages
|