Re: Soft-hyphens or breakable points in a string



Mark wrote:

My page has a table with many columns such that the right-side of the table gets chopped off when printed. I specify a table width of 100%, but otherwise no cell dimensions are specified. The culprits are 2 wide columns which contain e-mail addresses.

I would primarily consider the possibilities for reducing the amount of information per row. In the absence of a URL demonstrating the actual problem, I cannot make any more specific suggestion.


Secondarily, I would consider whether it is possible to reduce the width requirements of _other_ columns than those containing E-mail addresses. The reason is that breaking an E-mail address may cause confusion and even give a wrong idea of what the address is.

I can get the page to fit entirely on the printer output if the browser would break the e-mail address string at the '@' symbol.

The Unicode line breaking rules define "@" as belonging to line breaking class AL, i.e. as comparable to alphabetic characters. Although those rules are generally highly debatable, there is wisdom behind this particular assignment. The at sign is typically used in contexts like E-mail addresses, URLs, and programming language constructs where a line break between "@" and an adjacent letter would not be appropriate. An E-mail address is basically an unbreakable string that must not contain whitespace (except in a comment).


Thus, I would avoid breaking an E-mail address at almost any cost.

What I've done for now is replaced the '@' in all e-mail addresses with '[space]@[space]' which now wraps nicely and my table fits.

That's even worse, since it introduces whitespace on both side of "@". A naive user might even think that the space is part of the address. (After all, few people in the world know the _exact_ syntax of E-mail address, i.e. are variations and complications and special cases that are allowed.)


Is there an HTML trick I can use that tells the browser that it is permissible, but only if needed, to break the string at the '@' or dot (.), much like the soft-hyphen does in Word?

There is the <wbr> trick, e.g.
jkorpela@<wbr>cs.<wbr>tut.<wbr>fi
It's genuinely a trick: it works in most browsing situations but does not conform to any standard. There's also the standard-conforming way of using a zero width no-break space, which works very rarely and causes quite some trouble when it doesn't. See
http://www.cs.tut.fi/~jkorpela/html/nobr.html#suggest


According to the reputable "Chicago Manual of Style" (clause 7.44), if a URL or E-mail address needs to be broken, the break should appear "between elements, after a colon, a slash, a double slash, or the symbol @ but before a period or any other punctuation or symbols". I think there's a wisdom in not breaking after but before a period: a period at the end of line will easily be seen as terminating the address, whereas a period at the start of a line suggests that it is a continuation of the preceding line.

P.S. The soft hyphen does _not_ work the way you think in MS Word. If you enter a soft hyphen character, MS Word treats it as yet another graphic character and displayes it in all occasions. You can use an MS Word command to add "soft hyphen", but what really happens is that a normal hyphen-minus "-" is inserted, together with invisible extra information that forbids a line break after it.
.




Relevant Pages

  • Re: OT - GETSYI (was:Re: PHONE error - Invalid specification of node or person. Try again.)
    ... Because this name can include up to 15 characters, the buffer length field in the item descriptor should specify 15. ... access: read only mechanism: by 32- or 64-bit descriptor-fixed-length string descriptor mechanism: by 32-bit descriptor-fixed-length string descriptor Name of the node about which $GETSYI is to return information. ... The nodename argument is the 32-bit address or the 32- or 64-bit address of a character string descriptor pointing to this name string. ...
    (comp.os.vms)
  • Re: Which is better - a char type or a string of length one?
    ... data type but its strings have a terminating zero so C has to ... string of one character needs at least two cells. ... If you just specify a string type, you would need runtime checks to ...
    (comp.lang.misc)
  • Re: sed issue
    ... the log file, it truncates the file after the first line, and I have been ... unable to get it to read past the last character of the first line. ... This seems to be the only instance of the problematic string, but it is in every one of the expect log output files. ... You can also try to specify ...
    (comp.unix.shell)
  • Re: CLisp case sensitivity
    ... You have to specify an external format, ... with (string (code-char #x10000)). ... > character Allegro yet it will return 1 in CLISP and SBCL. ... Cats meow out of angst "Thumbs! ...
    (comp.lang.lisp)
  • Re: strange error... about operator expression
    ... your txt1 variable does not contain a slash ... provide a trailing slash for InStr to find when your user provides none. ... If the user doesn't provide a slash character, ... InStr will find the one you added at the end of the string (it will be ...
    (microsoft.public.vb.general.discussion)