Re: String trim (was JavaScript Functions)



In comp.lang.javascript message <k4ydnY53BoUhyAbUnZ2dnUVZ_umWnZ2d@gigane
ws.com>, Tue, 17 Feb 2009 19:26:36, kangax <kangax@xxxxxxxxx> posted:
Dr J R Stockton wrote:
In comp.lang.javascript message <XOGdnfc7I6_uoAfUnZ2dnUVZ_g4LAAAA@gigane
ws.com>, Mon, 16 Feb 2009 23:30:43, kangax <kangax@xxxxxxxxx> posted:
Dr J R Stockton wrote:

On a 3GHz PC, XP sp3, FF3, the following takes perceptible but
insignificant time to list all non-matches to \S : it could perhaps be
done better.

Why not just use Richard's test, posted earlier in this thread? It
tests client's \s against all of the whitespace characters (including
Unicode "space separators"). Doesn't it clearly demonstrate above
mentioned oddities?
Richard's test considers only the characters that he thinks should
be
treated by \s as spaces, etc. Mine, much quicker to write, found all

That list seems very logical to me. /\s/ (CharacterClassEscape :: s) is
clearly defined in ES3's 15.10.2.12. WhiteSpace (7.2), which /\s/
references, clearly lists all of the character code points. It also
mentions Unicode space separators. Those space separators are also
clearly defined in Unicode [1] under the White_Space section.

AFAICS, Richard's test says nothing about whether \s or \S matches
\u3000. Therefore, Richard's test cannot tell whether a browser is
fully compliant. Mine can, except for handling any character coding
outside 0x0000 to 0xFFFF.


characters that don't match \S in the current browser (it now uses
S.match(/\s/g)). The tests are logically distinct.
If there is a character, such as
cp:"6158", codePoint:"0x180E", character :"\u180E",
name:"MONGOLIAN VOWEL SEPARATOR", group:"Zs"
that NO browser recognises, that's not much of a worry for coders
(unless handling Mongolian) since testing on any browser will give the
same result.

Doesn't it make more sense to base tests on specs, rather than on some
vague subset of browsers? We can't really assert that "NO browser
recognizes" "MONGOLIAN VOWEL SEPARATOR"; neither can we test "all
browsers", can we?

You missed the stress in "if ... NO browser".


Test fully against specs to find out whether the tested systems are
compliant. Test browsers covering most of the market for Windows
browsers to find put what most (Windows) users will have in their
browsers. The tests are quite distinct.

However, after using my test, one only has to read the list of Unicode
whitespace characters to see how it compares with the result of my test.



I'm not : ) As it stands now, FireFox's \s is simply not ES3-compliant
and its deficiencies affect native `trim` (as that `trim` relies on \s)

But whether that is important for a particular page depends on whether
any incorrectly-classed characters can appear within it, and (if they
do) whether the difference really matters.

Consider reading an ISO 8601 date-and-time, found in the text of a
document. ISO 8601:2000 required a 'T' in the middle; it does not allow
't', but that should generally be tolerated. ISO 8601:2004 allows a
space instead, without (AFAIR, ICBW) actually specifying \x20 or \xA0.
In practice, the text may get paragraph-packed, so a reader should
accept a newline followed by spaces and HTabs. But perhaps not two
newlines. But maybe a form feed surrounded by newlines should count as
a newline. And one should ignore page headers and footers. But the
chances of finding a Mongolian character (which might look like a space)
are, in non-Mongolian contexts, negligible.


--
(c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
Web <URL:http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
.



Relevant Pages

  • Re: POSTing Chinese characters
    ... You will have to take a network trace to determine if the missing characters ... clear whether the web browser chose to UTF8-encode the POST data or not. ... and it works in English ...
    (microsoft.public.inetserver.iis)
  • Re: Arabic IIS6 Windows 2003 Issue
    ... encoded correctly between the browser and database. ... I believe that IIS6 is able to correctly handle your scenario, ... English characters tend to work because most character sets keep them at ...
    (microsoft.public.inetserver.iis)
  • Re: String trim (was JavaScript Functions)
    ... It also mentions Unicode space separators. ... Those space separators are also clearly defined in Unicode under the White_Space section. ... that NO browser recognises, that's not much of a worry for coders ... representation fails to match some of the characters and also ...
    (comp.lang.javascript)
  • Re: Find function in Word does not work when searching double spac
    ... Copy an example of the double spaces and paste it into the Find prompt. ... I suspect that at some stage you have obtained that text from a browser. ... will see that the two characters are differently rendered on screen. ...
    (microsoft.public.word.docmanagement)
  • Re: UTF-8 without external modules on Perl 5.0
    ... before general browser support for utf-8 was adequate. ... Users could select an 8-bit web page encoding appropriate to their ... various ways when they attempt to submit characters which cannot be ...
    (comp.lang.perl.misc)

Loading