Re: Checking that user has entered a word or words in text input form using regular expressions...




Lasse Reichstein Nielsen wrote:
RobG <rgqld@xxxxxxxxxxxx> writes:

Lasse Reichstein Nielsen said on 18/04/2006 4:06 PM AEST:
or if what you want is what the Unicode specification calls "Alphabetic".
See <URL:http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
(You can also see why it's something of a mouthful to create a regexp
for it :)

If that is the requirement, why not:

if ( !/\d/.test(inputValue) )
{
// inputValue doesn't have any digits

Because there's more (much more) to Unicode than letters and digits.
In the file linked, the Grapheme_Base and Math groups contains symbols
that are neither digit nor letter. Take, e.g., codepoint 0x3251:
"circled numer twenty one", or 0x4dc0 "Hexagram for the creative
heaven". :)


There are Unicode letters and Unicode blocks (like InMongolian). For
better understanding what i really mean please read "Unicode support"
paragraph in the followin URL:

<URL:http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html>

(see also: http://www.unicode.org/unicode/reports/tr18/ ).

I did not checked the ECMAScript 4 proposal/standard track, but they
should 'upgrade' regular expressions to support Classes for Unicode
blocks and categories.

Best regards
Luke M.

.



Relevant Pages

  • Re: Microsoft Layer for Unicode on Windows 95/98/Me systems
    ... Yes,>some<, but no upper case to lower case conversion etc. StrConvis ... and Unicode support is installed on the OS. ... Microsoft Office 97 on the target system that installs Fm20.dll as part of ...
    (microsoft.public.vb.winapi)
  • Re: Ugaritic Affiliations
    ... hacek), namely Unicode U+1038C UGARITIC LETTER SHIN, and a form 'without ... There are 30, not 31, letters in the Ugaritic script, and they ... The non-Roman transliteration I've seen offered for zu include Hebrew teth ... the non-Roman transliterations I've seen offered for ssu include Hebrew ...
    (sci.lang)
  • Re: What support is there for Unicode?
    ... font you have chosen for display happen to be displayed correctly, ... character set and you are limited to <256 glyphs at a time. ... but users of certainly applications can use see Unicode ... Unicode support when there is some, ...
    (comp.sys.acorn.apps)
  • Re: Chinese character & pinyin frequency analysis
    ... Your code is &# plus 5 digits. ... least I did that way with the unicode chart, ... matter of personal preference. ... the SGML standard. ...
    (sci.lang)
  • Re: Invariant with DIGIT-CHAR-P and the reader.
    ... > These are Unicode characters that have the "digit" Unicode attribute. ... > associated numeric weight as a digit in that radix. ... > 13.1.4.6 (Digits in a Radix). ...
    (comp.lang.lisp)

Loading