Re: How to identify unicode characters in record



Ana C. Dent wrote:
If I am having a good day, I can barely spell unicode.
We are in the process of upgrading our application to support unicode
characters.
CREATE TABLE LOOKUP
(ID NUMBER,
DESCRIPTION VARCHAR2(320));
This table exists in a 10GR2 database that supports UTF-8 character set.

How do I query the databse to return all the IDs where DESCRIPTION contains
1 or more unicode (non-ASCII) characters?

I am more than willing to RTFM, if you point me at which FM has the answer.

Free clues would be much appreciated.


Ana - I think both the tips from Michael and Charles will work.
(Byte value >=128 or byte count vs. char count)

I want to make you aware to an issue with UTF-8 columns we recently
stumbled over.
It is entirely possible to insert invalid UTF-8 strings into an UTF-8
VARCHAR2 column if the client has set the wrong character set. If the
client tells the server the charset matches, no conversion will take
place of the bytes that the client sends as string, and whatever it
sends will get inserted into the column.

best,
Martin

.



Relevant Pages

  • Re: Reading text files with UTF-8 byte order mark
    ... The application that creates the text file has been "enhanced" to support Unicode, and now places a UTF-8 byte order mark (BOM) as the first three characters of the file ... ...
    (microsoft.public.data.ado)
  • =?utf-8?B?UmU6IFN0cmluZyAiw6LigqzihKIiIHRyYW5zbGF0ZWQgdG8gYXBvc3Ryb3BoZS4gV2h5Pw==?=
    ... it works), though it seems to use mostly just Ascii characters, representing ... but the author is not making the best possible use of UTF-8. ... They don't map it to ASCII apostrophe, ... Latin 1 encoding. ...
    (alt.html)
  • [PATCH] UTF-8 input: composing non-latin1 characters, and copy-paste
    ... One can put the keyboard driver into Unicode mode, load a Unicode keymap, and get single keystrokes generate valid UTF-8 for non-ASCII characters. ...
    (Linux-Kernel)
  • Re: Attention: European C/C++/C#/Java Programmers-Call for Input
    ... For any language using a Latin ... Look at existing tools and source code that supports UTF-8, and see how it can make your work easier and give a result that users might actually be able to *use*. ... But you'll find something that does a reasonable job and *will* work perfectly for most programmers who stick to ASCII identifiers. ... A related problem is if you are making identifiers case-insensitive - it's hard to figure out cases for non-ASCII characters. ...
    (comp.arch.embedded)
  • Re: Enhanced Unicode support for "Go" tools
    ... maybe Rene and Randy to note, perhaps - is an "ASCII compatible" ... version of UNICODE...in fact, for strict 7-bit ASCII, UTF-8 and ... characters so, being on Windows, that opinion makes great sense ... where the majority of the supported languages ...
    (alt.lang.asm)