Re: Unexpected conversion from AL32UTF8 to WE8ISO8859P1



On 13 Jan 2009 16:38:54 GMT, Laurenz Albe <invite@xxxxxxxxxxxxxxx>
wrote:

Jaap W. van Dijk <j.w.vandijk@xxxxxxxxx> wrote:
This is my testcase (comments between square brackets):

[Logging on to the source database:]

SQL*Plus: Release 10.1.0.2.0 - Production on Vr Jan 9 08:40:14 2009

Copyright (c) 1982, 2004, Oracle. All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> select *
2 from v$nls_parameters
3 where parameter = 'NLS_CHARACTERSET';

PARAMETER
----------------------------------------------------------------
VALUE
----------------------------------------------------------------
NLS_CHARACTERSET
AL32UTF8

[...]

[Now I create and fill the table at the source database:]

SQL> create table test_jvd (teken varchar2(1 character),tekst varchar2
(100));

Table created.

SQL> insert into test_jvd (teken,tekst) values (chr(49765),'Small E
acute');

1 row created.

SQL> commit;

Commit complete.

SQL> select dump(teken)
2 from test_jvd;

DUMP(TEKEN)
------------------------------------------------------------------------
Typ=1 Len=2: 194,101

Ok, now I know what your problem is!

CHR(49765) is *NOT* a small e acute.

It is in fact an invalid UTF-8 sequence, and if Oracle were not so
incredibly sloppy about encoding, it would report an error.

As it is, it stores garbage in the table, and this garbage will turn to
question marks as soon as a conversion takes place.

A small e acute in UTF-8 is actually 50089.

Try again with CHR(50089) and voila! it will work.

Yours,
Laurenz Albe

Hi Laurenz,

Thanks for diving into this! In an other posting in this thread I came
to the same conclusion. The list with erroneous codes that was handed
to me put me on the wrong track. I had some trouble determining that
the codes were wrong because I could not find a list of correct utf-8
codes for 1 and 2 bytes on the internet. Unicode code point are easily
found, but it wasn't until I found a document with the conversion
algorithm from unicode codepoint to UTF-8 byte representations that I
discovered the the codes handed to me were wrong.

How did you come by your information on valid UTF-8 codes?

Regards,
Jaap.
.



Relevant Pages

  • Re: Check Constraint and function call
    ... he requires values to be atomic that is all he says, scalars - that is a SQL ... I assume that out-of-range codes are not ... The dummy exchange used in American movies was “KLondike-5”. ... The Postcode system does not allow for easy workload re-distribution ...
    (microsoft.public.sqlserver.programming)
  • Re: Query to look up a number of zip codes or area codes for bulk
    ... I am not very familiar with code and SQL. ... >> trying to figure out how to make a query where a user can input several zip ... >> codes at the same time to pull up a list for a bulk mailing. ... > TextBox - just copy the comma-delimited string from the TextBox and.... ...
    (microsoft.public.access.queries)
  • Re: Schema design
    ... so we hate assembly language style flags and prefer ... are writing a 1950's file system in SQL. ... Does the table have description of the codes? ...
    (microsoft.public.sqlserver.programming)
  • Re: Radiologik
    ... Zeichenwirrwar. ... Praktisch brauche ich solche Codes eher selten, so dass ich sie mir dann in der Regel kurz über die Zeichentabelle ... Da News-Nachrichten sich nach den MIME-RFCs richten müssen, sollten eigentlich alle modernen News-Reader auch mit UTF-7 und UTF-8 kodierten Nachrichten zurechtkommen, denn die RFCs erlauben das. ... wenn es nur am nicht Unicode-fähigen Font liegen sollte, da gibt es zum einen mit Code2000 einen brauchbaren 'freien'* Font, oder man installiert sich ein aktuelles Java JRE, dann bekommt man IIRC den Font "Lucida Sans Unicode" installiert, den ich z.B. auch in meinem Newsreader verwende. ...
    (de.sci.mathematik)
  • Re: English versus German
    ... I use UTF-8 since 2003. ... cars do not use ISO country codes. ...
    (sci.lang)