Re: Unexpected conversion from AL32UTF8 to WE8ISO8859P1
- From: j.w.vandijk.removethis@xxxxxxxxx (Jaap W. van Dijk)
- Date: Wed, 14 Jan 2009 07:48:50 GMT
On 13 Jan 2009 16:38:54 GMT, Laurenz Albe <invite@xxxxxxxxxxxxxxx>
wrote:
Jaap W. van Dijk <j.w.vandijk@xxxxxxxxx> wrote:
This is my testcase (comments between square brackets):
[Logging on to the source database:]
SQL*Plus: Release 10.1.0.2.0 - Production on Vr Jan 9 08:40:14 2009
Copyright (c) 1982, 2004, Oracle. All rights reserved.
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, OLAP and Data Mining options
SQL> select *
2 from v$nls_parameters
3 where parameter = 'NLS_CHARACTERSET';
PARAMETER
----------------------------------------------------------------
VALUE
----------------------------------------------------------------
NLS_CHARACTERSET
AL32UTF8
[...]
[Now I create and fill the table at the source database:]
SQL> create table test_jvd (teken varchar2(1 character),tekst varchar2
(100));
Table created.
SQL> insert into test_jvd (teken,tekst) values (chr(49765),'Small E
acute');
1 row created.
SQL> commit;
Commit complete.
SQL> select dump(teken)
2 from test_jvd;
DUMP(TEKEN)
------------------------------------------------------------------------
Typ=1 Len=2: 194,101
Ok, now I know what your problem is!
CHR(49765) is *NOT* a small e acute.
It is in fact an invalid UTF-8 sequence, and if Oracle were not so
incredibly sloppy about encoding, it would report an error.
As it is, it stores garbage in the table, and this garbage will turn to
question marks as soon as a conversion takes place.
A small e acute in UTF-8 is actually 50089.
Try again with CHR(50089) and voila! it will work.
Yours,
Laurenz Albe
Hi Laurenz,
Thanks for diving into this! In an other posting in this thread I came
to the same conclusion. The list with erroneous codes that was handed
to me put me on the wrong track. I had some trouble determining that
the codes were wrong because I could not find a list of correct utf-8
codes for 1 and 2 bytes on the internet. Unicode code point are easily
found, but it wasn't until I found a document with the conversion
algorithm from unicode codepoint to UTF-8 byte representations that I
discovered the the codes handed to me were wrong.
How did you come by your information on valid UTF-8 codes?
Regards,
Jaap.
.
- Follow-Ups:
- Re: Unexpected conversion from AL32UTF8 to WE8ISO8859P1
- From: Laurenz Albe
- Re: Unexpected conversion from AL32UTF8 to WE8ISO8859P1
- References:
- Unexpected conversion from AL32UTF8 to WE8ISO8859P1
- From: Jaap W. van Dijk
- Re: Unexpected conversion from AL32UTF8 to WE8ISO8859P1
- From: Laurenz Albe
- Re: Unexpected conversion from AL32UTF8 to WE8ISO8859P1
- From: Jaap W. van Dijk
- Re: Unexpected conversion from AL32UTF8 to WE8ISO8859P1
- From: Laurenz Albe
- Re: Unexpected conversion from AL32UTF8 to WE8ISO8859P1
- From: Jaap W. van Dijk
- Re: Unexpected conversion from AL32UTF8 to WE8ISO8859P1
- From: Laurenz Albe
- Unexpected conversion from AL32UTF8 to WE8ISO8859P1
- Prev by Date: Re: Help required - URGENT PLS
- Next by Date: Re: Reviews of Dan Tow's "SQL Tuning"?
- Previous by thread: Re: Unexpected conversion from AL32UTF8 to WE8ISO8859P1
- Next by thread: Re: Unexpected conversion from AL32UTF8 to WE8ISO8859P1
- Index(es):
Relevant Pages
|