Re: Unicode Character Allocation



my_grillz_gleam schrieb:
Hello all,

I have a quick question regarding how Oracle allocates storage space
for its data types. In particular, I have been tasked develop processes
to move data between Oracle and DB2 databases which both are set to
use UTF-8. Now, I have no problems moving data from the DB2 tables to
the Oracle tables, however moving from Oracle to DB2 has been causing
records to reject. And to note, both tables have the exact same DDL and
the Oracle is using BYTE semantics (DB2 only has BYTE semantics). Now
my question is:

Does Oracle, in UTF-8 mode, actually allocate 4 bytes per every byte
specified in the DDL for a character field?

i.e. does VARCHAR2(100 BYTE) equal 400 bytes or 100 bytes of disk
space allocated? It seems to me that this is the case, from my testing.
And unfortunately my Oracle DBA was not able to confirm this.


You may look it in the Oracle online documentation:
http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14225/ch6unicode.htm#g1014017
<quote>
UTF-8 is the 8-bit encoding of Unicode. It is a variable-width encoding and a strict superset of ASCII. This means that each and every character in the ASCII character set is available in UTF-8 with the same code point values. One Unicode character can be 1 byte, 2 bytes, 3 bytes, or 4 bytes in UTF-8 encoding. Characters from the European scripts are represented in either 1 or 2 bytes. Characters from most Asian scripts are represented in 3 bytes. Supplementary characters are represented in 4 bytes.
</quote>

In other words, it depends on the characters in UTF-8, how many bytes will them represent, it may vary from 100 bytes up to 400 bytes for 100 characters.

Best regards

Maxim
.



Relevant Pages

  • Re: Storing Queries in Tables
    ... Before I start I am not a Oracle DBA and my knowledge of Oracle is ... Reports are stored as SQL in tables in our main application schema. ... characters like tabs and spaces. ...
    (comp.databases.oracle.misc)
  • Re: Invalid Characters (s) Returned by OracleDataReader
    ... four nearly identical posts in twenty minutes. ... your Oracle driver or the way the application is using .Net. ... -- "RobertoP" wrote in message ... but the data returned by the OracleDataReader object contains random> invalid characters. ...
    (microsoft.public.dotnet.languages.csharp)
  • RE: difficulties with utf-8 characters using DBD::Oracle, where works using DBD::Pg (PostgreSQL) -
    ... DBD::Oracle 1.16, which explained the weird behavior of Oracle with plain ... > and read the documentation about unicode carefully. ... >> special characters. ... >> EOF ...
    (perl.dbi.users)
  • Strange unicode problem with oracle 10g
    ... we have a strange problem with unicode characters and oracle 10g. ... The error occurs only with delphi components, with ado type library i have ...
    (borland.public.delphi.database.ado)
  • Re: Oracle 10g / PHP / utf-8
    ... The production environment uses Unix for the Oracle ... Our PHP application works with UTF-8 encoded strings and actually ... foreign characters (all characters which are not in the ... What do we need to configure in order to have the database server ...
    (comp.databases.oracle.server)