Re: To unicode or not?



aj (ronald@xxxxxxxxxxxxx) writes:
I am migrating a database to SQL Server that is not unicode...

Assuming that I have no plan to store different languages (other
than English), does it make sense to change my char() -> nchar()
and my varchar() -> nvarchar()?

I think it does. Your plans tomorrow may be different, and switching
to Unicode later, can be costly.

Then again, since SQL Server stores Unicode data in the UCS-2 encoding,
this means that every character takes up two bytes.

My collation is sql_latin1_general_cp1_ci_as. Is there any relationship
between the collation and unicode support? I see that I can alter
existing columns, changing from char -> nchar. So I could go unicode
later, yes? Is this a good idea?

Windows collations are entirely Unicode-based. This means that when you
use them with varchar, you simply work with a subset of the characers,
but the same routine are used, and the full rules of Unicode are applied.

SQL collations on the other hand, work with two completely different
set of rules for nchar/nvarchar on the one hand, and char/varchar on
the other. For nchar/nvarchar, an SQL collation is just a Windows collation
with another name. For char/varchar, an SQL collation works with rules
for the 255 characters in the code page, and no others. Since these rules
encompasses fewer characters, they perform better than Unicode. For
most operations this difference is moderate, but for operations as
"LIKE '%xxx%'" it can be a factor of seven.

So, this far an SQL collation may be tempting, but...

If I go unicode, am I asking for trouble from other tools/protocols that
access the database, like JDBC/ODBC? If something knows about/supports
varchar(), will it also automatically know about/support nvarchar()?

Modern API support Unicode across the board. Only ancient APIs like
DB-Library does not. In fact, it's gone that far that many API use
Unicode as default; I believe this is the case with JDBC. This can
lead to performance issues if your columns are varchar. Assume this
query:

SELECT ... FROM tbl WHERE indexedvarchar = @unicodevalue

SQL Server maintains a data-type precedence, which says that when two
type meets the type with lower precedence gets converted to the other
type. varchar has lower priority than nvarchar. This has impliciations
on how the indexed is used. If the collation is a Windows collation,
the index is still seeked, but less efficiently, and you can expect
a doubling or tripling in execution time.

But if the collation is an SQL collation, the index is completely useless,
since the rules are different. Depending on the size of the table, the
execution time can grow with a factor 100 or 1000.

For that reason, I recommend that you do with Latin1_General_CI_AS.
--
Erland Sommarskog, SQL Server MVP, esquel@xxxxxxxxxxxxx

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

.



Relevant Pages

  • Re: Is it collation or something else
    ... Unicode that the Locale is English, ... terminator is being converted to a black diamond with a question mark inside ... I am hoping I can manipulate SQL to do the work so that I can go back to ... All the collation issues were resolved by running an alter database to set ...
    (microsoft.public.sqlserver.server)
  • RE: Collation
    ... SQL 7.0 lets you indepently specify different values for the three ... server with totally inconsistent collation settings (for example, ... insensitive Unicode sorting). ... figure out what SQL 2000 setup settings you need to select to get a server ...
    (microsoft.public.sqlserver.server)
  • RE: Uni Code
    ... The best way to do that would be use Unicode but depending upon your SQL ... installation and uses, you might like to consider Collation. ... How to transfer a database from one collation to another collation in SQL ...
    (microsoft.public.sqlserver.dts)
  • Re: cannot convert between unicode and non-unicode data types
    ... change the error message I am trying to go into a nvarchar on Sql and never ... product can't actually produce a meaningful error message. ... Tried dumping to a raw file but the output is still unicode when I try to ... new SSIS. ...
    (microsoft.public.sqlserver.dts)
  • RE: Install SQL Server 2000 with different COLLATION
    ... When you upgrade an existing SQL 7.0 instance, ... always inherits the 7.0 instance's collation. ... If you want to install SQL ... Microsoft SQL Server Support ...
    (microsoft.public.sqlserver.server)