Re: To unicode or not?
- From: Erland Sommarskog <esquel@xxxxxxxxxxxxx>
- Date: Fri, 24 Oct 2008 21:36:01 +0000 (UTC)
aj (ronald@xxxxxxxxxxxxx) writes:
I am migrating a database to SQL Server that is not unicode...
Assuming that I have no plan to store different languages (other
than English), does it make sense to change my char() -> nchar()
and my varchar() -> nvarchar()?
I think it does. Your plans tomorrow may be different, and switching
to Unicode later, can be costly.
Then again, since SQL Server stores Unicode data in the UCS-2 encoding,
this means that every character takes up two bytes.
My collation is sql_latin1_general_cp1_ci_as. Is there any relationship
between the collation and unicode support? I see that I can alter
existing columns, changing from char -> nchar. So I could go unicode
later, yes? Is this a good idea?
Windows collations are entirely Unicode-based. This means that when you
use them with varchar, you simply work with a subset of the characers,
but the same routine are used, and the full rules of Unicode are applied.
SQL collations on the other hand, work with two completely different
set of rules for nchar/nvarchar on the one hand, and char/varchar on
the other. For nchar/nvarchar, an SQL collation is just a Windows collation
with another name. For char/varchar, an SQL collation works with rules
for the 255 characters in the code page, and no others. Since these rules
encompasses fewer characters, they perform better than Unicode. For
most operations this difference is moderate, but for operations as
"LIKE '%xxx%'" it can be a factor of seven.
So, this far an SQL collation may be tempting, but...
If I go unicode, am I asking for trouble from other tools/protocols that
access the database, like JDBC/ODBC? If something knows about/supports
varchar(), will it also automatically know about/support nvarchar()?
Modern API support Unicode across the board. Only ancient APIs like
DB-Library does not. In fact, it's gone that far that many API use
Unicode as default; I believe this is the case with JDBC. This can
lead to performance issues if your columns are varchar. Assume this
query:
SELECT ... FROM tbl WHERE indexedvarchar = @unicodevalue
SQL Server maintains a data-type precedence, which says that when two
type meets the type with lower precedence gets converted to the other
type. varchar has lower priority than nvarchar. This has impliciations
on how the indexed is used. If the collation is a Windows collation,
the index is still seeked, but less efficiently, and you can expect
a doubling or tripling in execution time.
But if the collation is an SQL collation, the index is completely useless,
since the rules are different. Depending on the size of the table, the
execution time can grow with a factor 100 or 1000.
For that reason, I recommend that you do with Latin1_General_CI_AS.
--
Erland Sommarskog, SQL Server MVP, esquel@xxxxxxxxxxxxx
Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
.
- References:
- To unicode or not?
- From: aj
- To unicode or not?
- Prev by Date: Re: How to store big Arrays
- Next by Date: Re: Create UDDT - good idea?
- Previous by thread: Re: To unicode or not?
- Index(es):
Relevant Pages
|