Among the new features in SQL Server 2014, the most compelling, at least from my point of view, is the in-memory tables and procedures known as project Hekaton. Because the tables are organized completely differently than traditional SQL Server tables there are, naturally, some limitations around Hekaton tables and procedures. In looking over the list in Kalen Delaney’s CTP1 whitepaper, I’m sure most relational designers would be struck by the “no check constraints, no foreign keys, no triggers” limits. I was intrigued by the repercussions with respect to collation. Hekaton tables must use a BIN2 collation on all character columns that participate in indexes. In addition, the natively compiled procedures only support BIN2 collations for comparisons, sorting, and grouping. Since, BIN2 collations are unusual in my experience, I decided to look into this. Especially since you don’t need a copy of SQL Server 2014 to look into this now.
BIN2 uses a sorting/grouping/comparison algorithm based on Unicode codepoints. This means that sorting/grouping/comparison (I’ll use just the word “sorting” as a shortcut for a while) is, by definition, case-sensitive and accent-sensitive. Any upper-case character sorts before all lower-case characters. So if I create a BIN2-version of the authors table in the pubs database, the last name “del Castillo” sorts at the end of the list using “ORDER BY au_lname”. This would be surprising to most end-users and, as Michael Kaplan wrote in his “International Features in SQL Server 2000” whitepaper: “in general, one of the most effective ways to alienate end users of an application is to get a task such as basic sorting wrong.”
Maybe I’m worrying too much about this, as many/most folks in places other than US may be used to using the COLLATE clause on expressions *everywhere*, includng string literals. As Qingsong Yao writes, “A linguistic collation usually can sort one language correctly, but not all languages. For example, the Latin1_General_CI_AS collation does not follow the rule of French language.” But for those that aren’t familiar with COLLATE clause, unless you like binary sort, get used to the COLLATE clause. The equivalent for “ORDER BY au_lname” with COLLATE would be “ORDER BY au_lname COLLATE Latin1_General_CI_AS_KS_WS ASC”, replacing “Latin1_General…” by whatever collation you’re already using to sort. And we couldn’t use this with a natively compiled procedure.
Finally, I thought it would be nice to come up with recommendations for collation on an instance and database level. Qingsong Yao has some in his blog articles, such as:
1. Don’t alter a database collation (of an existing database)
2. Changing the collation of a column is not so trivial
3. Got Collation conflict, How to avoid this?
So after reading his suggestions, how about…
1. If you’re changing some tables in a DB to Hekaton, use a collation on columns in those tables.
2. If you plan to be going mostly or all Hekaton in a DB, create a new copy of the DB with a BIN2 collation at a database level
3. Do remember that tempdb uses the instance collation, so you can specify the COLLATE on temp tables or use the COLLATE DATABASE_DEFAULT syntax Kimberly mentions here.
4. If you’re going to have an all-Hekaton instance (not sure I’d do this because of limits on linked servers, etc. with compiled procs)
you could try a BIN2 collation at an instance level.
5. Don’t specify some columns with BIN2 and some non-BIN2 in the same table for Hekaton, because it may drive your programmers crazy, unless they’re already used to it.
Feel free to argue any of these points with me; if I think you have a better argument, I’ll change it.
But, to summarize this, before going to Hekaton, MAKE SURE and try things out (especially reports) with a BIN2 collation, and add the COLLATE clause when necessary to change things back to what users are expecting. Users DO like things fast, but the also like things “correct”.
Hope this makes your implementation smoother. And thanks to SQL Server Books Online, Michael Kaplan, Qingsong Yao, and Kalen Delaney’s “Inside SQL Server 2008 – Internals” for helping to provide some clarity on BIN2 collations and collations in general.
Cheers, Bob
4 thoughts on “Thinking about Hekaton? Then think about collations.”
Just wanted to point out that most of the SAP on SQL*Server installations use BIN2 collation. I believe this is because of the Unicode/Java components that are installed as part of the SAP installation.
Thanks for your comment.
Comments are closed.