OK, so... I don't know how many of you use different collations but if you do then you know that there are two truths:

1) They're very flexible
2) They can cause you a bit of grief (changing collations and tempdb)

Flexibility

As of SQL Server 2000 (or heck, maybe it was 7.0?), database collations could be changed at installation OR set/changed later. You can set the collation when a database is created (if not set, the database will use the server's default). You can set the collation when a table is created (if not set, the table will use the database's default). You can set the collation when a query is executed (which doesn't really make sense unless it's in a WHERE clause or ORDER by clause). And - you can set the collation in a view or stored procedure to do things like case sensitive searching - on the fly. However, neither of these will perform well over large results sets (at least not without indexes) so, I'd be careful of doing any adHoc changes to collations (even in views/sps - without appropriate indexes)!

Anyway, the key point is that they're very flexible. In many international databases/localized databases, column collation differs by table (in order to do efficient sorting, etc.) and different language data may be separated (either with a column that described which language/country code is used OR in different tables).

Grief in Changing Database Collations

Actually, changing database collation is *very* simple. Literally, it only takes an ALTER DATABASE to do. For example, the following code runns flawlessly:

USE master
go

DROP DATABASE TestCollation
go

CREATE DATABASE TestCollation
COLLATE
SQL_Latin1_General_CP1_CI_AS
go

sp_helpdb TestCollation
go

ALTER DATABASE TestCollation
COLLATE Latin1_General_CS_AS_KS_WS
go

sp_helpdb TestCollation
go

BUT... if you go from case sensitive to case insensitive... be careful! It is important to realize that ALL of your tables AND data will need to be checked against the new collation. In fact, changing database collation will not be allowed if the objects/data would no longer adhere to your unique constraints, etc. Check out this more complete script (ChangingDatabaseCollation.sql (2.85 KB)), if you want to see what happens.

Grief with temporary objects

So.. the other area (and this seems to be the one where everyone has trouble), is with temporary objects. If you create a temp table and your database has a different collation other than TempDB (which has the same collation as the system - based on installation), then comparisons/lookups/joins - may have problems. A simple trick to get around this is to use database_default. Check out this sample and you'll see how it works:

CREATE DATABASE Test
COLLATE Icelandic_BIN
go

USE Test
go

CREATE TABLE #test1
(
   col1 varchar(12)
)
go

CREATE TABLE #test2
(
   
col1 varchar(12) COLLATE database_default
)
go

USE Tempdb
go

CREATE TABLE #test3
(
   
col1 varchar(12) COLLATE database_default
)
go

sp_help 'tempdb..#test1' -- Will use TempDB's collation
exec sp_help 'tempdb..#test2' -- Will use Test's collation (Icelandic BIN)
exec sp_help 'tempdb..#test3' -- Will use TempDB's collation
go

So simple, so obvious... and, well - I just found out about that one?! I used to recommend that you explcitly set the collation for every column. Now, that still works - but, it doesn't offer you any flexbility. So, you could get around that with dynamic string execution but that can also get very complicated, very quickly. So... database_default is a VERY simple and clean way of doing this.

Have fun,
kt

Categories:
SQL Server 2005 | Tips | tempdb

This is a much needed and much overdue blog entry... In 8 Steps to Better Transaction Log throughput, I mentioned a customer that was helped by TWO typical optimization problems I see. In that blog entry, I said I would write two blog entries - that one on transaction log optimization and another on common tempdb optimizations. Well, I forgot...until I was reminded with an email this morning (thanks Marcus!).

First - a bit of understanding of TempDB - what goes there?

  • Internal temporary objects needed by SQL Server in the midst of other complex operations. For example, worktables created by a hash aggregate will be stored in TempDB or interim tables uses in hash joins (almost anything that shows as "hash" something in your query plan output is likely to go to tempdb).
  • User objects created with either # (for local temporary objects), ## (globabl temporary objects) or @ (table variables)
    • # = Local temporary object
      Local temp objects are objects accessible ONLY in the session that created it. These objects are also removed automatically when the session that created it ends (unless manually dropped).
    • ## = Globabl temporary object
      Global temporary objects are objects that are accessible to ANYONE who can login to your SQL Server. They will only persist as long as the user that created it lasts (unless manually dropped) but anyone who logs in during that time can directly query, modify or drop these temporary objects. These objects are also removed automatically when the session that created it ends (unless manually dropped) OR if being used by another session when the session that created it ends, when the session using it finishes using it (and it's only as long as any locks are held). If other sessions need more permanent use of a temporary object you should consider creating a permanent objects and dropping it manually.
    • @ = User-defined Table Variable
      User-defined Table Variables were introduced in SQL Server 2000 (or, wow - was it 7.0?) and provide an alternative to temporary tables by allowing you to create a variable defined as type TABLE and then you can populate and use it in a variety of ways. There has been A LOT of debate over whether or not you should always use table variables or always use temp tables. My response is that I ALWAYS avoid the word always! My point is that table variables are NOT always better nor are temp tables always better. There are key uses to each. I tend to like temp tables in scenarios where the object is used over a longer period of time - I can create non-key indexes on it and it's more flexible to create to begin with (SELECT INTO can be used to create the temp table). I also have the ability to use the temporary table in nested subprocedures because it's not local to the procedure in which it was created. However, if you don't need any of those things then a table variable might be better. When it is likely to be better - when you have smaller objects that don't need to be accessed outside of the procedure in which it was created and when you only need KEY indexes (a table variable ONLY supports the indexes created by a create table statement - meaning PRIMARY KEY and UNIQUE KEY).
  • Objects created by client applications - this is possibly a large part of your problem... Profiling can help you to determine if there's a lot of TempDB usage from your client applications.

OK, so now that you know what goes there - how do you make it optimal?

First and foremost, TempDB is in cache just as any other database is in cache. TempDB does not spill to disk unless you are low on cache and/or if you have a lot of inflight transactions in TempDB. Although TempDB is not persisted from shutdown to restart - it still needs to do some logging and therefore you should consider its optimization a lot like other databases.

Things you should do for TempDB (that are a lot like what you should do for every database):

  1. Isolate the data and log portion of TempDB.
  2. Place them on clean, defragmented disks.
  3. Pre-allocate them so they don't need to do a lot of autogrowth.
  4. Make sure you have sufficient memory to support active objects (check for disk activity to the disks that contain TempDB files).
  5. Make sure that transactions are written efficiently so that there are no unusually long running transactions that are unnecessarily holding resources (and therefore locks and therefore log activity).

And - if you need to move TempDB, you should review this KB Article: Moving SQL Server databases to a new location with Detach/Attach

Things you should do SPECIFICALLY for TempDB (especially if you're running on a multiproc machine):

Before I say what... let me tell you why? TempDB has a large number of objects being created all the time. For an object to be created, space must be allocated to it. Space allocation is determined by looking at some of the internal system pages (the GAM, and SGAM). In the end, it is these pages that start to have significant contention (with just one file) in a VERY active TempDB. To minimize that contention you can create multiple files.

  1. Consider creating multiple files for TempDB (even if on the same physical disks) so that there is less of a bottleneck when objects are being allocated. Make sure to read associated KB.
  2. Consider setting a trace flag to have object allocation grab extents rather than pages. Make sure to read associated KB.

BOTH of these last two are described in detail by a KB article: FIX: Concurrency enhancements for the tempdb database.

OK - so that should really help! Moving forward (meaning SQL Server 2005), having multiple files can still help for TempDB.

Theme design by Nukeation based on Jelle Druyts