As part of the 2008 DBA class we're teaching down here in Melbourne, I did a demo of using predicates and file targets with extended events, so I want to blog the script for people to play with.

For background info on extended events see:

In this scenario, I'd like to track queries that make heavy usage of the CPU on my system.

First off I'll create a test database to play with.

USE MASTER;
GO
IF DATABASEPROPERTYEX ('production', 'Version') > 0 DROP DATABASE production;
GO

CREATE DATABASE production;
GO
USE production;
GO

CREATE TABLE t1 (c1 INT IDENTITY, c2 UNIQUEIDENTIFIER ROWGUIDCOL DEFAULT NEWID(), c3 CHAR (5000) DEFAULT 'a');
CREATE CLUSTERED INDEX t1_CL ON t1 (c1);
CREATE NONCLUSTERED INDEX t1_NCL ON t1 (c2);
GO

SET NOCOUNT ON;
INSERT INTO t1 DEFAULT VALUES;
GO 1000

-- Get the database ID to plug into the event session
SELECT DB_ID ('production');
GO

Notice that I grabbed the database ID of the new database I created, as I'll need that when defining the extended event predicate.

IF EXISTS (SELECT * FROM sys.server_event_sessions WHERE name = 'EE_ExpensiveQueries')
   DROP EVENT SESSION EE_ExpensiveQueries ON SERVER;
GO

CREATE EVENT SESSION EE_ExpensiveQueries ON SERVER
ADD EVENT sqlserver.sql_statement_completed
   (ACTION (sqlserver.sql_text, sqlserver.plan_handle)
      WHERE sqlserver.database_id = 18 /*DBID*/  AND cpu > 10 /*total ms of CPU time*/)
ADD TARGET package0.asynchronous_file_target
   (SET FILENAME = N'C:\SQLskills\EE_ExpensiveQueries.xel', METADATAFILE = N'C:\SQLskills\EE_ExpensiveQueries.xem')
WITH (max_dispatch_latency = 1 seconds);
GO

I'm monitoring the event that fires whenever a statement completes, and I'm filtering by database ID (make sure you plug in the correct database ID when you try this yourself) and by the number of milliseconds of CPU time the statement used. Note: when using predicates you should always make sure that you order the predicate tests such that the most restrictive predicates (those most likely to evaluate to false) are first in the list, so the predicate evaluation 'short-circuits' as quickly as possible (a standard programming practice with boolean logic). I'm also using a file target, just to show how it can be done, instead of the ring buffer which is commonly used for extended events demos.

Now I'm going to turn on the event session and do some queries in the production database.

ALTER EVENT SESSION EE_ExpensiveQueries ON SERVER STATE = START;
GO

USE production;
GO

SELECT COUNT (*) FROM t1 WHERE c1 > 500;
GO

SELECT SUM (c1) FROM t1 WHERE c3 LIKE 'a';
GO

ALTER INDEX t1_CL ON t1 REORGANIZE;
GO

ALTER INDEX t1_CL ON t1 REBUILD;
GO

That should have generated some long-running queries. Now I'll switch in to the context of master (so the querying of the event session itself doesn't get captured by the event session) and see what I've captured.

USE master;
GO

SELECT COUNT (*) FROM sys.fn_xe_file_target_read_file
   ('C:\SQLskills\EE_ExpensiveQueries*.xel', 'C:\SQLskills\EE_ExpensiveQueries*.xem', NULL, NULL);
GO

In this case I've got 3 entries. I can pull these out using the following code:

SELECT data FROM
   (SELECT CONVERT (XML, event_data) AS data FROM sys.fn_xe_file_target_read_file
      ('C:\SQLskills\EE_ExpensiveQueries*.xel', 'C:\SQLskills\EE_ExpensiveQueries*.xem', NULL, NULL)
 ) entries;
GO

But I get three XML blobs back, like so:

 

What's more useful is to pull everything out of the XML blob programmatically using the code below:

SELECT
   data.value (
      '(/event[@name=''sql_statement_completed'']/@timestamp)[1]', 'DATETIME') AS [Time],
   data.value (
      '(/event/data[@name=''cpu'']/value)[1]', 'INT') AS [CPU (ms)],
      CONVERT (FLOAT, data.value ('(/event/data[@name=''duration'']/value)[1]', 'BIGINT')) / 1000000
      AS [Duration (s)],
   data.value (
      '(/event/action[@name=''sql_text'']/value)[1]', 'VARCHAR(MAX)') AS [SQL Statement],
      SUBSTRING (data.value ('(/event/action[@name=''plan_handle'']/value)[1]', 'VARCHAR(100)'), 15, 50)
      AS [Plan Handle]
FROM
   (SELECT CONVERT (XML, event_data) AS data FROM sys.fn_xe_file_target_read_file
      ('C:\SQLskills\EE_ExpensiveQueries*.xel', 'C:\SQLskills\EE_ExpensiveQueries*.xem', null, null)
) entries
ORDER BY [Time] DESC;
GO

Time                    CPU (ms) Duration (s) SQL Statement                              Plan Handle
----------------------- -------- ------------ ------------------------------------------ --------------------------------------------------
2009-10-16 17:59:29.623 30       1.214875     ALTER INDEX t1_CL ON t1 REBUILD;           0x06001000EB672A07B8C0C807000000000000000000000000
2009-10-16 17:59:28.407 20       0.024076     ALTER INDEX t1_CL ON t1 REORGANIZE;        0x0600100003594903B8C0C807000000000000000000000000
2009-10-16 17:59:28.343 51       0.045144     SELECT SUM (c1) FROM t1 WHERE c3 LIKE 'a'; 0x06001000FAF5B11EB820C307000000000000000000000000

Now I can plug in one of the plan handles to a query of sys.dm_exec_query_plan to get the graphical query plan:

SELECT [query_plan] FROM sys.dm_exec_query_plan (0x06001000FAF5B11EB820C307000000000000000000000000);
GO

And clicking on the resulting XML 'link' gives the query plan:

 

 

And now I can tweak the production workload to potentially behave better.

Just for kicks I went back into the context of the production database and ran the XML parsing again to capture the query plan - try it yourself - pretty gnarly! :-)

Now, if I was going to make this more useful, I'd use a ring buffer target, with a polling mechanism every few seconds to make sure that I can capture the graphical query plan for expensive queries before the plan is pushed out of cache - maybe I'll get around to doing that and publish the complete solution.

In the meantime, here's a zip file containing the entire script that you can download and play with: EE_ExpensiveQueries.zip (1.72 kb)

Enjoy!

The 35-page whitepaper on high availability I wrote for the SQL team over the summer has been published on MSDN. It's a 2-300 level whitepaper that describes the various high-availability technologies in SQL Server 2008 and how they can be used to mitigate disasters. It's chock-full of links to other whitepapers, technical articles and Books Online sections and also presents my methodology for planning a high-availability strategy.

You can get it at http://msdn.microsoft.com/en-us/library/ee523927.aspx.

Enjoy!

Here's the table of contents:

  • Introduction
  • Causes of Downtime and Data Loss
    • Planned Downtime
    • Unplanned Downtime and Data Loss
  • Planning a High-Availability Strategy
    • Requirements
    • Limitations
    • Technology Evaluation
  • SQL Server 2008 High-Availability Technologies
    • Logging and Recovery
    • Backup, Restore, and Related Technologies
      • Partial Database Availability and Online Piecemeal Restore
      • Instant File Initialization
      • Mirrored Backups
      • Backup Checksums
      • Backup Compression
    • Other Single-Instance Technologies
      • Online Operations
      • Database Snapshots
      • Hot-Add Memory and CPU
      • Resource Governor
    • Multi-Instance Technologies
      • Log Shipping
      • Transactional Replication
      • Database Mirroring
      • Failover Clustering
      • Combining Multi-Instance Technologies
      • Virtualization
  • Mitigating the Causes of Downtime and Data Loss
  • High-Availability Features Supported by SQL Server 2008 Editions
  • Conclusion

Here's an interesting bug that surfaced recently, first reported by Bryan Smith on the MSDN disaster recovery/HA forum three weeks ago.

One of the mechanisms I advise for running consistency checks on VLDBs with multiple filegroups is to run successive DBCC CHECKFILEGROUP operations on the filegroups holding the partitions (see CHECKDB From Every Angle: Consistency Checking Options for a VLDB). Prior to SQL Server 2005 SP3, if a table or index is split into multiple partitions, then DBCC CHECKFILEGROUP would skip checking the entire table or index if it was partitioned over multiple filegroups. From SQL Server 2005 SP3 onwards, DBCC CHECKFILEGROUP will validate only the partitions of tables and indexes that reside on the filegroup being checked - rather than skipping the whole table or index - a big improvement.

Now it seems that SQL Server 2008 has a bug where it essentially has regressed back to the old behavior where DBCC CHECKFILEGROUP will skip a table or index if it's not wholely contained on the filegroup being checked.

Here's a script you can use to test this. It creates a partitioned table over multiple filegroups and then runs DBCC CHECKFILEGROUP on the first partition. I'll discuss the results after the script.

CREATE DATABASE DBMaint2008;
GO
USE DBMaint2008;
GO

ALTER DATABASE DBMaint2008 ADD FILEGROUP DataPartition1;
ALTER DATABASE DBMaint2008 ADD FILEGROUP DataPartition2;
ALTER DATABASE DBMaint2008 ADD FILEGROUP DataPartition3;
GO

ALTER DATABASE DBMaint2008 ADD FILE
   (NAME = N'DataPartition1', FILENAME = N'C:\SQLskills\DataPartition1.ndf',
   SIZE = 10, FILEGROWTH = 10)
TO FILEGROUP DataPartition1;
GO
ALTER DATABASE DBMaint2008 ADD FILE
   (NAME = N'DataPartition2', FILENAME = N'C:\SQLskills\DataPartition2.ndf',
   SIZE = 10, FILEGROWTH = 10)
TO FILEGROUP DataPartition2;
GO
ALTER DATABASE DBMaint2008 ADD FILE
   (NAME = N'DataPartition3', FILENAME = N'C:\SQLskills\DataPartition3.ndf',
   SIZE = 10, FILEGROWTH = 10)
TO FILEGROUP DataPartition3;
GO

CREATE PARTITION FUNCTION Partitions_PFN (INT)
AS RANGE RIGHT FOR VALUES (1000, 2000);
GO

CREATE PARTITION SCHEME [Partitions_PS]
AS PARTITION [Partitions_PFN] TO (DataPartition1, DataPartition2, DataPartition3);
GO

CREATE TABLE TestTable (c1 INT IDENTITY, c2 DATETIME DEFAULT GETDATE ());
CREATE UNIQUE CLUSTERED INDEX TestPK ON TestTable (c1) ON Partitions_PS (c1);
GO

SET NOCOUNT ON;
GO
INSERT INTO TestTable DEFAULT VALUES;
GO 3000

DBCC CHECKFILEGROUP (DataPartition1);
GO

On SQL Server 2005 SP3, the output from the final batch contains:

DBCC results for 'TestTable'.
Cannot process rowset ID 72057594038452224 of object "TestTable" (ID 2073058421), index "TestPK" (ID 1), because it resides on filegroup "DataPartition2" (ID 3), which was not checked. 
Cannot process rowset ID 72057594038517760 of object "TestTable" (ID 2073058421), index "TestPK" (ID 1), because it resides on filegroup "DataPartition3" (ID 4), which was not checked. 
There are 999 rows in 3 pages for object "TestTable".

This shows that it processed the 1000 rows in the first partition, but not the other two - as we'd expect. 

On SQL Server 2008, the output for TestTable is limited to:

Cannot process rowset ID 72057594038910976 of object "TestTable" (ID 2105058535), index "TestPK" (ID 1), because it resides on filegroup "DataPartition2" (ID 3), which was not checked.

And that's it - nothing about it processing any rows in partition 1. This shows that DBCC didn't process the first partition as we'd expect - this becomes even more apparent with very large amounts of data, where DBCC CHECKFILEGROUP will just complete almost instantly.

As Bryan says in his post, Microsoft has acknowledged this is a bug and it should hopefully be fixed for 2008 CU5. In the meantime, this is something you should be aware of as your tables may not be being checked properly.

The SQLCAT team has published a very detailed (and very long) whitepaper on implementing failover clustering with SQL Server 2008. There are some major differences from 2000 and 2005 related to setup and SP/CU installs so this is worth reading even if you're experienced with 2005 failover clusters.

You can download it from http://download.microsoft.com/download/6/9/D/69D1FEA7-5B42-437A-B3BA-A4AD13E34EF6/SQLServer2008FailoverCluster.docx and the link's on our whitepapers page.

(And yes, we're still on vacation until the end of July - no real blogging/Twittering)

A new whitepaper has been published that comprehensively covers the Resource Governor feature of SQL Server 2008, written by fellow MVP Aaron Bertrand and SQL PM Boris Baryshnikov. I read through this a while back (but didn't have the bandwidth to do a full technical review) and it's an excellent whitepaper with lots of code examples and in-depth explanations.

Check it out at http://download.microsoft.com/download/D/B/D/DBDE7972-1EB9-470A-BA18-58849DB3EB3B/ResourceGov.docx.

It's very commonly known that you can use the Script Wizard in Management Studio to script out tables or even an entire database. It's NOT commonly known that in SQL Server 2008, the wizard was upgraded and you can now script out the data too. The feature isn't enabled by default, which is why not many people know about it, and I couldn't find it described in Books Online either, further contributing to it's obscurity - but it's definitely there.

If you go into the wizard, and go down to the Table/View Options, there's a setting 'Script Data'. Set that to True and you'll get a bunch of INSERT statements in the resulting script too. Very cool.

Here's a screenshot showing what I mean (click it for a 1024x768 version).

 

Enjoy!

Categories:
SQL Server 2008 | Tools

The SQLCAT team have published another excellent whitepaper - this time the long-awaited one on the SQL Server 2008 data compression feature. Thirteen people inside and outside Microsoft (including me) provided technical reviews and the authors (Sanjay Mishra along with Marcel van der Holst, Peter Carlin, and Sunil Agarwal) did a great job. I remember leading an effort back in 2005 to see if we (the SQL team) could get some form of data compression into SQL Server 2005 RTM (no, obviously) so it's great to see data compression out there and now with top-class proscriptive guidance on when and how to use it.

Bottom line: don't just go an turn it on without analyzing whether you'll get a decent compression ratio and your workload is suited to data compression.

You can get to the whitepaper at: Data Compression: Strategy, Capacity Planning and Best Practices and I'll add it to our Whitepaper Links page.

Enjoy!

Last week's survey was two-fold - what's the largest SQL Server database you manage, and how many SQL Server databases are you responsible for managing. Here are the results as of 5/10/2009.

 

As far as the database sizes are concerned, the distribution curve isn't surprising, but the average size of databases *is*. 70% of respondents have a database over 100GB, with half of those being 500GB or more. One lucky (or unlucky, depending on how you look at it!) person gets to play with a 20+TB database. There some truly huge SQL Server databases out there - for instance the 1.1 petabyte astronomical database described here, which adds 1.4TB every night. Back in December 2003, the largest SQL Server database was a 5TB Verizon database. Now multi-TB databases seem quite common when talking to large customers.

The distribution curve for the number of databases also seems unsurprising, except for the uptick at the 150+ databases mark. Fully 1/4 of respondents are responsible for more than 200 databases! Even if the majority of the databases are small, that's still a huge amount of management to consider and context to have to know about.

So what was the point of this survey? Well, I've heard from quite a few people this year that some of their DBA colleagues have been let go and they've had to pick up more responsibility, whether they want to or not. Just last week, Microsoft laid off 3000 more people worldwide, including some in their various IT departments. In the DBA class we were teaching last week, it was obvious that some of the attendees were affected and had a lot more on their plates.

With the average size of databases growing, and the number of databases any one person is responsible for also growing, it's imperative that the overall environment is as easily managed as possible. What does this mean? Here's a 2-minute list of stuff off the top of my head that I think helps make an environment more manageable:

  • Setting up each database so that the important parts can be recovered as quickly and easily as possible
  • Setting up each database so that there are no performance bottlenecks (these two were both kind of addressed by my previous survey Physical database layout vs. database size)
  • Making sure each database has the right backup strategy (from last week's survey Importance of having the right backups)
  • Making sure that each database has the correct monitoring set up. For instance:
    • SQL Agent alerts on the instance for high-severity errors (see Easy monitoring of high-severity errors: create Agent alerts)
    • Monitoring data and log file sizes to avoid auto-growths
    • Monitoring index fragmentation levels
    • Pro-active monitoring for things like high disk-queue lengths or runaway queries
    • Regular consistency checks to find corruption
  • Security is setup appropriately (see TechNet Magazine: feature article on Common SQL Server Security Issues and Solutions)
  • Maintenance jobs are automated as much possible (e.g. backups, consistency checks, index maintenance, and statistics maintenance). There are a bunch of scripts people have published to help with this: Tara Kizer has a lot on her blog, Ola Hallengren has a good one here, as do many others (no offense to anyone else with scripts, these are just the two that spring to mind - by all means reply with a comment to let me know of your popular script)
  • Database settings adhere to common best-practices (e.g. auto-shrink off, auto-grow on (and managed!), auto-create and auto-update statistics on, page checksums on)

And I'm barely touching the tip of the iceberg here. Most of these are just database maintenance best practices - with the idea that the healthier a database is, the less likely something will go wrong that demands DBA attention. And in an environment where potentially hundreds of databases are under the control of a single person, avoiding problems is of paramount importance.

Ok - cup of coffee later and I have more stuff to add to the list:

  • Consistency of settings and scripts across database and instances, so what you're familiar with in one database context translates to all others
  • Environment-wide run-book that covers all aspects of day-to-day operations, so all DBAs do the same thing and a DBA can help out or pick up databases easily
  • Environment-wide disaster recovery guide - written by the most senior DBA and tested by all DBAs down to the most junior
  • T-SQL source code control
  • Management of access to databases
    • Physical access to servers is limited
    • Network access to servers is limited
    • SA access is limited
    • Developers can't deploy new code without going through QA first
    • And so on
  • High-availability technologies in place to aid recoverability, with plenty of practice of failing over applications
  • Use of tools to increase productivity. Most major 3rd-party tool vendors have tools that can help in a myriad of DBA activities - I'm not going to name any in particular as I don't want to favor any over any of the others.

Talking about tools, SQL Server 2008 itself has some stuff built in that can increase productivity. Kimberly did a great post last November on Central Management Servers - see SQL Server 2008 Central Management Servers - have you seen these?. There's also Policy-Based Management, which is a good first step along the road to centralized policy setting, testing, and enforcement (although it has some nits that need to be fixed in v2), and Performance Data Collection which can be used to easily collect perf data (same kind of stuff that Activity Monitor has) for multiple instances and stored in a single location - Kimberly will be blogging more about these two features in the future.

Overall, having a haphazard way of setting up and maintaining databases and instances is going to make it really hard to scale up size and number of databases in an environment without investing heavily in more people. Taking a more disciplined, planned approach, learning from others - using others' scripts and practices, and developing streamlined processes will lead to increased productivity, decreased stress, and happier, stronger DBAs. No-one wants to be in constant fire-fighting mode.

This has been a bit of a ramble, but that's the point of these editorial-style posts. I'm sure you know a lot of this, and I'm sure I've missed some obvious points, so feel free to comment with more tips, tricks, and links!

Next up - this week's survey!

Another quickie today. If you're using Activity Monitor in SQL Server 2008, or Performance Data Collection with the Server Activity system data collection set, then you may see a constant, and large number of 'Other' SQL Server Waits. When you drill into these, you see that the highest number of waits are for FSAgent. The FSAgent is part of the FILESTREAM subsystem, and it's what fools the rest of the Storage Engine into accepting the FILESTREAM data as if it was real varbinary(max) data. If you look in the Books Online entry for sys.dm_os_wait_stats, which defines all the different wait types, it explains that the wait should only show up when FILESTREAM I/Os are occuring. However, this wait type shows up *all* the time - clearly a bug.

Turns out that it's a known bug that's been fixed - see KB 958942.

This short post is prompted by a question that came in through Twitter - I *knew* it was worth joining and spending time on it (http://twitter.com/PaulRandal).

The (paraphrased) question is "can FILESTREAM data be stored remotely?". This has been confusing people, and neither FILESTREAM BOL nor my FILESTREAM whitepaper (see here) explicitly answer the question.

The FILESTREAM data container for a database must be placed on an NTFS volume on locally-connected storage. Just like database data and log files, the directory cannot be on a UNC share. The confusion comes from the fact that FILESTREAM data *can* be *accessed* remotely through a UNC share - but as far as the host instance is concerned, the FILESTREAM storage must be local.

A second question that came up a while ago is whether FILESTREAM data containers can share the same directory or be nested. The answers are kind-of, and no, respectively. Let's see.

I'll create the first database with a FILESTREAM data container:

CREATE DATABASE FileStreamTestDB1 ON PRIMARY
    (NAME = FileStreamTestDB1_data, FILENAME = N'C:\SQLskills\FSTestDB1_data.mdf'),
FILEGROUP FileStreamFileGroup CONTAINS FILESTREAM
    (NAME = FileStreamTestDB1Documents, FILENAME = N'C:\SQLskills\FSDC\Documents')
LOG ON
    (NAME = FileStreamTestDB1_log, FILENAME = N'C:\SQLskills\FSTestDB1_log.ldf');
GO
 

And now let's try another database with the same parent directory:

CREATE DATABASE FileStreamTestDB2 ON PRIMARY
    (NAME = FileStreamTestDB2_data, FILENAME = N'C:\SQLskills\FSTestDB2_data.mdf'),
FILEGROUP FileStreamFileGroup CONTAINS FILESTREAM
    (NAME = FileStreamTestDB2Documents, FILENAME = N'C:\SQLskills\FSDC\Documents2')
LOG ON
    (NAME = FileStreamTestDB2_log, FILENAME = N'C:\SQLskills\FSTestDB2_log.ldf');
GO
 

This works fine. You can't have another database use the *same* directory as the first database (i.e. N'C:\SQLskills\FSDC\Documents'), but two FILESTREAM data containers can have the same parent directory.

And now let's try a nested one:

CREATE DATABASE FileStreamTestDB3 ON PRIMARY
    (NAME = FileStreamTestDB3_data, FILENAME = N'C:\SQLskills\FSTestDB3_data.mdf'),
FILEGROUP FileStreamFileGroup CONTAINS FILESTREAM
    (NAME = FileStreamTestDB3Documents, FILENAME = N'C:\SQLskills\FSDC\Documents\Documents3')
LOG ON
    (NAME = FileStreamTestDB3_log, FILENAME = N'C:\SQLskills\FSTestDB3_log.ldf');
GO

Msg 5136, Level 16, State 2, Line 1
The path specified by 'C:\SQLskills\FSDC\Documents\Documents3' cannot be used for a FILESTREAM container since it is contained in another FILESTREAM container.
Msg 1802, Level 16, State 2, Line 1
CREATE DATABASE failed. Some file names listed could not be created. Check related errors.

Doesn't work, as expected, as this is documented in BOL here.

Thanks

There's been a recent flurry of confusion and misconceptions about trace flag 1118 in SQL Server 2008. This trace flag switches allocations in tempdb from single-page at a time for the first 8 pages, to immediately allocate an extent (8 pages). It's used to help alleviate allocation bitmap contention in tempdb under a heavy load of small temp table creation and deletion.

There are multiple points of confusion, which I'll address in turn. Then I'll prove that the trace flag still works in SQL Server 2008.

1) Why is the trace flag usually required in 2000? In SQL 2000, whenever a temp table is created in tempdb and a row inserted, an IAM page must be allocated and a single data page must be allocated. These two pages are both 'single-page' allocations, from a mixed extent (see Inside The Storage Engine: Anatomy of an extent for more info). This means that an SGAM allocation bitmap page must be accessed, and a PFS page must be accessed (see Inside The Storage Engine: GAM, SGAM, PFS and other allocation maps for more info).

With lots of very small temp tables being created, this means the very first SGAM page and the very first PFS page in the data file are accessed/changed by all the threads, leading to latch contention problems on these two pages. When the temp tables are deleted again, the various pages are deallocated, which again needs to access and change the PFS page, and potentially the SGAM page.

There are two ways to alleviate this problem. Firstly, create multiple data files in tempdb - which splits the latch contention over multiple allocation bitmaps (from having allocations come from multiple files) and thus reduces the contention. The general rule of thumb was one tempdb data file for each processor core. Secondly, turn on TF1118, which makes the first 8 data pages in the temp table come from a dedicated extent. This means one extent is allocated from the GAM page, rather than 8 single pages (and potentially 8 accesses to the SGAM page). The pages within the extent are reserved and allocated singly from this extent, as needed. This also cuts down on contention and is documented in KB 328551.

2) What does reserved vs. allocated mean? When an extent is allocated to a table, the 8 pages in the extent are not immediately allocated as well. Allocating an extent means those 8 pages are reserved exclusively for subsequent allocation to that table. The pages are allocated individually as needed, but no other table can allocate them. This is why such extents are called 'dedicated' extents (see my blog post link above for more details). You can see the counters of reserved pages vs. allocated pages in the output from sp_spaceused.

3) Why is the trace flag not required so much in 2005 and 2008? In SQL Server 2005, my team changed the allocation system for tempdb to reduce the possibility of contention. There is now a cache of temp tables. When a new temp table is created on a cold system (just after startup) it uses the same mechanism as for SQL 2000. When it is dropped though, instead of all the pages being deallocated completely, one IAM page and one data page are left allocated, and the temp table is put into a special cache. Subsequent temp table creations will look in the cache to see if they can just grab a pre-created temp table 'off the shelf'. If so, this avoids accessing the allocation bitmaps completely. The temp table cache isn't huge (I think it's 32 tables), but this can still lead to a *big* drop in latch contention in tempdb.

4) Does the trace flag still exist in 2005 and 2008? Yes it does - KB 328551 clearly states:

Note Trace flag -T1118 is also available and supported in Microsoft SQL Server 2005 and SQL Server 2008. However, if you are running SQL Server 2005 or SQL Server 2008, you do not have to apply any hotfix.

Just to make extra-sure (as I'm always paranoid about saying absolutes), I checked with my good friend Ryan Stonecipher, who's the dev lead for the team that owns allocation (and a bunch of other stuff, including DBCC). He confirmed the code is exactly the same in 2008 as it was in 2005. And I prove it to you below too.

5) And why is it sill there in 2005 and 2008? It does the same thing in 2005/2008 as it did in 2000. If the temp table creation/deletion workload is high enough, you can still see latch contention, as the temp table cache won't be enough to completely alleviate the need for creating actual new temp tables, rather than just being able to grab one 'off the shelf'. In that case, using the trace flag to change to extent-based allocation (in *exactly* the same way as for 2000) can help, as can creating more tempdb data files.

As far as data files go though, the number has changed. Instead of a 1-1 mapping between processor cores and tempdb data files (*IF* there's latch contention), now you don't need so many - so the recommendation from the SQL team is the number of data files should be 1/4 to 1/2 the number of processor cores (again, only *IF* you have latch contention). The SQL CAT team has also found that in 2005 and 2008, there's usually no gain from having more than 8 tempdb data files, even for systems with larger numbers of processor cores. Warning: generalization - your mileage may vary - don't post a comment saying this is wrong because your system benefits from 12 data files. It's a generalization, to which there are always exceptions.

6) Why does DBCC IND still show two pages, even with the trace flag on? I've heard of some people being confused by the output of DBCC IND in SQL 2008 when the trace flag is turned on. Creating a single row temp table will only show two pages allocated in the DBCC output - one IAM page and one data page. Yes, that's completely correct - as only two pages are allocated, but the data page comes from a dedicated extent, not a mixed extent. (IAM pages are *always* single-page allocations from mixed-extents).

And now the proof, on SQL 2008.

SELECT @@VERSION;
GO

Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (Intel X86)

First off, I'll create a temp table without the trace flag enabled, and see what pages the table has allocated, by looking at the first IAM. I'll use a temp table with an 8000+ byte row size, and insert two rows - so we have two data pages for clarity.

DBCC TRACEOFF (1118, -1);
GO

USE tempdb;
GO

CREATE TABLE #temp (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
INSERT INTO #temp DEFAULT VALUES;
GO 2

Now I'll figure out what is the first IAM page, using my sp_AllocationMetadata script (see here for the script and details), and dump it with DBCC PAGE to see the single-page allocations it's tracking, and which dedicated extents are allocated to the table:

EXEC sp_AllocationMetadata '#temp';
GO

Object Name    Index ID  Alloc Unit ID        Alloc Unit Type  First Page  Root Page  First IAM Page
-------------- --------- -------------------- ---------------- ----------- ---------- ---------------
#temp__<snip>  0         1152921505223016448  IN_ROW_DATA      (1:158)     (0:0)      (1:199)

DBCC TRACEON (3604);
GO
DBCC PAGE ('tempdb', 1, 199, 3);
GO

<snip>

IAM: Single Page Allocations @0x4A35C08E

Slot 0 = (1:158)                     Slot 1 = (1:200)                     Slot 2 = (0:0)
Slot 3 = (0:0)                       Slot 4 = (0:0)                       Slot 5 = (0:0)
Slot 6 = (0:0)                       Slot 7 = (0:0)                      


IAM: Extent Alloc Status Slot 1 @0x4A35C0C2

(1:0)        - (1:1016)     = NOT ALLOCATED   

As you can clearly see from the partial output of the dump of the IAM page, there are two single-page allocations and no extents allocated to the temp table. This is what should happen when the trace flag is not enabled.

Now I'll do the same thing with the trace flag 1118 enabled.

USE tempdb;
GO

DROP TABLE #temp;
GO

DBCC TRACEON (1118, -1);
GO

CREATE TABLE #temp (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
INSERT INTO #temp DEFAULT VALUES;
GO 2

EXEC sp_AllocationMetadata '#temp';
GO

Object Name    Index ID  Alloc Unit ID        Alloc Unit Type  First Page  Root Page  First IAM Page
-------------- --------- -------------------- ---------------- ----------- ---------- ---------------
#temp__<snip>  0         1224979099301904384  IN_ROW_DATA      (1:208)     (0:0)      (1:158)

DBCC TRACEON (3604);
GO
DBCC PAGE ('tempdb', 1, 158, 3);
GO

<snip>

IAM: Single Page Allocations @0x4A8FC08E

Slot 0 = (0:0)                       Slot 1 = (0:0)                       Slot 2 = (0:0)
Slot 3 = (0:0)                       Slot 4 = (0:0)                       Slot 5 = (0:0)
Slot 6 = (0:0)                       Slot 7 = (0:0)                      


IAM: Extent Alloc Status Slot 1 @0x4A8FC0C2

(1:0)        - (1:200)      = NOT ALLOCATED                              
(1:208)      -              =     ALLOCATED                              
(1:216)      - (1:1016)     = NOT ALLOCATED
                              

Now as you can clearly see, there are no single-page allocations, and there's a single extent allocated to the table. Proof that trace flag 1118 still does exactly what it should in SQL Server 2008.

Now for a DBCC IND on the table: 

DBCC IND ('tempdb', '#temp', -1);
GO

PageFID PagePID     IAMFID IAMPID      ObjectID    IndexID    
------- ----------- ------ ----------- ----------- -----------
1       158         NULL   NULL        293576084   0
1       208         1      158         293576084   0
1       209         1      158         293576084   0

(I've removed some of the trailing columns for clarity.) We see that it still only lists the two data pages (1:208, 1:209) and the IAM page (1:158) - although an entire extent was allocated to the temp table, only two pages from the extent were actually allocated and used - the rest are reserved for use by that table, but remain unallocated. 

Hopefully this post has cleared up a lot of the confusion around this trace flag and what it does.

In my previous posts on FILESTREAM I discussed the directory structure of the FILESTREAM data container and how to map the directories to database tables and columns. In this post I'm going to explain how and when the FILESTREAM garbage collection process works as that doesn't seem to be documented anywhere (even in the FILESTREAM whitepaper I wrote for MS - it wasn't supposed to be that low-level). There seems to be a lot of confusion about how updates of FILESTREAM data work, and when the old versions of the FILESTREAM files are removed. I'm going to explain how it all works and then show you by example.

The basic behavior that is non-intuitive is that there's no such thing as a partial update of FILESTREAM data. If you have 10MB of data stored in a FILESTREAM column (and hence have a 10MB FILESTREAM file), then updating even a single byte of it will result in a whole new 10MB FILESTREAM file. Anything that relies on having an up-to-date version of the database (e.g. log backups, log-shipping, replication) will pick up the entire new 10MB FILESTREAM file. Every time an update is made to that data, a new 10MB FILESTREAM file is created and then subsequently backed-up, replicated, etc. This can lead to unexpectedly large log backups, or network traffic between replication nodes.

Once you realize that new versions of the FILESTREAM files are going to be created, the obvious follow-on question is: when do the old versions get removed? The answer is: it depends! Smile

The old versions are removed by a process called garbage collection - in much the same way that memory garbage collection runs for managed code and deallocates object instantiations that are no longer referenced by any variables. The key point is that nothing needs the object instantiation any more; otherwise the memory garbage collection would be corrupting the run-time memory of the managed code application. The same principle applies for FILESTREAM garbage collection - the old versions of the FILESTREAM files cannot be removed until they are no longer needed.

But what does 'no longer needed' mean for FILESTREAM files? Well, it's kind of the same as for transaction log records. An old version of a FILESTREAM file is no longer needed if the transaction that created it has committed or rolled back, AND there are no other technologies that must read it, like a log backup (when running in the FULL or BULK_LOGGED recovery models), or the transactional replication log reader. In fact, the transaction log VLF containing the log record of the creation of the FILESTREAM data file must be switched to inactive before the FILESTREAM file can be garbage collected. Note that I don't mention database mirroring - in SQL 2008 database mirroring and FILESTREAM cannot be used together.

Once the old FILESTREAM file is no longer needed, it is available for garbage collection. How does the garbage collection process know which FILESTREAM files to physically delete? The answer is that when the file is no longer needed, an entry is made in a special table called a 'tombstone' table. The garbage collection process scans the tombstone tables and removes only the FILESTREAM files with an entry in the tombstone table. You can read more about the tombstone tables in this blog post from the CSS blog.

So when does the garbage collection process actually run? It can't be part of log backups, because in the SIMPLE recovery model, you can't take log backups. The answer is that it runs as part of the database checkpoint process. This is what causes some confusion - an old FILESTREAM file will not be removed until after it is no longer needed AND a checkpoint runs.

Now let's see this stuff in action. I'm going to create a database with FILESTREAM data in and then play around with transactions, log backups, and checkpoints to show you garbage collection working.

CREATE DATABASE FileStreamTestDB ON PRIMARY
    (NAME = FileStreamTestDB_data,
    FILENAME = N'C:\Metro Demos\FileStreamTestDB\FSTestDB_data.mdf'),
FILEGROUP FileStreamFileGroup CONTAINS FILESTREAM
    (NAME = FileStreamTestDBDocuments,
    FILENAME = N'C:\Metro Demos\FileStreamTestDB\Documents')
LOG ON
    (NAME = 'FileStreamTestDB_log',
    FILENAME = N'C:\Metro Demos\FileStreamTestDB\FSTestDB_log.ldf');
GO

USE FileStreamTestDB;
GO

CREATE TABLE FileStreamTest1 (
    DocId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
    DocName VARCHAR (25),
    Document VARBINARY(MAX) FILESTREAM);
GO

Now I'm going to put the database into the FULL recovery model and take a full database backup - which means I must now take log backups to manage the size of the transaction log. It also means that a FILESTREAM file cannot be removed until it has been backed up.

ALTER DATABASE FileStreamTestDB SET RECOVERY FULL;
GO
BACKUP DATABASE FileStreamTestDB TO DISK = 'C:\SQLskills\FSTDB.bak';
GO

Now I'm going to create some FILESTREAM data.

INSERT INTO FileStreamTest1 VALUES (
    NEWID (), 'Paul Randal',
    CAST ('SQLskills.com' AS VARBINARY(MAX)));
GO

Looking in the FILESTREAM data container I created, I have the following file:

 

Remember from the previous blog posts, that the FILESTREAM file filenames are the database log sequence number at the time they were created. Now I'll update the value in an implicit transaction (no BEGIN TRAN and COMMIT TRAN).

UPDATE FileStreamTest1
    SET Document = CAST (REPLICATE ('Paul', 2000) AS VARBINARY(MAX))
    WHERE DocName LIKE '%Randal%';
GO

and we now have the following files:

 

The new file is the 8KB file and the old FILESTREAM value is the 1KB file. If I try doing an explicit CHECKPOINT, nothing changes as the old file is still required as it hasn't yet been backed up. Now I'll do a log backup.

BACKUP LOG FileStreamTestDB TO DISK = 'C:\SQLskills\FSTB_log.bak';
GO

And the files are all still there. Although the first 1KB file is no longer needed, a checkpoint hasn't occurred yet, so garbage collection hasn't run. Now running an explicit CHECKPOINT, the directory still contains the two files. What happened? The transaction log VLF containing the log record for the creation of the FILESTREAM file is still active, so the file is still needed. I have to do *another* log backup and checkpoint before garbage collection kicks in (as that will cause the log to cycle, when there's nothing happening in the database and no active transactions) and the directory view changes to:

 

The alternative would have been to generate more log records, spilling into the next transaction log VLF, then do another log backup which would mark the 'creation' VLF inactive, and then the next checkpoint would run garbage collection on the file. This, of course, would be the normal course of events in a production database.

So, don't get confused if you update a FILESTREAM file, then do a log backup and checkpoint and nothing happens. Remember the transaction log has to have progressed enough for the 'creation' VLF to be inactive too. You can prove this to yourself by creating an explicit transaction at the same time as the FILESTREAM update (in another, implicit transaction). No matter how many times you backup the log and checkpoint the database, the garbage collection will not run until the explicit transaction is committed or rolled back, and then another log backup and checkpoint is run.

I'll leave it as a fun exercise for you to play around with updates in explicit transactions and various backup scenarios to see when garbage collection can and cannot remove old files, but now you know exactly how it works.

Last week's survey was on how you should store large-value character data in SQL 2005+ (see here for the survey). Here are the result as of 4/3/2009 - and I think my favorite answer is starting to catch-on:

My favorite answer is, of course, it depends! For all those who didn't answer 'it depends', your answer is valid, but only for particular circumstances, as each method has its pros and cons and won't be applicable in all cases. It's extremely important when designing a schema to consider how to store LOB data, as making the wrong choice can lead to nasty performance issues (where 'performance' is a catch-all to include things like slow queries, fragmentation, and wasted space). Now I'd like to run through each of the options and detail what I think of as the pros and cons. A couple of definitions first: 'in-row' means the column value is stored in the data or index record with the other columns; 'out-of-row' or 'off-row' means the column value is stored in a text page somewhere in the data file(s), with a physical pointer stored in the data/index record (taking either 16 or 24 bytes itself).

  • As a N/CHAR column. This is a great choice when the data that's stored in the column is a fixed size all the time, and always uses the full width of the column. Any time that the data may be smaller than the defined wdith of the column, space is being wasted in the row. Wasted space leads to fewer rows per page, more disk space being used to store the data, more I/Os to read the data, and more memory used in the buffer pool. However, if the character values are very volatile, and can change size, then having a fixed-width column can avoid the problem of a row having to expand and there not being enough space on the page to allow that - leading to a fragmentation-causing page split in an index (or forwarding record in a heap). There's a tipping point that can be hard to identify for your particular application...
  • As a N/VARCHAR (1-8000) column. For data values less than 8000 bytes, this is the common choice as it avoids wasted space. However, if the application can change the size of the data after the initial creation of the row, then there is the possibility of fragmentation occuring through page-splits. In SQL Server 2005+, a row can also be created that is more than 8060 bytes - one or more variable-length columns is pushed into off-row storage and replaced by a physical pointer. This means any access of the column has to do an extra I/O to reach the data - and this is commonly a physical I/O as the text page is not already in memory. This can lead to hard-to-diagnose performance issues if a query selects the column and some rows have the data in-row, and some out-of-row. Also, if the data values tend towards the larger end of the 1-8000 byte spectrum, individual rows can become vary large, leading to very few rows per page - and the problems described in the first option. If the data isn't used very much, then storing it in-row like this isn't very efficient.
  • As a N/VARCHAR (MAX) column in-row. This has the same pros and cons as the option above, with the added benefit that the value can grow larger than 8000 bytes. In that case it will be pushed off-row automatically, and start incurring the extra I/O for each access. These data types also work with the intrinsic functions in the same way as the character data types discussed above. I guess one drawback of this type compared to FILESTREAM is that it's limited to 2GB. Also, if there's a LOB data column in the table definition, the table's clustered index cannot have online operations peformed on it - even if all the LOB values are NULL or stored in-row!
  • As a N/VARCHAR (MAX) column out-of-row. The drawback of storing this data out-of-row is that accessing it requires an extra I/O to retrieve it, but if the data isn't used very much then this is an efficient way to go, but still uses space in-row to store the off-row pointer. An additional benefit of storing the data off-row is that it can be placed in a separate filegroup, possibly on less expensive storage (e.g. RAID 5) - but then there's the drawback that it can't be moved after being created except with an export/import operation. This option has the same online operations drawback as storing the data in-row.
  • As a N/TEXT column in-row. This has the same pros and cons as the N/VARCHAR (MAX) column in-row option, but these data types are deprecated and don't work with the majority of the intrinsic functions.
  • As a N/TEXT column out-of-row. Same as above.
  • In a seperate table and JOIN to it when required. This option is great when the data isn't used very much, as it doesn't require any storage at all in the main table (except for a value to use for the JOIN), but it does require some extra up-front design and slightly more complicated queries. There's another HUGE benefit to doing this - by moving the LOB data to another table, online operations become available on the main table's clustered index. (This concept is 'vertical partitioning' a huge topic in itself...)
  • As a FILESTREAM column. (Yes, I didn't have this in the survey, but it's a possibility). If your data values are going to be more than 1MB, then you may want to consider using the FILESTREAM data type in SQL 2008 to allow much faster access to the data than having to read it through the buffer pool before giving it to the client. There are lots of pros and cons to using FILESTREAM - see my whitepaper for more info here.

So, as you can see, the best answer for a general question like this is definitely It Depends!. Although I haven't covered every facet of each storage option, the aim of this post is to show that it is very important to consider the implications of the method you choose, as it could lead to performance problems down the line.

Next post - this week's survey!

The May 2009 TechNet Magazine is now available online, and it's the annual security issue. In there is an article I wrote highlighting 10 common security issues (and solutions) you should worry about if you're not a security-savvy DBA. It covers:

  • Physical security
  • Network security
  • Attach surface minimzation
  • Service accounts
  • Restricting use of administrator privileges
  • Authentication
  • Authorization
  • SQL injection
  • Disaster recovery
  • Auditing

There are also two screencasts of me demonstrating Transparent Data Encryption and SQL Server Audit, both in SQL Server 2008. 

To quote myself from the end of the article:

As far as takeaways from this article are concerned, I want you to realize that there are some steps you need to go through to ensure the data you are storing in SQL Server is as secure as you need it to be. This is especially important when you inherit a SQL Server instance that someone else has been managing. It's just like buying a house from someone—you need to ask if the alarm works, if the yard is fenced in, and who has copies of the keys. Running through the list I've given in this article is a good start, but make sure you dig deeper in areas that are relevant to you.

Check it out at http://technet.microsoft.com/en-us/magazine/2009.05.sql.aspx.

Enjoy!

Last week's survey was on what method you use to run consistency checks (see here for the survey). Here are the results as of 3/27/09 - again, very encouraging:

As you can see, 70% of respondents run DBCC CHECKDB on the production server, either with PHYSICAL_ONLY or without. For those running without it, be aware that using PHYSICAL_ONLY turns DBCC CHECKDB from a CPU-bound process into an I/O-bound process, and makes it run (potentially) magnitudes faster. It will still evaluate page checksums and torn-page protection, just skipping the higher-level logical checks.

I'm surprised to see so many people using a completely separate system to run consistency checks (restoring a backup and running a CHECKDB on the restored backup) - I've been a proponent of this method for a few years now, but I didn't think it had caught on so much (assuming a reasonably representative sample of readers responded to the survey). This method allows the entire consistency checking workload to be offloaded, and completely validates the backups used - but has the downside of requiring extra disk space on another server to restore the backup (I wish they'd build the system I got a patent for - to consistency check the database inside a backup without restoring it - see here).

Three of these choices I threw in to see if anyone was doing them so I could explain why they're not good methods to use.

  • "Don't run any consistency checks at all". I'm sure I don't need to labor the point with this one - you need to run consistency checks as well as having some kind of page protection turned on. If you don't proactively check for corruption, when it does occur it will likely be more widespread, and harder to recover within your data-loss and recovery time objectives than if you'd discovered it earlier. I've written lots about this in the Corruption and CHECKDB From Every Angle categories.
  • "Run DBCC CHECKDB on a database snapshot on a mirror database". I was asked about this several times while here at SQL Connections too. Database mirroring works by shipping transaction log records between the principal and mirror databases, NOT by shipping database pages - so if a page gets corrupted on the principal, the corruption will not be transferred to the mirror. This means that DBCC CHECKDB on a snapshot of the mirror does not reflect the consistency state of the principal database at all. Saying that, it is however, possible for corruption in the principal to affect the mirror. Imagine a column value is corrupted on disk by the principal's I/O subsystem. If that value is then read and used to calculate another value, which is then persisted - that calculated value is also 'corrupt' and will be reflected in the mirror database. Kind of an insidious, second-order corruption effect.
  • "Use BACKUP WITH CHECKSUM to validate page checksums, no DBCCs". Another option that sounds feasible, but in fact isn't for a couple of reasons. Firstly, BACKUP WITH CHECKSUM will stop when it finds a bad checksum, whereas DBCC CHECKDB will continue reading the rest of the database and tell you everything that's wrong with it. Secondly, BACKUP WITH CHECKSUM will only check those pages that HAVE page checkums, whereas DBCC CHECKDB will consistency check everything, regardless of whether a page has a checksum or not. For a database upgraded from an earlier version, where not every page will have a page checksum since you enabled them after upgrading, this is a critical point.

To summarize, this week's results were great, with almost 90% of respondents running some kind of consistency check. If you're not running any, for whatever reason, rethink your decision - there's always a way to run some kind of consistency checks and give yourself more peace of mind. See CHECKDB From Every Angle: Consistency Checking Options for a VLDB for more info.

Next post - this week's survey!

While I was teaching the MCM-Database class last week, we were discussing fragmentation and the effect of a high-order GUID key on an index. Without going into too many details, having a random GUID - as generated from the NEWID() - function is bad, but having one generated by NEWSEQUENTIALID() isn't anyway near so bad (I'll discuss the details more in the fragmentation series I'm starting). As part of the demo, we wanted to change the column default for the leading key of a table from NEWID() to NEWSEQUENTIALID() - problem was that none of us could remember the exact syntax, so we worked it out together. I thought it would make an interesting post, so here it is.

First off, here's my table with a poor clustered index key:

CREATE TABLE BadKeyTable (
    c1 UNIQUEIDENTIFIER DEFAULT NEWID () ROWGUIDCOL,
    c2 SMALLDATETIME DEFAULT GETDATE (),
    c3 CHAR (400) DEFAULT 'a',
    c4 VARCHAR(MAX) DEFAULT 'b');
GO
CREATE CLUSTERED INDEX BadKeyTable_CL ON BadKeyTable (c1);
GO

INSERT INTO BadKeyTable DEFAULT VALUES;
GO

(And you'll notice that I've given up doing nice colors in the T-SQL I post - it's too time consuming). The default we're interested in is in bold above. To change the default, we first need to find the constraint name so we can drop it. There are two queries you can use:

SELECT [name] FROM sys.objects
WHERE [parent_object_id] = OBJECT_ID ('BadKeyTable');
GO

or

SELECT [name] FROM sys.default_constraints
WHERE [parent_object_id] = OBJECT_ID ('BadKeyTable');
GO

The second is obviously the more supported way, as the first will return the names of all sub-objects of this table - all constraints, and all internal tables, such as XML indexes. The second query returns:

name
-------------------------------
DF__BadKeyTable__c1__7C8480AE
DF__BadKeyTable__c2__7D78A4E7
DF__BadKeyTable__c3__7E6CC920
DF__BadKeyTable__c4__7F60ED59

The constraint we're interested in is the one for the first column - DF__BadKeyTable__c1__7C8480AE. Now we need to drop the constraint and then add the new one as there's no way to simply alter the constraint in-place. We do that using:

ALTER TABLE BadKeyTable DROP CONSTRAINT DF__BadKeyTable__c1__7C8480AE;
GO

ALTER TABLE BadKeyTable ADD CONSTRAINT DF__BadKeyTable__c1
DEFAULT NEWSEQUENTIALID() FOR c1;
GO

And we're done.

A thread cropped up on SQLServerCentral involving IAM chain corruption (see Inside the Storage Engine: IAM pages, IAM chains, and allocation units for details of IAM chains). The error from DBCC CHECKDB was:

Server: Msg 2576, Level 16, State 1, Line 1
IAM page (0:0) is pointed to by the previous pointer of IAM page (1:394336) object ID 229575856 index ID 4 but was not detected in the scan.

and there was some discussion of what the error meant, and why the initial page ID in the error of (0:0) means something special. There was a further question of how the errors would differ if the IAM page header was partially zero'd out by an I/O subsystem error.

We're on-stage here at SQL Connections doing a pre-con and I'm not on until this afternoon so I can bang out a quick blog post! I'm going to create a small database and show the difference between the two cases. The error above was from a SQL 2000 database, but the behavior is the same on SQL 2005 and 2008. Here's the script to create the database.

CREATE DATABASE CorruptIAMEXample;
GO
USE CorruptIAMExample;
GO

CREATE TABLE test (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
CREATE CLUSTERED INDEX test_cl on test (c1);
GO

SET NOCOUNT ON;
GO
INSERT INTO test DEFAULT VALUES;
GO 1000

After corrupting the database, I can reproduce the error above by running DBCC CHECKDB on it:

Msg 2576, Level 16, State 1, Line 1
The Index Allocation Map (IAM) page (0:0) is pointed to by the previous pointer of IAM page (1:153) in object ID 0, index ID -1, partition ID 0, alloc unit ID 72057594042384384 (type Unknown), but it was not detected in the scan.

The error is saying that the first IAM page in the IAM chain (page 1:153) does not have a reference from metadata. The sysallocunits system table contains a link to the first IAM page, the root page, and the first page. You can see these by querying the sys.system_internals_allocation_units catalog view, or you can see this blog post  - Inside The Storage Engine: sp_AllocationMetadata - putting undocumented system catalog views to work. I corrupted the sysallocunits table so that the link to the first IAM page of the test table was removed. The clue is the first page ID in the error - if it's a (0:0), that's the missing metadata case. For SQL 2000, this can happen if someone manually updates the sysindexes table.

Now what about the other case, where the IAM page itself has a corrupt header? I recreated the database again and corrupted the header of the first IAM page of the test table. Here's the output from DBCC CHECKDB:

Msg 2575, Level 16, State 1, Line 1
The Index Allocation Map (IAM) page (1:153) is pointed to by the next pointer of IAM page (0:0) in object ID 2073058421, index ID 1, partition ID 72057594038386688, alloc unit ID 72057594042384384 (type In-row data), but it was not detected in the scan.
Msg 7965, Level 16, State 2, Line 1
Table error: Could not check object ID 2073058421, index ID 1, partition ID 72057594038386688, alloc unit ID 72057594042384384 (type In-row data) due to invalid allocation (IAM) page(s).
Msg 8928, Level 16, State 6, Line 1
Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown): Page (1:153) could not be processed. See other errors for details.

and a whole bunch of

Msg 8905, Level 16, State 1, Line 1
Extent (1:216) in database ID 21 is marked allocated in the GAM, but no SGAM or IAM has allocated it.

errors. This is because the IAM page no longer looks like an IAM page and so DBCC CHECKDB can't process it as such. If I zero out the IAM page completely, DBCC CHECKDB returns basically the same errors as above. 

I've created a zipped backup of the SQL 2005 database in each case:

You'll need to do a RESTORE FILELISTONLY and then maybe move the files when you restore. Have fun!

Last week’s survey was on what kind of regular index maintenance you perform (see here for the survey) as a way of kicking off a new series I’m writing around index maintenance. Here are the results as of 3/21/09 – I find them very encouraging:

As you can see, about 2/5 of respondents are performing some form of analysis-based fragmentation removal (answers 5+6), which I consider the best way to perform index maintenance, if you’re willing to invest the time involved to set it up. It allows the least amount of work to be performed, for the most targeted performance gains – and so is especially appropriate for 24x7 systems where there’s a minimal or non-existent maintenance window.

The next best option is to do all rebuilds or all defrags based on a fragmentation threshold (answers 3+4), which about 1/5 of respondents do. This also allows work to be limited, but by choosing only a single method of removing fragmentation, there are pros and cons. Now, the survey was limited to a number of questions so I couldn’t explore what the threshold is that people are using (e.g. logical fragmentation, page density, extent fragmentation, or something else). Some measures are good to use and some not so good, and I’ll be exploring the various counters and ways of determining fragmentation as the series progresses.

Either of the options to operate on all indexes regardless of fragmentation (options 1+2) can lead to lots of wasted resources (disk space, transaction log space, I/Os, CPU) by operating on indexes that are not fragmented in the first place, or for which fragmentation removal has no benefit for workload performance. About 1/3 of respondents do this. This isn’t surprising to me as rebuild-all-the-indexes-every-night/week is a very common index maintenance plan for “involuntary DBAs” who know that index maintenance is important, but don’t have the knowledge or training to implement a more sophisticated maintenance plan. This growing size of this group of people is one of the main reasons I’m going to write this series.

Doing absolutely nothing for index maintenance, which about 1/10 do, is usually not a good idea, as indexes in a database that’s not read-only commonly become fragmented over time. However, these people may know they don’t suffer from fragmentation issues, or that removing fragmentation has no effect on workload performance. However, my suspicion (based on what I see in the field) is that some of these respondents don’t realize the benefits of performing index maintenance.

You may be surprised to hear that I don’t consider doing nothing to be the worst choice. That dubious honor goes to answer 7 – doing any kind of index maintenance followed by a database shrink operation – as 3 respondents do. A post-maintenance shrink operation may well undo some of the benefits of the maintenance by introducing massive amounts of index fragmentation – see my blog post Auto-shrink - turn it OFF! for details of how bad this can get.

I’m actually pretty encouraged by these results though. Compared to how things were around 1999 when I wrote DBCC INDEXDEFRAG and DBCC SHOWCONTIG for SQL Server 2000, these results show that knowledge in the field (or at least in the group that reads my blog and responded) has vastly improved. My aim for the forthcoming series about index fragmentation and maintenance is to increase knowledge a lot more broadly.

Next up - this week's survey. Thanks for reading!

Way back at the start of me blogging here I wrote a comprehensive description of ghost records and the ghost cleanup process - see Inside the Storage Engine: Ghost cleanup in depth. A question came up in the class I'm teaching this week that's worth answering in a blog post - do ghost records occur in heaps? The answer is no, not during normal processing.

When snapshot isolation is enabled, deletes from a heap are ghosted, as part of the overall versioning process, which can lead to some interesting side-effects. A versioned record has an extra 14-bytes tagged on the end, so a heap record that suddenly becomes versioned is 14-bytes longer - which may mean it doesn't fit on the page any longer. This could lead to it being moved, resulting in a forwarding/forwarded record pair - just because the record was deleted! Now, the page has to be full for this time happen, and the Storage Engine will take steps to avoid this happening for rows less than 32 bytes long - but that's getting a little too deep.

Anyway, I digress. I want to show you the difference between deleting from a clustered index and from a heap. I'm going to create two such tables, then delete row from each and roll it back.

CREATE TABLE t1 (c1 CHAR (10));
CREATE CLUSTERED INDEX t1c1 on t1 (c1);
GO

CREATE TABLE t2 (c1 CHAR (10));
GO

INSERT INTO t1 VALUES ('PAUL');
INSERT INTO t1 VALUES ('KIMBERLY');

INSERT INTO t2 VALUES ('PAUL');
INSERT INTO t2 VALUES ('KIMBERLY');
GO

-- prevent random background transactions
ALTER DATABASE GhostTest SET AUTO_CREATE_STATISTICS OFF;
GO

BEGIN TRAN DelFromClust;
DELETE FROM t1 WHERE c1='KIMBERLY';
ROLLBACK TRAN;
GO

BEGIN TRAN DelFromHeap;
DELETE FROM t2 WHERE c1='KIMBERLY';
ROLLBACK TRAN;
GO

SELECT * FROM ::fn_dblog (null, null);
GO

Here's a portion of the results from looking in the transaction log. The line of code where I turn off auto-update stats is just to prevent the auto-create transactions from cluttering up my view of the transaction log.

The first (highlighted) transaction is for the delete/rollback in the clustered index. You can clearly see that the third column shows a log context of ghosting for the LOP_DELETE_ROWS log record, plus the setting of the 'this page has at least one ghost record' in the PFS byte for that page.

The second (unhighlighted) transaction is for the delete/rollback in the heap. Here you can see that it just does a straight delete.

If you look at the data page contents before the rollback in both cases, for the clustered index you'll still be able to see the deleted (ghosted) record, and for the heap you'll see the deleted record really is deleted.

Hope this helps.

I'm teaching the Microsoft Certified Masters - Database qualification this week here in Redmond, and in part of day one I discuss the FILESTREAM directory structure. I was asked the question where do the directory name GUIDs come from? so I started digging around in the system tables while Kimberly was lecturing. Take a look at my previous blog post (FILESTREAM directory structure) from last week to see the database schema I'm working with. I recreated it again and wrote some queries to find where the GUIDs are stored, as they have to be stored in the database somewhere.

Here's the query to find all the FILESTREAM directory names, for both levels of directory (and you have to run this through the DAC as it's accessing undocumented, hidden system tables):

SELECT o.name AS [Table],
    cp.name AS [Column],
    p.partition_number AS [Partition],
    r.rsguid AS [Rowset GUID],
    rs.colguid AS [Column GUID]
FROM sys.sysrowsets r
    CROSS APPLY sys.sysrscols rs
    JOIN sys.partitions p ON rs.rsid = p.partition_id
    JOIN sys.objects o ON o.object_id = p.object_id
    JOIN sys.syscolpars cp ON cp.colid = rs.rscolid
WHERE rs.colguid IS NOT NULL AND o.object_id = cp.id
AND r.rsguid IS NOT NULL AND r.rowsetid = rs.rsid;
GO

See below for a screenshot of this using my scenario.

You can see that I'm connected through the admin connection in SSMS and that the top-level directory name is derived from the rowset GUID, with the column-level directory name is derived from the column GUID. Note that some parts are byte-reversed, but they're definitely the right GUIDs.

Enjoy!

Last week I kicked off the first weekly survey - on whether you validate your backups or not (see here for the survey). The results are very interesting (as of 3/13/09):

As you can see, almost 25% of respondents never validate their backups! And a further 25% only validate them occasionally, with 30% doing some kind of regular checks, and only a handful checking all the time.

While these results may seem shocking to you, based on what I've heard when teaching, they're pretty normal. There are lots of reasons why DBAs may choose not to validate backups as often as they should, including:

  • Not enough time to restore the backups to check them
  • Not enough disk space
  • Not part of the day-to-day operations guide
  • Don't see why it's important

Kimberly and I have a saying (well, to be fair, Kimberly coined it): you don't have a backup until you've restored it. You don't know whether the backup you just took was corrupt or not and will actually work in a disaster recovery situation.

Can you ever get a guarantee? No. Here's an analogy, taken from a very old post of mine. Consider Paul, who works for the Seattle Police Department in traffic control. Paul's in a control room somewhere in the city with a large bank of monitors connected to various traffic cameras. Paul's job is to cycle through the cameras every 1/2 hour, looking for traffic accidents. At the end of the 1/2 hour cycle, if Paul han;t seen any accidents then he knows that there are no accidents in the city.

Ah - but hold on. Does Paul really know that? No. All Paul knows is that at the point he looked at a particular camera, there was not an accident at that spot in the city. The very instant he switches to another camera feed, an accident could happen at a spot covered by the previous camera.

The same is true for validating backups. As soon as you've validated a backup, it could then be corrupted by the I/O subsystem, but at least you know that it was valid at some point. But what if that happens, I hear you ask? Well, then you need to have multiple copies of your backups, and you should not rely on backups as the only method of disaster recovery. A good high-availability solution includes as many technologies as you need to mitigate all risks - and backups are just one of those technologies. You're going to have to have some kind of redundant system too that you can fall back on (or mayb even immediately failover to, depending on your particular disaster recovery plan). But, saying that, you can't rely on the redundant server either - if it goes wrong, you'll need your backups.

So - whichever way you look at it, validating backups is a really good practice to get into so you don't get bitten when it comes to the crunch. When I teach, I've got many stories of customers losing data, business, time, and money (and DBAs losing their jobs) because the backups didn't work or were destroyed along with the data. Here's one for you (simplified, and no I won't divulge names etc). Major US investment firm decides to provision new hardware, so takes a backup of the database storing all the 401k accounts for all their customers (private and corporate), flattens the hardware, and goes to restore the backup. The backup is corrupt - on SQL 2000, where there's no RESTORE ... WITH CONTINUE_AFTER_ERROR. What happened? Well, the SQL team and Product Support had to get involved to help get the data back, but people in the firm lost their jobs and it cost a lot of time and money to recover the data. If only they'd had multiple copies of the backup, and tested their backup before removing the database (or better yet, restored the database on the new hardware before flattening the old hardware). They learned a costly lesson, but they did change their practices after that.

Unfortunately this is so often the way - people don't realize they need to validate backups or have an HA plan UNTIL they have a disaster. Then suddenly its the top priority at the company. Being proactive can save a lot of grief, and make you look good when disaster strikes.

Backups can be unusable for a number of reasons, including:

  • The full backup is corrupt, because something in the I/O subsystem corrupted it.
  • A backup in the log backup chain is corrupt, meaning restore cannot continue past that point in the chain.
  • All backups following a full backup are written to the same backup set, but the WITH INIT clause is used accidentally on all backups, meaning the only backup present in the backup set is the last one taken.
  • An out-of-band backup was taken without using the WITH COPY_ONLY clause and the log backup chain was broken (see BACKUP WITH COPY_ONLY - how to avoid breaking the backup chain).
  • The backups worked but the database contained corruption before it was backed up (kind of a separate issue).

The only ones that are out of your control are the first two, but they can be mitigated by having multiples copies of backups. All of these though, can be avoided at disaster recovery time by reguarly restoring your backups as a test of what you'd do if there was a real disaster. You might be surprised what you'd find out...

This is more of an editorial style post than a deep technical or example script post - I'm going to start doing more of these around the weekly surveys. Next post - this week's survey.

Thanks!

After writing the FILESTREAM whitepaper for Microsoft, I've had lots of questions about the structure of the FILESTREAM data container. The FILESTREAM data container is the technical term for the NTFS directory structure where all the FILESTREAM data is stored.

When you want to use FILESTREAM data, you first add a filegroup (during or after database creation):

ALTER DATABASE FileStreamTestDB ADD FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM;
GO

And then add a 'file' to the filegroup:
ALTER DATABASE FileStreamTestDB ADD FILE (
     NAME = FSGroup1File, FILENAME = 'C:\Metro Labs\FileStreamTestDB\Documents')
TO FILEGROUP FileStreamGroup1;
GO

The 'file' is actually the pathname to what will become the root directory of the FILESTREAM data container. When it's initially created, it will contain a single file, filestream.hdr, and a single directory $FSLOG. Filestream.hdr is a metadata file describing the data container and the $FSLOG directory is the FILESTREAM equivalent of the database transaction log. You can think of them as equivalent, although the FILESTREAM log has some interesting semantics, which I'll cover in a separate post.

The question I most often get is: are all the FILESTREAM files for a database stored in one gigantic directory? The answer is no.

The root directory of the data container contains one sub-directory for each table (or each partition of a partitioned table). Each of those directories contains a further sub-directory for each FILESTREAM column defined in the table. An example is below, with the screen shot taken after running the following code:

CREATE TABLE FileStreamTest1 (
    DocId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
    DocName VARCHAR (25),
    Document VARBINARY(MAX) FILESTREAM);
GO

CREATE TABLE FileStreamTest2 (
    DocId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
    DocName VARCHAR (25),
    Document1 VARBINARY(MAX) FILESTREAM,
    Document2 VARBINARY(MAX) FILESTREAM);
GO

INSERT INTO FileStreamTest1 VALUES (NEWID (), 'Paul Randal', CAST ('SQLskills.com' AS VARBINARY(MAX)));
INSERT INTO FileStreamTest1 VALUES (NEWID (), 'Kimberly Tripp', CAST ('SQLskills.com' AS VARBINARY(MAX)));
GO

This image shows the FILESTREAM data container for our database that has two tables with FILESTREAM columns, each with a single partition. The first table has a two FILESTREAM columns and the second has a single FILESTREAM column. The filenames of all these directories are GUIDs. In the example, you can see two FILESTREAM files in a column-level directory. The FILESTREAM file names are actually the log-sequence number from the database transaction log at the time the files were created. You can correlate these by looking at the data with DBCC PAGE, but first finding the allocated pages using sp_AllocationMetadata (see this blog post):

EXEC sp_AllocationMetadata FileStreamTest1;
GO

Object Name     Index ID Alloc Unit ID     Alloc Unit Type   First Page Root Page First IAM Page
--------------- -------- ----------------- ---------------   ---------- --------- --------------
FileStreamTest1 0        72057594039697408 IN_ROW_DATA       (1:169)    (0:0)     (1:170)
FileStreamTest1 0        72057594039762944 ROW_OVERFLOW_DATA (0:0)      (0:0)     (0:0)
FileStreamTest1 2        72057594039828480 IN_ROW_DATA       (1:171)    (1:171)   (1:172)

(3 row(s) affected)

Notice there's a nonclustered index as well as the heap - that's the index that's enforcing the uniqueness constraint on the UNIQUEIDENTIFIER column. Now we can use DBCC PAGE to look at the first page of the heap, which will have out data records in:

DBCC TRACEON (3604);
DBCC PAGE (FileStreamTestDB, 1, 169, 3);
GO

<snip>

Slot 0 Offset 0x60 Length 88

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 88
Memory Dump @0x514EC060

00000000:   30001400 140d5047 2ca9f24f 874d35ca †0.....PG,©òO‡M5Ê
00000010:   e9e77649 03000002 00280058 80506175 †éçvI.....(.X.Pau
00000020:   6c205261 6e64616c 03000000 00000080 †l Randal........
00000030:   140d5047 2ca9f24f 874d35ca e9e77649 †..PG,©òO‡M5ÊéçvI
00000040:   01000000 68020000 00000000 17000000 †....h...........
00000050:   79000000 0c000000 †††††††††††††††††††y.......

Slot 0 Column 1 Offset 0x4 Length 16 Length (physical) 16

DocId = 47500d14-a92c-4ff2-874d-35cae9e77649

Slot 0 Column 2 Offset 0x1d Length 11 Length (physical) 11

DocName = Paul Randal

Document = [Filestream column] Slot 0 Column 3 Offset 0x28 Length 48

ColType = 3                          FileId = -2147483648                 UpdateSeq = 1
CreateLSN = 00000017:00000079:000c (23:121:12)                            TxFMiniVer = 0
XdesId = (0:616)

<snip>

You can see that the CreateLSN I've highlighted above matches the filename of the first FILESTREAM file in the example image.

Hopefully this explains how the FILESTREAM files are stored - more on this in the next post where I'll show how updates and garbage collection are implemented.

 

Microsoft SQL Server 2008 Internals finally starts to roll off the presses at Microsoft Press today! For me this is the first time in print (in book form at least) and my consistency checking chapter (about 1/10th of the book) represents a complete brain-dump of everything I know about how DBCC CHECKDB works.

Kimberly also wrote a great chapter on index internals with some nice examples of how and why they're linked together the way they are. And of course the rest of the book is excellent too, mostly by Kalen but with additional guest chapters on the query optimizer by one of its architects, Conor Cunningham, and on tracing and auditing by fellow-MVP Adam Machanic - all good friends of ours.

Check it out on Amazon.com here.

One set of features i haven't blogged about yet in SQL Server 2008 are the new security features: SQL Server Audit, Transparent Data Encryption, and Extensible Key Management. I've just finished writing a security article for the May 2009 TechNet Magazine (the annual security issue) and while trolling around TechNet I found that the security team has just published a comprehensive whitepaper on SQL Server Audit.

The whitepaper covers:

  • Functional overview of the feature
  • Technical overview of using the feature, including how to define the different levels of audit specification (database and server)
  • Audit files and the event log
  • Performance considerations (it's designed to be much faster than other auditing features are is uses the high-peformance extended events feature under the covers)
  • Examples of using it, both through T-SQL and SSMS

Check it out at http://msdn.microsoft.com/en-us/library/dd392015.aspx.

While trawling through the latest 2008 Books Online this morning to answer a question, I noticed a new section that I hadn't seen before, which explains in detail how to perform a rolling upgrade with database mirroring. The link to the MSDN page is http://msdn.microsoft.com/en-us/library/bb677181.aspx and below I've linked to the flowchart from that page.  

Books Online also has a lot of other "how-to" topics around database mirroring - here are some links:

Kevin Cox of the SQLCAT team also just blogged about a customer upgrade from 2005 to 2008 and some of the issues they faced. Btw - if you're not subscribed to their blog, you definitely should be - lots of cool stuff.

And now the bug. In 2008 RTM, if your database contains full-text then mirroring will not work when you perform a rolling upgrade. This is explained in KB 956017 (with a trace-flag workaround) and KB 957816, which points at 2008 RTM CU1 that has the fix in.

In the TechNet Magazine article I wrote on Advanced Troubleshooting with Extended Events, I mentioned the always-on event session called system_health. Jonathan Kehayias, a fellow MVP and blogging mad-man, posted a great article with SQL Server Central today about how to get historical deadlock graph info from it. His article explains some of the pre-reqs for doing this (like installing 2008 RTM CU1 - see KB 956717 - to get it working and a bug with the XML output) and gives some code.

After wrestling with the download of CU1 (it took 5 tries to get it to download!), I got it installed in one of my VPCs and gave Jon's code a whirl. I forced a deadlock by creating two tables, locking them from separate transactions and then trying to select from the other table. See the code below:

-- Connection 1:
CREATE TABLE t1 (c1 INT);
INSERT INTO t1 VALUES (1);
GO

BEGIN TRAN
UPDATE t1 SET c1 = 2;
GO

-- Connection 2:
CREATE TABLE t2 (c1 INT);
INSERT INTO t2 VALUES (1);
GO

BEGIN TRAN
UPDATE t2 SET c1 = 2;
GO

-- Connection 1:
SELECT * FROM t2;
GO

-- Connection 2:
SELECT * FROM t1;
GO

And one of them will be killed by the deadlock monitor.

Msg 1205, Level 13, State 45, Line 1
Transaction (Process ID 57) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Now running Jon's (slightly reformatted) code:

SELECT CAST (
    REPLACE (
        REPLACE (
            XEventData.XEvent.value ('(data/value)[1]', 'varchar(max)'),
            '<victim-list>', '<deadlock><victim-list>'),
        '<process-list>', '</victim-list><process-list>')
    AS XML) AS DeadlockGraph
FROM (SELECT CAST (target_data AS XML) AS TargetData
    FROM sys.dm_xe_session_targets st
    JOIN sys.dm_xe_sessions s ON s.address = st.event_session_address
    WHERE [name] = 'system_health') AS Data
CROSS APPLY TargetData.nodes ('//RingBufferTarget/event') AS XEventData (XEvent)
    WHERE XEventData.XEvent.value('@name', 'varchar(4000)') = 'xml_deadlock_report';

Gives the XML deadlock graph (which unfortunately can't be displayed graphically because the format differs from the Profiler/trace-flag output)

<deadlock-list>
  <deadlock>
    <victim-list>
      <victimProcess id="process53d9000" />
    </victim-list>
    <process-list>
      <process id="process53d9000" taskpriority="0" logused="256" waitresource="RID: 1:1:304:0" waittime="331" ownerId="1868" transactionname="user_transaction" lasttranstarted="2009-02-23T13:32:05.100" XDES="0x4eb8c10" lockMode="S" schedulerid="1" kpid="2216" status="suspended" spid="57" sbid="0" ecid="0" priority="0" trancount="1" lastbatchstarted="2009-02-23T13:32:33.093" lastbatchcompleted="2009-02-23T13:32:05.100" clientapp="Microsoft SQL Server Management Studio - Query" hostname="CHICAGO" hostpid="2552" loginname="CHICAGO\Administrator" isolationlevel="read committed (2)" xactid="1868" currentdb="1" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200">
        <executionStack>
          <frame procname="" line="1" sqlhandle="0x02000000c70b7b1daba27f2ddf3e772600949464d524b6f2" />
        </executionStack>
        <inputbuf>
SELECT * FROM t1;
    </inputbuf>
      </process>
      <process id="process4985558" taskpriority="0" logused="256" waitresource="RID: 1:1:294:0" waittime="10181" ownerId="1877" transactionname="user_transaction" lasttranstarted="2009-02-23T13:32:08.067" XDES="0x4eb9be0" lockMode="S" schedulerid="1" kpid="2020" status="suspended" spid="56" sbid="0" ecid="0" priority="0" trancount="1" lastbatchstarted="2009-02-23T13:32:23.240" lastbatchcompleted="2009-02-23T13:32:08.067" clientapp="Microsoft SQL Server Management Studio - Query" hostname="CHICAGO" hostpid="2552" loginname="CHICAGO\Administrator" isolationlevel="read committed (2)" xactid="1877" currentdb="1" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200">
        <executionStack>
          <frame procname="" line="1" sqlhandle="0x02000000f4d34532698b7ad324df813feb2ba5024730cb79" />
        </executionStack>
        <inputbuf>
SELECT * FROM t2;
    </inputbuf>
      </process>
    </process-list>
    <resource-list>
      <ridlock fileid="1" pageid="304" dbid="1" objectname="" id="lock47aea80" mode="X" associatedObjectId="72057594039042048">
        <owner-list>
          <owner id="process4985558" mode="X" />
        </owner-list>
        <waiter-list>
          <waiter id="process53d9000" mode="S" requestType="wait" />
        </waiter-list>
      </ridlock>
      <ridlock fileid="1" pageid="294" dbid="1" objectname="" id="lock47af200" mode="X" associatedObjectId="72057594038976512">
        <owner-list>
          <owner id="process53d9000" mode="X" />
        </owner-list>
        <waiter-list>
          <waiter id="process4985558" mode="S" requestType="wait" />
        </waiter-list>
      </ridlock>
    </resource-list>
  </deadlock>
</deadlock-list>

Cool!

Cumulative Update 1 for SQL 2008 RTM contains fixes for two nasty FILESTREAM bugs (among a lot of other bug fixes).

The first one concerns restoring a 2008 database from a series of log backups when the database contains FILESTREAM info. It's possible that a race condition can cause one of the log backups to miss backing up a FILESTREAM file - resulting in a corrupt database after a restore operation. See KB 957809 for more details.

The second bug occurs when a clustered index is rebuilt using ALTER INDEX ... REBUILD and the table contains FILESTREAM data. In this case, it's possible that all the FILESTREAM files are copied, meaning the rebuild operation can take a very long time if there's lots of large FILESTREAM data values in the table. There's no need for the FILESTREAM data to be copied during an index rebuild as the FILESTREAM locations and filenames remain the same. See KB 957823 for more details.

You can get the fixes for these in CU1 - see KB 956717 for the download (right at the top of the page) and the list of all other fixes included in the update.

Microsoft has released a PDF showing all the SQL Server 2008 system tables - very cool but you'll need a big printer to print it out - or go to Kinko's. Download it from Microsoft here.

Categories:
SQL Server 2008

The January 2009 issue of TechNet Magazine is now available on the web and has a new article I wrote on Advanced Troubleshooting with Extended Events in SQL Server 2008. The article covers:

  • An overview of troubleshooting in SQL Server, along with links to a bunch of tools like DMVStats and RML
  • An overview of extended events, with descriptions of all the parts of the feature and how to investigate them with the new DMVs
  • Performance considerations of how you setup an extended events session, especially on multi-core servers
  • An example where I build an I/O chargeback mechanism to tie into a system controlled by the SQL Server 2008 resource governor

You can get to the article here, which also has a link to a screencast of me doing a demo. And you can bet that this is an area I'll be blogging about going forward too.

Enjoy!

PS This issue also has the editorial where I get made a Contributing Editor Smile

I was perusing the latest release of the SQL Server 2008 Books Online on MSDN (look, Kimberly's in Dublin this week - what else am I supposed to do to amuse myself in the evenings? Smile) and found a cool new section on change data capture in the SSIS section. It's called Improving Incremental Loads with Change Data Capture and shows how to create an SSIS package that will pull incremental change data for a single table, and for multiple tables. If you read my article on CDC in the October TechNet Magazine (see my blog post here) and have been playing around, then this BOL topic could save you a bunch of time.

Check it out at http://msdn.microsoft.com/en-us/library/bb895315.aspx.

In the previous blog post I mentioned our partner company in Australia, so I'd better explain...

We have a new partner company - SQLskills.com.au - run by our good friend and fellow MVP Greg Linwood, along-side his other company MyDBA, which provides DBA support, consulting and staffing services to customers world-wide. By extending SQLskills.com into Australia, we can provide world-class training to the burgeoning SQL Server base in and around Australia, without customers having to travel to the US. As well as providing their own custom courses, the SQLskills.com.au team will be teaching courses developed by me, Kimberly, and Bob Beauchemin - using only the best instructors that we've personally taught and approved.

Update 12/10/08: the Australia classes have been pushed out to June 2009 for various reasons - watch the blog for more details.

The courses have already been running the last two months to great success and I'm very excited to announce that Kimberly and I are coming to Australia in February 2009 to teach four classes in the SQLskills.com.au training facility in Melbourne. The classes we have planned are:

Click the links for more details and registration info. The internals course (or equivalent knowledge) is really a pre-requisite for the perf tuning and maintenance courses that immediately follow it, as these courses will be really in-depth.

Checkout the SQLskills.com.au site for other classes they offer for developers and BI specialists.

We hope to see you in Melbourne next year!

During the PASS Community Summit, Rick Heiges did an interview with Thomas Grohser, the Senior Database Engineer at the bwin Interactive Entertainment AG, the world's largest online gambling company. This is the private client Kimberly and I were onsite with for a week last month in Vienna before heading to Barcelona for TechEd. In the interview, Thomas reveals some of the cool throughput statistics of their workload and his early impressions of SQL Server 2008.

Check it out on the PASS site (you'll need to register to view it) here.

(And here are my previous posts about their backup timings, and my wanderings around Vienna (long with lots of pictures))

Categories:
Interviews | SQL Server 2008

Today's post is based on a bunch of questions I've had around the various compression features in SQL Server 2008.

Does turning on data compression or backup compression compress the transaction log files on disk?

No. The transaction log is not compressed in any way under any circumstances. Rows from tables and indexes that have compression enabled will be logged in their row compressed forms, even if page compression is enabled.

Does data compression compress LOB columns?

No. There is no native compression support for any LOB columns (n/text, image, n/varchar(max), varbinary(max), XML), whether stored in-row or out-of-row. There's also no native compression support for FILESTREAM data.

Does log shipping use compression to compress the logs being shipped?

Log shipping does not ship transaction logs - it ships log *backups*. If backup compression is enabled for the instance hosting the log shipping primary database, or the log shipping job is changed to enable backup compression, then the log backups will be compressed and less data will be sent over the wire to the log shipping secondary(s).

Is backup compression the same as log stream compression with database mirroring?

No. Backup compression compresses backups (see my previous blog post here). Log stream compression with database mirroring compresses transaction log records before sending them between the principal and the mirror (see my previous blog post here).

Should I just turn on backup compression at the instance level?

Not necessarily. It depends whether the majority of database on the instance will benefit from backup compression. Backup compression (and any compression algorithm) uses CPU whether a decent compression ratio is achieved or not. Check what compression ratio is achieved first and then enable backup compression if its worth it. Otherwise, just enable it for individual databases.

Does data compression use the same algorithm as backup compression?

No. Backup compression uses a proprietary block-based compression algorithm that is part of Windows. Data compression uses up to 3 algorithms, depending on the level of compression configured. Row compression just makes all non-LOB columns into variable-length columns. Page compression does row compression, then common-prefix compression for each table columns, then common-value dictionary compression for each page. Details can be found at the following BOL sections: Row Compression Implementation  and Page Compression Implementation.

Hope this helps!

On Tuesday we had a look around our customer's data centers here in Austria - hidden away in the labryinthine bowels of a very large building in Vienna. Typical data centre with a halon fire extinguishing system but exceptional in its neatness and organization. The star of the show was their new HP Superdome - 32 dual-core Itaniums with 1/2 a terabyte of memory. Very nice - providing a lot of headroom for their workload to grow (currently at four hundred thousand SQL statements per *second* - spelled out to show there's no accidentally added zeros).

One of the systems they showed us is responsible for doing backups. They stripe the backups across 12 devices using multiple network cards and can manage to backup 2 terabytes in two hours after tweaking the BLOCKSIZE, BUFFERCOUNT, and MAXTRANSFERSIZE! Now comes the cool(er) part - with backup compression on SQL Server 2008 they've benchmarked backing up 2 terabytes in 36 minutes! That's a pretty awesome number and makes for some excellent disaster recovery times.

(Details printed with permission)

During early SQL Server 2008 development, the Storage Engine team introduced additional minimally-logged functionality. And then it was removed again for RTM. Now it seems that they've bowed under customer pressure and have made the functionality available again (see Sunil's announcement here).

Using trace flag 610 in the RTM build, you enable the potential for minimal-logging when:

  • Bulk loading into an empty clustered index, with no nonclustered indexes
  • Bulk loading into a non-empty heap, with no nonclustered indexes

Sunil's previous blog post here gives more info on the required syntax - very useful!

The sparse columns feature in SQL Server 2008 is generating lots of interest from people looking to deploy extensible schemas. I've seen a few questions from people that are confused by some of the info in Books Online, particularly about adding and removing sparse columns from a table.

There's a section in BOL that states that sparse columns are added and removed from existing tables by creating a new copy of each row *on the same page* and then deleting the old row, and that this can fail when the row size is around 4009 bytes (1/2 the max row size when sparse columns exist). I've been trying to repro this behavior as it seemed a nonsensical design to have used (and the design was done after I'd left Microsoft so I've never seen the underlying code) - and I couldn't. I finally got around to discussing this with the dev team last week and had it confirmed that Books Online is indeed incorrect - there is no such issue with sparse columns. I've been told that BOL will be corrected.

One other issue that's come up is whether sparse columns work with row overflow (i.e. rows greater than the size of a page, where one or more variable-length columns are pushed into off-row storage). The answer is yes, it works just the same as when the column isn't sparse.

Hope this helps some of you.

The feature article on SQL Server 2008: Tracking Changes in Your Enterprise Database I wrote for the November issue of TechNet Magazine is live on the web. It also includes a 5 minute long screencast I recorded where I demo the change data capture feature.

You can get to the article at http://technet.microsoft.com/en-us/magazine/cc987538.aspx. The topics covered are:

  • The need for tracking changes
  • Tracking changes in SQL Server 2005
  • Change tracking in SQL Server 2008
  • Change data capture in SQL Server 2008

It's written around 2-300 level and presents a good overview of both features, as well as a comparison between them. It's written for DBAs and ITPros so does not go into depth on how to program with either feature.

Checkout my previous TechNet Magazine articles and Q&A columns at http://www.sqlskills.com/blogs/paul/category/TechNet-Magazine.aspx.

Enjoy!

Just found out that the 25-page FILESTREAM whitepaper I wrote recently for the SQL team has been published on MSDN.

You can get it at http://msdn.microsoft.com/en-us/library/cc949109.aspx.

Enjoy!

Here's the table of contents.

  • Introduction
  • Choices for BLOB Storage
  • Overview of FILESTREAM
    • Dual Programming Model Access to BLOB Data
    • When to Use FILESTREAM
  • Configuring Windows for FILESTREAM
    • Hardware Selection and Configuration
    • Physical Storage Layout
    • RAID Level Choice
    • Drive Interface Choice
    • NTFS Configuration
      • Optimizing NTFS Performance
      • Cluster Size
      • Managing Fragmentation
      • Compression
      • Space Management
      • Security
    • Antivirus Considerations
    • Enabling FILESTREAM in Windows
  • Configuring SQL Server for FILESTREAM
    • Security Considerations
    • Enabling FILESTREAM in SQL Server
    • Creating a Database Enabled for FILESTREAM
    • Creating a Table for Storing FILESTREAM Data
    • Configuring FILESTREAM Garbage Collection
    • Partitioning Considerations
    • Load Balancing of FILESTREAM Data
    • Feature Combinations and Restrictions
  • Performance Tuning and Benchmarking Considerations
  • Data Migration Considerations
  • FILESTREAM Usage Best Practices
  • Conclusion

At the last Connections conference in April, the conference organizers tapped a bunch of speakers for interviews, articles, and other content to put into a free "newsletter" called MyDevConnections, and now it's finally available. It covers all the Connections conferences, so isn't just limited to SQL Server. As far as SQL is concerned, Kimberly and I wrote an article about our favorite SQL Server 2008 features (data compression and filtered indexes, respectively), and Ross Mistry wrote an article on hardening a SQL Server 2005 installation. There's also an extract from a Women in Technology interview that Kimberly took part in. Overall there's 84 pages of content in the PDF, with some adverts for the Connections shows (obviously).

Check it out at http://www.devconnections.com/mydevconnections/S08_DevOnlineMag_Web.pdf

This is the second of the two sessions I recently recorded with Richard and Greg on RunAs Radio (the first one on being an "involuntary DBA" is here). I've just finished the final edited version of a whitepaper for Microsoft on the FILESTREAM feature of SQL Server 2008 and this session goes into details of why you'd want to use it and how to setup a system for optimal FILESTREAM performance. The whitepaper should be available before PASS in November, in the meantime, checkout the show!

The show is 35 minutes long and you can download it at http://www.runasradio.com/default.aspx?showNum=74.

Enjoy!

I just heard today that the first whitepaper I've written for Microsoft has been published!

The abstract is:

SQL Server Replication: Providing High-Availability using Database Mirroring

This white paper describes how to use database mirroring to increase the availability of the replication stream in a transactional environment. It covers setting up replication in a mirrored environment, the effect of mirroring partnership state changes, and the effect of mirroring failovers on replication. In addition, it describes how to use LSN-based initialization to recover from the failover of a mirrored subscriber database.

Although brief overviews are given of both replication and database mirroring, it is easier to understand this white paper if the reader has some experience with one or both of these technologies, and has at least a rudimentary knowledge of database concepts such as transactions.

You can download it from http://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/ReplicationAndDBM.docx and I'll put a link in our whitepapers page.

Enjoy!

(Quickie post #1 while it's Kimberly's turn to lecture this morning...)

I had a comment on my post dealing with suspect databases, saying that the 2008 behavior is much better. I hadn't tried this so it was (really good) news to me.

So, trying the same steps from that blog post on 2008 - everything's the same up to the point where the server has been restarted and the database is suspect. Now, if I try to detach the corrupt database, I get the following:

EXEC sp_detach_db 'DemoSuspect';
GO

Msg 3707, Level 16, State 2, Line 1
Cannot detach a suspect or recovery pending database. It must be repaired or dropped.

VERY COOL!!!! This should *really* cut down the number of times people accidentally get themselves into trouble by detaching the database.

For the sake of completeness, if I really want to detach the suspect database, I can do it in 2008 if I put the database into EMERGENCY mode first (doesn't mean I would though!)

When I started blogging, way back in 2006 :-), the third post I made on the old Storage Engine blog was about rebuilding the msdb database in 2005 (see here). This no longer works in 2008 (fellow MVP Tibor Karaszi explains why here), and in fact the information in Books Online about how to rebuild any of the 2008 system databases is incorrect. It's not something I've tried yet but people have already needed to do it (including Tibor!). So what to do?

Step in Bob Ward, a Principal Escalation Engineer with PSS, and a very good friend of mine. He's just researched and published a comprehensive blog post giving the procedure for rebuilding the system databases in 2008 using setup.exe. Checkout his great post here.

Here's hoping that you never need to do it!

Over the last few weeks I've had lots of questions about FILESTREAM performance and how to coax NTFS into scaling nicely. I just finished writing a 30-page whitepaper about FILESTREAM for the SQL Server team, which should be published before PASS in November (I'll blog the link when I have it). Although my whitepaper isn't strictly about performance, there is a long section about setting up your system to get high-performance from FILESTREAM. What I want to do in this blog post is give a bullet list of things to do that will help you get good performance. All of these are explained in more detail in the whitepaper.

Here you go, in no particular order:

  • Make sure you're storing the right-sized data in the right way. Jim Gray (et al) published a research paper a couple of years ago based called To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?. To summarize the findings, BLOBs smaller than 256-KB should be stored in a database, and 1-MB or larger should be stored in the file-system. For those in-between, "it depends" - my favorite answer. The upshot of this is that you won't get good performance if you store lots of small BLOBs in FILESTREAM.
  • Use an appropriate RAID level for the NTFS volume that will host the FILESTREAM data container (the NTFS directory structure corresponding to the FILESTREAM filegroup in the database). Don't use RAID-5, for instance, for a write-intensive workload.
  • Use an appropriate disk technology. SCSI will be usually be faster than SATA/IDE, but more expensive. This is because SCSI drives usually have higher rotational speeds, so lower latency and seek times.
  • Whichever disk technology you choose, if SATA, ensure it supports NCQ, and if SCSI, ensure it supports CTQ. Both of these allow the drives to process multiple, interleaved IOs concurrently.
  • Separate the data containers. Separate them from each other, and separate them from other database data and log files. This avoids contention for the disk heads.
  • Defrag the NTFS volume if needed before setting up FILESTREAM, and periodically to maintain good scan performance
  • Turn off 8.3 name generation on the NTFS volume. This is an order-N algorithm that has to check that the new name generated doesn't collide with any existing names in the directory. This slows insert and update performance down *a lot*. Do this using the command line fsutil utility.
  • Turn off tracking of last access time using fsutil.
  • Set the NTFS cluster size appropriately. For BLOBs 1-MB or large, use a cluster size of 64-KB. This will help to reduce fragmentation.
  • A partial update of FILESTREAM data creates a new file. Batch lots of small updates into one large update to reduce churn.
  • When streaming the data back to the client, use an ~60-KB SMB buffer size (or multiples thereof). This is so that the buffers don't get overly fragmented as TCP/IP buffer are 64-KB.

Hope this helps!

Moving databases around is pretty common, as is moving databases between servers running different Editions of SQL Server, especially during a disaster recovery situation. You may not know this, but in SQL Server 2005, if you had partitioning anywhere in a database, you could only attach/restore that database using an Enterprise or Developer instance. I say 'you may not know this' because partitioning isn't really widely used (from what I can gather from talking to customers). The upshot of this is that SQL Server 2005 will refuse to attach/restore a database with partitioning in - even if you're in a disaster recovery situation and the only server you have available has Standard Edition.

In SQL Server 2008, a lot more people are going to bump into this issue. The list of features that are Enterprise only, and will prevent attach/restore using a lower edition has expanded to include 3 new features that WILL be much more commonly used than partitioning. The four features that are in this category are:

  • Data compression
  • Partitioning
  • Transparent data encryption
  • Change data capture

All of these require elevated permissions to enable EXCEPT data compression, which only requires ALTER permission on a table. This means someone with table-owner privileges could enable compression without the DBA knowing, and suddenly the database can't be attached/restored to, say, Standard Edition.

If you're a DBA and have just taken over a database, there's now an easy way to tell whether the database contains these features. A new DMV has been added - sys.dm_db_persisted_sku_features - that will report which of these four features are present in a database. Let's check it out.

Using a 2008 Enterprise instance:

CREATE DATABASE EnterpriseOnly;
GO
USE EnterpriseOnly;
GO

CREATE TABLE compressed (c1 INT) WITH (DATA_COMPRESSION = ROW);
GO

SELECT * FROM sys.dm_db_persisted_sku_features;
GO

feature_name    feature_id
--------------  -----------
Compression     100

Now let's try backing up and restoring the database on a different edition:

BACKUP DATABASE EnterpriseOnly TO DISK = 'C:\SQLskills\EnterpriseOnly.bck';
GO

And on a 2008 Express instance:

RESTORE DATABASE EnterpriseOnly FROM DISK = 'C:\SQLskills\EnterpriseOnly.bck'
   
WITH MOVE 'EnterpriseOnly' TO 'C:\SQLskills\EnterpriseOnly.mdf',
   
MOVE 'EnterpriseOnly_log' TO 'C:\SQLskills\EnterpriseOnly_log.ldf'
GO

Processed 160 pages for database 'EnterpriseOnly', file 'EnterpriseOnly' on file 1.
Processed 5 pages for database 'EnterpriseOnly', file 'EnterpriseOnly_log' on file 1.
Msg 3167, Level 16, State 1, Line 1
RESTORE could not start database 'EnterpriseOnly'.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.
Msg 909, Level 21, State 1, Line 1
Database 'EnterpriseOnly' cannot be started in this edition of SQL Server because part or all of object 'compressed' is enabled with data compression or vardecimal storage format. Data compression and vardecimal storage format are only supported on SQL Server Enterprise Edition.
Msg 933, Level 21, State 1, Line 1
Database 'EnterpriseOnly' cannot be started because some of the database functionality is not available in the current edition of SQL Server.

Now, it's cool that it tells you exactly why the database couldn't be restored, but did you notice the first two lines of output? The database is fully restored BEFORE the operation fails! This makes perfect sense, as the database needs to be fully recovered before the server can tell whether any of the four features are still enabled or not. However, in a disaster recovery situation, waiting many hours for a database to restore only to be told that it can't be restored on this instance would be even more disastrous.

To summarize, you should always know what's happening in databases you manage. If portability of your databases to a lower Edition is important, make sure that none of these features can be enabled without you knowing about it.

While we were at TechEd in June, Kimberly and I participated in an hour-long discussion panel (Leveraging SQL Server Technologies to Build a Solid High-Availability Strategy) that was video-taped by the TechEd Online folks. It's now been edited and is available for download/viewing. We cover everything from requirements analysis to technology details in SQL Server 2008. The other panel members were Satya Jayanty, Allan Hirt, Kevin Farlee, and Amit Bansal.

You can browse the various online videos at http://technet.microsoft.com/en-us/events/teched/cc561184.aspx or go straight to the panel discussion video here.

Enjoy!

Well, almost... Kimberly and I have agreed to co-author the upcoming SQL Server 2008 Internals book with our good friend Kalen Delaney. This is the 2008 evolution of Kalen's Inside SQL Server series of books and will be published by MS Press around February 2009. The other authors are (also our good friends) Adam Machanic and Conor Cunningham.

Kimberly's going to write the chapter on Index Internals and I'm going to write the chapter on DBCC Internals. This is very exciting as these are our respective favorite subjects, as you probably already know - these won't be short chapters :-)

It's going to be a busy rest of the year - phew!

PS Next year we have plans to write a book ourselves - watch this space...

People have been complaining that I've stopped blogging so much - vacation folks, vacation! Today I've got a few class and conference posts to get through and then I'll get back to the technical posts.

We're doing a 2.5 day public class based on the SQL Server 2008 material we developed earlier this year. This will be part of a larger conference being hosted by Dev Connections in San Francisco, USA. Our workshop will run October 6th through 8th.

You can register and get more details at http://www.devconnections.com/SFWorkshops/default.asp?s=127.

Here's the abstract:

SQL Server 2008 offers an impressive array of capabilities for professional developers that build upon key innovations introduced in SQL Server 2005. The use of many of these will have manageability and infrastructure implications for a database—and hence the DBA! There are also enhancements to existing high-availability technologies, plus a variety of significant new tools to aid in managing performance, scalability, administration, and troubleshooting. This workshop helps you understand how to exploit the new toolset and how to manage a database that makes use of the new features in SQL Server 2008. The multi-day format of this event allows us to explore each feature in more detail, with more in-depth demonstrations and labs.

Topics covered include:

  • Availability Enhancements
    • Database Mirroring
    • Backup Compression
    • Peer-to-Peer Replication
  • Security Enhancements
    • Transparent Data Encryption
    • Extensible Key Management
    • All Actions Audited
  • Policy-Based Management and Multi-Server Administration
  • Troubleshooting and Throttling
    • Resource Governor
    • Extended Events
  • New Development Technologies
    • Spatial Indexes
    • Sparse Columns
    • Filtered Indexes and Statistics
    • Change Tracking and Change Data Capture
    • FILESTREAM
  • Performance Data Collection
  • Scalability Enhancements
    • Data Compression
    • Partition-Level Lock Escalation

This workshop runs Oct 6 (9am - 4pm), Oct 7 (9am - 4pm), Oct 8 (9am - 12pm).

We're sitting here in St. Pete Beach in Florida visiting some of Kimberly's family and having some sun-kissed R&R before heading back up to Seattle on Wednesday, and I thought I'd get the next post in my sparse columns mini-series out. Before I start though, Kimberly just posted the resources for the Accidental DBA class we taught at SQL Connections last week and in Iceland at the end of March - see here.

In my first post on sparse columns (see here) I introduced the concepts and explained why sparse columns are a useful feature. In this post I'm going to use the example I gave - a document repository with 50 document types and 20 unique attributes per document-type. Yes, it's a contrived example, but scale it up be a factor of 100+ (think Sharepoint Server) and methods like normalization no longer apply.

I'm using a CTP-6 VPC on a Lenovo T60P laptop with 4GB, a dual-core 2.2GHz CPU, and the VPC is running off a 6200RPM drive. Your mileage may vary for run-times of the example scripts. The VPC is the one we gave out at SQL Connections and Iceland, and you can download the scripts and a VPC (maybe only CTP-5) from the Microsoft JumpStart site (see here for details).

The first test I'll do is just creating the schema necessary to store the 1000+ columns of information in the test scenario. I'll do one with sparse columns and one without:

-- Create two tables, one with 1000 columns and one with 1000 columns but 997 sparse.
CREATE TABLE TableWithoutSparseColumns (
   DocID INT IDENTITY, DocName VARCHAR (100), DocType INT,
   c0004 INT NULL, c0005 INT NULL, c0006 INT NULL, c0007 INT NULL, c0008 INT NULL, c0009 INT NULL,
   ...
   c0994 INT NULL, c0995 INT NULL, c0996 INT NULL, c0997 INT NULL, c0998 INT NULL, c0999 INT NULL,
   c1000 INT NULL);
GO

CREATE TABLE TableWithSparseColumns (
   DocID INT IDENTITY, DocName VARCHAR (100), DocType INT,
   c0004 INT SPARSE NULL, c0005 INT SPARSE NULL, c0006 INT SPARSE NULL, c0007 INT SPARSE NULL,
   ...
   c0996 INT SPARSE NULL, c0997 INT SPARSE NULL, c0998 INT SPARSE NULL, c0999 INT SPARSE NULL,
   c1000 INT SPARSE NULL);
GO

I won't list all the column names for the sake of brevity. Next I'll insert some values into each table (the same values in each table):

-- Insert a few rows in each
INSERT INTO TableWithSparseColumns (DocName, Doctype) VALUES ('aaaa', 1);
INSERT INTO TableWithSparseColumns (DocName, Doctype, c0945) VALUES ('bbbb', 2, 46);
INSERT INTO TableWithSparseColumns (DocName, Doctype, c0334) VALUES ('cccc', 3, 44);
INSERT INTO TableWithSparseColumns (DocName, Doctype, c0233, c0234) VALUES ('dddd', 4, 12, 34);
INSERT INTO TableWithSparseColumns (DocName, Doctype, c0233, c0234,c0235,c0236) VALUES ('eeee', 4, 12, 34, 46, 66);
GO

INSERT INTO TableWithoutSparseColumns (DocName, Doctype) VALUES ('aaaa', 1);
INSERT INTO TableWithoutSparseColumns (DocName, Doctype, c0945) VALUES ('bbbb', 2, 46);
INSERT INTO TableWithoutSparseColumns (DocName, Doctype, c0334) VALUES ('cccc', 3, 44);
INSERT INTO TableWithoutSparseColumns (DocName, Doctype, c0233, c0234) VALUES ('dddd', 4, 12, 34);
INSERT INTO TableWithoutSparseColumns (DocName, Doctype, c0233, c0234,c0235,c0236) VALUES ('eeee', 4, 12, 34, 46, 66);
GO

Now let's see how big each table is:

-- Now lets see how big the rows are
SELECT [avg_record_size_in_bytes], [page_count] FROM sys.dm_db_index_physical_stats (
   DB_ID ('SparseColumnsTest'), OBJECT_ID ('TableWithoutSparseColumns'), NULL, NULL, 'DETAILED');

SELECT [avg_record_size_in_bytes], [page_count] FROM sys.dm_db_index_physical_stats (
   
DB_ID ('SparseColumnsTest'), OBJECT_ID ('TableWithSparseColumns'), NULL, NULL, 'DETAILED');
GO

avg_record_size_in_bytes page_count
------------------------ --------------------
4135                     5

(1 row(s) affected)

avg_record_size_in_bytes page_count
------------------------ --------------------
40.6                     1

(1 row(s) affected)

Ok - so that's not a huge difference in page count (because we've only got 5 rows), but it's a *massive* difference in average record size. Scaled up to hundreds of thousands or millions of records, the space savings will be astronomical!

Now let's try selecting the data back using results-to-grid mode and a simple SELECT * statement. It takes 20 seconds to return - solely because the client still has to retrieve the metadata for 1000+ columns. Even though the columns are still defined as SPARSE, they show up in a SELECT * resultset, and that makes extracting out the non-NULL values pretty difficult...

Time for another new feature - column sets. There's a new column type available for use with sparse columns - an XML COLUMN_SET. This is a column that is only materialized when selected, and will return all the non-NULL sparse columns in a row as an XML BLOB. It will also change the behavior of a SELECT * operation - removing all the sparse columns from the resultset and replacing them with itself, representing all the non-NULL sparse columns. Redefining our TableWithSparseColumns to have an XML COLUMN_SET column called SparseColumns (using the syntax 'SparseColumns XML COLUMN_SET FOR ALL_SPARSE_COLUMNS'), and re-inserting the same values then gives the following results for a SELECT * operation:

DocID   DocName   DocType   SparseColumns
------- --------- --------- ---------------
1       aaaa      1         NULL
2       bbbb      2         <c0945>46</c0945>
3       cccc      3         <c0334>44</c0334>
4       dddd      4         <c0233>12</c0233><c0234>34</c0234>
5       eeee      4         <c0233>12</c0233><c0234>34</c0234><c0235>46</c0235><c0236>66</c0236>

Pretty cool - and it returns virtually instantaneously (obviously scaling up to hundreds of thousands or millions of rows would take longer due to the time necessary to read the pages into the buffer pool). One downside is that the XML blob only returns the column name and value - not the datatype - but if your application can cope with that then not having to wade through hundreds (or thousands by RTM) of NULL columns values is great.

Next time I'll discuss the internals of how sparse columns are stored.

It's been quite a while since I wrote a blog post on a flight but I happened to be playing with a CTP-6 VPC on the way down to Orlando and thought I'd do the first in a few posts on a cool feature of SQL Server 2008 - Sparse Columns.
 
One problem in database schema design is how to store heterogenous data types with many properties efficiently. Here's an example (contrived) scenario - consider a document repository that can store up to 50 different kinds of documents, with a a few common properties (like document type, document name, last modification time) and 20 totally different attributes for each column type. The document repository needs to store the common fields, plus the per-document-type attributes for each document.

[Edit: I had several comments - thank you - that this example is too contrived and that normalization could give some benefit to it - ok- the example is *very* simple to illustrate the concepts involved. Now imagine the same example with thousands of document types and thousands of user-defined attributes per document - normalization is no longer applicable. This is the Sharepoint Server scenario that drives this feature.]
 
What are some of the options for this in SQL Server 2005?

Single table
Define a table with 1000+ columns to have allow all the document types to be stored in a single table. The first 20 columns could be for the common properties and then each subsequent set of columns stores the attributes for a single document type (e.g. columns 20-39 store the attributes for documents of type 1, columns 40-59 store the attributes for documents of type 2, and so on).

Comments on this architecture:

  • There is a huge amount of wasted space for each table row - as only a maximum of 40 columns (common fields plus per-document-type attributes) will have values in each 1000+ column record. Even if all the attributes are stored as nullable variable length columns (e.g. SQLVARIANT) then there's still a minimum of 1-bit of storage required per column (for the null bitmap entry). There's also the CPU overhead of having to crack the SQLVARIANT columns, and the storage overhead of having them in the first place.
  • The 8060 byte record size limit effectively limits the number and datatypes of columns that can be defined per record. You could easily have 1000 4-byte INT columns per record, for instance, but combinations of wider data-types becomes tricky.
  • There is a 1024 column limit per table. This puts a hard stop on the number of document types we can stores in our example.
  • Efficient indexes are impossible. Even if the index is defined to only contain the columns representing the attributes for a particular document type, they would still have a row for every document, regardless of the document type.

Vertically partition
Vertically partition the document repository such that each document type has its own table.

Comments on this architecture:

  • This allows an effectively unlimited number of document types to be supported, with a larger number of attributes for each document type, and with much more efficient indexes.
  • However, any operation that needs to operate over the entire document repository has to join all the tables (e.g. select all documents with a last modification date in the last 7 days)

Property bag
Use a table where the per-document-type properties are stored in a LOB value (somtimes called a property bag).

Comments on this architecture:

  • This also allows an effectively unlimited number of document types and attributes.
  • Accessing attributes is very slow and expensive, requiring reading into an offset inside the LOB column.
  • Indexing over attributes is going to be very expensive - requiring a computed column (to extract the attribute from the LOB value) for each attribute to be indexed

XML
Define an XML column which effectively acts as a property bag.

Comments on this architecture:

  • This also allows an effectively unlimited number of document types and attributes.
  • Accessing attributes is faster than using a LOB property bag but slower than regular columns, and requires XQuery operations.
  • Indexing is possible using XML indexes, but they're very space inefficient (a primary XML index shreds the entire XML column, and the XML column remains)

Basically - there's no good way to do it in SQL Server 2005 or before.
 
Enter SQL Server 2008 with sparse columns.
 
A sparse column is a nullable column that's declared with the SPARSE attribute. This means that when the value is null, it takes zero space - not even the single bit for the null bitmap is required - and this works even for fixed-length columns! The trade-off is that non-null sparse columns take an extra 4-bytes of space over regular columns. Here's an example for INT columns:

  • Non-null regular INT column: 4 bytes
  • Null regular INT column: 4 bytes
  • Non-null sparse INT column: 8 bytes
  • Null regular INT column: 0 bytes

Books Online has a table showing the potential space savings for the various data types using sparse columns. You can get to this table by looking for 'Sparse Columns' in the SQL Server 2008 Books Online index.
 
In my document repository example above, declaring each of the per-document type attributes as SPARSE would allow each record to only store the attributes needed for the document it represents, rather than every defined column - a huge space saving!
 
But what about the limit on the number of columns? Well, SQL Server 2008 is also bumping the number of columns per table to 30000 (see Kimberly's blog post from yesterday) - although not until the next CTP is available.
 
But how would indexing work? SQL Server 2008 has another new feature that helps here - filtered indexes. Conor's blogged about these recently (see here).
 
Over the next few weeks I'll post more on sparse columns - using them, comparisons with other schemas, and anything else I can come up with.

Over the weekend there was a question on one of the internal aliases at MS: how can I tell what percentage of a database has changed since the last full backup, so I can choose between a differential or full backup?

No such code exists as far as I know - until now! I happened to read the thread while sitting in the airport in Washington D.C. on the way back from Iceland so I started playing around and this morning I completed the code.

The code below creates a function and a stored procedure. The basic idea behind the code is as follows:

For each online data file in the database
   For each GAM interval in the file
      Crack the DIFF map page using DBCC PAGE
      Interpret the DIFF bitmap to aggregate the changed extents
      Add the sum to the total changed extents for the database
   End
End
Report results

There's a function that I create in msdb call SQLskillsConvertToExtents that cracks some of the DBCC PAGE output, and the main procedure is called sp_SQLskillsDIFForFULL and it created as a system object in master. I tried making it a table-valued function but you can't do things like INSERT-EXEC in a function, and that's required for processing the DBCC PAGE output. So - create your own wrapper function or whatever to use it. The interface/output is:

EXEC sp_SQLskillsDIFForFULL 'msdb';
GO

Total Extents Changed Extents Percentage Changed
------------- --------------- ----------------------
102           56              54.9

I've tested it with databases with multiple files and up to around 700GB for a single file size. There's the potential for an issue with file sizes of 4TB and beyond (where PFS intervals and GAM intervals map to the same extent every 4TB or so, but I think it'll be ok and the position of the DIFF map in the extent won't change - if someone could test it with a 4+TB file I'd be grateful). It's been tested on SQL Server 2005 and 2008. It will not work on SQL Server 2000 - I'll do a 2000 version sometime soon.

Note that after doing a full backup you will never see Changed Extents equal to zero. It will always be 4 + (number of online data files - 1), and around 20 or so for msdb. This is because the extent containing the file header in each file is always marked as changed, as are three extents in the primary file containing the roots of some critical system tables.

Anyway - here it is. You can download it in a zip file from SQLskillsDIFForFULL.zip (2.65KB). Enjoy!

/*============================================================================
   
File: SQLskillsDIFForFULL.sql

   Summary: This script creates a system-wide SP SQLskillsDIFForFILL that
   
works out what percentage of a database has changed since the
   
previous full database backup.

   Date: April 2008

   SQL Server Versions:
         
10.0.1300.13 (SS2008 February CTP - CTP-6)
         
9.00.3054.00 (SS2005 SP2)
------------------------------------------------------------------------------
   
Copyright (C) 2008 Paul S. Randal, SQLskills.com
   
All rights reserved.

   For more scripts and sample code, check out 
      
http://www.sqlskills.com/

   You may alter this code for your own *non-commercial* purposes. You may
   
republish altered code as long as you give due credit.

   THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF 
   
ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED 
   
TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
   
PARTICULAR PURPOSE.

============================================================================*/

-- Create the function in MSDB
--
USE msdb;
GO

IF EXISTS (SELECT * FROM sys.objects WHERE NAME = 'SQLskillsConvertToExtents')
   
DROP FUNCTION SQLskillsConvertToExtents;
GO

-- This function cracks the output from a DBCC PAGE dump
-- of an allocation bitmap. It takes a string in the form
-- "(1:8) - (1:16)" or "(1:8) -" and returns the number
-- of extents represented by the string. Both the examples
-- above equal 1 extent.
--

CREATE FUNCTION SQLskillsConvertToExtents (
   
@extents VARCHAR (100))
RETURNS INTEGER
AS
BEGIN
   
DECLARE @extentTotal   INT;
   
DECLARE @colon         INT;
   
DECLARE @firstExtent   INT;
   
DECLARE @secondExtent  INT;

   SET @extentTotal = 0;
   
SET @colon = CHARINDEX (':', @extents);

   -- Check for the single extent case
   --
   IF (CHARINDEX (':', @extents, @colon + 1) = 0)
      
SET @extentTotal = 1;
   
ELSE
      
-- We're in the multi-extent case
      --
      BEGIN
      
SET @firstExtent = CONVERT (INT,
         
SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      
SET @colon = CHARINDEX (':', @extents, @colon + 1);
      
SET @secondExtent = CONVERT (INT,
         
SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      
SET @extentTotal = (@secondExtent - @firstExtent) / 8 + 1;
   
END

   RETURN @extentTotal;
END;
GO

USE master;
GO

IF OBJECT_ID ('sp_SQLskillsDIFForFULL') IS NOT NULL
   
DROP PROCEDURE sp_SQLskillsDIFForFULL;
GO

-- This SP cracks all differential bitmap pages for all online
-- data files in a database. It creates a sum of changed extents
-- and reports it as follows (example small msdb):
--
-- EXEC sp_SQLskillsDIFForFULL 'msdb';
-- GO
--
-- Total Extents Changed Extents Percentage Changed
-- ------------- --------------- ----------------------
-- 102           56              54.9
--
-- Note that after a full backup you will always see some extents
-- marked as changed. The number will be 4 + (number of data files - 1).
-- These extents contain the file headers of each file plus the
-- roots of some of the critical system tables in file 1.
-- The number for msdb may be round 20.
--
CREATE PROCEDURE sp_SQLskillsDIFForFULL (
   
@dbName VARCHAR (128))
AS
BEGIN
   
SET NOCOUNT ON;

   -- Create the temp table
   
--
   
IF EXISTS (SELECT * FROM msdb.sys.objects WHERE NAME = 'SQLskillsDBCCPage')
   
DROP TABLE msdb.dbo.SQLskillsDBCCPage;

   CREATE TABLE msdb.dbo.SQLskillsDBCCPage (
      
[ParentObject] VARCHAR (100),
      
[Object]       VARCHAR (100),
      
[Field]        VARCHAR (100),
      
[VALUE]        VARCHAR (100));

   DECLARE @fileID         INT;
   
DECLARE @fileSizePages  INT;
   
DECLARE @extentID       INT;
   
DECLARE @pageID         INT;
   
DECLARE @DIFFTotal      INT;
   
DECLARE @sizeTotal      INT;
   
DECLARE @total          INT;
   
DECLARE @dbccPageString VARCHAR (200);

   SELECT @DIFFTotal = 0;
   
SELECT @sizeTotal = 0;

   -- Setup a cursor for all online data files in the database
   
--
   
DECLARE files CURSOR FOR
      
SELECT [file_id], [size] FROM master.sys.master_files
      
WHERE [type_desc] = 'ROWS'
      
AND [state_desc] = 'ONLINE'
      
AND [database_id] = DB_ID (@dbName);

   OPEN files;

   FETCH NEXT FROM files INTO @fileID, @fileSizePages;

   WHILE @@FETCH_STATUS = 0
   
BEGIN
      
SELECT @extentID = 0;

      -- The size returned from master.sys.master_files is in
      
-- pages - we need to convert to extents
      
--
      
SELECT @sizeTotal = @sizeTotal + @fileSizePages / 8;

      WHILE (@extentID < @fileSizePages)
      
BEGIN
         
-- There may be an issue with the DIFF map page position
         
-- on the four extents where PFS pages and GAM pages live
         
-- (at page IDs 516855552, 1033711104, 1550566656, 2067422208)
         
-- but I think we'll be ok.
         
-- PFS pages are every 8088 pages (page 1, 8088, 16176, etc)
         
-- GAM extents are every 511232 pages
         
--
         
SELECT @pageID = @extentID + 6;

         -- Build the dynamic SQL
         
--
         
SELECT @dbccPageString = 'DBCC PAGE ('
            
+ @dbName + ', '
            
+ CAST (@fileID AS VARCHAR) + ', '
            
+ CAST (@pageID AS VARCHAR) + ', 3) WITH TABLERESULTS, NO_INFOMSGS';

         -- Empty out the temp table and insert into it again
         
--
         
DELETE FROM msdb.dbo.SQLskillsDBCCPage;
         
INSERT INTO msdb.dbo.SQLskillsDBCCPage EXEC (@dbccPageString);

         -- Aggregate all the changed extents using the function
         
--
         
SELECT @total = SUM ([msdb].[dbo].[SQLskillsConvertToExtents] ([Field]))
         
FROM msdb.dbo.SQLskillsDBCCPage
            
WHERE [VALUE] = '    CHANGED'
            
AND [ParentObject] LIKE 'DIFF_MAP%';

         SET @DIFFTotal = @DIFFTotal + @total;

         -- Move to the next GAM extent
         
SET @extentID = @extentID + 511232;
      
END

      FETCH NEXT FROM files INTO @fileID, @fileSizePages;
   
END;

   -- Clean up
   
--
   
DROP TABLE msdb.dbo.SQLskillsDBCCPage;
   
CLOSE files;
   
DEALLOCATE files;

   -- Output the results
   
--
   
SELECT
      
@sizeTotal AS [Total Extents],
      
@DIFFTotal AS [Changed Extents],
      
ROUND (
         
(CONVERT (FLOAT, @DIFFTotal) /
         
CONVERT (FLOAT, @sizeTotal)) * 100, 2) AS [Percentage Changed];
END;
GO

-- Mark the SP as a system object
--
EXEC sys.sp_MS_marksystemobject sp_SQLskillsDIFForFULL;
GO

-- Test to make sure everything was setup correctly
--
EXEC sp_SQLskillsDIFForFULL 'msdb';
GO

Now we're back from Iceland and I have a week to catch up with some content development before the MVP Summit next week and then SQL Connections the following week.

One of the things I struggled with earlier in the year while writing a SQL Server 2008 training course for Microsoft was how to get FILESTREAM to work with partitioning. There wasn't (and still isn't) any information in Books Online that I could find so I had to play around to figure it out.

I should say that the CTP-6/February CTP version of Books Online *does* have a bunch of code examples around using FILESTREAM, so I'm not going to write a blog post about that. Look in the Getting Started with FILESTREAM Storage section (or paste this link into the Books Online URL: window).

Back to partitioning - first I created a test database:

CREATE DATABASE FileStreamTestDB
ON PRIMARY
   
(NAME = FileStreamTestDB_data,
      
FILENAME = N'C:\Metro Demos\FileStreamTestDB\FSTestDB_data.mdf'),
FILEGROUP FileStreamFileGroup CONTAINS FILESTREAM
   
(NAME = FileStreamTestDBDocuments,
      
FILENAME = N'C:\Metro Demos\FileStreamTestDB\Documents')
LOG ON
   
(NAME = 'FileStreamTestDB_log',
      
FILENAME = N'C:\Metro Demos\FileStreamTestDB\FSTestDB_log.ldf');
GO

Then I tried the obvious, knowing that I can't partition on the ROWGUIDCOL:

CREATE PARTITION FUNCTION MyPartFunction (INT) AS RANGE RIGHT FOR VALUES (1000, 2000);

CREATE PARTITION SCHEME MyPartScheme AS PARTITION MyPartFunction ALL TO ([PRIMARY]);

CREATE TABLE FileStreamTest (
   
TestId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
   
Customer INT,
   
[
Name] VARCHAR (25),
   
[
Document] VARBINARY(MAX) FILESTREAM)
ON MyPartScheme (Customer);
GO

Partition scheme 'MyPartScheme' has been created successfully. 'PRIMARY' is marked as the next used filegroup in partition scheme 'MyPartScheme'.
Msg 1921, Level 16, State 4, Line 8
Invalid filegroup 'default' specified.

Eventually I worked out that you have to define a separate partitioning scheme just for FILESTREAM data. This is because the regular data is stored on non-FILESTREAM filegroups, so trying to use the regular partitioning scheme for FILESTREAM would mean telling the Engine to store the FILESTREAM data in non-FILESTREAM filegroups. Clearly a non-starter. Ok - try again with a separate partitioning scheme (the prior MyPartFunction partition function and MyPartScheme partition scheme already exist now remember):

CREATE PARTITION SCHEME MyFSPartScheme AS PARTITION MyPartFunction ALL TO ([FileStreamFileGroup]);

CREATE TABLE FileStreamTest (
   
TestId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
   
Customer INT,
   
[Name] VARCHAR (25),
   
[Document] VARBINARY(MAX) FILESTREAM)
ON MyPartScheme (Customer)
FILESTREAM_ON MyFSPartScheme;
GO

Partition scheme 'MyFSPartScheme' has been created successfully. 'FileStreamFileGroup' is marked as the next used filegroup in partition scheme 'MyFSPartScheme'.
Msg 1908, Level 16, State 1, Line 1
Column 'Customer' is partitioning column of the index 'UQ__FileStreamTest__03317E3D'. Partition columns for a unique index must be a subset of the index key.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.

Hmm - I can't partition on Customer because there's already a unique index over TestId - UNLESS I specifically set the unique index on TestId to be non-partitioned by setting a filegroup for it:

CREATE TABLE FileStreamTest (
   
TestId UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE ON [PRIMARY],
   Customer INT,
   
[Name] VARCHAR (25),
   
[Document] VARBINARY(MAX) FILESTREAM)
ON MyPartScheme (Customer)
FILESTREAM_ON MyFSPartScheme;
GO

That works! Now - the BIG issue with this setup is that switching partitions won't work while the unaligned index is enabled. So how to disable it? First we need to find out what it's called:

SELECT * FROM sys.indexes WHERE [object_id] = OBJECT_ID ('FileStreamTest');
GO

And then disable it:

ALTER INDEX UQ__FileStre__8CC33161060DEAE8 ON FileStreamTest DISABLE;

Now you can do partition switching. Here's the catch - to re-enable the index you need to REBUILD it - which is a size of data operation! The upshot of all this is that partitioning can be made to work with FILESTREAM data but partition switching is no longer a metadata-only operation.

Hopefully this will be addressed for V2.

One of the drawbacks of not being in the SQL team at Microsoft any longer is that I don't know about all the undocumented features in the next release - I have to hunt around for them like everyone else :-(

So I was poking about in SSMS in 2008 CTP-6 and noticed a function called sys.fn_PhysLocCracker that I'd never heard of. Doing an sp_helptext on it gets the following output:

-- Name: sys.fn_PhysLocCracker
--
-- Description:
-- Cracks the output of %%physloc%% virtual column
--
-- Notes:
-------------------------------------------------------------------------------
create function sys.fn_PhysLocCracker (@physical_locator binary (8))
returns @dumploc_table table
(
 [file_id] int not null,
 [page_id] int not null,
 [slot_id] int not null
)
as
begin

 declare @page_id binary (4)
 declare @file_id binary (2)
 declare @slot_id binary (2)

 -- Page ID is the first four bytes, then 2 bytes of page ID, then 2 bytes of slot
 --
 select @page_id = convert (binary (4), reverse (substring (@physical_locator, 1, 4)))
 select @file_id = convert (binary (2), reverse (substring (@physical_locator, 5, 2)))
 select @slot_id = convert (binary (2), reverse (substring (@physical_locator, 7, 2)))
 
 insert into @dumploc_table values (@file_id, @page_id, @slot_id)
 return
end

Cool - but something else I've never heard of %%physloc%% - what's that? After playing around for a while, I figured out how to make it work.  Just to be confusing, there's another identical version of the function called sys.fn_PhysLocFormatter - and that's the only one I could get to work. Here's an example:

CREATE TABLE TEST (c1 INT IDENTITY, c2 CHAR (4000) DEFAULT 'a');
GO
INSERT INTO TEST DEFAULT VALUES
;
INSERT INTO TEST DEFAULT VALUES
;
INSERT INTO TEST DEFAULT VALUES;
GO

SELECT sys.fn_PhysLocFormatter (%%physloc%%) AS [Physical RID], * FROM TEST;
GO

Physical RID       c1
-----------------  -----------
(1:411:0)          1
(1:411:1)          2
(1:413:0)          3

It's a physical-record locator function! Undocumented and unsupported (obviously), but hey, some of the best features are :-) It gives the database file, page within the file, and slot number on the page in the format (file:page:slot). I can think of a *bunch* of uses for this which I'll be exploring over the next few months.

How cool is that?!?!

A couple of weeks ago I blogged about the three tracks of the SQL Server 2008 JumpStart course that SQLskills.com taught internally for Microsoft and some MVPs - see here for details. Well, the content is now available to download! Note that this was based on CTP-5 (November 2007 CTP) and there have been *lots* of behavioral changes since then (with more planned for CTP-6 Refresh and RTM), but if you want a high-level overview of a bunch of the features (albeit in bullet-point summaries on slides) then this is a good place to start.

For me, what's *REALLY* cool is that the site also has a downloadable VPC plus lab manuals for all of the AlwaysOn High-Availability hands-on labs that SQLskills.com wrote. We originally wrote these labs for SQL Server 2005 and I updated them all for CTP-5. The VPC has a long lab on each of the following:

  • Database Snapshots
  • Data Recovery and Preventative Techniques
  • Instant Initialization
  • Peer-to-Peer Replication (including the new Topology Wizard I blogged about here)
  • Table and Index Partitioning
  • Snapshot Isolation
  • Online Operations
  • Database Mirroring (including a demo I wrote of Automatic Page Repair, described here)
  • Service Oriented Database Architecture

There is some great depth in each of these - Kimberly blogged more info about the exercises in each lab here. There's also another VPC image with some higher-level labs on a variety of 2008 features and written by a number of different people- including some labs on Policy-Based Management and Performance Data Collection that Kimberly wrote.

So - where can you get these from? Go to http://sqlserver2008jumpstart.microsofttraining.com/ and hit the Download link on the right-hand side. Register and then you can get to the materials. The AlwaysOn VPC image is Collection 2 at the bottom of the page, and you'll see all the slide decks as you scroll down the page.

Enjoy!

The second part of our radio interview with TechNet has been released (see here for part 1). In this installment we discuss troubleshooting and manageability in SQL Server 2008. You can get to it by going to the March 11th 2008 show here.

Enjoy!

(Cross-posted on Paul and Kimberly's blogs)

With the Spring SQL Server Connections show coming up next month, its time to start planning for the Fall show. Once again Kimberly and I will be Co-Chairs of the Conference and we'd like to invite you to submit abstracts for the Fall 2008 SQL Connections conference, to be held in Las Vegas, November 10-14th, 2008.

The conference will take place shortly after the SQL Server 2008 launch (when it actually RTMs, not the 'launch' that happened February 27th), and will focus heavily on SQL Server 2008. Abstracts are still welcome on best practices for SQL Server 2005 and how to upgrade and migrate applications from SQL Server 200x to SQL Server 2008. However, we will consider all topics - from design to performance to troubleshooting to architectures to programming. Also, we're likely to closely follow our past conference format with three primary themes/tracks: Infrastructure and Scalability, Programming, and Business Intelligence. You should aim for the intermediate to advanced audience for any SQL Server 2005 materials and for SQL Server 2008 we'll consider any level - especially depending on the newness of the release.

Note: You will not be able to speak at a DevConnections show if you are also presenting at a competitor’s show, in the same state, within 30 days of the DevConnections show. To be more specific, if you wish to speak at SQL Connections or any other DevConnections show this Fall in Las Vegas, you are precluded from speaking at any competing conferences in the state of Nevada between October 10, 2008 and December 13, 2008.

For submitting session abstracts, please use this URL:  http://www.deeptraining.com/devconnections/abstracts

The tool will be open from March 10th to midnight EST April 9th, after which we won't accept any abstracts. If you have used this site before and have forgotten your password, you can have your secret word mailed to you. It's better to do this then to create a new account. Please keep the abstracts under 200 words each and in one paragraph. Please do not use bulleted items or line breaks, and please use a spell-checker. If you have an issue with the site, please follow up with an email to paul@sqlskills.com.

Please create a new speaker record (if you haven't submitted before) and submit at least 3 abstracts; however, it will help your chances of being selected if you submit 5 or 6 abstracts. There's one slight change for this conference only - we won't have a fourth track for speakers just delivering a single conference session, so if you only submit a single session abstract, you're unlikely to be picked.

What you will get if selected:

  • $500 per conference talk. (Additional compensation for pre/post conference workshops.)
  • Coach airfare and hotel stay paid by the conference
  • Free admission to all of the co-located conferences
  • Speaker party
  • The adoration of attendees :-)
  • etc.

Thanks for your continued support for our SQL Connections conference and we look forward to seeing your bright and shiny new abstracts!

Thanks,
Kimberly L. Tripp & Paul S. Randal
SQL Connections Conference Chairs

During the various courses I've been teaching, people are interested in how FILESTREAM performance compares with storing BLOBs in the database itself. I have some performance graphs based on measurements the dev team made - these have been presented publicly by myself and the dev team so I can share them with you here.

There are three graphs below, showing the relative performance for read, insert, and update of:

  • BLOB data stored in FILESTREAM format and accessed through the WIN32 streaming APIs. The times include getting a transaction context from SQL Server, getting the file path, doing the operation, closing the file, and committing the transaction in SQL Server.
  • BLOB data stored in FILESTREAM format and manipulated through T-SQL
  • BLOB data stored in varbinary(max) format (and obviously manipulated through T-SQL)

The data is the same in each test. The tests were performed on a 4-way box with a cold buffer pool. (Note that if this was for a warm-buffer pool, as the graph in the FILESTREAM whitepaper is for, the varbinary(max) and FILESTREAM T-SQL numbers would essentially swap.) One interesting point to note is that for smaller data sizes, it's faster to manipulate them through T-SQL than through the file system - this is expected based on research Jim Gray did when putting together the original TerraServer.

Just over a month ago I posted on how to enable FILESTREAM in CTP-5 and the pre-CTP-6 build I was working with. Now that's all changed! CTP-6 is a hybrid of the CTP-5 method and what will eventually be the methodology in RTM. The changes were made to separate the OS-level configuration from the SQL-level configuration, so a SQL admin can't invoke OS-level changes. This makes sense to me.

In a nutshell, the new way of enabling FILESTREAM will be:

  • OS admin enables FILESTREAM when installing SQL Server or through the SQL Server Configuration Manager
  • SQL admin enables FILESTREAM in the instance using sp_configure. This will always succeed, but if this is done before the OS-level enabling, then FILESTREAM operations will fail.

I won't go into all the details as Joanna Omel (the Program Manager for FILESTREAM in the Storage Engine) has blogged about them on the Storage Engine PM team blog - see here for her post.

More on CTP-6 changes as I hear about them (or trip over them!)

Categories:
FILESTREAM | SQL Server 2008

Wow - almost 10 days without a blog post - that must be a record for me! :-) Never fear - I'll be posting more over the next couple of weeks. Kimberly flew off to India yesterday to teach some Microsoft classes and unfortunately I couldn't join her this time as I'm teaching 3 classes myself:

  • an internal Microsoft class on Designing for High Availability
  • another internal Microsoft class on SQL Server 2008 for DBAs (similar to the JumpStart class I posted about here)
  • 3 days of content for the new Microsoft Certified Architect: Database qualification - see the Microsoft Learning site here for details

Anyway - the subject of this post is to let you know that last week, Kimberly and I did two interviews for TechNet Radio on SQL Server 2008 technologies. Part 1 has just been released where we discuss security and availability features. You can get to it by going to the March 4th 2008 show here. Tune in and find out how I lull myself to sleep when Kimberly's out of town...

Enjoy!

SQL Connections is in less than two months now and our pre-con and post-con workshops are filling up fast - checkout my previous blog post here for the full list of what we're presenting.

Now we've finalized our TechEd US sessions with Microsoft and can let you all know that we're going to be there as usual! This time Kimberly and I are doing a joint pre-con workshop on SQL Server 2008 - so if you can only make it to one conference this year, you can choose TechEd or SQL Connections and still catch our 2008 workshop plus other sessions :-)

There's a catch though - this year TechEd US has split into two weeks (to mirror what TechEd Europe has been doing for a while). The first week is for Developers and the second week is for IT Pros. We'll be at the IT Pro week in Orlando, June 10-13. Here's what Kimberly and I are doing (all jointly presented this year):

Pre-Conference Workshop: PRC21 Microsoft SQL Server 2008 Overview for Database Professionals (Monday 6/9)

SQL Server 2008 offers an impressive array of capabilities for professional developers that build upon key innovations introduced in SQL Server 2005. The use of many of these will have manageability and infrastructure implications for a database—and hence the DBA! There are also enhancements to existing high-availability technologies, plus a variety of new tools to aid in managing performance, scalability, administration, and troubleshooting. This workshop helps you understand how to exploit the new toolset and how to manage a database that makes use of the new features in SQL Server 2008. Come to this workshop so your developers don't surprise you with new demands once your company upgrades!

Topics covered include:

  • Availability Enhancements (Database Mirroring, Backup Compression, Peer-to-Peer Replication)
  • Security Enhancements (Transparent Data Encryption, Extensible Key Management, All Actions Audited)
  • Policy-Based Management
  • Troubleshooting and Throttling (Resource Governor, Extended Events)
  • New Development Technologies (Spatial Indexes, Sparse Columns, Filtered Indexes, Change Data Capture, FILESTREAM)
  • Performance Data Collection
  • Scalability Enhancements (Data Compression, Partition-Level Lock Escalation)

Session 1: Essential Database Maintenance

In this session, Paul and Kimberly will run you through their top-ten database maintenance recommendations, with a lot of tips and tricks along the way. These are distilled from almost 30 years combined experience working with SQL Server customers and are geared towards making your databases more performant, more available, and more easily managed (to save you time!). Everything in this session will be practical and applicable to a wide variety of databases. Topics covered and myths debunked include: backups, shrinks, fragmentation, statistics, and much more! Focus will be on 2005 but we'll explain some of the key differences for 2000 and 2008 as well.

Session 2: Corruption Survival Techniques

Your database is corrupt - what do you do? Well, it depends! How critical is the data? Do you know what's really wrong with the database? What does all that DBCC CHECKDB output mean? Should you restore or repair? It’s all about limiting downtime and data-loss when a corruption occurs - from knowing the tools to understanding the implications of choices you make. In this demo-heavy session Paul and Kimberly will give you insight into how to recover from corruption without making things worse. Most importantly you'll get step-by-step instructions for dealing with the more common scenarios.

Session 3:  Are Your Indexing Strategies Working?

So you spent a bunch of time figuring out what indexes you should have while designing and testing your database. Now you're in production and six months have gone past. Are your strategies still valid now? Is SQL Server using the indexes you created? Are the users issuing the queries you thought they would - or are you missing crucial indexes? More importantly - how can you figure any of this out? In this demo-heavy session, Paul and Kimberly will show you how to analyze what's currently going on with your database and how to bring your initial strategy up-to-date. Come along to this session to help you find out what you might not know about your workload!
The whole conference line-up looks great - with a bunch of our SQL MVP friends doing sessions too. So - no matter which conference you come to, we're really looking forward to meeting new people and seeing some of the your faces again!

Phew - last week Kimberly and I spent 3 days teaching the ins-and-outs of SQL Server 2008 for DBAs/IT-Pros to about 130 Microsoft SQL Server experts and MVPs (like Kalen Delaney, Adam Machanic and Ron Talmage). This was the (95% complete) Beta delivery of a course we've been developing for the last six months for Microsoft that they'll use to train their SQL experts around the world on the new release. It's been very interesting watching the features develop through the CTPs (especially since I left the fold last August) - and making demos work on pre-release builds of the CTPs.

Teaching the course was a *blast* - the thing I love about teaching a really geeky crowd is the plethora of great questions and opportunities for going deep with explanations. Our team actually wrote and delivered the concurrently presented Developer and BI tracks as well. As you can see from the list below (and this is just the features a DBA needs to use/know about), SQL Server 2008 isn't a dot release of Yukon at all, as some people have suggested. Over the three days we covered:

  • Database Mirroring (D)
  • Backup Compression
  • Peer-to-Peer Replication (D)
  • Transparent Data Encryption (D)
  • Extensible (Off-Box) Key Management
  • All Actions Audited (D)
  • Policy-Based Management
  • Resource Governor (D)
  • Extended Events (D)
  • Spatial Indexes
  • Integrated Full-Text Search
  • Sparse Columns (D)
  • Filtered Indexes
  • Change Tracking
  • Change Data Capture (D)
  • FILESTREAM (D)
  • Performance Data Collection
  • Query Optimizer Enhancements
  • Data Compression (D)
  • Service Broker
  • Partition-Level Lock Escalation (D)

The features marked with a (D) are ones I demo'd during the course (Kimberly demo'd a bunch of the others - especially the tools features). Some of the demos were challenging to make work in time as we only got a pre-CTP6 build mid-January just before we headed off to China.

So why am I posting this? Well, a bunch of these features are in CTP-6, which should be just around the corner, and I have some easy-to-understand demos of them that I'll be posting here over the next month or so. Also, if this course sounds interesting, Kimberly and I will be teaching it in various configurations over the next year - starting with SQL Connections in April, a soon-to-be-announced class in Iceland in March, and the ITPro portion of TechEd in June.

Watch this space starting next week (today's the last day of six straight weeks of teaching for us so this weekend's a break :-))

After many reminders (thanks Adam Machanic!) I've added Conor and Simon to the two aggregated feeds over all the SQLskills.com blogs.

There are two feeds:

  1. SQL Server 2008 Category Posts
  2. All posts

The amount of content is really growing as Simon and Conor also seem to not sleep like me :-)

Enjoy!

Categories:
General | SQL Server 2008

The second post in my series on FILESTREAM (see here) deals with how to enable FILESTREAM through T-SQL and using SSMS.

You can't just create FILESTREAM data - you have to specifically enable it at the instance level and then possibly restart the instance or host computer. The reason for this is that there's a file-system filter driver (called RsFx) installed to aid access to the data through the NTFS Streaming APIs. The rules for this are:

  • The computer needs to be restarted if this is the first time FILESTREAM has been enabled on any instance hosted on the machine
  • The instance needs to be restarted if you disable FILESTREAM after its been enabled

In the previous article I mentioned that FILESTREAM can't be enabled on a mirrored database - there's another restriction I forgot: FILESTREAM isn't supported for instances running on WOW64. I don't remember the architectural limitation that prevents this but I do remember the heated discussions in the dev team back in early 2007.

Anyway, I digress. You can enable FILESTREAM through T-SQL using the sp_filestream_configure stored procedure. It takes two parameters @enable_level and @share_name. The level of support is configured through the first and has the following options:

  • 0 - FILESTREAM is disabled for the instance
  • 1 - FILESTREAM is enabled for T-SQL access only
  • 2 - FILESTREAM is enabled for T-SQL AND local file-system access
  • 3 - FILESTREAM is enabled for T-SQL, local file-system, AND remote file-system access

If level 3 is specified, then the a share name for remote access must also be specified. For example:

EXEC sp_filestream_configure @enable_level = 3, @share_name = 'MyFilestreamSQLServerInstance';
GO

Note that once the share name is specified, it can't be changed without disabling and re-enabling FILESTREAM on the instance.

Now, if you don't want to have to remember the options you can use SSMS to do the enabling (with no change to the restart requirements listed above). To do this you right-click on the server name in the Object Explorer and select Properties. When the Server Properties window comes up, select the Advanced tab and right at the top is the FILESTREAM section. Here's what I mean:

Next up is creating a FILESTREAM filegroup and adding data.

Categories:
FILESTREAM | SQL Server 2008

Microsoft has announced that the RTM date of SQL Server 2008 has slipped out to Q3 of this year. See this official blog post here. The official launch will still go ahead as planned on February 27th.

While some people may see this as disappointing, I don't see many large customers going straight into SS2008 when it comes out so I, for one, am glad they're taking the time to ensure a high-quality release.

Categories:
SQL Server 2008

Well it's the end of our first week in China - tomorrow we fly to Beijing for another week of teaching. Hopefully we'll get to do some sightseeing on Sunday - the last time I was in Beijing (for TechEd '06) the conference hotel was about 25 miles outside the city centre - so I haven't really been to Beijing yet.

During this week I've been playing with FILESTREAM for some demos I'm writing for a Microsoft class I'll be teaching when I get back to Redmond. The class is about SQL Server 2008 for DBAs and the attendees will be a bunch of Microsoft Field personnel and SQL Server MVPs. Anyway, as I was playing, I realized that I hadn't blogged anything about FILESTREAM yet, so this is the start of a series of posts about the feature. First up - what is it?

One problem that SQL Server users face is how to store related structured and unstructured data (BLOBs) together while:

  • Providing a way to keep the data in sync (transactionally consistent)
  • Providing fast streaming access to the BLOBs
  • Keeping costs low
  • Enabling scalability
  • Providing ease of management

There's also been a strong desire for a data type that supports BLOB values greate than 2GB for many years.

Before SQL Server 2008, the solutions centered around:

  1. Storing the BLOBs in the file system
    • Advantages: low cost per GB; great streaming performance
    • Disadvantages: difficult to integrate with structured data; complex app development/deployment
  2. Storing the BLOBs on a dedicated BLOB store
    • Advantages: good scale/expandability; cost decreases as scale increases
    • Disadvantages: difficult to integrate with structured data; complex app development/deployment; streaming support/performance is solution dependent
  3. Storing the BLOBs in a database
    • Advantages: integrated management; BLOBs integrated with structured data; easy app development/deployment
    • Disadvantages: high cost per GB; poor streaming performance; 2GB size limit per BLOB

Enter FILESTREAM. It provides the following:

  • BLOB data is stored in the file system but rooted in the database (in the table of which it is part)
  • BLOB data is kept transactionally consistent with structured data
  • BLOB data is accessible through T-SQL and the NTFS Streaming APIs - with great streaming performance
  • BLOB size is limited only by the NTFS volume size
  • Manageability is integrated into SQL Server

Sounds pretty good eh? Well, it is - mostly. There are a few drawbacks with v1 however:

  • Database mirroring cannot be configured on databases with FILESTEAM data
  • Database snapshots don't snapshot FILESTEAM data
  • FILESTREAM data can't be natively encrypted by SQL Server

Personally, I think the first of these could be a major barrier to adoption in the enterprise - hopefully this restriction will be lifted in v2.

So - there's a taster. In the next few posts I'll detail how to enable FILESTREAM and create FILESTREAM data.

Categories:
FILESTREAM | SQL Server 2008

One of the cool features of SQL Server 2008 for developers is spatial data support. There have been some great posts recently about using spatial (see Simon Sabin's blog here), which is all developer stuff - but what I'm interested in are the implications of spatial support for DBAs, and they are focused on spatial indexes

Note: all images in this post are taken from November CTP Books Online

There are two kinds of spatial data that 2008 supports - planar (i.e. points, lines, polygons on a single 2-D plane) and geodetic (i.e. points, lines, polygons on a geodetic ellipsoid - for example, the Earth). These are presented in SQL as geometry and geography data respectively. A common operation that's performed on spatial data is comparing two spatial values to see if they intersect at all. Now, this is a complicated calculation, which gets more computationally expensive as the complexity of the spatial values increase. Given a problem of 'which spatial values in this table does this spatial value X intersect with', it would be great to have some way of quickly pruning out spatial values in the table that cannot possibly intersect with X, and so avoid doing the expensive calculation for them. Enter spatial indexes.

Here's the basic idea behind a spatial index:

  • A plane is broken up into a grid of cells.
  • Each spatial value is evaluated to see which cells in the grid it intersects with
  • The list of cells is stored along with the primary key of the table row that the spatial value is part of
  • Comparing two spatial values for intersection is a matter of comparing the list of grid cells - if there are no matching cells, the spatial values do not intersect, and there's no need to do the expensive intersection calculation

In practice its a bit more complicated. For planar data (i.e. the geometry data type), you need to define a bounding box (i.e. 4 corner points that define a rectangle of space in which you're interested on the 2-D plane). That bounding box on the plane will be broken down into a grid of cells. The top-level grid can be up to 16x16, giving 256 cells. The next level of granularity breaks each of those top-level grid cells into a further grid, again up to 16x16. So now there could be (16x16) x (16x16) cells in the grid - or 65536 cells. This obviously allows a more exact description of a spatial value in the list of cells. And so on and so on. There are actually 4 levels of grid that the bounding box is broken up into - and each can be 16x16, for a possible total of 168 or 4 billion cells. The picture below illustrates this with a grid size of 4x4 at each level.

The bounding box and the size of the grid at each level are specified when the spatial index is created, as well as the maximum number of matching grid cells to store in the spatial index per spatial value - to a max of 8192. Once the bounding box has been decomposed into the various levels of grid, each value in the spatial column is evaluated against the grid. The value is first decomposed against the first level grid. If the number of cells it matches is less than the max per spatial value, the decomposition then moves to the second level grid. This decomposition continues until the maximum number of matching grid cells is reached. If the max is reached while processing a deeper level for a cell, (e.g. in the middle of processing the 2nd level grid of 4x4 for cell #13 in the 1st level), the deeper level matches are thrown away and only the coarser granularity matching cell is stored (e.g. continuing that example, the 2nd level grid matches are discarded and only cell #13 in the 1st level will be stored). The picture below helps to illustrate this.

So, each geometric spatial value is approximated in the spatial index by a list of matching cells in the defined bounding box. As there is a limit to the number of matching cells that can be stored in the approximation, it is an optimistic representation. This means that if two values are compared using the approximations, there will be no false negatives, as the approximations map a larger space than the actual spatial values. There can, however, be false positives. A false (or real) positive means the spatial values need to then be compared using the complex, computationally expensive intersection algorithm using the actual spatial values. So again, the spatial index serves as a way of pruning out the need to run the expensive algorithm.

The algorithm is very similar for geodetic data (i.e. the geography data type), however there's no bounding box. Instead, the entire geodetic ellipsoid is projected onto a 2-D plane and then the grid decomposition algorithm is applied to that plane in exactly the same way as for planar data. The picture below describes how the projection is done.

You may have already realized that the effectiveness of the spatial index in pruning is directly proportional to how exactly the approximations in the index actually describe the spatial values. In other words, the higher the number of grid cells at each level, and the higher the number of grid cell matches that are stored per spatial value in the index, the better the index is at pruning. More exact approximations require storing more matching grid cells at deeper granularities - i.e. taking MORE SPACE. Creating a more exact spatial index takes more space.

With all that in mind, the interesting thing for DBAs here is that there's a trade-off between CPU use to do the real intersection algorithm and spatial index size to use in pruning calls to the algorithm. It's too early to know what best practices there are - but I'll blog them as I here about them.

Back in October 2007 I blogged about partition-level lock escalation in SQL Server 2008 (see here) and I promised to do a follow-up once CTP-5 came out with syntax etc. So here it is.

A brief recap - lock escalation in SQL Server 2005 and before only allowed table-level lock escalation. If you have a partitioned table with queries going against different partitions, then table-level escalation is a pain because the whole table is suddenly locked and concurrent queries against distinct partitions can't run. SQL Server 2008 gives the ability to escalate to a parttition lock, which won't affect the queries on the other partitions.

The lock escalation policy can only be set with ALTER TABLE after a table has been created, and the policy can only be set at the table level. The syntax is

ALTER TABLE TableName SET (LOCK_ESCALATION = TABLE | AUTO | DISABLE);

The options mean:

  • TABLE - escalation will always be to the table level. This is the default.
  • AUTO - escalation will be to the partition level if the table is partitioned; otherwise it will be to the table level
  • DISABLE - escalation will be disabled. This does not guarantee that it will NEVER occur - there are some cases where it is necessary (Books Online gives the example of scanning a heap in the SERIALIZABLE isolation level)

The only way I could find to check what the escalation policy for a table is set to is to use the sys.tables catalog view:

SELECT lock_escalation_desc FROM sys.mytables WHERE name = 'TableName';

Let's try it out. Here's a script that creates a database with an example table with 3 partitions. The partition ranges are negative infinity to 7999, 8000 to 15999, 16000 to positive infinity.

CREATE DATABASE LockEscalationTest;
GO

USE LockEscalationTest;
GO

-- Create three partitions: -7999, 8000-15999, 16000+
CREATE PARTITION FUNCTION MyPartitionFunction (INT) AS RANGE RIGHT FOR VALUES (8000, 16000);
GO

CREATE PARTITION SCHEME MyPartitionScheme AS PARTITION MyPartitionFunction
ALL TO ([PRIMARY]);
GO

-- Create a partitioned table
CREATE TABLE MyPartitionedTable (c1 INT);
GO

CREATE CLUSTERED INDEX MPT_Clust ON MyPartitionedTable (c1)
ON MyPartitionScheme (c1);
GO

-- Fill the table
SET NOCOUNT ON;
GO

DECLARE @a INT = 1;
WHILE (@a < 17000)
BEGIN
INSERT INTO MyPartitionedTable VALUES (@a);
SELECT @a = @a + 1;
END;
GO

Now I'm going to explicitly set the escalation to TABLE and start a transaction that should cause lock escalation.

ALTER TABLE MyPartitionedTable SET (LOCK_ESCALATION = TABLE);
GO

BEGIN TRAN
UPDATE
MyPartitionedTable SET c1 = c1 WHERE c1 < 7500;
GO

We should be able to see the locks being held:

SELECT [resource_type], [resource_associated_entity_id], [request_mode],
[request_type], [request_status] FROM sys.dm_tran_locks WHERE [resource_type] <> 'DATABASE';
GO

resource_type   resource_associated_entity_id request_mode   request_type   request_status
--------------- ----------------------------- -------------- -------------- ----------------
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
OBJECT          2105058535                    X              LOCK           GRANT

Just as we expected - an X table lock. Trying any query against the table fails now. Now I'll rollback that transaction, set the escalation to partition-level and try again.

ROLLBACK TRAN;
GO

ALTER TABLE MyPartitionedTable SET (LOCK_ESCALATION = AUTO);
GO

BEGIN TRAN
UPDATE
MyPartitionedTable SET c1 = c1 WHERE c1 < 7500;
GO

SELECT [partition_id], [object_id], [index_id], [partition_number]
FROM sys.partitions WHERE object_id = OBJECT_ID ('MyPartitionedTable');
GO

SELECT [resource_type], [resource_associated_entity_id], [request_mode],
[request_type], [request_status] FROM sys.dm_tran_locks WHERE [resource_type] <> 'DATABASE';
GO

partition_id         object_id   index_id    partition_number
-------------------- ----------- ----------- ----------------
72057594039042048    2105058535  1           1
72057594039107584    2105058535  1           2
72057594039173120    2105058535  1           3

resource_type   resource_associated_entity_id request_mode   request_type   request_status
--------------- ----------------------------- -------------- -------------- ----------------
HOBT            72057594039042048             X              LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
OBJECT          2105058535                    IX             LOCK           GRANT

Excellent - the object lock is now IX rather than X, and the X lock is at the partition (HOBT) level for partition 1 (see the bold highlighting to match the partition ID with the lock resource). (For an explanation of HOBTs, see my post Inside The Storage Engine: IAM pages, IAM chains, and allocation units.) So now we should be able to do something with another partition - let's see if we can cause another partition level X lock in another connection:

USE LockEscalationTest;
GO

BEGIN TRAN
UPDATE
MyPartitionedTable set c1 = c1 WHERE c1 > 8100 AND c1 < 15900
;
GO

SELECT [partition_id], [object_id], [index_id], [partition_number]
FROM sys.partitions WHERE object_id = OBJECT_ID ('MyPartitionedTable'
);
GO

SELECT [resource_type], [resource_associated_entity_id], [request_mode],
[request_type], [request_status] FROM sys.dm_tran_locks WHERE [resource_type] <> 'DATABASE'
;
GO

partition_id         object_id   index_id    partition_number
-------------------- ----------- ----------- ----------------
72057594039042048    2105058535  1           1
72057594039107584    2105058535  1           2
72057594039173120    2105058535  1           3

resource_type   resource_associated_entity_id request_mode   request_type   request_status
--------------- ----------------------------- -------------- -------------- ----------------
HOBT            72057594039107584             X              LOCK           GRANT
HOBT            72057594039042048             X              LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
METADATA        0                             Sch-S          LOCK           GRANT
OBJECT          2105058535                    IX             LOCK           GRANT
OBJECT          2105058535                    IX             LOCK           GRANT

Now we have two partition X locks, for partitions 1 and 2 (as expected - use the color coding above to match up the IDs), plus two table-level IX locks (one for each  connection, as expected). Very cool!

Now I'm going to force a deadlock - by having each connection try to read a row from the other locked partition:

Connection 1:

SELECT * FROM MyPartitionedTable WHERE c1 = 8500;
GO

Conneciton 2:

SELECT * FROM MyPartitionedTable WHERE c1 = 100;
GO

Connection 2 succeeds but on connection 1 we get (as expected):

(local)\SQLDEV01(SQLHAVPC\Administrator): Msg 1205, Level 13, State 18, Line 1
Transaction (Process ID 51) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

This illustrates a potential problem with this new mechanism - applications that used to rely on the blocking nature of X table locks may now exhibit deadlocks if partition-level escalation is turned on in production without any testing. In fact, this mode was specifically chosen NOT to be the default setting for new tables because some trial workloads exhibited deadlocks during testing. Don't just turn it on in production without testing - as with any other option or feature.

Quickie today as I'm preparing to speak at the Pacific Northwest SQL Server User Group meeting tonight on the MS Campus here in Redmond (my trademark Detection and Recovery from Database Corruptions talk).

SQL Server 2005 introduced the concept of hot-add memory, to allow for dynamic workload handling. SQL Server 2008 increases these capabilities by adding hot-add CPU as well. 'Hot-add' means being able to plug in a CPU while the machine is running and then reconfigure SQL Server to make use of the CPU ONLINE! (i.e. no application downtime required at all)

There are a few restrictions:

  • You need a 64-bit system that support hot-add CPU (obviously :-))
  • You need Enterprise Edition of SQL Server 2008
  • You need Windows Server Datacenter or Enterprise Edition

When you plug in the new CPU, SQL Server won't automatically start using it. If you think about it, it can't - you may not want that CPU to be used by SQL Server - so it has to be told that it can use it. You do that by setting the appropriate affinity masks and then running a RECONFIGURE.

Ok - now we come to the bit that needs the "(and affinity masks)" in the title. What's an affinity mask? In a nutshell, it's a bitmap of all the CPUs on the machine that specifies which ones are available for general SQL Server Engine use, and which ones are available only for SQL Server I/O. There are two kinds of masks, a regular affinity mask, and an affinity I/O mask. The regular affinity mask, if all zeroes, says that Windows decides who get's what CPU when. If the affinity mask is non-zero, then there's a bit per CPU. If it's set to 1 then SQL Server can use the CPU. If a bit is set to 1 in the affinity I/O mask, then the CPU can only be used for I/O. A bit cannot be set in both masks. Now - an affinity mask is 32-bits wide, so if you have more than 32 CPUs, you need to use two more affinity masks, called affinity64 and affinity64 I/O. These do the same thing but for CPUs 32-63 on the machine.

So, very cool, especially for those of you that can afford such hardware - I can't so I don't have a box to test it on (the 64-bit server we have here at SQL skills doesn't support it).

Back in September last year I blogged about the native Backup Compression that is in SQL Server 2008 (see here) and promised to blog more when CTP-5 came out and I ran some tests. Well, it's here and I have so here's some data for you.

I expanded the AdventureWorks database to be 322Mb (random size, but big enough to get a decent sized run-time on my server). I used System Monitor to capture %user-mode CPU time, plus backup/restore throughput for a compressed and uncompressed backup operation, and then restores.

1) For the uncompressed backup the average CPU was 5% (the green line at the bottom), the run-time was 39.5s, and, of course, it took 322Mb to store the backup.

2) For the compressed backup the average CPU was way higher at 25%, BUT the run-time was 21.6s (a 45% improvement), and the backup was stored in 76.7MB (a 4.2x compression ratio). Very cool.

3) For the restore of the uncompressed backup the average CPU was 8%, and the run-time was 71.0s.

4) For the restore of the compressed backup the average CPU was 14.5%, and the run-time was 36s (a 50% improvement).

So - to summarize, turning on compression means more CPU and smaller run-times - just what was expected. Note that if you try this on your database you will see different results - the compression ratio and CPU usage is entirely dependent on the data being compressed.

Hit another issue last night while playing with the latest CTP - VS2005 can't cope with some of the changes in SS2008. The problem arose while defining an ASP.NET connection string that was pointing to a 2008 instance (while playing with query notifications and cache invalidation, but that's unrelated) - it failed with a slightly misleading error message from VS that my SQL Server instance had to be SQL Server 2005 Beta 2 or above!

Turns out there's a fix already (which works perfectly) - see http://www.microsoft.com/downloads/details.aspx?FamilyID=e1109aef-1aa2-408d-aa0f-9df094f993bf&displaylang=en. Thanks to Bob Beauchemin for helping me out.

Categories:
SQL Server 2008 | Tools

I came across an interesting bug in Management Studio in the latest Katmai CTP today - when a connection disconnects from a database, SSMS doesn't release the shared database lock that it holds. This prevents any operations that need exclusive database access (like a RESTORE) and can be somewhat disconcerting if you don't realize what's going on. Try executing the following in SSMS in the November CTP:

USE master;
GO

CREATE DATABASE MySSMSTest;
GO

USE MySSMSTest;
GO

CREATE TABLE test (c1 INT);
GO

USE master;
GO

DROP DATABASE MySSMSTest;
GO

And you'll see:

Msg 3702, Level 16, State 4, Line 1
Cannot drop database "MySSMSTest" because it is currently in use.

It works perfectly in SQLCMD in the same CTP, and also in SQL Server 2005. You can work around this by doing something like:

ALTER DATABASE MySSMSTest SET RESTRICTED_USER WITH ROLLBACK IMMEDIATE;

GO

Not exactly ideal - but, hey, it's only a CTP. This has already been reported in Connect as issue 320135 - I added a workaround.

I'll post any further bugs that I find here as well as making sure they're on Connect.

While we were in Barcelona we sat down with Richard Campbell and Greg Hughes from RunAs Radio to record a 1/2 hour interview on SQL Server 2008. We touch on a ton of different features (look at the number of Categories I've tagged this with!) and have a bunch of laughs along the way - check it out here.

PS There's been a ton of interest in the slide deck idea I had so we'll be going ahead with that. Look for an announcement sometime in the first few months of next year about how to get them. Thanks to everyone that replied!

One of the cool features added in SQL Server 2005 for scaling-out a workload was peer-to-peer replication. The major drawback was that to change an existing peer-to-peer topology, the entire topology had to be quiesced. In SQL Servr 2008, the Configure Peer-To-Peer Topology Wizard in Management Studio has undergone a major face-lift and a peer-to-peer topology can be altered ONLINE - very cool!

To get to the wizard, you still need to go through the clunkiness of enabling peer-to-peer subscriptions in the Subscription Options pane of the Publication Properties of a new publication (by right-clicking the publication under the Replication->Local Publications folder in Object Explorer):

Once that's set to True, you can right-click the publication and you'll see a Configure Peer-To-Peer Topology option which will bring up the new wizard. Here's what you'll see with only a single node configured - I've hovered the mouse over the node to get the tool-tip to show up:

By right-clicking anywhere on the design surface you get a menu, from which you can select to Add a New Peer Node. Of course the node you select has to already have been setup for replication otherwise you'll get an error. As long as replication is already setup, you'll get the Add a New Peer Node wizard:

You need to set the Peer Originator ID to a number that isn't already being used by another node in the topology - it would be nice if the wizard would default to an ID it knows isn't being used instead of 1.  You then decide whether to have peer-to-peer connections with all the other nodes in the topology automatically setup. If you don't check that option, the node will appear on the topology viewer, but with no connections, like below (again I've brought up the tool-tip so you can see it's a different node than the first one - in this case a different instance inside a VPC):

If you don't check the option, you can create connections manually by simply right-clicking either node and selecting Add a New Peer Connection. You'll see a rubber-banding arrow that you pull to the node you want to connect to. I tested the automatic method of creating the connections and it works nicely too. Here's a three node topology in the viewer:

The rest of the wizard is as before - setting the Log Reader Agent and Distributor Agent security settings and how to initialize the new peer. At any point you can go back into the wizard and add or remove connections or peers. I'm pretty impressed!

On the last day of SQL Connections a couple of weeks back we did a 20 minute TV interview with Steve Wynkoop of SSWUG. Apart from the mandatory mention of our favorite game Blokus, we discussed a bunch of new features coming in SQL Server 2008 while I struggled not to say 'we' instead of 'they' to describe the SQL team. I'm still in recovery I guess...

Check it out at http://www.sswug.org/sswugtv/seeshow.asp?sid=P227.

There's a well known problem that every time a backup operation succeeds a message is written into the error log and Windows event log. If you're taking very frequent log backups (say every 5 minutes) of multiple databases, that's a significant amount of clutter in the logs. Well - now there's a fix!

Kevin Farlee, the Storage Engine PM responsible for (among many other things) BACKUP and RESTORE has just blogged about a trace flag - 3226 - that's been in the product since SQL Server 2000 that will suppress the success messages. He's planning to document this (and other) trace flags in this area starting in SQL Server 2008. Excellent!

All the razzamatazz about new releases go on about what's new but hardly ever is there discussion of what's been removed. So that's the topic of this short post.

In the Books Online that comes with the July CTP of SS2008 (here's a link to the download page), the topic Discontinued Database Engine Functionality in SQL Server 2008 lists the following:

Category Discontinued feature Replacement

Aliases

sp_addalias

Replace aliases with a combination of user accounts and database roles. For more information, see CREATE USER (Transact-SQL) and CREATE ROLE (Transact-SQL). Remove aliases in upgraded databases by using sp_dropalias (Transact-SQL).

Backup and restore

DUMP statement

BACKUP

Backup and restore

LOAD statement

RESTORE

Backup and restore

BACKUP LOG WITH NO_LOG

None. The transaction log is automatically truncated when the database is using the simple recovery model. If you must remove the log backup chain from a database, switch to the simple recovery model.

Backup and restore

BACKUP LOG WITH TRUNCATE_ONLY

None. The transaction log is automatically truncated when the database is using the simple recovery model. If you must remove the log backup chain from a database, switch to the simple recovery model.

Backup and restore

BACKUP TRANSACTION

BACKUP LOG

Backup and restore

sp_helpdevice

Query the sys.backup_devices catalog view.

Compatibility level

60, 65, and 70 compatibility levels

Databases must be set to at least compatibility level 80.

DBCC

DBCC CONCURRENCYVIOLATION

None

Groups

sp_addgroup

Use roles.

Groups

sp_changegroup

Use roles.

Groups

sp_dropgroup

Use roles.

Groups

sp_helpgroup

Use roles.

Sample databases

Northwind and pubs

Use AdventureWorks. Northwind and pubs are available as downloads, or can be copied from a previous installation of SQL Server.

For more information, see AdventureWorks Sample Databases.

Most of these aren't going to bother people much I would guess but removing BACKUP LOG WITH NO_LOG is going to cause problems. Don't get me wrong, I think it's great that it's being removed (see my previous post for why I think it's evil) but it's going to break a lot of people who rely on it now for managing their transaction logs instead of taking log backups or switching to SIMPLE recovery mode.

As I mentioned in a previous post, one of the new features for database mirroring in SQL Server 2008 is log stream compression. There's a good explanation of what this is (along with some example workloads and performance graphs) on the SQL Customer Advisory Team blog so I'm not going to duplicate all that here.

Basically, the way mirroring works is by shipping all the transaction log from the principal to the mirror, and then replaying the log records on the mirror database. Network bandwidth can be a bottleneck for mirrored databases where the rate of transaction log generation is high enough to saturate the network. On synchronously mirrored databases (where a transaction cannot commit on the principal until all the transaction log for it has been hardened in the log of the mirror database), this can even lead to workload throughput degradation.

Log compression is a way to alleviate this problem. SS2008 Books Online states that compression rates of at least 12.5% are achieved - obviously the compression ratio could be much higher than that and is dependant on what's being compressed - i.e. the data that is being processed by the application. One of the tests they did for the post above shows a 5x compression ratio - that's pretty good.

The downside of this is that compression is not free - extra CPU must be used. The obvious CPU load increases comes from having to compress the transaction log on the principal and then decompress it again on the mirror before hardening it in the log. The non-obvious increase will come because the principal and mirror should end up being able to process more transactions/second, which means more CPU is needed.

This extra CPU could be a problem for some systems that are already pegged in terms of CPU load, and so they may actually see a drop in performance when log stream compression is enabled. Now, it's on by default when you upgrade to SS2008 but there is a way to turn it off. The blog post above divulges that there is a trace flag, 1462, that turns off log stream compression and effectively reverts the behavior back to SS2005.

Apart from the obvious benefits of this feature for systems that can afford the extra CPU, I'm excited because it may allow more mirrored databases to perform effective database maintenance - in the form of fragmentation control and index maintenance. When database mirroring is enabled, the database has to be running in the full recovery mode. This means that operations such as index rebuilds are fully logged - that's a lot of extra log being generated if you're used to switching to bulk-logged recovery mode to perform index operations. Of course, this isn't an issue for reorganizing an index (with ALTER INDEX ... REORGANIZE) as this has been fully logged (and has to be that way because of how it works) since I wrote the old DBCC INDEXDEFRAG for SS2000.

Anyway, for some customers the fact that these two operations are fully logged means that running them produces so much additional log that, in conjunction with the log from the workload itself, the network becomes a bottleneck for mirroring and it slows down both the maintenance and reduces the workload throughput. The addition of log stream compression means that for these systems, the log may be compressed enough so that the network is no longer a bottleneck and more regular fragmentation control can take place. This in turn will increase workload throughput as well - a double benefit!

Lots of people have been asking for us to create some aggregate feeds of the various blogs on the SQLskills site - well, now I've done it.

The three new feeds are:

SQLskills BI Team Blog

SQLskills SQL Server 2008 Category Aggregate Feed

SQLskills All Blogs Aggregate Feed

http://pipes.yahoo.com/pipes/pipe.run?_id=vv9PBOp13BGAc67kLO2fWQ&_render=rss. There's no FeedBurner equivalent for this feed as its larger than the maximum feed size that FeedbBurner can handle (512k).

Enjoy!

Categories:
General | SQL Server 2008

With the November 5-9th SQL Connections conference in Las Vegas around the corner, Shirley Brothers, the Connections Conference Manager, would like to start planning the Spring show. From the Spring show onwards, Kimberly and I will be the co-chairs of the SQL Connections conference. We would like to invite you to submit abstracts for the Spring 2008 SQL Connections conference, to be held in Orlando, April 20-24th, 2008.

The conference will take place shortly after the SQL Server 2008 launch (Feb 27, 2008), and will focus on best practices for SQL Server 2005, how to upgrade and migrate applications from SQL Server 200x to SQL Server 2008 and new features to leverage in SQL Server 2008. However, we will consider all topics - from design to performance to troubleshooting to architectures to programming. Also, we're likely to closely follow our past conference format with three primary themes/tracks: Infrastructure and Scalability, Programming, and Business Intelligence. You should aim for the intermediate to advanced audience for any SQL Server 2005 materials and for SQL Server 2008 - we'll consider any level - especially depending on the newness of the feature.

For submitting session abstracts, please use this URL: 
http://www.deeptraining.com/devconnections/abstracts

Please keep the abstracts under 200 words each and in one paragraph. Please do not use bulleted items or line breaks, and please use a spell-checker. If you have an issue with the site, please follow up with an email as well to
paul@sqlskills.com.

Please create a new speaker record (if you haven't submitted before) and submit at least 3 abstracts; however, it will help your chances of being selected if you submit 5 or 6 abstracts. We need to have all your abstracts by October 26th. 

Finally, please see some important notes about your sessions at Connections conferences. These comments are from our Connections Conference Manager - Shirley Brothers:


I would like to thank each and every one of our Connections speakers for helping us make this event so successful. Last year in Vegas we had just over 4,700 attendees; we hope to do that or better this fall. By the end of September we had over 3,000 registrations for our Fall show.

A successful show happens when you have a combination of things come together: great speakers, good venue, great partners, fun events, and ongoing relationship building. I hope we can all continue to work together to make Connections the very best event outside of Tech Ed, for years to come. As a more successful show, our competitors are trying to compete by moving their shows closer to our dates and in some cases, to the same cities.

It’s very disheartening for me to see our speakers presenting similar topics at competing shows that are scheduled so close to our own shows. So I want to make a small change in how we handle speakers. Essentially, I don’t want to schedule a speaker at a Connections show who is also presenting at one of our competitor’s shows, in the same state, within 30 days of a Connections show. Many of you have known me for years and you know that I would never discourage a speaker from doing something that is good for his/her career and company. I have never asked a speaker not to speak for a competitor and I am NOT saying that now. I am saying that if another show sits within 30 days of ours in the same state, that the speaker should choose to speak at one or the other conference, but not both. If for whatever reason a speaker does not choose Connections during that time period, there will NOT be hard feelings and they can speak for one of our shows at another time. 

I want Connections conferences to be different and unique in the minds of our attendees, our sponsors, and our speakers. I have given this a lot of thought and I think the best way to produce our Connections shows is not to have any of our speakers presenting at our show and at a competitor’s show a few weeks earlier or later in the same place.

Shirley

 
Thanks for your continued support for our SQL Connections conference and we look forward to seeing your bright and shiny new abstracts!

Thanks,
Kimberly L. Tripp & Paul S. Randal
SQL Connections Conference Chairs

Are you coming to SQL Connections in November? If you are, and you're coming to either of our interactive sessions, we invite you to send us questions in advance that we will pick from and answer during the sessions. The two sessions are:

  • SDB351: Follow the Rabbit - Interactive Q&A on Availability

In this session, Kimberly Tripp and Paul Randal will have only 5-10 slides. The focus of this session is on mixing availability technologies to create the best overall architecture to minimize downtime and data loss. In general, we're going to focus on best practices and then open up to your questions so that you can drive the discussion! This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is! Grab your lunch and come back - we'll probably still be hanging out!

  • SDB349: Follow the Rabbit - Interactive Q&A on the Storage Engine and the Relational Engine

In this session, Kimberly Tripp and Paul Randal will have only 5-10 slides. Each slide cover topics for discussion as well as the reason(s) for why something might be behaving badly and/or things to try to solve your problems. In general we're going to focus on best practices and then open up to your questions so that you can drive the discussion! Paul will focus on the SE (Storage Engine) and internals and Kimberly will focus on the RE (Relational Engine) and query tuning/performance. This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is!

Please send your questions to questions@sqlskills.com with a subject line of 'Follow the Rabbit'. If we read out your question and answer it during the session, we'll give you a prize - one of our DVDs full of hands-on labs to learn from (the ones we give out to pre-con and post-con attendees) - we'll definitely do a few per session. Thanks!

Paul and Kimberly

In SQL Server 2005, queries over partitioned tables use a single-thread per partition. This can cause performance problems under certain circumstances:

  1. On systems with many CPUs, if the table schema has less partitions than there are CPUs, then not all the CPUs will be used to process the query. Some examples:
    1. On a 32-way box, a query over a 12-partition table (e.g. a sales table partitioned by month) will only use 12 threads (one on each of 12 CPUs). This means 20 CPUs are potentially idle.
    2. On a 4-way box, a query over a 12-partition table only accesses a single partition so will only use one thread (on one CPU). This means 3 CPUs are potentially idle.
  2. On tables that have skewed data such that one partition is much larger than another, the length of time the query takes to complete will be bounded by the single thread processing the largest partition.

As part of the set of improvements in SQL Server 2008 for data warehousing there will be an option to change the threading behavior for queries over partitioned tables. The new, alternative model is that all available threads process part of each partition and then move into the next partition. This allows all available CPUs to take part in processing the query, which should lead to a drop in the query completion time.

The only time this model won't work is if the data is not in the buffer pool and is not spread out evenly across the available drives. For example, if an entire partition is stored on a single drive, then multiple threads will be scanning different portions of the drive, causing the disk head to thrash and IO throughput to drop sharply compared with a single thread driving the IO. For this reason, the option to use the new model will be off by default, to avoid surprising people with sudden bad performance after upgrading.

This should be available in the next CTP and then I'll post again with some example datasets and queries to see what the potential benefits and drawbacks are.

ITForum.gifITForum.gif

 

 

 

November's going to be a busy month for Kimberly and I! Hot on the heels of SQL Connections in Las Vegas is the yearly pilgrimage to Barcelona for TechEd IT Forum and this year we're packing in a ton of sessions in between us.

 

Here's what we're doing:

Sessions

  • (DAT205) The Next Release of Microsoft SQL Server: Manageability Overview

The next release of SQL Server will contain exciting new manageability features targeted at reducing total cost of ownership. Come learn more about what's in store in areas like policy-driven administration and performance data collection and analysis. The session focuses on the database engine.

  • (DAT301) SQL Server Indexing - Unravelling the Unknown

Knowing tips and tricks to indexing is extremely helpful and will help you to solve "known" query problems. But what's lurking in the unknown? Is SQL Server using your indexes? Or, do you have a bunch of indexes sitting around wasting space and negatively impacting performance? Finally, SQL Server 2005 has an answer! SQL Server 2005 DMVs (Dynamic Management Views) can provide you with valuable information about your current indexing strategies, what should be removed, and even what's missing. Do you know how to find this information, leverage it, and then programmatically respond to it? Come to this session to learn a few tips and tricks as well as how to figure out the unknown indexing problems!

  • (DAT305) Secrets to Fast Detection and Recovery from Database Corruptions

How can you tell whether your data is corrupt? If you have corruption, how do you work out what’s wrong with the database? How do you ensure you have a valid backup? If you don’t have a valid backup, how and what do you repair? If you do have a backup, how do you work out whether you should restore or repair? And at what granularity? How do you go about determining what went wrong in the first place? It’s all about limiting downtime and data-loss when a corruption occurs - from knowing the tools to understanding the choices to planning a successful strategy. Some of the features discussed:

  • Torn-page detection and page checksums
  • IO read-retry
  • Backup checksums
  • Consistency checks (DBCC CHECKDB and related commands)
  • Database repairs

Facing database corruption is almost inevitable in every DBAs career - make sure you're prepared when it happens to you.

Chalk-talks

  • (DAT01-IS) SQL Server Upgrade Best Practices, Tips, and Tricks

Even though SQL Server 2005 has been out for a while, many companies are only just getting ready to upgrade. Come to this session to learn best practices, tips, and tricks distilled from two years of customer experiences. We'll also touch on some issues you'll face going to SQL Server 2008 when it's released next year. Come to this session to learn and share - bring your questions and experiences!

  • (DAT02-IS) SQL Server 2005 Database Mirroring: Setup to Implementation to Monitoring

Database Mirroring is one of the most exciting technologies in SQL Server 2005. With more and more people including it in their Disaster Recovery Strategies, it's important to know when to implement Database Mirroring as well as the implications of the architecture you choose. In this session there will be no slides just demos that will explore how Database Mirroring works in its various configurations and how that may effect your performance. Join us and see database mirroring in action and get your questions answered!

  • (DAT07-IS) DBCC Internals

All DBAs should have heard of (and used) DBCC – especially for consistency checking. Get down deeper than 400-level with this session on how some of the most important DBCC commands work. Topics covered include CHECKDB, SHRINKFILE, INDEXDEFRAG, and more.

Instructor-led labs

  • (DAT03-ILL) SQL Server Always On Technologies Instructor-Led Lab: Part 1 - Database Mirroring

See Database Mirroring in action! From implementation to monitoring to failover, database mirroring provides an ideal solution for many disaster recovery scenarios and this session will prepare you to handle them with minimal downtime or data loss.

  • (DAT04-ILL) SQL Server Always On Technologies Instructor-Led Lab: Part 2 - Database Snapshots

Database Snapshots are useful in many situations: database maintenance, data recovery, and point-in-time data access. You can even create a snapshot on a mirror database to get better return-on-investment (ROI) on your high-avaialbility (HA) investments. In this session we will explain how database snapshots work as well as go through several exercises, including working with multiple database snapshots and creating database snapshots on a mirror database.

  • (DAT05-ILL) SQL Server Always On Technologies Instructor-Led Lab: Part 3 - Online Operations

The bane of any DBA's life is to have to take data offline to perform maintenance or recover from a disaster. The various Online Operations in SQL Server 2005 alleviate much of this frustration. This session will show you how to move a table online for better isolation and control, partition a table online, access a database that's partially damaged, and perform online piecemeal restore.

Lunchtime Demos

  • (DAT01-PD) Database Recovery Techniques

In this fast-paced demo session nasty things will be done repeatedly to database. Then the methods and approaches to recovery will be shown. Not for the faint-hearted!

So, just like SQL Connections the week before, serious amounts of info, tips, and tricks for you to geek-out on with us and take home to use. We hope to see a lot of you there!

Paul (and Kimberly)

SQL Server supports lock escalation - when the server decides to move from a large number of row or page locks on an object to a table-level lock. Sunil Agarwal posted a great description of lock escalation in SQL Server 2005 on the Storage Engine blog so I won't repeat it all here.

The problem with lock escalation is that it can be tricky to manage on systems that have conflicting requirements.

Disabling lock escalation

For example, if a table needs to support large batch updates with concurrent user queries, then having the batch update cause an escalation to a table-level exclusive lock prevents the user queries from running. There are a couple of documented trace flags that can be used to disable lock escalation:

  • 1211 - disables lock escalation totally and will allow lock memory to grow to 60% of dynamically allocated memory (non-AWE memory for 32-bit and regullar memory for 64-bit) and will then further locking will fail with an out-of-memory error
  • 1224 - disables lock escalation until 40% of memory is used and then re-enables escalation

The problem with these two trace flags are that they are instance-wide and turning them on can cause huge performance issues if a poorly-written application takes too many locks. It's not possible to disable lock escalation for a single table - until now!

SQL Server 2008 includes the ability to disable lock escalation per-table!! This is a fantastic step forward in concurrency management.

Changing the escalation mechanism

To extend the example above, what about if the table has multiple partitions? With the batch update only affecting a single partition and concurrent user queries going against other partitions, the escalation policy in SQL Server 2005 means that the batch update will escalate to a table-level exclusive lock and freeze out the user queries, even though they're going against different partitions. The only recourse is to disable lock escalation - until now!

SQL Server 2008 includes the ability to specify partition-level lock escalation instead of table-level lock escalation. And this is per-table! Very cool.

Summary

SQL Server 2008 will have ALTER TABLE syntax to specify per-table lock escalation management. The options will be:

  • Automatic determination of the level to escalate to. If the table is partitioned, locks will be escalated to the partition-level.
  • Table-level lock escalation (even if the table is partitioned).
  • Disable lock escalation

Once this feature is available in a CTP I'll blog about the syntax and supporting infrastructure, along with some examples.

Part of the improvements to database mirroring in SQL Server 2008 are the addition of 10 new performance counters under the Database Mirroring Performance Object. These add some cool troubleshooting capabilities.

SQL Server 2005 provided the following 11 counters (from Books Online):

Name Description

Bytes Received/Sec

Number of bytes received per second.

Bytes Sent/sec

Number of bytes sent per second.

Log Bytes Received/sec

Number of bytes of log received per second.

Log Bytes Sent/sec

Number of bytes of log sent per second.

Log Send Queue

Total number of bytes of log that have not yet been sent to the mirror server.

Pages Sent/sec

Number of pages sent per second.

Receives/sec

Number of mirroring messages received per second.

Redo Bytes/sec

Number of bytes of log rolled forward on the mirror database per second.

Redo Queue

Total number of bytes of hardened log that currently remain to be applied to the mirror database to roll it forward. This is sent to the Principal from the Mirror.

Sends/sec

Number of mirroring messages sent per second.

Transaction Delay

Delay in waiting for unterminated commit acknowledgement.

SQL Server 2008 now provides 21 counters, with the new ones highlighted in red. This info is taken from the 2008 July CTP 08Books Online that is downloadable here.

Name Description

Bytes Received/sec

Number of bytes received per second.

Bytes Sent/sec

Number of bytes sent per second.

Log Bytes Received/sec

Number of bytes of log received per second.

Log Bytes Redone from Cache/sec

Number of redone log bytes that were obtained from the mirroring log cache, in the last second.

This counter is used on only the mirror server. On the principal server the value is always 0.

Log Bytes Sent from Cache/sec

Number of sent log bytes that were obtained from the mirroring log cache, in the last second.

This counter is used on only the principal server. On the mirror server the value is always 0.

Log Bytes Sent/sec

Number of bytes of log sent per second.

Log Compressed Bytes Rcvd/sec

Number of compressed bytes of log received, in the last second.

Log Compressed Bytes Sent/sec

Number of compressed bytes of log sent, in the last second.

Log Harden Time (ms)

Milliseconds that log blocks waited to be hardened to disk, in the last second.

Log Remaining for Undo KB

Total kilobytes of log that remain to be scanned by the new mirror server after failover.

This counter is used on only the mirror server during the undo phase. After the undo phase completes, the counter is reset to 0. On the principal server the value is always 0.

Log Scanned for Undo KB

Total kilobytes of log that have been scanned by the new mirror server since failover.

This counter is used on only the mirror server during the undo phase. After the undo phase completes, the counter is reset to 0. On the principal server the value is always 0.

Log Send Flow Control Time (ms)

Milliseconds that log stream messages waited for send flow control, in the last second.

Sending log data and metadata to the mirroring partner is the most data-intensive operation in database mirroring and might monopolize the database mirroring and Service Broker send buffers. Use this counter to monitor the use of this buffer by the database mirroring session.

Log Send Queue KB

Total number of kilobytes of log that have not yet been sent to the mirror server.

Mirrored Write Transactions/sec

Number of transactions that wrote to the mirrored database and waited for the log to be sent to the mirror in order to commit, in the last second.

This counter is incremented only when the principal server is actively sending log records to the mirror server.

Pages Sent/sec

Number of pages sent per second.

Receives/sec

Number of mirroring messages received per second.

Redo Bytes/sec

Number of bytes of log rolled forward on the mirror database per second.

Redo Queue KB

Total number of kilobytes of hardened log that currently remain to be applied to the mirror database to roll it forward. This is sent to the Principal from the Mirror.

Send/Receive Ack Time

Milliseconds that messages waited for acknowledgement from the partner, in the last second.

This counter is helpful in troubleshooting a problem that might be caused by a network bottleneck, such as unexplained failovers, a large send queue, or high transaction latency. In such cases, you can analyze the value of this counter to determine whether the network is causing the problem.

Sends/sec

Number of mirroring messages sent per second.

Transaction Delay

Delay in waiting for unterminated commit acknowledgement.

Here's a little more explanation and what you can use these new performance counters to troubleshoot:

  • Log Bytes Redone from Cache/sec
    • This measures how much of the transaction log in the redo queue is being read by the log redo task from the mirror's in-memory transaction log cache. Reading from the cache is a lot faster than having to read from the mirror's actual transaction log. Even though the log gets hardened on the mirror database's log disk, it does not need to be removed from the cache until the cache fills up with new transaction log from the principal.
    • You could think of this as a cache hit ratio measure for the redo queue.
    • If this number is lower than usual, it means that transaction log is arriving from the principal faster than the log redo task can roll forward the transaction log in the redo queue.
  • Log Bytes Sent from Cache/sec
    • This is similar to the counter above. It measures how much of the transaction log being sent from the principal to the mirror is being read from the principal's in-memory transaction log cache. Sending from the cache is a lot faster than having to go to the transaction log itself and read from disk.
    • You could think of this as a cache hit ratio for the send queue.
    • If this number is lower than usual it means that the transaction log is being generated on the principal faster than it can be sent to the mirror.
  • Log Compressed Bytes Rcvd/sec
  • Log Compressed Bytes Sent/sec
    • These are simple counters and can be used with the Log Bytes Sent/sec and Rcvd/sec counters to determine the compression ratio. Log stream compression is another enhancement in SQL Server 2008 that I'll cover in a future post.
  • Log Harden Time (ms)
    • This measures the delay between the mirror server receiving a chunk of transaction log and it being hardened on the mirror database's log disk (i.e. the delay before the chunk of transaction log becomes part of the redo queue on the mirror server).
    • If this number is higher than normal it means the mirror database's log disk is more heavily loaded and may be  becoming saturated.
  • Log Remaining for Undo KB
  • Log Scanned for Undo KB
    • The Books Online entries for these counters are self-explanatory.
    • These counters give a way to monitor the undo phase after a failover occurs.
  • Log Send Flow Control Time (ms)
    • This measures how long a mirroring connection had to wait before it could us the mirroring flow control buffer.
    • If this number is higher than normal it means there is contention for the buffer, most likely because there are too many Database Mirroring partnerships running from a single instance.
  • Mirrored Write Transactions/sec
    • As Books Online mentions, this counts the number of transactions in the principal database that had to wait for a commit record to harden in the mirror database's transaction log.
    • If this value is lower than normal (for the same application workload) it means there is a bottleneck somewhere in the system.
  • Send/Receive Ack Time
    • As Books Online mentions, this can be used to measure network latency between the principal and mirror servers.
    • If this value is larger than normal it means that there is a network bottleneck between the principal and mirror servers.

Hopefully Microsoft will publish a whitepaper or some troubleshooting scenarios showing these counters being used.

One of the hottest features in SQL Server 2005 is database mirroring, and it's helped many companies implement successful and relatively inexpensive high-availability strategies. In SQL Server 2008, Database Mirroring has been enhanced in several ways - one of which is the ability to automatically repair corrupt pages!

This feature is based on the fact that the principal and mirror databases are exactly the same. So, if a page becomes corrupt on the principal, SQL Server should be able to read the page from the mirror and use it to fix the principal. Similarly, if a page becomes corrupt on the mirror, the page can be read from the principal to fix up the mirror. Pretty cool, eh?

Details

  • The feature works for pages that have 824 errors, 823 errors where the OS returns a CRC error while reading the page (to prevent resource issues triggering a page repair), and pages that have 829 errors (where the page is marked as restore pending). See my previous post here for more details on page errors.
  • Pages are fixed asynchronously.
    • If the page is corrupt in the principal, the query that hit the corrupt page will fail. Once discovered, a page is marked as being 829 until its fixed. This prevents an issue where a transient disk error could allow a subsequent update to change the page after it's been queued for being repaired, and then the page is overwritten with a copy from the mirror, losing the update. Nasty.
    • If the page is corrupt in the mirror (which is discovered when the page is read as part of the continual recovery of the log), the mirroring session is suspended. The mirror keeps track of all corrupt pages that need to be repaired with copies from the principal. Once all corrupt pages have been repaired, the mirroring session will be resumed automatically. This means that if a page is corrupt in both the mirror and the principal, manual intervention will be required to resolve the issue.
  • The feature is only available in Enterprise Edition. This means that if the principal is on Enterprise Edition and the mirror is on Standard Edition, then corrupt pages on the principal can repaired from the mirror but not the other way around.
  • There is a new DMV - sys.dm_db_mirroring_auto_page_repair - that allows you to track corrupt pages in mirrored databases 
    • It covers all mirrored databases on a server.
    • It provides info on the last 100 pages that were found in any mirrored database, as well as the status of the automatic page repair operation.
    • This DMV isn't yet included in the SQL Server 2008 Books Online available on TechNet but will have the following info:
      • Database ID the page is in
      • The Page ID, split into file and page-in-file
      • The error type - distinguishing between 823 errors, torn-page errors, page checksum failures, and all-other-824 errors
      • The status of the page repair operation
      • The time that the status was last updated
  • If a page repair fails for any reason (e.g. the mirroring partner couldn't supply the page) then the repair will be marked as failed. If the page is then hit again (by a query on the principal or a recovery operation on the mirror) then it will be re-queued for repair.
  • Not all pages can be repaired - the file header page, database boot page, and allocation bitmap pages (GAM, SGAM, PFS) cannot be repaired this way.

In Action

I've got a system with a few SQL Server 2008 instances running so I decided to give it a try. Here's what I did, using the TicketSalesDB from the Always-On DVDs and labs that SQLskills produces:

  • Changed the database to use page checksums
  • Rebuilt the clustered index of one of the tables (so the pages have page checksums on)
  • Took the initial backup of the database that's needed for mirroring (so that it has no corruptions in)
  • Used DBCC IND to find a page in the leaf level of the clustered index and corrupted the page
  • Setup mirroring (but using the backup I took before introducing the corruption - so the mirror database will have a clean copy of the page I corrupted)
  • Performed a query on the table with the corrupt page to force the page checksum failure and kick-off automatic page repair

And it worked! Going back into the instance with the principal database and querying the DMV gives:

C:\>sqlcmd /E /S.\KATMAI01
1> SELECT * FROM sys.dm_db_mirroring_auto_page_repair;
2> GO
database_id file_id     page_id              error_type page_status modification_time
----------- ----------- -------------------- ---------- ----------- -----------------------
          6           4                 4256         -1           5 2007-09-27 17:23:20.067

(1 rows affected)
1>

The page I corrupted was (4:4256) and page_status of 5 means the repair succeeded. Running the query again confirms that the corruption has been fixed. The page was also logged in the suspect_pages table in msdb:

1> SELECT * FROM msdb..suspect_pages;
2> GO
database_id file_id     page_id              event_type  error_count last_update_date
----------- ----------- -------------------- ----------- ----------- -----------------------
          6           4                 4256           5           1 2007-09-27 17:23:20.407

(1 rows affected)
1>

In Books Online (both 2005 and 2008), event_type of 5 means that the page was repaired.

I checked the SQL Server error log for the principal and this is what I found (the bolding is mine):

2007-09-27 17:17:10.41 spid25s     Database mirroring is active with database 'TicketSalesDB' as the principal copy. This is an informational message only. No user action is required.
2007-09-27 17:23:19.92 spid51      Error: 824, Severity: 24, State: 2.
2007-09-27 17:23:19.92 spid51      SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x55684fbe; actual: 0x16e84fbe). It occurred during a read of page (4:4256) in database ID 6 at offset 0x00000002140000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL.6\MSSQL\Data\TicketSalesFG2005Q1.NDF'.  Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.
2007-09-27 17:23:19.96 spid26s     Database mirroring is attempting to repair physical page (4:4256) in database "TicketSalesDB" by requesting a copy from the partner.
2007-09-27 17:23:20.42 spid26s     Database mirroring successfully repaired physical page (4:4256) in database "TicketSalesDB" by obtaining a copy from the partner.

I also checked the error log for the mirror and there's nothing relevant in there. I guess the same info would be output to the mirror database's error log if the mirror became corrupted. It's reasonably simple to check this using similar steps as above, but corrupting the database before the initial backup is taken, fixing the corruption again after the backup and before mirroring starts (so the corruption is on the mirror but not the principal), and then triggering an update on the corrupt page. When the update is replayed on the mirror, the corruption will be hit and the page repaired in the same way. I'll check later and blog if there's any difference.

Summary

SQL Server 2008 introduces an enhancement to database mirroring that can bi-directionally pull pages between the principal and mirror databases to fix page corruptions. One word of caution I'd give is that this feature doesn't mean you can ignore these errors when they occur - you still need to do root-cause analysis on the corruption and take steps to prevent them happening again before a corruption occurs that automatic page repair cannot fix (as I mentioned above).

Nevertheless, this is a tremendously useful feature that's going to save a lot of downtime. Cool!

My previous post on Backup Compression reminded me to post about installing the latest SQL Server 2008 CTP (here's a link to the download page).

I've installed SQL Server 2008 many times, both within Microsoft and since leaving in August, but I've never done it on a system that has SQL Server 2000 installed too. Last week I decided to create a VPC image with a few instances of SQL Server 2008 on as well as SQL Server 2005 - basically updating the VPC image from the Always-On DVD that contains all the Hands-On Labs that Kimberly's written over the last few years for SQL Server 2005.

One thing I'd heard is that SQL Server 2008 won't install side-by-side yet with SQL Server 2000, or any SQL Server 2000 tools. I'm not sure whether this will be the case at RTM, but it's fair enough for a pre-RTM CTP. So, I tried it and sure-enough it failed. I then used Add/Remove Programs to de-install SQL Server 2000 and thought this would do the trick. Nope - the install still failed, thinking I had one or more SQL Server 2000 instances in the VPC. Next I tried deleting the 2000-specific directories from the C:/Program Files/Microsoft SQL Server directory. Nope - the install still failed. Finally, I fired up regedt32.exe and deleted the 2000-specific registry keys. Then the install finally worked.

So - to save you a bunch of time figuring this out, using Add/Remove Programs doesn't de-install SQL Server 2000 well enough to satisfy the SQL Server 2008 CTP setup.

Over the next few months I'll be blogging a lot about new features that are coming in SQL Server 2008 for DBAs and ITPros.

First up is Backup Compression. This has been one of most heavily requested features since before I joined the SQL team back in 1999 - for me it's really cool that it's in this coming release.

It's widely recognized that database sizes are growing significantly, and that the growth is being forced by many factors, such as companies requiring more data to be available online for longer (e.g. to comply with government regulations) or an increasing amount of data being digitized for storage. Although the per/GB cost of storage is falling, the actual cost of keeping a database available is increasing. Here's why:

  • Prior to SQL Server 2008, a backup of a VLDB (or any sized database really) takes, on average, the same amount of disk space as the data in the database itself (remember that database files can be sized much larger than the amount of data they hold). A prudent backup strategy is to have multiple backups online, and multiple redundant copies of each backup. Thus, the amount of disk space required increases rather quickly - more than offsetting the drop in storage costs.
  • Keeping safe copies of all database backups means they have to be physically copied to a separate location. Increasing database sizes translates into more bandwidth required to copy larger database backups (or lower bandwidth links get saturated for longer and longer).
  • If you have a well designed storage strategy, then you're probably using RAID - so the more backups you need to store, and the higher level of RAID you use, the more drives you need to store all these backup copies.

In addition to costs, the elapsed times of backup and restore operations need to be considered. Backup and restore operations are essentially IO-bound. For a backup in SQL Server 2005 and before, the whole database has to be read and then written to a new location - with the total IO size of the writes to the backup equalling that of the reads from the database. The same is true for a restore, with reads from the backup equalling writes to the restored database. (As an aside, a restore operation also has the added CPU cost of having to examine each page as it's read from the backup to figure out where it should be written to in the restored database - this usually makes a restore operation take 10-20% more elapsed time than a backup.)

The perfect solution to these problems, at the expense of sometimes-considerable CPU time, is to compress the database as it's written into the backup. This reduces storage space per backup, required network bandwidth per backup copy, and elapsed time for backup and restore operations. The last point - reducing elapsed time for restore operations - is especially important in disaster recovery situations, where anything that can help reduce database downtime is a good thing.

[Edit: I've added a little here based on comments I received - thanks Brent and Marc!]

A less ideal solution I've seen is to take a regular backup and then compress it after-the-fact before copying to other locations. While this is a reasonable solution, it requires more disk space than compressing the database as its backed up, and it's a more complicated procedure. It also increases the time to take the backup, as the compression is done in a seperate step.

An alternative to compressing the backup at all is to make the backup location a compressed NTFS directory. While this achieves the compression goal, it doesn't permanently compress the backup so doesn't reduce the network bandwidth required to copy the backup or the space needed to archive the backup to tape.

Up until SQL Server 2008, the only solutions for compression-during-backup have come from third-party software vendors. Although these solutions do the job of aleviating the problems I've described above, there are two major roadblocks to their adoption that I've heard from SQL Server customers:

  1. You need to buy another software license as well as SQL Server - this can be pretty expensive for a large number of SQL Server installations. Management is also a headache, to ensure that all sites that may need to decompress the backup have the correct software installed.
  2. Some IT shops are Microsoft-only, which precludes the use of any software not supplied by Microsoft.

In SQL Server 2008, Microsoft will include a long-awaited and much-asked-for backup compression solution - eliminating the two roadblocks above. This is a fantastic first step improving backup/restore functionality - hopefully in the release after SQL Server 2008 we'll see further innovations that will allow encrypted backups, table-level restore, and easier validation of the database stored in a backup.

Some points to note:

  • Adhering to the principal-of-least-surprise, backup compression will be off by default, with very simple syntax to turn it on - directly with T-SQL or through the tools.
  • The compression algorithm used is proprietary to Microsoft and has yielded similar compression ratios to well-known third-party products.
  • During Microsoft's in-house testing on real -world customer databases, average compression ratios of 5:1 have been observed.
  • A backup set will not be able to contain both compressed and uncompressed backups.
  • None of the existing functionality will be altered by compression - e.g. the operation of WITH CHECKSUM or WITH CONTINUE_AFTER_ERROR (see here for more info on those options).
  • A restore operation will be able to tell automatically whether a given backup is compressed or not and just do the right thing.

Once backup compression is available in a public CTP, I'll blog some sample scripts and report on things like:

  • the compression ratios, elapsed time differences, and CPU usages for a few sample databases
  • the varoius configuration options available
  • any differences in the MSDB backup history tables or the output from RESTORE HEADERONLY/LABELONLY

In summary, backup bompression is a very exciting feature and should enable many customers to save money, either on additional software licenses or on storage/network costs, and time, especially in all-important disaster recovery situations.

Thanks

120x240_SQLConn_IBT.jpg120x240_SQLConn_IBT.jpg

 

 

The Fall conference season is on us again! Building on the great reaction to our co-presented Database Maintenance workshop at SQL Connections in Orlando, Kimberly and I are doing a *ton* of stuff at SQL Connections this Fall. The conference is in Las Vegas (where I've never been!) and runs from November 5th to 8th, with pre-con workshops on the 5th. We've got so much cool stuff to talk about that as well as doing a pre-con on the 5th, we're also doing a pre-pre-con on the 4th AND a post-con workshop on the 9th! And if that's not enough to help us lose our voices during the week, we're also doing 5 conference sessions between us! Tuesday 6th is Microsoft day and the session line-up looks great - lots of juicy details about SQL Server 2008 - as well as some best-practices sessions for those of you who are happy with SQL Server 2000 or 2005 for now.

 

 

Here's what we're doing:

Workshops

  • November 4th - Pre-pre-con: SPR301: SQL Server Database Maintenance: From Planning to Practice to Post-Mortem

No matter how much effort you spend on the design of your database, if you don't maintain it in production then it will suffer from performance and manageability problems. The key to continued performance and smooth operations is a well thought-out and automated database maintenance plan. This full-day workshop has three sections: planning, practice, and post-mortem. Planning for database maintenance actually starts with database design, so one of the things covered will be how to avoid design choices that limit database maintenance or contribute to maintenance problems. We'll discuss a laundry-list of maintenance problems and then explore how to tell if you need to mitigate them, strategies and best-practices for doing so, and how to avoid having your mitigation choices cause unforeseen and undesirable side-effects. Topics covered will include database files (shrink, grow, virtual log files, log size/management), consistency checks and corruption detection, fragmentation, statistics, backup/restore (options, granularity, strategies) and recovery models. The workshop will vary between 200-400 level covering ALL the key concepts of maintenance features. Finally, we'll spotlight some real-world examples where people made good and bad choices and discuss how you can repeat or avoid them, respectively. If you're wondering how to bring your database back under control, and keep it there, then this full-day workshop will help you tame maintenance problems whether you're a full-time system administrator or a reluctant DBA.

  • November 5th - Pre-con: SPR303: SQL Server Disaster Recovery: From Planning to Practice to Post-Mortem

Every DBA's nightmare is having down time and data loss and not knowing how to recover. However, designing and implementing a successful disaster recovery strategy is easier said than done. It's about asking all the right questions and figuring out all the best answers for your situation. This full-day workshop has three sections: planning, practice and post-mortem. Planning is a critical part of disaster recovery, but the most-often disregarded. Topics we'll cover here include: How do you choose technologies to fit requirements and effectively use key features of SQL Server 2005? How do technology choice affect workload performance? Putting a well-thought out plan into practice requires even more planning and in this section we'll discuss technology implementation, building step-by-step recovery/operation guides for when disasters happen, and, most importantly, testing your plan by simulating real problems. In the final section, we'll spotlight some real-world examples where people made costly mistakes and show you how they could have been avoided with a little planning and practice. If you've ever had nightmares about disaster recovery (or actually had a disaster!) and been at a loss for what to do, then this full-day workshop will give you the direction and technical details you need for success!

  • November 9th - Post-con: SPS302: SQL Server - Put Your Knowledge Into Action (Bring Your Own Laptop)

After a week of learning and watching demos - spend your last conference day putting your knowledge into action and diving deeper into the implementation details. Bring your own laptop to install our VPC environment setup with hands-on lab exercises to walk you through some of our most important features in Database Maintenance and Disaster Recovery. All labs will be ILLs (instructor-led labs) with supporting hands-on lab content *and* you will walk away with your own copy of the DVD to continue the exercises back at your office. You can attend without a laptop but your experience will be significantly better with one! This is meant as an advanced workshop and you're expected to bring a reasonable laptop configuration in order to participate: * Virtual Server or Virtual PC - already installed * At least 1GB of physical memory w/512MB dedicated to the VPC environment (2GB is preferred w/1GB dedicated to VPC) * 12 GB of physical disk space (20+ GB is preferred) * DVD Drive

Sessions

  • SDB351: Follow the Rabbit - Interactive Q&A on Availability

In this session, Kimberly Tripp and Paul Randal will have only 5-10 slides. The focus of this session is on mixing availability technologies to create the best overall architecture to minimize downtime and data loss. In general, we're going to focus on best practices and then open up to your questions so that you can drive the discussion! This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is! Grab your lunch and come back - we'll probably still be hanging out!

  • SDB350: SQL Server Table Strategies - Designing for Performance and Availability

Often tables are designed based solely on the data that needs to be tracked (here's a column name, here's a data type - done!). Unfortunately, design does not usually take into account how the data is going to be used OR how SQL Server uses the data. Knowing the internals of table structures as well as the optmizations that come with good design will make your database truly scalable. Come to this session to learn some internals as well as various design strategies such as vertical and horizonal partitioning. Additionally, are there any other features that require changes in your design and thinking? For example, online index operations impact design because of the limitations that exist with partitioning and LOB columns. If you want to scale, you need to be here!

  • SDB347: SQL Server Indexing for Performance - Finding the Right Balance

In terms of performance tuning, there are few silver bullets. If I had to choose ONE area that improves performance the most (when designed appropriately!), it's indexing. However, indexing strategies depend on the data and even more so, the usage of the data. Come to this session to see what indexing strategies help the base table the most as well as how to optimize your worst performing queries.

  • SDB348: SQL Server Indexing Strategies - Are You Sure?

Knowing tips and tricks to indexing is extremely helpful and will help you to solve "known" problems. But what's lurking in the unknown? Is SQL Server using your indexes? Or, do you have a bunch of useless indexes? Finally, SQL Server 2005 has an answer! SQL Server 2005 DMVs (Dynamic Management Views) can provide you with valuable information about your current indexing strategies, what should be removed, and even what's missing. Do you know how to find this information, leverage it, and then programmatically respond to it? Come to this session to figure it out!

  • SDB349: Follow the Rabbit - Interactive Q&A on the Storage Engine and the Relational Engine

In this session, Kimberly Tripp and Paul Randal will have only 5-10 slides. Each slide cover topics for discussion as well as the reason(s) for why something might be behaving badly and/or things to try to solve your problems. In general we're going to focus on best practices and then open up to your questions so that you can drive the discussion! Paul will focus on the SE (Storage Engine) and internals and Kimberly will focus on the RE (Relational Engine) and query tuning/performance. This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is!

So, serious amounts of info, tips, and tricks for you to geek-out on with us and take home to use. We hope to see a lot of you there!

Paul (and Kimberly)

Theme design by Nukeation based on Jelle Druyts