Way back in the mists of time, at the end of the last century, I wrote DBCC SHOWCONTIG for SQL Server 2000, to complement my new invention DBCC INDEXDEFRAG.

I also used to wear shorts all the time, with luminous orange, yellow, or green socks.

Many things change - I now have (some) dress sense, for one. One other thing that changed was that DMVs came onto the scene with SQL Server 2005. DBCC SHOWCONTIG was replaced by sys.dm_db_index_physical_stats. Under the covers though, they both use the same code - and the I/O characteristics haven't changed.

This is a blog post I've been meaning to do for a while now, and I finally had the impetus to do it when I heard about today's T-SQL Tuesday on I/O in general being run by Mike Walsh (Twitter|blog). It's a neat idea so I decided to join in this time. In retrospect, reading this over before hitting 'publish', I got a bit carried away (spending two hours on this) - but it's one of my babies, so I'm entitled to! :-)

This isn't a post about how to use DMVs in general, how to use this DMV in particular, or anything about index fragmentation. This is a blog post about how the DMV works.

DMV is a catch-all phrase that most people (myself included) use to describe all the various utility views in SQL Server 2005 and 2008. DMV = Dynamic Management View. There's a catch with the catch-all though - some of the DMVs aren't views at all, they're functions. A pure DMV gets info from SQL Server's memory (or system tables) and displays it in some form. A DMF, on the other hand, has to go and so some work before it can give you some results. The sys.dm_db_index_physical_stats DMV (which I'm going to call 'the DMV' from now on) is by far the most expensive of these - but only in terms of I/O.

The idea of the DMV is to display physical attributes of indexes (and the special case of a heap) - to do this it has to scan the pages comprising the index, calculating statistics as it goes. Many DMVs support what's called predicate pushdown, which means if you specify a WHERE clause, the DMV takes that into account as it prepares the information. This DMV doesn't. If you ask it for only the indexes in the database that have logical fragmentation > 30%, it will scan all the indexes, and then just tell you about those meeting your criteria. It has to do this because it has no way of knowing which ones meet your criteria until it analyzes them - so can't support predicate pushdown.

This is where understanding what it's doing under the covers comes in - the meat of this post.

LIMITED

The default operating mode of the DMV is called LIMITED. Kimberly always makes fun of the equivalent option for DBCC SHOWCONTIG, which I named as a young and foolish developer - calling it WITH FAST. Hey - it's descriptive!

The LIMITED mode can only return the logical fragmentation of the leaf level plus the page count. It doesn't actually read the leaf level. It makes use of the fact that the next level up in the index contains a key-ordered list of page IDs of the pages at the leaf level - so it's trivial to examine the key-ordered list and see if the page IDs are also in allocation order or not, thus calculating logical fragmentation.

The idea behind this option is to allow you to find the fragmentation of an index by reading the minimum number of pages, i.e. in the smallest amount of time. This option can be magnitudes faster than using the DETAILED mode scan, and it depends on how big the index's fanout is. Without getting too much into the guts of indexes, the fanout is based on the index key size, and determines the number of child-page pointers an index page can hold (e.g. the number of leaf-level pages that a page in the next level up has information about).

Consider an index with a char(800) key. Each entry in a page in the level above the leaf has to include a key value (the lowest key that can possibly appear on the page being referred to), plus a page ID, plus record overhead, plus slot array entry - so 812 bytes. So a page can only hold 8096/812 = 9 such entries. The fanout is at most 9.

Consider an index with a bigint key. Each entry is 13 bytes, so a page can hold 8096/13 = 622 entries. The fanout is at most 622, but will likely be smaller, depending on operations on the index causing fragmentation at the non-leaf levels.

For a table with 1 million pages at the leaf level, the first index will have 1 million/9 = 111112 pages at least at the level above the leaf. The second index will have at least 1608 pages. The savings in I/O from using the LIMITED mode scan will clearly differ based on the fanout.

I've created a 100GB clustered index (on the same hardware as I'm using for the benchmarking series) with 13421760 leaf-level pages and a maximum fanout of 540. In reality, I populated the index using 16 concurrent threads, so there's some fragmentation. The level above the leaf has 63012 pages, an effective fanout of 213. Still, the LIMITED mode scan will read 213x less than a DETAILED scan, but will it be 213x faster?

Here's a perfmon capture of the LIMITED mode scan on my index:

 

There's nothing special going on under the covers in a LIMITED mode scan - the chain of pages at the level above the leaf is read in page-linkage order, with no readahead. The perfmon capture shows:

  • Avg. Disk Read Queue Length (light blue) is a steady 1.
  • Avg. disk sec/Read (pink) is a steady 4ms.
  • Disk Read Bytes/sec (green) is roughly 14.5million.
  • Page reads/sec (dark blue) is roughly 1800.

DETAILED 

The DETAILED mode does two things:

  • Calculate fragmentation by doing a LIMITED mode scan
  • Calculate all other statistics by reading all pages at every level of the index

And so it's obviously the slowest. It has to do the LIMITED mode scan first to be able to calculate the logical fragmentation, because it reads the leaf level pages in the fastest possible way - in allocation order. DBCC has a customized read-ahead mechanism for allocation order scans that it uses for this DMV and for DBCC CHECK* commands. It's *incredibly* aggressive and will hit the disks as hard as it possibly can, especially with DBCC CHECK* running in parallel.

Here's a perfmon capture of the DETAILED mode scan on my index:

 

Not quite as pretty as the LIMITED mode scan, but I like it :-) Here's what it's showing:

  • Avg. Disk Read Queue Length (black) is in the multiple hundreds. Clearly it's appetite for data is outstripping what my RAID array can do. It basically tries to saturate the I/O subsystem to get as much data as possible flowing into SQL Server.
  • Avg. disk sec/Read (pink line at the bottom) is actually measuring in whole seconds, rather than ms. Given the disk queue length, I'd expect that.
  • DBCC Logical Scan Bytes/sec (red) varies substantially as the readahead mechanism throttles up and down, but it's driving anywhere up to 80MB/sec. You can see around 9:49:20 AM when it drops to zero for a few seconds.
  • Readahead pages/sec (green) is tracking the DBCC scan. This is a buffer pool counter, the DBCC one is an Access Methods counter (the dev team I used to run during 2005 development). If I had Disk Read Bytes/sec and Pages reads/sec showing, they'd track the other two perfectly - I turned them off for clarity.

So the DETAILED mode not only reads more data, but it does it a heck of a lot more aggressively so has a much more detrimental effect on the overall I/O capabilities of the system while it's running.

SAMPLED

There is a third mode that was introduced just for the DMV. The idea is that if you have a very large table and you want an idea of some of the leaf level statistics, but you don't want to take the perf hit of running a DETAILED scan, you can use this mode. It does:

  • LIMITED mode scan
  • If the number of leaf level pages is < 10000, read all the pages, otherwise read every 100th pages (i.e. a 1% sample)

Summary

There's no progress reporting from the DMV (or DBCC SHOWCONTIG) but if you look at the reads column in sys.dm_exec_sessions you can see how far through the operation it is. This method works best for DETAILED scans, where can compare that number against the in_row_data_page_count for the index in sys.dm_db_partition_stats (yes, you'll need to mess around a bit if the index is actually partitioned).

In terms of timing, I ran all three scan modes to completion. The results:

  • LIMITED mode: 282 seconds
  • SAMPLED mode: 414 seconds
  • DETAILED mode: 3700 seconds

Although the LIMITED mode scan read roughly 200x less than the DETAILED scan, it was only 13 times faster, because the readahead mechanism for the DETAILED scan is way more efficient than the (necessary) follow-the-page-linkages scan of the LIMITED mode.

Just for kicks, I ran a SELECT COUNT(*) on the index to see how the regular Access Methods readahead mechanism would fare - it completed in 3870 seconds - 5% slower, and it had less processing to do than the DMV. Clearly DBCC rules! :-)

Although the DETAILED mode gives the most comprehensive output, it has to do the most work. For very large indexes, this could mean that your buffer pool effectively gets flushed out by the lazy writer making space available for the DMV to read and process the pages. One of the reasons I advise people to only run the DMV on indexes they know they're interested in - and better yet, run it on a restored backup of the database.

Hope this is helpful!

PS Oh, also beware of using the SSMS fragmentation wizard. It uses a SAMPLED mode scan, but I found it impossible to cancel!

Back in January I offered a promotion as a way of introducing our maintenance/operations auditing services. Now I've completed a bunch of them, with some excellent results for customers.

One international customer, Plex Systems, was so pleased with the outcome of my audit of their manufacturing ERP software hosting databases that they issued a press release today to their industry partners and clients. Another customer, Hose and Fittings, Etc, was amazed at the details and justifications in the report I presented them - see their testimonial on the Past Customers page.

Take a look at the new auditing page that describes how the various audits work, and let me know if you want to discuss doing one - we can do it wherever you are in the world.

Thanks!

One of the perennial problems facing both experienced and involuntary DBAs is how to tell whether a database is really in the FULL recovery model or not? This is complicated by the fact that when you switch a database into the FULL recovery mode, it actually behaves as if it's in the SIMPLE recovery mode until the log backup chain is established (this is commonly called being in 'pseudo-SIMPLE').

It's a problem for several reasons:

1) if the database is really in the FULL recovery model then log backups must be taken so the log can clear/truncate properly and it doesn't grow out of control

2) if the database is in the FULL recovery model but the log backup chain has been broken (or not established at all since the database was created) then log backups are not possible (except for the yuckiness in SQL 2000 when log backups would succeed without complaint but be totally useless during disaster recovery)

I don't know of any script to easily determine whether a database is really in the FULL recovery mode, so I knocked one together - and I present it here for you to use.

The trick to the script is finding the last LSN that's been backed up for the database. if this is non-NULL, then a log backup chain exists and the database is really in the FULL recovery mode. This is stored in the dbi_dbbackupLSN field in the database boot page (see Search Engine Q&A #20: Boot pages, and boot page corruption) but also nicely available in the DMV sys.database_recovery_status.

I've tested this on 2005 SP3 and 2008 SP1.

Note that this doesn't work on SQL 2000 - I've poked around and can't find a way to get at the LSN without reading the boot page directly, which can't be done gracefully inside a function - I'll leave that as an exercise for you. You'd expect the IsTruncLog property returned by DATABASEPROPERTY to be correct when the database is in pseudo-SIMPLE, but it's not unfortunately.

Here are some test cases for the script:

CREATE DATABASE SimpleModeDB;
CREATE DATABASE BulkLoggedModeDB;
CREATE DATABASE FullModeDB;
GO

ALTER DATABASE SimpleModeDB SET RECOVERY SIMPLE;
ALTER DATABASE BulkLoggedModeDB SET RECOVERY BULK_LOGGED;
ALTER DATABASE FullModeDB SET RECOVERY FULL;
GO

SELECT [Name], msdb.dbo.SQLSkillsIsReallyInFullRecovery ([Name]) AS 'ReallyInFULL'
FROM sys.databases
WHERE [Name] LIKE '%ModeDB';
GO

Name              ReallyInFULL
----------------- -------------
SimpleModeDB      0
BulkLoggedModeDB  0
FullModeDB        0

This makes sense - the new FullModeDB database is still in pseudo-SIMPLE. Now what if we take a full database backup?

BACKUP DATABASE FullModeDB To DISK='C:\SQLskills\FullModeDB.bck' WITH INIT;
GO
SELECT msdb.dbo.SQLSkillsIsReallyInFullRecovery ('FullModeDB') AS 'ReallyInFULL';
GO

Processed 152 pages for database 'FullModeDB', file 'FullModeDB' on file 1.
Processed 1 pages for database 'FullModeDB', file 'FullModeDB_log' on file 1.
BACKUP DATABASE successfully processed 153 pages in 0.230 seconds (5.449 MB/sec).

ReallyInFULL
------------
1

Perfect. Now what about switching it back to SIMPLE and back to FULL again?

ALTER DATABASE FullModeDB SET RECOVERY SIMPLE;
ALTER DATABASE FullModeDB SET RECOVERY FULL;
GO
SELECT msdb.dbo.SQLSkillsIsReallyInFullRecovery ('FullModeDB') AS 'ReallyInFULL';
GO

ReallyInFULL
------------
0

Just as we expect - the log backup chain has been broken and the database is back to pseudo-SIMPLE again.

Now what if we restart the log backup chain using a full database backup?

BACKUP DATABASE FullModeDB To DISK='C:\SQLskills\FullModeDB.bck' WITH INIT;
GO
SELECT msdb.dbo.SQLSkillsIsReallyInFullRecovery ('FullModeDB') AS 'ReallyInFULL';
GO

Processed 152 pages for database 'FullModeDB', file 'FullModeDB' on file 1.
Processed 1 pages for database 'FullModeDB', file 'FullModeDB_log' on file 1.
BACKUP DATABASE successfully processed 153 pages in 0.095 seconds (13.193 MB/sec).

ReallyInFULL
------------
1

Perfect. Now what about if we break the chain and try to restart it using a differential database backup?

ALTER DATABASE FullModeDB SET RECOVERY SIMPLE;
ALTER DATABASE FullModeDB SET RECOVERY FULL;
GO
BACKUP DATABASE FullModeDB To DISK='C:\SQLskills\FullModeDB_diff.bck' WITH INIT, DIFFERENTIAL;
GO
SELECT msdb.dbo.SQLSkillsIsReallyInFullRecovery ('FullModeDB') AS 'ReallyInFULL';
GO

Processed 40 pages for database 'FullModeDB', file 'FullModeDB' on file 1.
Processed 1 pages for database 'FullModeDB', file 'FullModeDB_log' on file 1.
BACKUP DATABASE WITH DIFFERENTIAL successfully processed 41 pages in 0.080 seconds (4.192 MB/sec).

ReallyInFULL
------------
1

Perfect - that works too, as I'd expect. You may wonder why a differential backup works - either a full or differential backup will work as they bridge the LSN gap since the last full or differential backup before the log backup chain was broken - both of these backups include transaction log - see More on how much transaction log a full backup includes.

And here's the script itself - enjoy!

/*============================================================================
   File: SQLskillsIsReallyInFullRecovery.sql

   Summary: This script creates a function in msdb that returns a BIT value of
   1 if the supplied database is really in the FULL recovery mode and not still
   in pseudo-SIMPLE.

   Date: October 2009

   SQL Server Versions:
      10.0.2531.00 (SS2008 SP1)
      9.00.4035.00 (SS2005 SP3)
------------------------------------------------------------------------------
   Copyright (C) 2009 Paul S. Randal, SQLskills.com
   All rights reserved.

   For more scripts and sample code, check out
     
http://www.sqlskills.com/

   You may alter this code for your own *non-commercial* purposes. You may
   republish altered code as long as you give due credit.

   THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF
   ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED
   TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
   PARTICULAR PURPOSE.
============================================================================*/

USE [msdb];
GO

IF EXISTS (SELECT * FROM sys.objects WHERE NAME = 'SQLskillsIsReallyInFullRecovery')
   DROP FUNCTION SQLskillsIsReallyInFullRecovery;
GO
 
CREATE FUNCTION SQLskillsIsReallyInFullRecovery (
   @DBName sysname)
RETURNS BIT
AS
BEGIN
  
DECLARE @IsReallyFull  BIT;
   DECLARE @LastLogBackupLSN NUMERIC (25,0);
   DECLARE @RecoveryModel  TINYINT;

   SELECT @LastLogBackupLSN = [last_log_backup_lsn]
   FROM sys.database_recovery_status
   WHERE [database_id] = DB_ID (@DBName);

   SELECT @RecoveryModel = [recovery_model]
   FROM sys.databases
   WHERE [database_id] = DB_ID (@DBName);

   IF (@RecoveryModel = 1 AND @LastLogBackupLSN IS NOT NULL)
      SELECT @IsReallyFull = 1
   ELSE
      SELECT @IsReallyFull = 0;

   RETURN (@IsReallyFull);
END;
GO

Here's an interesting bug that surfaced recently, first reported by Bryan Smith on the MSDN disaster recovery/HA forum three weeks ago.

One of the mechanisms I advise for running consistency checks on VLDBs with multiple filegroups is to run successive DBCC CHECKFILEGROUP operations on the filegroups holding the partitions (see CHECKDB From Every Angle: Consistency Checking Options for a VLDB). Prior to SQL Server 2005 SP3, if a table or index is split into multiple partitions, then DBCC CHECKFILEGROUP would skip checking the entire table or index if it was partitioned over multiple filegroups. From SQL Server 2005 SP3 onwards, DBCC CHECKFILEGROUP will validate only the partitions of tables and indexes that reside on the filegroup being checked - rather than skipping the whole table or index - a big improvement.

Now it seems that SQL Server 2008 has a bug where it essentially has regressed back to the old behavior where DBCC CHECKFILEGROUP will skip a table or index if it's not wholely contained on the filegroup being checked.

Here's a script you can use to test this. It creates a partitioned table over multiple filegroups and then runs DBCC CHECKFILEGROUP on the first partition. I'll discuss the results after the script.

CREATE DATABASE DBMaint2008;
GO
USE DBMaint2008;
GO

ALTER DATABASE DBMaint2008 ADD FILEGROUP DataPartition1;
ALTER DATABASE DBMaint2008 ADD FILEGROUP DataPartition2;
ALTER DATABASE DBMaint2008 ADD FILEGROUP DataPartition3;
GO

ALTER DATABASE DBMaint2008 ADD FILE
   (NAME = N'DataPartition1', FILENAME = N'C:\SQLskills\DataPartition1.ndf',
   SIZE = 10, FILEGROWTH = 10)
TO FILEGROUP DataPartition1;
GO
ALTER DATABASE DBMaint2008 ADD FILE
   (NAME = N'DataPartition2', FILENAME = N'C:\SQLskills\DataPartition2.ndf',
   SIZE = 10, FILEGROWTH = 10)
TO FILEGROUP DataPartition2;
GO
ALTER DATABASE DBMaint2008 ADD FILE
   (NAME = N'DataPartition3', FILENAME = N'C:\SQLskills\DataPartition3.ndf',
   SIZE = 10, FILEGROWTH = 10)
TO FILEGROUP DataPartition3;
GO

CREATE PARTITION FUNCTION Partitions_PFN (INT)
AS RANGE RIGHT FOR VALUES (1000, 2000);
GO

CREATE PARTITION SCHEME [Partitions_PS]
AS PARTITION [Partitions_PFN] TO (DataPartition1, DataPartition2, DataPartition3);
GO

CREATE TABLE TestTable (c1 INT IDENTITY, c2 DATETIME DEFAULT GETDATE ());
CREATE UNIQUE CLUSTERED INDEX TestPK ON TestTable (c1) ON Partitions_PS (c1);
GO

SET NOCOUNT ON;
GO
INSERT INTO TestTable DEFAULT VALUES;
GO 3000

DBCC CHECKFILEGROUP (DataPartition1);
GO

On SQL Server 2005 SP3, the output from the final batch contains:

DBCC results for 'TestTable'.
Cannot process rowset ID 72057594038452224 of object "TestTable" (ID 2073058421), index "TestPK" (ID 1), because it resides on filegroup "DataPartition2" (ID 3), which was not checked. 
Cannot process rowset ID 72057594038517760 of object "TestTable" (ID 2073058421), index "TestPK" (ID 1), because it resides on filegroup "DataPartition3" (ID 4), which was not checked. 
There are 999 rows in 3 pages for object "TestTable".

This shows that it processed the 1000 rows in the first partition, but not the other two - as we'd expect. 

On SQL Server 2008, the output for TestTable is limited to:

Cannot process rowset ID 72057594038910976 of object "TestTable" (ID 2105058535), index "TestPK" (ID 1), because it resides on filegroup "DataPartition2" (ID 3), which was not checked.

And that's it - nothing about it processing any rows in partition 1. This shows that DBCC didn't process the first partition as we'd expect - this becomes even more apparent with very large amounts of data, where DBCC CHECKFILEGROUP will just complete almost instantly.

As Bryan says in his post, Microsoft has acknowledged this is a bug and it should hopefully be fixed for 2008 CU5. In the meantime, this is something you should be aware of as your tables may not be being checked properly.

This survey is based on a suggestion from Jonathan Kehayias (blog/Twitter).

In this survey I'm asking two questions:

  1. When you're designing/implementing a database, do you manually create statistics or let SQL Server create them automatically?
  2. How do you maintain statistics? Manually or letting SQL Server do it for you? Or both? 

I'll probably collaborate with Kimberly on the editorial for this, as she's forgotten more about statistics than I'll ever know!

*Please* no comments on this post - wait for the survey results post to avoid skewing the answers (comments are unfortunately moderated anyway, now that my blog gets a lot of spam). I'm very interested in your reasoning, but not until everyone else responds.

   

As always, a big Thanks! for contributing to the blog/community by responding. Please shoot me an email (Contact button, bottom left of the blog), or ping me on Twitter (@PaulRandal) if you have an idea for a good survey.

Here's a question that came up recently: if I've upgraded a database from SQL 2000 or before, how can I tell if the data purity checks will be run or not?

As you may know, DBCC CHECKDB in 2005 onwards includes 'data purity' checks. These look for column values where the value is outside the valid range of values for the column's data type. For databases created on SQL 2005 onwards, these checks are always performed by DBCC CHECKDB and cannot be turned off. For databases created on earlier versions, it's a little more complicated.

In SQL Server versions prior to 2005, it was possible to import invalid data values into a database. These invalid values could cause query execution problems, or possibly even wrong results. In 2005, when the import 'holes' were closed, the data purity checks were added to DBCC CHECKDB, but not by default for upgraded databases. Because the possibility existed of upgraded database containing invalid values, the decision was made not to enable the data purity checks by default as this could lead people to suspect that the upgrade had caused corruptions that weren't there on 2000 or before.

So, if you have a database that was upgraded and you want to run the data purity checks, you need to use the WITH DATA_PURITY option for DBCC CHECKDB. This is all documented in Books Online and I put it into the Release Notes for SQL 2005 as well. Back to the question though - at what point do you NOT need to specify the option? Well, for an upgraded database, if the WITH DATA_PURITY option is used and no problems are found, a bit is irrevocably flipped in the boot page (see Search Engine Q&A #20: Boot pages, and boot page corruption) and from that point onwards the data purity checks will be performed on the database whenever DBCC CHECKDB runs.

The problem is though, how can you tell whether the bit has been flipped? You need to look at the boot page. The easiest way to do that is to use the DBCC DBINFO command (undocumented, but perfectly safe). It's the equivalent of using DBCC PAGE ('dbname', 1, 9, 3) to look at the boot page contents, as I explained in the blog post referenced in the previous paragraph.

There are two things you need to look for: what version of SQL Server created the database, and it the 'create version' is 2000 or lower, whether the special 'data purity' flag is set or not.

For a database created on SQL Server 2005:

DBCC TRACEON (3604);
GO
DBCC DBINFO ('master');
GO

DBINFO STRUCTURE:

DBINFO @0x66C8EF64

dbi_dbid = 1                         dbi_status = 65544                   dbi_nextid = 1984726123
dbi_dbname = master                  dbi_maxDbTimestamp = 16000           dbi_version = 611
dbi_createVersion = 611              dbi_ESVersion = 0                   
dbi_nextseqnum = 1900-01-01 00:00:00.000                                  dbi_crdate = 1900-01-01 00:00:00.000
dbi_filegeneration = 0              
dbi_checkptLSN

m_fSeqNo = 7612                      m_blockOffset = 224                  m_slotId = 1
dbi_RebuildLogs = 0                  dbi_dbccFlags = 0                   
dbi_dbccLastKnownGood = 2009-05-12 16:07:15.647                          
dbi_dbbackupLSN

<snip>

If the dbi_createVersion is 611 or higher, the database was created on SQL Server 2005+ and will always have data purity checks performed.

For an upgraded database (this is one of my pre-corrupted databases, you can get it from Conference corruption demo scripts and example corrupt databases):

DBCC TRACEON (3604);
GO
DBCC DBINFO ('DemoCorruptMetadata');
GO

DBINFO STRUCTURE:

DBINFO @0x6855EF64

dbi_dbid = 7                         dbi_status = 16                      dbi_nextid = 2089058478
dbi_dbname = DemoCorruptMetadata     dbi_maxDbTimestamp = 100             dbi_version = 611
dbi_createVersion = 539              dbi_ESVersion = 0                   
dbi_nextseqnum = 1900-01-01 00:00:00.000                                  dbi_crdate = 2009-06-17 15:14:49.490
dbi_filegeneration = 0              
dbi_checkptLSN

m_fSeqNo = 10                        m_blockOffset = 303                  m_slotId = 1
dbi_RebuildLogs = 0                  dbi_dbccFlags = 0                   
dbi_dbccLastKnownGood = 1900-01-01 00:00:00.000                          
dbi_dbbackupLSN

<snip>

This database has a dbi_createVersion lower than 611, so we need to look at the dbi_dbccFlags field. A value of 0 means that the data purity checks are not enabled by default. A value of 2 means they are enabled by default. You can easily check this out for your own databases.

Have fun!

A long time ago, in a blog post far, far away (well before I went offline in July) I kicked off a weekly survey about how often you run consistency checks (see here for the survey). Now I'm back online again, and so here are the results as of 8/3/09.

 

The results are actually surprising - I didn't expect so many people to be running consistency checks so frequently - 25% daily and another 37% weekly - very cool!

The number of people who don't know what consistency checks are may look a little depressing but I think it's probably a symptom of the number of people coming into the SQL world as involuntary DBAs. For those people, consistency checks are a way of proactively checking for database corruption, which is nearly always caused by the I/O subsystem. You can read a good introduction to consistency checking and other database maintenance topics in the article I wrote for TechNet Magazine in August 2008 - Top Tips for Effective Database Maintenance and bit more in the previous survey Importance of how you run consistency checks.

Basically you need to run regular consistency checks. There's a myth that you don't need to run consistency checks - this was perpetuated by various marketing folks when SQL 7.0 shipped, because SQL 6.5 used to cause allocation corruptions and the rewrite for 7.0 removed all the corruption problems. Now, of course there have been bugs in SQL Server that cause problems, but they account for a tiny fraction of the corruptions out there. Nearly all corruptions are caused by something going wrong in the I/O subsystem - and you can't predict when that will or won't happen. Jim Gray once likened the disk heads in a hard drive as akin to a 747 flying at 500mph 1/4 inch above the ground - scary stuff. You DO need to run consistency checks, because corruptions do happen. You can read more about the causes of corruptions in this blog post: Search Engine Q&A #26: Myths around causing corruption.

So - how often should you run consistency checks? Well, it depends. When I'm teaching I like to give two examples:

  • You have a dodgy I/O subsystem that is causing corruptions. You have no backups. You have a zero data-loss requirement. With no backups, your only way to get rid of corruptions is to run repair, which usually leads to data loss (see Misconceptions around database repair). In that case, you want to know about corruption as soon as possible to limit the amount of data loss from running repair. Contrived example for sure, but you'd be surprised what I've seen...
  • You have a rock-solid I/O subsystem, with all drivers and firmware up-to-date. You have a comprehensive backup strategy which you've extensively tested and you've confident you can recover from corruptions with zero-to-minimal downtime and data loss. In this case you may be comfortable running consistency checks once a week, say.

The overwhelming factor in how often to run consistency checks is you. How comfortable are you with the integrity of your I/O subsystem and your ability to recover from a corruption problem. If you have corruptions in your database today, you'll probably run DBCC CHECKDB on it every day for a month, right?!?

Part of any database maintenance of high-availability strategy is proactively making sure that corruption doesn't exist in the database - otherwise when you DO discover it, it may be more pervasive and it will take you longer to recover, and potentially with more downtime. Therefore the answer really is that you should run it as often as you can. Sometimes that can be difficult for very large databases (your definition is likely to be different than mine - I think 500GB and larger - depends on your hardware etc) as CHECKDB can take a long time to run (see CHECKDB From Every Angle: How long will CHECKDB take to run?), but there are ways you can effectively consistency check even a VLDB - see CHECKDB From Every Angle: Consistency Checking Options for a VLDB.

So - I'm very pleased to see so many people running regular consistency checks. However, maybe I'm just a pessimist but I did expect some of the less than optimal options to have higher numbers. Let's look at each in turn.

Never. The only time this is acceptable is if you don't know what consistency checks are and why you need them. Enough said. Even if you think you *can't* run consistency checks because it takes too many resources/time/etc, take a look at the blog post I mention above. There's always a way.

Only when corruption is detected some other way. By the time corruption has occurred and is detected through regular operations, it's likely to be more pervasive. Many databases do not see all their data read/updated as part of regular operations, which means that if a part of the database that's not used for a while gets corrupted, and you're not running consistency checks, you're not going to know. That means you're not going to know that your I/O subsystem is causing problems - and so more corruption will occur. You might think this isn't a big deal, but it can be depending on what part of the database (or maybe system databases) gets corrupted. Some things can be restored or repaired relatively simply - but what if, say, the boot page of master is corrupted? Your entire instance is down until you sort that out. It's always better to proactively discover corruption before it hits you in a way that really messes you up.

Only during an event like an upgrade or migration. In this category I see people running consistency checks AFTER doing an upgrade but not before. From release to release, DBCC CHECKDB has improved - especially from 2000 to 2005, where system catalog consistency checks were added. If you run consistency checks after an upgrade and find the database is corrupt, what do you do? Run repair? Go back to the earlier version and run repair? Restore your backups? The odds are that if you've got corruption, it's also in your backups too. Also, you may think that the upgrade itself caused the corruption - I've never seen this. It's always been that the corruption was there before and only got discovered after the upgrade. Always run consistency checks BEFORE doing an upgrade - to make sure you've got a chance of putting things right before (potentially irrevocably) moving to the newer version.

Only after performing a restore, or after a failover. This is, of course, a good practice, but shouldn't be the only time a consistency check is performed - for most of the same reasons as I've explained above. What do you do if the database is found to be corrupt after restoring backups? You're looking at data loss now.

To summarize: make sure you're running regular consistency checks - with the regularity in-line with your comfort zone and your maintenance/high-availability strategies.

Next up - the next survey!

In this week's survey I'd like to know how often you run consistency checks on your *most critical* production database, regardless of *how* you run them (we did that survery already - see Importance of how you run consistency checks). I'll report on the results around July 4th.

I'd only like you to answer for your *most critical* production database, as the frequency will probably vary wildly by database, server, production vs. dev/QA and so on. If everyone answers for their most critical database then we won't get skewed results.

*Please* no comments on this post - wait for the survey results post to avoid skewing the answers. I'm very interested in your reasoning, but not until everyone else responds.

As always, a big Thanks! for contributing to the blog/community by responding. Please shoot me an email (Contact button, bottom left of the blog), or ping me on Twitter (@PaulRandal) if you have an idea for a good survey.

PS Thanks to Pat Wright for suggesting this week's topic on Twitter.

In last week's survey I asked how you manage the size of your database *data* files - see here for the survey. Here are the results as of 6/24/09.

The 'other' values are as follows:

  1. 5 x 'manual file growths and a custom mom alter to when the datafiles are 98% full. autogrow set to fixed amount in case we miss the mom e-mail'
  2. 1 x 'auto-grow with a procedure to keep the log file size at 20% relative to the total data file size'
  3. 1 x 'Create with very large initial file size, set auto-growth to %'
  4. 1 x 'Lots of white space in files. Auto-grow to a fixed size (in case of emergency).'
  5. 1 x 'set initial size for 1 yr usage, monitor size, manual grow, autogrow percentage - send alert if it does grow'
  6. 1 x 'set initial size for 2 year growth, capture growth stats daily, monitor physical disk space daily'
  7. 1 x 'Set to autogrow by fixed size to cater for emergencies, otherwise maintain 80-90% free space by daily reporting then manual off-peak size increase if necessary.'

As I mentioned in the survey itself, this is just about database *data* files. I covered log file size management in a previous survey - see Importance of proper transaction log size management.

There are really four parts to data file size management, which I'll discuss below.

The first thing I'll say is that if you're able to, enable instant file initialization on 2005+ - as it will vastly reduce the time required to create data files, grow data files, and restore backups (if the restore has to create data files). We're talking minutes/hours down to single-digit seconds. See Misconceptions around instant file initialization. If you're not one of the miniscule fraction of a percent of customers who have volumes shared between secure file servers and SQL Server instances, turn this on ASAP. Most DBAs don't know about this feature, but everyone I teach it to or show it to are amazed and then go turn it on. No brainer.

Initial data file sizing. This can be tricky. Without getting into the how-to-calculate-the-database-size quagmire, I'll simply say that you should provision as much space as you can, based upon your sizing estimates. Don't just provision for the here-and-now - if you're able to, provision space for the next year's worth of anticipated growth - to prevent auto-growth having to kick-in at all. I rewrote all the Books Online pages for 2005 (links are to the 2008 BOL) on Estimating the Size of Heaps, Clustered Indexes, and Nonclustered Indexes and in the blog post Tool for estimating the size of a database I link to a tool someone put together that codified all my formulas. You can also get sizing tools from hardware vendors too.

Data file growth. If you're able to, auto-grow should ALWAYS be turned on, as an emergency measure in case your monitoring fails - you don't want the database to have to grow but it's unable to and then it stops and the application is offline. However, you shouldn't *rely* on auto-grow - it's just for emergencies. The auto-growth default for data files used to be 10% for 2000 and before, but then changed to 1MB from 2005 onwards (log file default auto-growth remained at 10%). Neither of these are good choices. A percentage-based auto-growth means that as your files get bigger, so does the auto-growth, and potentially the time it takes if you don't have instant file initialization enabled. A 1MB autogrowth is just nonsensical. Your auto-growth should be set to a fixed size - but what that should be is a great big 'it depends'. You need to decide whether you want the auto-growth to be a quick stop-gap, or whether the auto-growth will replace manual growth after monitoring. You also need to consider how long the auto-growth will take, especially without instant file initialization. I can't give any guidance here as to what a good number is, but I'd probably settle on something around 10% (fixed), with the percentage steadily falling as the database size grows. It's very important that you have alerts setup to you can tell when auto-growth does occur, so you can then take any necessary action to grow it even more or tweak your settings.

'Other' response #2 is interesting. There's been a 'best-practice' around for a while that the log file should be sized to be a percentage of the data file size. It's totally unfounded and in most cases bears no relation to reality. The vast majority of the time, the size of the log is *NOT* dependent on the data file sizes in any way. Imagine a 10TB database - would you provision a 2TB log? Of course not. Now, I can see special cases where the operations performed on the tables in the database might affect a fixed portion of the largest table in a single batch, and that could generate enough log (with reserved space too) to equal 20% of the data file size - but that's a pretty nonsensical special case, to be honest. You shouldn't use 'set the log as a percentage of the data file' as a guideline.

Data file size/usage monitoring. There's a growing movement towards monitoring the data file usage and manually growing the files as they approach 100% full - avoiding auto-growth altogether, but still having it enabled for emergencies. In my book, this is the best way to go as you have all the control over what happens and more importantly, when it happens - especially without instant file initialization. There are some quirks here though. SCOM, for instance, has logic that disables file size and usage monitoring if you enable auto-grow. It assumes that if you enable auto-grow then you're not interested in monitoring. I happened to have one of the SCOM devs in my last maintenance class I taught on the Redmond MS campus and he's going to try to get that logic fixed.

Data file shrinking. Just this morning I wrote a long blog post about this - see Why you should not shrink your data files. Running data file shrink causes index fragmentation, uses lots of resources, and the vast majority of the time when people use it, is unnecessary and wasteful. It should NEVER be part of a regular maintenance plan - as you get into the shrink-grow-shrink-grow cycle which becomes a zero-sum game with a ton of transaction log being generated. Think about this - if you run a shrink, which is fully logged, then all the log has to be backed up, log-shipped, database mirrored, scanned by the replication log reader agent, and so on. And then the database will probably auto-grow again through normal operation, or some part of the maintenance job that rebuilds indexes. And then you shrink it again. And the cycle continues...

Bottom line - make sure you size the data files with some free space, have auto-growth set appropriately, have instant file initialization enabled if you can, monitor file sizes and usage, alert on auto-grows, and don't use shrink. And don't use shrink. Really.

Next up - this week's survey!

(Been a while longer than usual since blog posts - I've been really busy flying around the country doing stuff onsite with clients. Normal service will be resumed at the start of July :-)

One of my biggest hot-buttons is around shrinking data files. Although I used to own the shrink code while I was at Microsoft, I never had a chance to rewrite it so that data file shrink is a more palatable operation. I really don't like shrink.

Now, don't confuse shrinking the transaction log with shrinking data files. Shrinking the log is necessary if your log has grown out of control, or as part of a process to remove excessive VLF fragmentation (see Kimberly's excellent posts on this here and here). However, shrinking the log should be a rare operation and should not be part of any regular maintenance you perform.

Shrinking of data files should be performed even more rarely, if at all. Here's why - data file shrink causes *massive* index fragmentation. Let me demonstrate with a simple script you can run. The script below will create a data file, create a 10MB 'filler' table at the start of the data file, create a 10MB 'production' clustered index, drop the 'filler' table and then run a shrink to reclaim the space.

USE MASTER;
GO

IF DATABASEPROPERTYEX ('DBMaint2008', 'Version') > 0
 DROP DATABASE DBMaint2008;

CREATE DATABASE DBMaint2008;
GO
USE DBMaint2008;
GO

SET NOCOUNT ON;
GO

-- Create the 10MB filler table at the 'front' of the data file
CREATE TABLE FillerTable (c1 INT IDENTITY,  c2 CHAR (8000) DEFAULT 'filler');
GO

-- Fill up the filler table
INSERT INTO FillerTable DEFAULT VALUES;
GO 1280

-- Create the production table, which will be 'after' the filler table in the data file
CREATE TABLE ProdTable (c1 INT IDENTITY,  c2 CHAR (8000) DEFAULT 'production');
CREATE CLUSTERED INDEX prod_cl ON ProdTable (c1);
GO

INSERT INTO ProdTable DEFAULT VALUES;
GO 1280

-- check the fragmentation of the production table
SELECT [avg_fragmentation_in_percent] FROM sys.dm_db_index_physical_stats (
    DB_ID ('DBMaint2008'), OBJECT_ID ('ProdTable'), 1, NULL, 'LIMITED');
GO

-- drop the filler table, creating 10MB of free space at the 'front' of the data file
DROP TABLE FillerTable;
GO

-- shrink the database
DBCC SHRINKDATABASE (DBMaint2008);
GO

-- check the index fragmentation again
SELECT [avg_fragmentation_in_percent] FROM sys.dm_db_index_physical_stats (
    DB_ID ('DBMaint2008'), OBJECT_ID ('ProdTable'), 1, NULL, 'LIMITED');
GO

avg_fragmentation_in_percent
----------------------------
0.390625

DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages
------ ----------- ----------- ----------- ----------- --------------
6      1           1456        152         1448        1440
6      2           63          63          56          56

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

avg_fragmentation_in_percent
----------------------------
99.296875

Look at the output from the script! The logical fragmentation of the clustered index before the shrink is a near-perfect 0.4%. After the shrink, it's almost 100%. The shrink operation *completely* fragmented the index, removing any chance of efficient range scans on it by ensuring the all range-scan readahead I/Os will be single-page I/Os.

Why does this happen? A data file shrink operation works on a single file at a time, and uses the GAM bitmaps (see Inside The Storage Engine: GAM, SGAM, PFS and other allocation maps) to find the highest page allocated in the file. It then moves it as far towards the front of the file as it can, and so on, and so on. In the case above, it completely reversed the order of the clustered index, taking it from perfectly defragmented to perfectly fragmented.

The same code is used for DBCC SHRINKFILE, DBCC SHRINKDATABASE, and auto-shrink - they're equally as bad. As well as introducing index fragmentation, data file shrink also generates a lot of I/O, uses a lot of CPU, and generates *loads* of transaction log - as everything it does is fully logged.

Data file shrink should never be  part of regular maintenance, and you should NEVER, NEVER have auto-shrink enabled. I tried to have it removed from the product for SQL 2005 and SQL 2008 when I was in a position to do so - the only reason it's still there is for backwards compatibility. Don't fall into the trap of having a maintenance plan that rebuilds all indexes and then tries to reclaim the space required to rebuild the indexes by running a shrink - that's a zero-sum game where all you do is generate a log of transaction log for no actual gain in performance.

So what if you *do* need to run a shrink? For instance, if you've deleted a large proportion of a very large database and the database isn't likely to grow, or you need to empty a file before removing it?

The method I like to recommend is as follows:

  • Create a new filegroup
  • Move all affected tables and indexes into the new filegroup using the CREATE INDEX ... WITH (DROP_EXISTING) ON <filegroup> syntax, to move the tables and remove fragmentation from them at the same time
  • Drop the old filegroup that you were going to shrink anyway (or shrink it way down if its the primary filegroup)

Basically you need to provision some more space before you can shrink the old files, but it's a much cleaner mechanism.

If you absolutely have no choice and have to run a data file shrink operation, be aware that you're going to cause index fragmentation and you should take steps to remove it afterwards if it's going to cause performance problems. The only way to remove index fragmentation without causing data file growth again is to use DBCC INDEXDEFRAG or ALTER INDEX ... REORGANIZE. These commands only require a single 8KB page of extra space, instead of needing to build a whole new index in the case of an index rebuild operation.

Bottom line - try to avoid running data file shrink at all costs!

In this week's survey I'd like to know how you manage the sizes of your database *data* files (remember we've already done log file size management). I'll report on the results around 6/21/09.

I say this every week in the PS, but I'm moving it up here because I don't like having to delete comments and send email, *please* no comments on this post - wait for the survey results post to avoid skewing the answers. I'm very interested in your reasoning, but not until everyone else responds.

As always, a big Thanks! for contributing to the blog/community by responding. Please shoot me an email (Contact button, bottom left of the blog), or ping me on Twitter (@PaulRandal) if you have an idea for a good survey.

PS Thanks to Steve Jones (of SQL Server Central fame) for suggesting this week's topic on Twitter.

There are a couple of issues that I've heard of in the last few weeks (one while onsite at a customer) and I think they might bite some people so I'd like to share them with you.

DBCC CHECKDB in 2005 onwards uses a hidden database snapshot to create the transactionally-consistent point-in-time view of the database that it requires to run the consistency checks. The hidden database snapshot is created as a set of NTFS alternate streams on the existing database data files. The alternative to having DBCC CHECKDB do this automatically is to manually create your own database snapshot and run DBCC CHECKDB against that - it's the same thing really.

More info on DBCC CHECKDB's use of snapshots, and potential problems can be found at:

The two issues that I've heard of both are around an inability of DBCC CHECKDB to create the hidden snapshot. In that case it is forced to use locks to stabilize the database, which usually fails because the exclusive database lock required for running the allocation checks portion cannot be acquired.

The first issue is around the permissions of the SQL Server service account. To be able to create the NTFS alternate streams, the service account must have the privileges to create files in the DATA directory of the SQL Server instance. This is a really difficult problem to track down as the actual NTFS failure message is not surfaced by the snapshot creation code.

The second issue is around the use of HP PolyServe. Upgrading to Matrix Server 3.6.1 disables support for alternate streams in the filesystem, effectively breaking DBCC CHECKDB. Here's the paragraph from the 3.6.1 upgrade guide (available here):

In previous releases, MxDB for SQL Server provided ADS support internally for use with various SQL Server features such as the DBCC CHECKDB command. This internal support has been removed in HP PolyServe Software for Microsoft SQL Server. Instead, after all servers are upgraded to 3.6.1, you will need to enable ADS support on all filesystems previously used with MxDB for SQL Server. During the upgrade to 3.6.1, SQL Server operations requiring ADS will fail, as the new ADS support feature is not yet in place on the nodes running 3.6.1. For continuity of SQL Server operations, it is important to upgrade all nodes to 3.6.1 and upgrade filesystems for ADS as quickly as possible.

Enabling support after the upgrade means running the PolyServe psfscheck command (which I believe just runs the NTFS fsutil command under the covers), which unfortunately means taking the volume momentarily offline.

Hope this helps!

Chad Miller over on SQLServerCentral has put together a really great collection of scripts that allow you to easily monitor and report on what backups are happening on your system. These should be especially useful for involuntary DBAs, rather than having to dig into the backup history tables themselves. I'll be including a link to his post and downloadable scripts in all my DBA classes.

Check out Chad's article and scripts at Backup Monitoring and Reporting.

In last week's survey I asked you two things, as a precursor to a whitepaper I'm writing for Microsoft

The first question was what is your maximum allowable downtime SLA (either for 24x7 operation or not). See here for the survey. Here are the results as of 5/30/09.

 

The Other values were all about not having SLAs defined. And I think that's why this survey had a poor response rate - most of you out there don't have defined SLAs.

If you take a standard 365-day year, which means the year has 524160 minutes. Some example maximum allowable downtimes for 24x7 operations:

  • 99.999% (a.k.a 'five-nines') = slightly over 5 minutes downtime per year
  • 99.99% (a.k.a 'four-nines') = almost 52.5 minutes downtime per year
  • 99.9% (a.k.a 'three-nines') = almost 8.75 hours downtime per year
  • 99% (a.k.a 'two-nines') = just over 3.5 days of downtime per year
  • 98.5% = almost 5.5 days of downtime per year
  • 98% = just over a week of downtime per year
  • 95% = just over 2.5 weeks of downtime per year

You might be thinking - wow - whoever only has a target of 95% uptime must have a pretty crappy setup BUT it totally depends on the application and business requirements. 95% may be absolutely fine for some companies and utterly devastating to others (imagine the revenue loss if Amazon.com was down for more than 2 weeks...)

On the other end of the spectrum, you might be thinking - wow - 99.999% target is completely unattainable for the majority of businesses out there, BUT again it totally depends on the application and business requirements. This may be attainable for a simple application and database that doesn't generate much transaction log, or for a large, very busy OLTP database that generates lots of log but the company has lots of money to throw at the HA problem and can afford redundant clusters in separate data-centers with fat, dark-fiber links between them.

Defining SLAs, and not just for maximum allowable downtime, is incredibly important. When done properly, defining SLAs shows that the technical staff in a company and the business management in a company are in tune. It shows they have extensively analyzed the business requirements of an application and balanced them against the technical, space, power, HVAC, manpower, budgetary, and other limitations which may prevent a higher percentage being set as the target. It provides a meaningful input into the design and architecture of systems, and the choices of technologies (note the plural) required to achieve the target SLAs. It shows the company shareholders that the business managers understand the criticality of applications involved in running the business and that they are taking steps to safeguard the shareholders' equity interests.

Note I said 'when done properly'. Technical staff picking SLAs without business input, or vice-versa is a recipe for disaster. If the technical staff don't understand the business requirements, how can they pick appropriate SLAs? If the business managers don't understand the technical limitations, how can they pick achievable SLAs? Everyone has to be involved. And one of the most important things to consider is whether the SLA covers 24x7 operation, or just the time that the application has to be available, if not 24x7.

How to go about using the SLAs to pick appropriate technologies is one of the things I'll be going into in the whitepaper (due for publication around the end of September).

The second questions was what was your measured uptime (again, either for 24x7 operation or not) over the last year. See here for the survey. Here are the results as of 5/30/09.

 

The Other values for 24x7 were 2 x '96', and 1 x 'it varies across customers'. The Other value for non-24x7 was '92'.

Of course, this would be a little more meaningful if we could correlate target downtime with actual downtime achieved, but the free survey site doesn't reach that level of sophistication (and I don't think people have enough time to get into that much detail) while casually reading a blog post.

Nevertheless, the results are interesting, although not really statistically valid from this small a sample-size. The most interesting data point is that some respondents don't know what they achieved last year, or just didn't measure. I think this is the case for the majority of readers. Having well-defined and appropriate SLAs is the key to defining a workable strategy, and what's the point of defining the SLAs if you don't measure how well you did against them? If you don't meet the target you need to revisit the strategy, the SLAs, or maybe even both.

The fact that many people don't have SLAs speaks to the general poor state of high-availability and disaster-recovery planning in the industry, IMHO. And Kimberly just chimed in stating that it also shows how the PC-based server market doesn't focus anywhere near as much on regimented policies and procedures as the older, more traditional, mainframe market did/does.

There's a lot of work to be done. Many times this stuff just isn't high up in the priority list until a disaster actually happens. Invariably it then becomes very, very important. Get ahead of the curve and be proactive - one of the things we always tell our clients.

Next post - this week's survey!

This week's survey is inspired by this morning's Kimberly+Paul hot-tub conversation around data-dependent routing vs. network load balancing, which then turned to SLAs. Yes, we lead an exciting life Smile.

This survey is a *four*-parter. Part 1 is for each of your databases, which survey option is closest to your target maximum allowable downtime SLA (Service-Level Agreement, a.k.a. RTO - Recovery Time Objective)?

  • Survey 1 is if your SLA is based on 24x7 operation
  • Survey 2 is if your SLA is based on non-24x7 operation (e.g. allowing for a 6 hour daily maintenance window, the rest of the time we must be available 99.9%)

Part 2 is for each of your databases, what can you actually achieve?

  • Survey 3 is your achievement over the last year for 24x7 operation
  • Survey 4 is your achievement over the last year for non-24x7 operation

I'll report on the result sometime during the first week of June - as this is a big survey I'd like to get lots of results. Please repost/Tweet/whatever.

What do I mean by that downtime SLA? Well, however you have it defined. It could be that your SLA is an absolute value based on 24x7 operation, or it could be based on the 9-5 work-day. In all cases, what percentage of the defined time does the system have to be available? Note that I'm calling out a 5, 4, and 3 nines of 24x7 as separate answers as I'd like to see how often these occur and can be met.

As always, a big Thanks! for contributing to the blog by responding. Please shoot me an email (Contact button, bottom left of the blog), or ping me on Twitter (@PaulRandal) if you have an idea for a good survey.

PS No comments on this post please (I'll delete them straight away) - please wait for the survey results post to avoid skewing the answers.

About a year ago, I blogged a cool script that would work out how much of a database has changed since the last full backup - i.e. how big will the next differential backup be. You can find that script at New script: How much of the database has changed since the last full backup?. I'm in the middle of writing a script that will tell you how big your next log backup will be, and as part of it, I need a way to determine how many extents from minimally logged operations will be included in the next log backup. I thought it might be useful to someone so I tidied it up and here it is for you to play with. It's almost the same as the previous script I published.

The code below creates a function and a stored procedure. The basic idea behind the code is as follows:

For each online data file in the database
   For each GAM interval in the file
      Crack the ML map page using DBCC PAGE
      Interpret the ML bitmap to aggregate the changed extents
      Add the sum to the total changed extents for the database
   End
End
Report results

There's a function that I create in msdb call SQLskillsConvertToExtents that cracks some of the DBCC PAGE output, and the main procedure is called sp_SQLskillsMLChangedData and it created as a system object in master. I tried making it a table-valued function but you can't do things like INSERT-EXEC in a function, and that's required for processing the DBCC PAGE output. So - create your own wrapper function or whatever to use it. The interface/output is:

EXEC sp_SQLskillsMLChangedData 'testdb';
GO

Total Extents Changed Extents Percentage Changed
------------- --------------- ----------------------
355           153              43.1

Anyway - here it is. You can download it in a zip file from SQLskillsMLChangedData.zip (2.50KB). Enjoy!

/*============================================================================
   File: SQLskillsMLChangedData.sql

   Summary: This script creates a system-wide SP SQLskillsMLChangedData that
   works out what percentage of a database has been changed by minimally
   logged operations since the last log backup.

   Date: May 2009

   SQL Server Versions:
      10.0.2531.00 (SS2008 SP1)
      9.00.4035.00 (SS2005 SP3)
------------------------------------------------------------------------------
   Copyright (C) 2009 Paul S. Randal, SQLskills.com
   All rights reserved.

   For more scripts and sample code, check out
      http://www.sqlskills.com/

   You may alter this code for your own *non-commercial* purposes. You may
   republish altered code as long as you give due credit.

   THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF
   ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED
   TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
   PARTICULAR PURPOSE.
============================================================================*/

-- Create the function in MSDB
--
USE msdb;
GO

IF EXISTS (SELECT * FROM sys.objects WHERE NAME = 'SQLskillsConvertToExtents')
   DROP FUNCTION SQLskillsConvertToExtents;
GO

-- This function cracks the output from a DBCC PAGE dump
-- of an allocation bitmap. It takes a string in the form
-- "(1:8) - (1:16)" or "(1:8) -" and returns the number
-- of extents represented by the string. Both the examples
-- above equal 1 extent.
--
CREATE FUNCTION SQLskillsConvertToExtents (
   @extents VARCHAR (100))
RETURNS INTEGER
AS
BEGIN
   DECLARE @extentTotal INT;
   DECLARE @colon INT;
   DECLARE @firstExtent INT;
   DECLARE @secondExtent INT;
   SET @extentTotal = 0;
   SET @colon = CHARINDEX (':', @extents);

   -- Check for the single extent case
   --
   IF (CHARINDEX (':', @extents, @colon + 1) = 0)
      SET @extentTotal = 1;
   ELSE
      -- We're in the multi-extent case
      --
      BEGIN
      SET @firstExtent = CONVERT (INT,
         SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      SET @colon = CHARINDEX (':', @extents, @colon + 1);
      SET @secondExtent = CONVERT (INT,
         SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      SET @extentTotal = (@secondExtent - @firstExtent) / 8 + 1;
   END

RETURN @extentTotal;
END;
GO

USE master;
GO

IF OBJECT_ID ('sp_SQLskillsMLChangedData') IS NOT NULL
   DROP PROCEDURE sp_SQLskillsMLChangedData;
GO

-- This SP cracks all minamally-logged bitmap pages for all online
-- data files in a database. It creates a sum of changed extents
-- and reports it as follows (example small msdb):
--
-- EXEC sp_SQLskillsMLChangedData 'msdb';
-- GO
--
-- Total Extents Changed Extents Percentage Changed
-- ------------- --------------- ----------------------
-- 355           153             43.1
--
CREATE PROCEDURE sp_SQLskillsMLChangedData (
   @dbName VARCHAR (128))
AS
BEGIN
   SET NOCOUNT ON;

   -- Create the temp table
   --
   IF EXISTS (SELECT * FROM msdb.sys.objects WHERE NAME = 'SQLskillsDBCCPage')
      DROP TABLE msdb.dbo.SQLskillsDBCCPage;

   CREATE TABLE msdb.dbo.SQLskillsDBCCPage (
      [ParentObject] VARCHAR (100),
      [Object] VARCHAR (100),
      [Field] VARCHAR (100),
      [VALUE] VARCHAR (100));

   DECLARE @fileID INT;
   DECLARE @fileSizePages INT;
   DECLARE @extentID INT;
   DECLARE @pageID INT;
   DECLARE @MLTotal INT;
   DECLARE @sizeTotal INT;
   DECLARE @total INT;
   DECLARE @dbccPageString VARCHAR (200);

   SELECT @MLTotal = 0;
   SELECT @sizeTotal = 0;

   -- Setup a cursor for all online data files in the database
   --
   DECLARE files CURSOR FOR
      SELECT [file_id], [size] FROM master.sys.master_files
      WHERE [type_desc] = 'ROWS'
      AND [state_desc] = 'ONLINE'
      AND [database_id] = DB_ID (@dbName);

   OPEN files;

   FETCH NEXT FROM files INTO @fileID, @fileSizePages;

   WHILE @@FETCH_STATUS = 0
   BEGIN
      SELECT @extentID = 0;

      -- The size returned from master.sys.master_files is in
      -- pages - we need to convert to extents
      --
      SELECT @sizeTotal = @sizeTotal + @fileSizePages / 8;

      WHILE (@extentID < @fileSizePages)
      BEGIN
         -- There may be an issue with the ML map page position
         -- on the four extents where PFS pages and GAM pages live
         -- (at page IDs 516855552, 1033711104, 1550566656, 2067422208)
         -- but I think we'll be ok.
         -- PFS pages are every 8088 pages (page 1, 8088, 16176, etc)
         -- GAM extents are every 511232 pages
         --
         SELECT @pageID = @extentID + 7;

         -- Build the dynamic SQL
         --
         SELECT @dbccPageString = 'DBCC PAGE ('
            + @dbName + ', '
            + CAST (@fileID AS VARCHAR) + ', '
            + CAST (@pageID AS VARCHAR) + ', 3) WITH TABLERESULTS, NO_INFOMSGS';

         -- Empty out the temp table and insert into it again
         --
         DELETE FROM msdb.dbo.SQLskillsDBCCPage;
         INSERT INTO msdb.dbo.SQLskillsDBCCPage EXEC (@dbccPageString);

         -- Aggregate all the changed extents using the function
         --
         SELECT @total = SUM ([msdb].[dbo].[SQLskillsConvertToExtents] ([Field]))
         FROM msdb.dbo.SQLskillsDBCCPage
            WHERE [VALUE] = ' MIN_LOGGED'
            AND [ParentObject] LIKE 'ML_MAP%';

         SET @MLTotal = @MLTotal + @total;

         -- Move to the next GAM extent
         SET @extentID = @extentID + 511232;
      END

      FETCH NEXT FROM files INTO @fileID, @fileSizePages;
   END;

   -- Clean up
   --
   DROP TABLE msdb.dbo.SQLskillsDBCCPage;
   CLOSE files;
   DEALLOCATE files;

   -- Output the results
   --
   SELECT
      @sizeTotal AS [Total Extents],
      @MLTotal AS [Changed Extents],
      ROUND (
         (CONVERT (FLOAT, @MLTotal) /
         CONVERT (FLOAT, @sizeTotal)) * 100, 2) AS [Percentage Changed];
END;
GO

-- Mark the SP as a system object
--
EXEC sys.sp_MS_marksystemobject sp_SQLskillsMLChangedData;
GO

-- Test to make sure everything was setup correctly
--
EXEC sp_SQLskillsMLChangedData 'msdb';
GO

This week's survey is inspired from many stories I saw on the forums and Twitter this week - mostly bad, one good (someone I'm following is spending the weekend testing their disaster recovery plan - cool!). I'd like to know whether you're ever tested your disaster recovery plan, and if so, what happened? I'll report on the results sometime of the weekend of 5/24/09.

And what do I mean by disaster recovery? The definition varies from just restoring a backup to conducting a full failover of a datacenter - but I'd like you to decide for yourself. I'll editorialize more about this next week.

As always, a big Thanks! for contributing to the blog by responding. Please shoot me an email (Contact button, bottom left of the blog) if you have an idea for a good survey.

PS No comments on this post please (I'll delete them straight away) - please wait for the survey results post to avoid skewing the answers.

PPS Probably a lot less blogging/Twitter (@PaulRandal) this coming week - I'm teaching all 5 days.

Quickie this morning to start the day off. I saw a question on a forum: if I *have* to use a GUID and *must* have a primary key, should I make the primary key clustered or nonclustered?

Now, I'm not getting into the whole GUID vs. bigint identifier, or random GUID vs. GUID generated by NEWSEQUENTIALID(), so please don't comment on those issues, they're not relevant here. I just want to address the question - what kind of index should it be?

From a Storage Engine perspective, my answer is nonclustered. Here are three reasons why:

  • If the index is clustered, then the cluster key is immediately at least 16 bytes (the size of a GUID). This doesn't change the size of the clustered index records (as the GUID column has to be stored in the table anyway, and a clustered index IS the table), but it does change the size of the nonclustered indexes. All nonclustered indexes on the table must include the cluster keys, even of they are not explicitly part of the nonclustered index keys (I'll do a post on this later). This means the GUID is present in every nonclustered index record too. From this perspective, it would be better to use a smaller clustered index key and have the GUID primary ley be nonclustered so it's only present in that one nonclustered index.
  • Random GUIDs used as the high-order key cause index fragmentation. Their random nature means the insertion point into the index is also random. This causes page splits, which cause fragmentation and are *expensive*. (I touched on this a bit a few days ago in my post How expensive are page splits in terms of transaction log?.). With a random key value, it's hard to avoid page splits and fragmentation, although you can delay them somewhat using FILLFACTOR, but at the expense of using extra space. By making the GUID index nonclustered, you can delay page splits even further. The clustered index is the table, so the records are (usually always) larger than nonclustered index records. This means you can get fewer clustered index records on an 8KB page than nonclustered index records. With fewer records per page, you can do fewer random insertions on the page before a page split occurs. So using a nonclustered index for the GUID key means you can do fewer expensive page splits.
  • Given that whatever kind of index you create for the GUID key is going to experience index fragmentation, you're probably going to want to periodically remove the fragmentation as part of your database maintenance plan. It makes sense to try to limit the amount of resources used by the fragmentation removal operation (e.g. cpu, IO, disk space, transaction log space), and so the smaller the fragmented index, the better. A nonclustered index for the GUID key will be smaller than a clustered index, so if you choose a non-fragmentation-causing clustered index key, and confine the fragmentation to the nonclustered index, you can use fewer resources during database maintenance.

And there you have it. I'm sure some of you have seen pathological cases that disprove one of the above points, but my arguments are generalizations. Maybe this is a can of worms I've opened, in which case I look forward to the comments!

PS Brent did a great post about humor when blogging, the cartoon links he includes are great. Check it out here.

This is a true story, and unfolded over the last few days. It's deliberately written this way, I'm not trying to be patronizing - just illustrating the mistakes people can make if they don't know what not to do.

Once upon a time (well, a few days ago), there was a security person who had access to a 2000 SQL Server instance. He saw that the server had run out of disk space, but had no SQL server knowledge. He decided that the best thing to do was to delete the log file, instead of contact the real DBA to take corrective action. This person should not have had any access to SQL Server, or the ability to delete the log. Nevertheless, the log was deleted. And then the database became suspect. Eventually, with no other option, someone decided to create a new transaction log file using DBCC REBUILD_LOG. This was the right thing to do. The database wasn't being backed up so restore wasn't an option. Some of the details of who did what are unclear, as it so often the case. Needless to say, the real DBA was kept in the dark.

Unfortunately, whoever rebuilt the log didn't run DBCC CHECKDB afterwards to find out what corruption had been caused in the database by having the transaction log unceremoniously ripped out from under the database's feet. Rebuilding a transaction log is one of the worst things that can happen to a database, as all active transactions are thrown away, with no chance of rolling back. This leaves the database in a transactionally inconsistent, and potentially structurally inconsistent state (see Corruption: Last resorts that people try first... filed under my Bad Advice category, and Q4 from TechNet Magazine: February 2009 SQL Q&A column: Is it ever safe to rebuild a transaction log?).

Instead they carried on running. The real DBA thought everything was cool and even upgraded the instance from SQL 2000 to SQL 2005. To be honest, after seeing the 19000+ corruptions they had in the database, I'm surprised that the database upgraded successfully. But it did. Then he tried to rebuild some indexes, which failed with corruption errors. Which brought us to this morning, when DBCC CHECKDB was run. It reported a lot of corruption, but running from SSMS only showed the first 1000 errors (which was the inspiration for one of my previous posts today How to get all the corruption messages from CHECKDB. So they started to run REPAIR_ALLOW_DATA_LOSS, which will cause data loss - we chose the name of the repair option well. I got involved at this point, and saw from the list of errors that massive data loss would be the result.

Once they had re-run DBCC CHECKDB though the command line (luckily only 20 minutes) I could see all the errors. And I must say I was astounded. It was the *worst* case of multiply-allocated extents (see Inside the Storage Engine: Anatomy of an extent) that I've ever seen. Not only were the extents allocated by multiple IAM pages (see Inside the Storage Engine: IAM pages, IAM chains, and allocation units) - in other words, two objects thought they had the same pages in the database allocated, but the two objects had actually each proceeded to overwite the other's updates in the pages - getting the two clustered indexes hopelessly interlinked. Lots of errors (1000s) like:

Msg 8904, Level 16, State 1, Server NAMESCHANGEDTOPROTECTTHEINNOCENT, Line 1
Extent (1:9528) in database ID 5 is allocated by more than one
allocation object.
...
Msg 8978, Level 16, State 1, Server NAMESCHANGEDTOPROTECTTHEINNOCENT, Line 1
Table error: Object ID 1445580188, index ID 1, partition ID
376212519911424, alloc unit ID 94737543200768 (type In-row data). Page
(1:52696) is missing a reference from previous page (1:427112).
Possible chain linkage problem.
...
Msg 8977, Level 16, State 1, Server NAMESCHANGEDTOPROTECTTHEINNOCENT, Line 1
Table error: Object ID 1445580188, index ID 1, partition ID
376212519911424, alloc unit ID 94737543200768 (type In-row data).
Parent node for page (1:143210) was not encountered.
...
Msg 2533, Level 16, State 1, Server NAMESCHANGEDTOPROTECTTHEINNOCENT, Line 1
Table error: page (1:405139) allocated to object ID 1445580188, index
ID 1, partition ID 376212519911424, alloc unit ID 94737543200768 (type
In-row data) was not seen. The page may be invalid or may have an
incorrect alloc unit ID in its header.
...
CHECKDB found 1653 allocation errors and 17646 consistency errors in
database 'NAMESCHANGEDTOPROTECTTHEINNOCENT'.

The repair for all of this? Delete it all and fix up the links. MASSIVE data loss.

Lessons to learn from this:

  • Don't give people with no clue about SQL Server access to SQL Server. 
  • Don't delete a transaction log to reclaim space. Cardinal sin.
  • Don't rebuild a transaction log UNLESS you run a full DBCC CHECKDB afterwards and satify yourself that you know the extent of the damage.
  • Don't upgrade a database without running DBCC CHECKDB first. Best case - the upgrade fails. Worst case - it upgrades and then you might not be able to fix the corruptions.
  • Don't just run repair without doing a little analysis on the errors reported, especially with a *huge* number of errors reported.

Finally, I'm really grateful to the DBA in question for letting me help him out with this, and to post this blog post - we all learn from our own and others' mistakes.

PS And I got involved in this from Twitter - I just *love* it. After initially being skeptical of how much time I'd spend on it, I'm finding the benefits of connecting to the SQL community in 'real-time' vastly outweigh the time I'm putting into it. Follow me on http://twitter.com/paulrandal and you'll see a bunch of other SQL MVPs on there too.

Last week's survey was two-fold - what's the largest SQL Server database you manage, and how many SQL Server databases are you responsible for managing. Here are the results as of 5/10/2009.

 

As far as the database sizes are concerned, the distribution curve isn't surprising, but the average size of databases *is*. 70% of respondents have a database over 100GB, with half of those being 500GB or more. One lucky (or unlucky, depending on how you look at it!) person gets to play with a 20+TB database. There some truly huge SQL Server databases out there - for instance the 1.1 petabyte astronomical database described here, which adds 1.4TB every night. Back in December 2003, the largest SQL Server database was a 5TB Verizon database. Now multi-TB databases seem quite common when talking to large customers.

The distribution curve for the number of databases also seems unsurprising, except for the uptick at the 150+ databases mark. Fully 1/4 of respondents are responsible for more than 200 databases! Even if the majority of the databases are small, that's still a huge amount of management to consider and context to have to know about.

So what was the point of this survey? Well, I've heard from quite a few people this year that some of their DBA colleagues have been let go and they've had to pick up more responsibility, whether they want to or not. Just last week, Microsoft laid off 3000 more people worldwide, including some in their various IT departments. In the DBA class we were teaching last week, it was obvious that some of the attendees were affected and had a lot more on their plates.

With the average size of databases growing, and the number of databases any one person is responsible for also growing, it's imperative that the overall environment is as easily managed as possible. What does this mean? Here's a 2-minute list of stuff off the top of my head that I think helps make an environment more manageable:

  • Setting up each database so that the important parts can be recovered as quickly and easily as possible
  • Setting up each database so that there are no performance bottlenecks (these two were both kind of addressed by my previous survey Physical database layout vs. database size)
  • Making sure each database has the right backup strategy (from last week's survey Importance of having the right backups)
  • Making sure that each database has the correct monitoring set up. For instance:
    • SQL Agent alerts on the instance for high-severity errors (see Easy monitoring of high-severity errors: create Agent alerts)
    • Monitoring data and log file sizes to avoid auto-growths
    • Monitoring index fragmentation levels
    • Pro-active monitoring for things like high disk-queue lengths or runaway queries
    • Regular consistency checks to find corruption
  • Security is setup appropriately (see TechNet Magazine: feature article on Common SQL Server Security Issues and Solutions)
  • Maintenance jobs are automated as much possible (e.g. backups, consistency checks, index maintenance, and statistics maintenance). There are a bunch of scripts people have published to help with this: Tara Kizer has a lot on her blog, Ola Hallengren has a good one here, as do many others (no offense to anyone else with scripts, these are just the two that spring to mind - by all means reply with a comment to let me know of your popular script)
  • Database settings adhere to common best-practices (e.g. auto-shrink off, auto-grow on (and managed!), auto-create and auto-update statistics on, page checksums on)

And I'm barely touching the tip of the iceberg here. Most of these are just database maintenance best practices - with the idea that the healthier a database is, the less likely something will go wrong that demands DBA attention. And in an environment where potentially hundreds of databases are under the control of a single person, avoiding problems is of paramount importance.

Ok - cup of coffee later and I have more stuff to add to the list:

  • Consistency of settings and scripts across database and instances, so what you're familiar with in one database context translates to all others
  • Environment-wide run-book that covers all aspects of day-to-day operations, so all DBAs do the same thing and a DBA can help out or pick up databases easily
  • Environment-wide disaster recovery guide - written by the most senior DBA and tested by all DBAs down to the most junior
  • T-SQL source code control
  • Management of access to databases
    • Physical access to servers is limited
    • Network access to servers is limited
    • SA access is limited
    • Developers can't deploy new code without going through QA first
    • And so on
  • High-availability technologies in place to aid recoverability, with plenty of practice of failing over applications
  • Use of tools to increase productivity. Most major 3rd-party tool vendors have tools that can help in a myriad of DBA activities - I'm not going to name any in particular as I don't want to favor any over any of the others.

Talking about tools, SQL Server 2008 itself has some stuff built in that can increase productivity. Kimberly did a great post last November on Central Management Servers - see SQL Server 2008 Central Management Servers - have you seen these?. There's also Policy-Based Management, which is a good first step along the road to centralized policy setting, testing, and enforcement (although it has some nits that need to be fixed in v2), and Performance Data Collection which can be used to easily collect perf data (same kind of stuff that Activity Monitor has) for multiple instances and stored in a single location - Kimberly will be blogging more about these two features in the future.

Overall, having a haphazard way of setting up and maintaining databases and instances is going to make it really hard to scale up size and number of databases in an environment without investing heavily in more people. Taking a more disciplined, planned approach, learning from others - using others' scripts and practices, and developing streamlined processes will lead to increased productivity, decreased stress, and happier, stronger DBAs. No-one wants to be in constant fire-fighting mode.

This has been a bit of a ramble, but that's the point of these editorial-style posts. I'm sure you know a lot of this, and I'm sure I've missed some obvious points, so feel free to comment with more tips, tricks, and links!

Next up - this week's survey!

This has come up a few times now, most recently in an email question this morning - subsequent runs of DBCC CHECKDB show varying numbers of corruptions, and sometimes no corruptions - what's going on? Even more strange - a maintenance job runs a DBCC CHECKDB, which shows errors, but then in the morning - no consistency errors. What?

I answered this back in the April 2009 SQL Q&A column in TechNet Magazine, but I want to get it here on the blog too in a bit more detail. The answer has to do with the way the database is consistency checked, and how corruptions are detected.

In 2005 onwards, you're going to be using page checksums to help detect corruption. If you created the database on 2005 onwards, page checksums are enabled by default and every allocated page will have one. If you upgraded a database from 2000 or before, then you need to manually enable page checksums with ALTER DATABASE. The nothing happens. Until a page is read in, changed, and then written back out. So your upgraded database will have a mixture of nothing/page checksums, or torn-page detection/page checksums. Note: torn-page protected pages remain torn-page protected, even with page checksums enabled, until the next time they're altered. Then they get a page checksum. See Inside The Storage Engine: Does turning on page checksums discard any torn-page protection? for an explanation and examples.

Once you've got page checksums enabled, who can you tell if there are corruptions in the database? Well, there are a number of ways corruptions will show up:

  1. You run an operation that hits a page that has been corrupted, and the page checksum test fails
  2. You run a BACKUP ... WITH CHECKSUM and it finds a page with a bad checksum
  3. You run a DBCC CHECKDB and it finds a page with a bad checksum

That's all very well, but what if a page *doesn't* have a page checksum on it (because it hasn't been changed since page checksums were enabled)? None of #1 to #3 will fail because of a bad page checksum, as there isn't a page checksum to check. #1 might fail, depending on how corrupt the page is, and it will likely fail with an obscure message that doesn't immediately scream 'corruption'. #2 won't fail, as the only time BACKUP examines what it's backing up is when WITH CHECKSUM is enabled and a page has a page checksum on it. #3 might find the corruption, depending on how the page is corrupt. If the corruption is in the middle of a large varchar field, for instance, probably not. Your best bet is to have page checksums enabled and regularly run DBCC CHECKDB.

That's how corruptions are detected. So what about the disappearing corruptions? This gets into how consistency checks work. Consistency checks only run on the pages in the database that are allocated. If a page isn't allocated to anything, then the 8192 bytes of it are meaningless and can't be interpreted. Don't get confused between reserved and allocated - I explain that in the first misconceptions post here. As long as a page is allocated, it will be consistency checked by DBCC CHECKDB, including testing the page checksum, if it exists. A corruption can seem to 'disappear' if a corrupt page is allocated at the time a DBCC CHECKDB runs, but is then deallocated by the time the next DBCC CHECKDB runs. The first time it will be reported as corrupt, but the second time it's not allocated, so it isn't consistency checked and won't be reported as corrupt. The corruption looks like it's mysteriously vanished. But it hasn't - it's just that the corrupt page is no longer allocated. There's nothing stopping SQL Server deallocating a corrupt page - in fact, that's what many of the DBCC CHECKDB repairs do - deallocate what's broken, and fix up all the links.

The maintenance job phenomenon can occur because of the order of operations in the job. If the DBCC CHECKDB is first, and then there's an index rebuild, and the index rebuild happens to rebuild an index that DBCC CHECKDB had found a corruption in, then the *new* index will have a completely different set of database pages, and won't contain the corrupt page. Bingo - disappearing corruption. A subsequent DBCC CHECKDB might not find any corruption, because the previously corrupt pages are no longer allocated.

Bottom line - any time you get corruption error messages, 99.999% of the time it's your I/O subsystem that's got problems, even if the corruptions 'disappear'.

PS Don't forget to follow along on Twitter - http://twitter.com/PaulRandal

Recently there's been a spate of people noticing strange behavior from active transactions when DBCC CHECKDB (or any of the other DBCC consistency checking commands run).

For example, I've create a database call DbccTest with a single table. In one connection I do:

BEGIN TRAN
INSERT INTO t1 VALUES (1, 1);
GO

And in another connection I do:

DBCC CHECKDB (DbccTest) WITH ALL_ERRORMSGS, NO_INFOMSGS;
GO
 

Look what gets printed in the error log:

2009-05-04 16:03:21.55 spid54      1 transactions rolled back in database 'DbccTest' (14). This is an informational message only. No user action is required.
2009-05-04 16:03:21.91 spid54      DBCC CHECKDB (DbccTest) WITH all_errormsgs, no_infomsgs executed by ROADRUNNERPR\paul found 0 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 0 seconds.

I've highlighted the weird part in bold - it looks like crash recovery ran on the DbccTest database. What's going on?

Well, crash recovery *DID* run on the DbccTest database - only it ran crash recovery into a hidden database snapshot. In the first part of the very long (well, 13 pages - not as long as the 70 page description in the 2008 internals book) post on how DBCC CHECKDB works (see CHECKDB From Every Angle: Complete description of all CHECKDB stages), I explain why DBCC CHECKDB needs a transactionally-consistent, point-in-time view of the database to run consistency checks on. In 2005 I changed the code to use a database snapshot - which by it's very nature provides a point-in-time, transactionally-consistent view of the database. When a database snapshot is created on a database, the transaction log of that database is examined, and crash-recovery is run on it, but *into the database snapshot* - the source database is totally unaffected. The message in the error log is from DBCC CHECKDB's hidden database snapshot starting up - but it's pretty misleading, I must admit.

One more twist to this behavior is that if there is a very long-running transaction in the database being consistency-checked, the database snapshot creation could take hours (because it has to wait for the long-running transaction to rollback, in the snapshot). If you think that the consistency check has hung, and try to kill it, you won't be able to and the SPID will appear as if it's in rollback. This is confusing because DBCC CHECKB doesn't do anything - so shouldn't have anything to rollback - but it's the database snapshot creation that's the problem. Once crash recovery has started on the database snapshot, it can't be interrupted - so you have to wait for it to finish (and the database snapshot to be created) before the DBCC command will actually stop, and remove the database snapshot. It's a weird side-effect, but a necessary one unfortunately. The recovery code could be changed to check for attention signals every so often I suppose, but I'm not holding-my-breath for that change.

Hope this helps.

Jason Massie posted an interesting statistic yesterday - Facebook has 1.5 petabytes of image storage, and it grows by 25TB daily - I wonder how they store and manage it?

In this week's survey, I'm interested in two things: what's the largest single SQL Server database in your company and how many SQL Server databases are you responsible for managing? I'll report on the results sometimes over the weekend of 5/9/09.

A note on the second survey: if you manage a team of DBAs, but don't manage databases directly yourself, answer for your individual DBAs, rather than the team as a whole.

Thanks!

PS No comments on this post please (I'll delete them straight away) - please wait for the survey results post to avoid skewing the answers.

Last week's survey was on what kind of backups you take, along with the recovery model used (see here for the survey). Here are the results as of 5/2/2009.

 

The 'other' responses were combinations of the other answers.

This survey is a bit of a pre-cursor to my article on Understanding SQL Server Backups that will be in the July TechNet Magazine (available start of June). In that article I explain how the three major backup types work, and then how to combine them into an effective backup strategy, so I don't want to steal all my own thunder.  In the spirit of my weekly surveys, this is a 20-minute, brain-dump editorial, rather than a very carefully planned out article.

When I'm teaching classes on Database Maintenance or High-Availability, I always tell people not to plan a backup strategy; plan a restore strategy. You don't want to come up against a disaster recovery situation and find that the only backup you have of your multi TB 24x7 busy OLTP database is a full from several weeks ago. Kiss your job goodbye on that one. You have to make sure that you have backups that allow you to recover as quickly as possible and with the minimum of data loss.

So, you're probably taking backups so that you can restore when disaster strikes. Some of you may also be taking backups because:

  1. You periodically restore the database onto a reporting server, onto a data warehouse server, or onto test/dev servers
  2. You've got log shipping implemented (backup, copy, restore; backup, copy, restore;...)
  3. You've got database mirroring implemented (and so you must use the FULL recovery model, and thus must take log backups to manage the transaction log)
  4. You're in the FULL recovery model, even though you're not interested in up-to-the-minute or point-in-time recovery, and must take backups to manage the size of the transaction log

The survey included what recovery model you're running in too - basically SIMPLE or FULL. I don't know anyone that runs all the time in the BULK_LOGGED recovery model; and most 24x7 systems cannot switch into BULK_LOGGED because of the possibility of crashing and not being able to take a tail-of-the-log backup, if a minimally-logged operation has occurred since the last log backup.

In the SIMPLE recovery model, you cannot take log backups - so you're basically saying that you don't need to be able to do up-to-the-minute recovery, point-in-time recovery, or single-page restores. That's cool - it totally depends on your situation, and is you're running in SIMPLE it means you understand that there's no point running in FULL and having to take log backups if you're not interested in using them. If you're in the group covered by #4 above, switch to SIMPLE!

In the FULL recovery model, you have to take log backups - plain and simple - otherwise the log will grow out of control and eventually your database will grind to a halt when the log runs out of space. I've done a few blog posts about that so I won't labor the point (see Search Engine Q&A #1: Running out of transaction log space, Search Engine Q&A #23: My transaction log is full - now what?, and Importance of proper transaction log size management).

In either recovery model, there's the question of just how often should the various kinds of (I like to think of) 'mandatory' backups be taken, and whether to use the 'optional' differential backups.  Again, not stealing my own thunder from TechNet Magazine: a full backup is a copy of all the data in the database; a log backup is all the transaction log generated since the last log backup (or first full/differential following a break in the log backup chain); a differential backup is all data that has changed since the last *full* backup.

You need to take regular full backups - but just how regular depends. If the database is very large, and you need to keep your backups around for regulatory purposes, you might choose to take a full backup every couple of weeks or a month, with compression. Commonly I see people taking a full backup once a week.

Almost 20% of respondents are in SIMPLE and only take full backups - I wonder how many realize that a full backup only gives you a single point-in-time to which you can recover - and you lose all work since the last full backup. In the SIMPLE recovery model, you can't take log backups, so if you want to reduce the amount of data loss when a disaster occurs, you'll need to take differentials too, which only a handful of respondents do. Although this still means you'll have some data loss, it's vastly reduced. The amount of potential data loss will be the amount of work since your last (e.g. daily) differential backup, rather than since your last (e.g. weekly) full backup.

In the FULL recovery model, only a few % of respondents are NOT taking log backups - which means they shouldn't be in the FULL recovery model, or are being forced to (e.g. from using mirroring) and don't realize that having a redundant copy of the database isn't a good enough HA strategy - you have to have backups too in case of a secondary failure.

The vast majority of respondents use the FULL recovery model taking full and log backups (45% - which is what I expected) and about 25% are also taking differentials too. This is a more advanced strategy and can seriously REDUCE your downtime in the event of a disaster. A differential backup basically short-circuits the need to restore all the log backups that were taken in the time between the last full backup and that differential backup. You'll find out more on this in the article. For the ultimate in flexibility and fast recovery, this is the way to go - but at the cost of a little more complicated backup strategy, and extra storage space for the differential backups.

Now, what about the exotic answers at the end of the survey?

  • OS-level backups: this isn't a popular solution *at all* because of the complexity of getting the SQL data back out from the OS-level backup and recovered. I'm not surprised that no-one selected this.
  • SAN-level backups: I've seen a few customers do this, with mixed results. You must make sure that the SAN admin knows what they're doing - I heard of one customer who had data and log files on different LUNs. The SAN admin snapped the data LUN, and *then* the log LUN. Every so often, corruption would occur...
  • Shutdown SQL and copy the files: Just don't even think about this. Taking downtime to take a backup is daft, and what if the database is corrupt and won't re-attach?
  • Don't take backups: No need to discuss this one.

Ok - that's a quick blast that should give you some idea of why you'd want to make sure you've got the *right* backups, not just any old backup strategy.

Next post - this week's survey!

(PS I'm really enjoying being on Twitter - lots on interesting stuff. See me at http://twitter.com/PaulRandal)

The June edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Combining index and statistics maintenance
  • How some Enterprise-only features can prevent a database restoring on lower editions
  • Why database mirroring can switch between SYNCHRONIZED and SYNCHRONIZING
  • Use a database mirror for reporting

Check it out at http://technet.microsoft.com/en-us/magazine/dd797578.aspx.

In this week's survey I'm interested in what kind of backups you take, and also what recovery model you mostly use. If you have multiple database with different strategies, by all means respond multiple times. The more responses the better! I'll report on the survey results some time over the weekend of 5/2/2009.

A couple of notes on the survey:

  • If you're using BULK_LOGGED at all, just answer as FULL.
  • When I say full backup, I mean either full database, full filegroup, or full file backup. Same for differential.
  • I don't care whether you use a tool outside SQL Server to take your SQL backups (e.g. LiteSpeed) - it's the type of backup that's interesting.

Thanks!

PS No comments on this post please (I'll delete them straight away) - please wait for the survey results post to avoid skewing the answers.

A couple of weeks ago I kicked off the latest survey, on what the physical layout of your databases are and why you have them that way (see here for the survey). I let the survey run for a while to get a good sampling, and I wasn't disappointed, with over 1000 responses! Here are the results as of 4/27/2009.

Just like any other 'best practice' kind of topic, the question of how to design the physical layout of a database provokes a lot of (sometimes heated) discussion. There are lots of options and there are even more factors to consider - so the best answer is my perennial favorite "it depends"! In this post, I don't want to tell you how I think you should layout your database - instead I want to discuss some of the options and let you make up your own mind, with the added benefit of data on what your peers are doing with their databases. The main point of this survey was to see what people are doing, rather than as a driver for an editorial blog post.

As you can clearly see from the results above, and predictably, the distribution of layout types shifts as the database size increases - but I was very surprised by the number of single file databases over 10GB. Rather than go through each option in the survey, I'm going to talk a bit about some of the things to consider when planning a layout.

Underlying I/O subsystem

This could be the most important factor to consider. If you only have a single physical drive, for instance, there's arguably not much point creating multiple data files, as that will force the disk heads to bounce back and forth to the different file locations on the disk. On the other hand, if you have a SAN with several thousand drives grouped together into multiple LUNs, your possibilities are a lot wider (and maybe much harder to come up with the optimal layout). Several people asked if I'd go into depth around having multiple controllers, and different drive layouts in a SAN - and my answer is no. I'm not an expert at storage design, which, like indexing, is both an art and a science. There's a good whitepaper that discusses some of this: Physical Database Storage Design, which I helped review back in my MS days.

Performance, recoverability, manageability

Having multiple files with separate storage for each allows reads and writes to be parallelized for increased performance, lowering the amount of disk head contention. When a checkpoint occurs, and pages are written to disk, spreading the I/O load over multiples files can speed up the checkpoint and reduce the IOPS spikes that you may see. It can also lead to reduced contention for the various allocation bitmaps - in the same way as I've described for tempdb. In user databases with a very high rate of allocations, contention can arise on the GAM pages - but it's not common. Some people also advocate having separate filegroups for tables and indexes, and although this can sometimes be more trouble than it's worth, and often turns into a religious debate, I have heard of people getting a perf boost from this.

One of the most convincing reasons (I find) for having multiple filegroups is the ability to do much more targeted recovery. With a single file database, if it gets corrupted or lost, you have to restore the whole database, no matter how large the file is - and this can seriously affect your ability to recover within the RTO (Recovery Time Objective) agreement. By splitting the database into multiple filegroups, you can make use of partial database availability and online piecemeal restores (in Enterprise Edition) to allow the database to be online as soon as the primary filegroup is online, and then restore the remaining filegroups in priority order - bringing the application online as soon as the relevant filegroups are online. You can even use this layout to spread your backup workload - moving to filegroup-based backups instead of database backups, although this isn't very common.

As far as manageability is concerned, there are a few reasons to have multiple filegroups. Firstly, you can isolate a table that requires a lot of I/O (e.g. in terms of index maintenance) on separate storage from other tables, so that maintenance operations (and the I/O overhead of doing them) doesn't interfere with the I/O of the other tables. Also, you can provision different kinds of storage for different tables - in terms of disk speed and RAID level (redundancy), for instance. If you want to be able to move data around, you can do it much more easily if the database is split up, than if it's a single file.

Summary 

Ok - so I lied. I *am* going to offer advice - against one of the options: single filegroup, single file. For smaller databases, this is fine - but as the database size gets larger, say, over tens of GB, then having a single file can become a serious liability. With a single file database (or even a single filegroup database), you lose most of the benefits mentioned above.

Bottom line - as your databases get larger, you're going to need to think more carefully about their layout, otherwise you could run into big problems as your workload increases or when disaster strikes. As the survey results show, this is what your peers are doing.

Next post - this week's survey!

There's been a recent flurry of confusion and misconceptions about trace flag 1118 in SQL Server 2008. This trace flag switches allocations in tempdb from single-page at a time for the first 8 pages, to immediately allocate an extent (8 pages). It's used to help alleviate allocation bitmap contention in tempdb under a heavy load of small temp table creation and deletion.

There are multiple points of confusion, which I'll address in turn. Then I'll prove that the trace flag still works in SQL Server 2008.

1) Why is the trace flag usually required in 2000? In SQL 2000, whenever a temp table is created in tempdb and a row inserted, an IAM page must be allocated and a single data page must be allocated. These two pages are both 'single-page' allocations, from a mixed extent (see Inside The Storage Engine: Anatomy of an extent for more info). This means that an SGAM allocation bitmap page must be accessed, and a PFS page must be accessed (see Inside The Storage Engine: GAM, SGAM, PFS and other allocation maps for more info).

With lots of very small temp tables being created, this means the very first SGAM page and the very first PFS page in the data file are accessed/changed by all the threads, leading to latch contention problems on these two pages. When the temp tables are deleted again, the various pages are deallocated, which again needs to access and change the PFS page, and potentially the SGAM page.

There are two ways to alleviate this problem. Firstly, create multiple data files in tempdb - which splits the latch contention over multiple allocation bitmaps (from having allocations come from multiple files) and thus reduces the contention. The general rule of thumb was one tempdb data file for each processor core. Secondly, turn on TF1118, which makes the first 8 data pages in the temp table come from a dedicated extent. This means one extent is allocated from the GAM page, rather than 8 single pages (and potentially 8 accesses to the SGAM page). The pages within the extent are reserved and allocated singly from this extent, as needed. This also cuts down on contention and is documented in KB 328551.

2) What does reserved vs. allocated mean? When an extent is allocated to a table, the 8 pages in the extent are not immediately allocated as well. Allocating an extent means those 8 pages are reserved exclusively for subsequent allocation to that table. The pages are allocated individually as needed, but no other table can allocate them. This is why such extents are called 'dedicated' extents (see my blog post link above for more details). You can see the counters of reserved pages vs. allocated pages in the output from sp_spaceused.

3) Why is the trace flag not required so much in 2005 and 2008? In SQL Server 2005, my team changed the allocation system for tempdb to reduce the possibility of contention. There is now a cache of temp tables. When a new temp table is created on a cold system (just after startup) it uses the same mechanism as for SQL 2000. When it is dropped though, instead of all the pages being deallocated completely, one IAM page and one data page are left allocated, and the temp table is put into a special cache. Subsequent temp table creations will look in the cache to see if they can just grab a pre-created temp table 'off the shelf'. If so, this avoids accessing the allocation bitmaps completely. The temp table cache isn't huge (I think it's 32 tables), but this can still lead to a *big* drop in latch contention in tempdb.

4) Does the trace flag still exist in 2005 and 2008? Yes it does - KB 328551 clearly states:

Note Trace flag -T1118 is also available and supported in Microsoft SQL Server 2005 and SQL Server 2008. However, if you are running SQL Server 2005 or SQL Server 2008, you do not have to apply any hotfix.

Just to make extra-sure (as I'm always paranoid about saying absolutes), I checked with my good friend Ryan Stonecipher, who's the dev lead for the team that owns allocation (and a bunch of other stuff, including DBCC). He confirmed the code is exactly the same in 2008 as it was in 2005. And I prove it to you below too.

5) And why is it sill there in 2005 and 2008? It does the same thing in 2005/2008 as it did in 2000. If the temp table creation/deletion workload is high enough, you can still see latch contention, as the temp table cache won't be enough to completely alleviate the need for creating actual new temp tables, rather than just being able to grab one 'off the shelf'. In that case, using the trace flag to change to extent-based allocation (in *exactly* the same way as for 2000) can help, as can creating more tempdb data files.

As far as data files go though, the number has changed. Instead of a 1-1 mapping between processor cores and tempdb data files (*IF* there's latch contention), now you don't need so many - so the recommendation from the SQL team is the number of data files should be 1/4 to 1/2 the number of processor cores (again, only *IF* you have latch contention). The SQL CAT team has also found that in 2005 and 2008, there's usually no gain from having more than 8 tempdb data files, even for systems with larger numbers of processor cores. Warning: generalization - your mileage may vary - don't post a comment saying this is wrong because your system benefits from 12 data files. It's a generalization, to which there are always exceptions.

6) Why does DBCC IND still show two pages, even with the trace flag on? I've heard of some people being confused by the output of DBCC IND in SQL 2008 when the trace flag is turned on. Creating a single row temp table will only show two pages allocated in the DBCC output - one IAM page and one data page. Yes, that's completely correct - as only two pages are allocated, but the data page comes from a dedicated extent, not a mixed extent. (IAM pages are *always* single-page allocations from mixed-extents).

And now the proof, on SQL 2008.

SELECT @@VERSION;
GO

Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (Intel X86)

First off, I'll create a temp table without the trace flag enabled, and see what pages the table has allocated, by looking at the first IAM. I'll use a temp table with an 8000+ byte row size, and insert two rows - so we have two data pages for clarity.

DBCC TRACEOFF (1118, -1);
GO

USE tempdb;
GO

CREATE TABLE #temp (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
INSERT INTO #temp DEFAULT VALUES;
GO 2

Now I'll figure out what is the first IAM page, using my sp_AllocationMetadata script (see here for the script and details), and dump it with DBCC PAGE to see the single-page allocations it's tracking, and which dedicated extents are allocated to the table:

EXEC sp_AllocationMetadata '#temp';
GO

Object Name    Index ID  Alloc Unit ID        Alloc Unit Type  First Page  Root Page  First IAM Page
-------------- --------- -------------------- ---------------- ----------- ---------- ---------------
#temp__<snip>  0         1152921505223016448  IN_ROW_DATA      (1:158)     (0:0)      (1:199)

DBCC TRACEON (3604);
GO
DBCC PAGE ('tempdb', 1, 199, 3);
GO

<snip>

IAM: Single Page Allocations @0x4A35C08E

Slot 0 = (1:158)                     Slot 1 = (1:200)                     Slot 2 = (0:0)
Slot 3 = (0:0)                       Slot 4 = (0:0)                       Slot 5 = (0:0)
Slot 6 = (0:0)                       Slot 7 = (0:0)                      


IAM: Extent Alloc Status Slot 1 @0x4A35C0C2

(1:0)        - (1:1016)     = NOT ALLOCATED   

As you can clearly see from the partial output of the dump of the IAM page, there are two single-page allocations and no extents allocated to the temp table. This is what should happen when the trace flag is not enabled.

Now I'll do the same thing with the trace flag 1118 enabled.

USE tempdb;
GO

DROP TABLE #temp;
GO

DBCC TRACEON (1118, -1);
GO

CREATE TABLE #temp (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
INSERT INTO #temp DEFAULT VALUES;
GO 2

EXEC sp_AllocationMetadata '#temp';
GO

Object Name    Index ID  Alloc Unit ID        Alloc Unit Type  First Page  Root Page  First IAM Page
-------------- --------- -------------------- ---------------- ----------- ---------- ---------------
#temp__<snip>  0         1224979099301904384  IN_ROW_DATA      (1:208)     (0:0)      (1:158)

DBCC TRACEON (3604);
GO
DBCC PAGE ('tempdb', 1, 158, 3);
GO

<snip>

IAM: Single Page Allocations @0x4A8FC08E

Slot 0 = (0:0)                       Slot 1 = (0:0)                       Slot 2 = (0:0)
Slot 3 = (0:0)                       Slot 4 = (0:0)                       Slot 5 = (0:0)
Slot 6 = (0:0)                       Slot 7 = (0:0)                      


IAM: Extent Alloc Status Slot 1 @0x4A8FC0C2

(1:0)        - (1:200)      = NOT ALLOCATED                              
(1:208)      -              =     ALLOCATED                              
(1:216)      - (1:1016)     = NOT ALLOCATED
                              

Now as you can clearly see, there are no single-page allocations, and there's a single extent allocated to the table. Proof that trace flag 1118 still does exactly what it should in SQL Server 2008.

Now for a DBCC IND on the table: 

DBCC IND ('tempdb', '#temp', -1);
GO

PageFID PagePID     IAMFID IAMPID      ObjectID    IndexID    
------- ----------- ------ ----------- ----------- -----------
1       158         NULL   NULL        293576084   0
1       208         1      158         293576084   0
1       209         1      158         293576084   0

(I've removed some of the trailing columns for clarity.) We see that it still only lists the two data pages (1:208, 1:209) and the IAM page (1:158) - although an entire extent was allocated to the temp table, only two pages from the extent were actually allocated and used - the rest are reserved for use by that table, but remain unallocated. 

Hopefully this post has cleared up a lot of the confusion around this trace flag and what it does.

A short post to start the day (in India) and then at lunchtime I'll do how it works: FILESTREAM garbage collection.

This is a question that came up recently on an MVP discussion list: why does a 200MB log backup take 40 minutes to restore on my log-shipping secondary?

First answer I thought of was that the I/O subsystem on the log-shipping secondary was very slow, or broken in some way, so I/Os were taking ages to complete. Nope - I/O subsystem was performing well.

Next thing I suggested was the the log-shipping secondary was restoring the log backups using WITH STANDBY and there was something like an index rebuild that was being rolled back. As an aside, when you restore log backups on the log-shipping secondary, you have a choice how they recovery: either WITH NORECOVERY or WITH STANDBY. The first option doesn't allow any access to the database, as the database is still "in recovery". The second option runs the REDO part of recovery, then runs the UNDO part of recovery, but saves the details of what it had to do for UNDO into a file (who's name and location you specify). It then allows read-only access to the database, for queries/reporting/whatever. Obviously if there's a lot of transaction log that has to be undone (i.e. rolling back transactions that weren't committed at the time the log backup completed), then this could take some time. But 40 minutes? No. That wasn't it.

I was partly right on my second guess. The *previous* log backup that was restored WITH STANDBY contained a long-running index operation, and so the undo file that the restore created was *huge*. The next log backup that's restored after a RESTORE ... WITH STANDBY, must first undo everything in the undo file (i.e. put the database back to the exact state it was in as if the WITH STANDBY part of the restore never happened) before it can restore more transaction log. In this case, no-one had noticed that the previous log restore *also* took a lot of time and created the huge undo file. It just looked like the 200MB log backup was causing the problem.

My advice: in any kind of there's-suddenly-a-corruption-or-performance-problem situation, don't just look at what's happening now. Always look at what happened leading up to the problem, as the answer is often there.

Every so often (well, ok, very frequently) my internal list of 'I must blog about that' gets too big and I have an almost uncontrollable urge. Luckily, these urges end up with me blogging several times, otherwise who knows what kind of trouble I'd get into? Smile

First up today is some pretty deep internals about how the transaction log works in a specific situation. This has come up a few times in the last few internals/maintenance classes I've taught in the transaction log module so I wanted to get a blog post out there that proves my answer is correct. The question is easier to frame with a picture, borrowed from my TechNet Magazine article from February (see here).

  

The image shows the circular natue of the transaction log, and how VLFs can be overwritten once the log has wrapped around, as long as they are no longer active (red = active log/VLF, blue = inactive log/VLF). If none of this makes sense to you, go read the TechNet Magazine article and it should do.

The question is: what happens if log records continue being written to VLFs 1 and 2 in the picture, and eventually the end of VLF 2 has been reached, but VLF 3 is still active, and can't be overwritten? Does the database just stop?

Let's find out.

I'm going to create a database and then engineer the situation above.

CREATE DATABASE LogTestDB ON PRIMARY (
    NAME = LogTestDB_data,
    FILENAME = N'C:\SQLskills\LogTestDB_data.mdf')
LOG ON (
    NAME = 'LogTestDB_log',
    FILENAME = N'C:\SQLskills\LogTestDB_log.ldf',
    SIZE = 5MB);
GO

The database is in the SIMPLE recovery model. Well, to be completely honest, it's in the FULL recovery model, but there hasn't been a full database backup yet, so as far as transaction log behavior is concerned, it behaves as if it was in SIMPLE (the log can be cleared by a CHECKPOINT) - again, if this makes no sense, go read the TechNet Magazine article. 

And then see how many VLFs it has using the DBCC LOGINFO command (and this is the *only* way to see how many VLFs there are):

DBCC LOGINFO ('LogTestDB');
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
-------- ---------- ------------- -------- -------- ------ -----------
2        1245184    8192          31       2        64     0
2        1245184    1253376       0        0        0      0
2        1245184    2498560       0        0        0      0
2        1499136    3743744       0        0        0      0

We've got 4 VLFs. The Status column tells us whether the VLF is active or not. A status of 2 means the VLF is active, 0 means it's not. The sequence number (FSeqNo) is the logical order of the VLFs within the log. The FileSize is in bytes (so each VLF is about 1.25 MB). Right now there's only one active VLF.

Now I'll engineer the situation in the image above. I'm going to fill the log so that VLFs 1, 2, and 3 are active. Then I'm going to start an explicit transaction that will hold VLF 3 and onwards active. Then I'll continue filling the log so it wraps around and starts to fill up VLF 1 again.

USE LogTestDB;
GO
CREATE TABLE BigRows (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO

SET NOCOUNT ON;
INSERT INTO BigRows DEFAULT VALUES;
GO 300

I've filled up VLFs 1 and 2, and started on 3. Let's check with DBCC LOGINFO:

DBCC LOGINFO ('LogTestDB');
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
-------- ---------- ------------- -------- -------- ------ -----------
2        1245184    8192          31       2        64     0
2        1245184    1253376       32       2        64     0
2        1245184    2498560       33       2        64     0
2        1499136    3743744       0        0        0      0

As you can see, the first 3 VLFs now have a status of 2 (active), and they're in sequence. Now I'll create an explicit transaction that will prevent VLF 3 and onwards being cleared.

BEGIN TRAN
INSERT INTO BigRows DEFAULT VALUES;
GO

If I explicitly do a CHECKPOINT now, VLFs 1 and 2 will clear:

CHECKPOINT
GO

DBCC LOGINFO ('LogTestDB');
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
-------- ---------- ------------- -------- -------- ------ ----------
2        1245184    8192          31       0        64     0
2        1245184    1253376       32       0        64     0
2        1245184    2498560       33       2        64     0
2        1499136    3743744       0        0        0      0

Now I'll carry on filling up the BigRows table so the log wraps around and starts filling up VLFs 1 and two again.

INSERT INTO BigRows DEFAULT VALUES;
GO 300

DBCC LOGINFO ('LogTestDB');
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
-------- ---------- ------------- -------- -------- ------ ----------
2        1245184    8192          35       2        128    0
2        1245184    1253376       32       0        64     0
2        1245184    2498560       33       2        64     0
2        1499136    3743744       34       2        64     0

You can see that the log has wrapped around now, but VLFs 3 and 4 are still active. Look at the sequence numbers of the active VLFs... the active log is VLF 3 then 4 then 1, with sequence numbers 33 then 34 then 35. Now if I carry on filling up the table, what's going to happen when the log bumps up against VLF 3 that is still active? Will it stop or will it grow?

INSERT INTO BigRows DEFAULT VALUES;
GO 300

DBCC LOGINFO ('LogTestDB');
GO

FileId   FileSize   StartOffset   FSeqNo   Status   Parity CreateLSN
-------- ---------- ------------- -------- -------- ------ ------------------
2        1245184    8192          35       2        128    0
2        1245184    1253376       36       2        128    0
2        1245184    2498560       33       2        64     0
2        1499136    3743744       34       2        64     0
2        253952     5242880       37       2        64     36000000049300052
2        270336     5496832       38       2        64     36000000049300052
2        253952     5767168       0        0        0      36000000107500066
2        335872     6021120       0        0        0      36000000107500066
2        253952     6356992       0        0        0      36000000190700020
2        401408     6610944       0        0        0      36000000190700020
2        253952     7012352       0        0        0      37000000037300040
2        466944     7266304       0        0        0      37000000037300040

The answer is that it grew, and kind of skipped the active VLFs! Look at the sequence numbers. The new sequence of the active log is VLF 3 then 4 then 1 then 2 then 5 then 6, as you can see from the sequence numbers. Once the active transaction I created commits or rolls back, VLFs 3, 4, 1, and 2 can clear and then the 'normal' sequencing of VLFs in the log will resume.

Every VLF has a small header which contains the sequence number of the VLF within the transaction log, so the log can kind of do contortions to work around active VLFs in the middle of the log. Very cool.

Ok - that was fun - now I feel better!

PS In the last couple of DBCC LOGINFO dumps, where the log has wrapped around, you can see that the parity bits for the log blocks in the VLFs have changed, as I explained in one of my previous posts Search Engine Q&A #24: Why can't the transaction log use instant initialization?.

Not much to do here in Tokyo while waiting for the Bangkok flight except read and blog, and I've already done enough reading for one day on the flight here - so that leaves blogging!

Here's an interesting case that I got involved in on SQLServerCentral (here's the original thread, linked here with permission of the original poster). The problem was that every night the maintenance job would run and end up producing a stack dump, from a 211 error (which always indicates metadata corruption causing the Engine to trip over). I guessed system table corruption but the consistency checking portion of the maintenance job was apparently running fine. So I asked for one of the stack dumps, the output from the maintenance plan, and the latest SQL error log to have a look at (I have a weakness for these things, as you well know).

After having a look, I saw some strange behavior in the error log - every night there 10 or so DBCC CHECKDBs of master within about 2 minutes - corresponding to the consistency checking part of the maintenance job that was clearing running DBCC CHECKDB against all the system and user databases. Then I remembered that vanilla 2005 SP2 has two nasty maintenance plan bugs - one of which caused the consistency checking part of the maintenance plan to malfunction when run, and run DBCC CHECKDB against master every time instead of the database specified. Fixing that would then lead to the being able to properly run consistency checks and find the corrupt database.

So - moral of the story is to make sure you're not running vanilla 2005 SP2 (or one of the builds close to it, 3150 to 3158) otherwise you could get into problems with maintenance plans - worse still, you may think you've been running consistency checks all this time on your user databases, but if no-one's checking the SQL error logs, you could just be repeatedly consistency checking master.

The KB that describes the bug is 934459 (available at http://support.microsoft.com/kb/934459), or you can install SP3 (available at http://support.microsoft.com/?kbid=955706).

This week's survey is a little more complicated. I'm interested in the physical layout of your databases. I've got four surveys, for a variety of database sizes. Please vote multiple times in each survey, as you see fit - and by all means forward this link to your friends/clients/etc or re-blog/tweet it. I'm going to report on this survey in two weeks, to give a bit more time for people to respond (and because we're travelling next week). I think we're going to see some interesting statistics come out of this - the more people that respond the better. I'll report on it 4/24/09.

One thing to note - this is just for user databases, not for tempdb. In the surveys, "multiple filegroups" implies multiple files too, and if you don't have them spread exactly one per drive/etc, just choose that option - I only have 10 options to choose from in the free surveys. 

(If you're in the over 1TB range and have multiple files/filesgroups spread over multiple drives/arrays/LUNs, vote using the last option on the >1TB survey and I'll lump them together.) 

Phew - thanks!

Last week's survey was on how *you* manage the size of your transaction log (see here for the survey). Here are the results as of 4/10/2009 - the most popular survey yet:

In my opinion (guarantee that I'm going to say stuff that some of you won't agree with!), there are only two valid answers in this survey: #1 - run in the FULL recovery model and take regular log backups, and #4 - run in the SIMPLE recovery model all the time. The last answer is applicable if you run out of log space even though you're in either of these situations but isn't a general strategy like #1 or #4. IMHO, you should be in one of these two situations and in the rest of this editorial I'll explain why. I'm not going to touch on *why* your transaction log might start to fill up, instead here are some links:

Now for the survey options:

  1. Take regular log backups. I'm very pleased to see the vast majority of respondents doing this, as it's the only valid log size management strategy when in the FULL recovery model (same thing applies to BULK_LOGGED of course, but with a few twists). Once you take that first database backup in the FULL recovery model, SQL Server assumes you're going to manage the transaction log size from that point on by taking log backups. Unfortunately that isn't documented in big, red flashing letters in Books Online - so people can get themselves into troubel inadvertently. Also, the FULL recovery model is the default, and is required for database mirroring - which further adds to the potential for people (such as involuntary DBAs) to accidentally switch into I-will-take-log-backups mode and then not take them. If you don't want to take log backups for recovery purposes, or you don't want to use database mirroring, don't use the FULL recovery model - it's as SIMPLE as that (ha ha). You might argue and say that you're only using FULL because of database mirroring, and don't want to take log backups. I'd argue back and say that if you care enough to have hot standby of your database, you must also take backups - as you can't rely solely on a redundant copy of your database on a different server.
  2. Use BACKUP LOG WITH NO_LOG/TRUNCATE_ONLY regularly. These two commands do basically the same thing - allow the log to be cleared without taking a log backup. What's the point if you're not taking log backups? - just switch to SIMPLE and let the checkpoints clear the log. In fact, in 2008 these two commands have been removed. See my blog post BACKUP LOG WITH NO_LOG - use, abuse, and undocumented trace flags to stop it.
  3. Use BACKUP LOG WITH NO_LOG/TRUNCATE_ONLY when it fills up. Same as above. You might argue that you're only keeping the log around in case there's a disaster, so that you can take a log backup at that point and use it to recover up to the point of the disaster. I'd argue that's broken on two counts: 1) what if the log file is damaged and you can't back it up? 2) that's *all* the transaction log since the last full database backup you took (if you break the log backup chain and then take a full database backup, that backup becomes the base of subsequent log backups) so that may take a long time to restore and replay...
  4. Run in the SIMPLE recovery model all the time. If you don't need to use FULL, don't. Running in SIMPLE is perfectly acceptable, as long as you don't mind losing work in the event of a disaster.
  5. Switch to SIMPLE when it fills up, then back to FULL. This is like #s 2 and 3 - what's the point?
  6. Switch to SIMPLE when it fills up, shrink the log, then switch back to FULL. This is worse than 2, 3, or 5. If you shrink the log, then it's going to grow again - possibly causing VLF fragmentation (see Transaction Log VLFs - too many or too few?), and definitely causing your workload to pause while the log grows, as the log can't use instant initialization (see Search Engine Q&A #24: Why can't the transaction log use instant initialization?).
  7. Shutdown SQL Server and delete the transaction log file(s). Just don't get me started on this one - I'm glad no-one 'fessed up to doing it. There are many reasons why this is daft, including: 1) you have to shutdown to do it, so your workload is off-line 2) if the database wasn't cleanly shut down, it won't be able to be started again without using EMERGENCY mode repair, and your data will be transactionally inconsistent 3) as the log can't be instant initialized, the database won't come online until the log has been created and zero'd. Just don't do this. Ever.
  8. Add more transaction log files. As a general strategy for managing the size of the log, this is not good. Extra log files have zero effect on performance (the myth that SQL Server writes in parallel to the log files is just that - a myth) and make management more tricky. However, if your log fills up for some out-of-the-ordinary reason, then adding another log file may be the only way to allow the database to keep running. If you have to do this though, make sure that you remove the extra log file(s) when you can to keep log management uncomplicated.

To summarize, if you want to be able to take log backups to aid in point-in-time or up-to-the-second recovery, use the FULL recovery model. If not, use SIMPLE and you won't need to mess around with the log when it fills up because you're not taking log backups.

Next post - this week's survey! (And thanks to all those who are responding to them!)

PS For those of you who sent me details about your databases from the survey back at the start of March (see here) - I haven't forgotten. I was waiting to get a decent sample size and now I'm going to go through the data. If you want to send me any more data, you've got until Sunday.

In this week's survey, I'm interested in how *you* manage the size of your transaction log. I'll report on the results around 4/10/2009.

Thanks!

(No comments please... wait till the results post next week) 

Last week's survey was on how you should store large-value character data in SQL 2005+ (see here for the survey). Here are the result as of 4/3/2009 - and I think my favorite answer is starting to catch-on:

My favorite answer is, of course, it depends! For all those who didn't answer 'it depends', your answer is valid, but only for particular circumstances, as each method has its pros and cons and won't be applicable in all cases. It's extremely important when designing a schema to consider how to store LOB data, as making the wrong choice can lead to nasty performance issues (where 'performance' is a catch-all to include things like slow queries, fragmentation, and wasted space). Now I'd like to run through each of the options and detail what I think of as the pros and cons. A couple of definitions first: 'in-row' means the column value is stored in the data or index record with the other columns; 'out-of-row' or 'off-row' means the column value is stored in a text page somewhere in the data file(s), with a physical pointer stored in the data/index record (taking either 16 or 24 bytes itself).

  • As a N/CHAR column. This is a great choice when the data that's stored in the column is a fixed size all the time, and always uses the full width of the column. Any time that the data may be smaller than the defined wdith of the column, space is being wasted in the row. Wasted space leads to fewer rows per page, more disk space being used to store the data, more I/Os to read the data, and more memory used in the buffer pool. However, if the character values are very volatile, and can change size, then having a fixed-width column can avoid the problem of a row having to expand and there not being enough space on the page to allow that - leading to a fragmentation-causing page split in an index (or forwarding record in a heap). There's a tipping point that can be hard to identify for your particular application...
  • As a N/VARCHAR (1-8000) column. For data values less than 8000 bytes, this is the common choice as it avoids wasted space. However, if the application can change the size of the data after the initial creation of the row, then there is the possibility of fragmentation occuring through page-splits. In SQL Server 2005+, a row can also be created that is more than 8060 bytes - one or more variable-length columns is pushed into off-row storage and replaced by a physical pointer. This means any access of the column has to do an extra I/O to reach the data - and this is commonly a physical I/O as the text page is not already in memory. This can lead to hard-to-diagnose performance issues if a query selects the column and some rows have the data in-row, and some out-of-row. Also, if the data values tend towards the larger end of the 1-8000 byte spectrum, individual rows can become vary large, leading to very few rows per page - and the problems described in the first option. If the data isn't used very much, then storing it in-row like this isn't very efficient.
  • As a N/VARCHAR (MAX) column in-row. This has the same pros and cons as the option above, with the added benefit that the value can grow larger than 8000 bytes. In that case it will be pushed off-row automatically, and start incurring the extra I/O for each access. These data types also work with the intrinsic functions in the same way as the character data types discussed above. I guess one drawback of this type compared to FILESTREAM is that it's limited to 2GB. Also, if there's a LOB data column in the table definition, the table's clustered index cannot have online operations peformed on it - even if all the LOB values are NULL or stored in-row!
  • As a N/VARCHAR (MAX) column out-of-row. The drawback of storing this data out-of-row is that accessing it requires an extra I/O to retrieve it, but if the data isn't used very much then this is an efficient way to go, but still uses space in-row to store the off-row pointer. An additional benefit of storing the data off-row is that it can be placed in a separate filegroup, possibly on less expensive storage (e.g. RAID 5) - but then there's the drawback that it can't be moved after being created except with an export/import operation. This option has the same online operations drawback as storing the data in-row.
  • As a N/TEXT column in-row. This has the same pros and cons as the N/VARCHAR (MAX) column in-row option, but these data types are deprecated and don't work with the majority of the intrinsic functions.
  • As a N/TEXT column out-of-row. Same as above.
  • In a seperate table and JOIN to it when required. This option is great when the data isn't used very much, as it doesn't require any storage at all in the main table (except for a value to use for the JOIN), but it does require some extra up-front design and slightly more complicated queries. There's another HUGE benefit to doing this - by moving the LOB data to another table, online operations become available on the main table's clustered index. (This concept is 'vertical partitioning' a huge topic in itself...)
  • As a FILESTREAM column. (Yes, I didn't have this in the survey, but it's a possibility). If your data values are going to be more than 1MB, then you may want to consider using the FILESTREAM data type in SQL 2008 to allow much faster access to the data than having to read it through the buffer pool before giving it to the client. There are lots of pros and cons to using FILESTREAM - see my whitepaper for more info here.

So, as you can see, the best answer for a general question like this is definitely It Depends!. Although I haven't covered every facet of each storage option, the aim of this post is to show that it is very important to consider the implications of the method you choose, as it could lead to performance problems down the line.

Next post - this week's survey!

Last week's survey was on what method you use to run consistency checks (see here for the survey). Here are the results as of 3/27/09 - again, very encouraging:

As you can see, 70% of respondents run DBCC CHECKDB on the production server, either with PHYSICAL_ONLY or without. For those running without it, be aware that using PHYSICAL_ONLY turns DBCC CHECKDB from a CPU-bound process into an I/O-bound process, and makes it run (potentially) magnitudes faster. It will still evaluate page checksums and torn-page protection, just skipping the higher-level logical checks.

I'm surprised to see so many people using a completely separate system to run consistency checks (restoring a backup and running a CHECKDB on the restored backup) - I've been a proponent of this method for a few years now, but I didn't think it had caught on so much (assuming a reasonably representative sample of readers responded to the survey). This method allows the entire consistency checking workload to be offloaded, and completely validates the backups used - but has the downside of requiring extra disk space on another server to restore the backup (I wish they'd build the system I got a patent for - to consistency check the database inside a backup without restoring it - see here).

Three of these choices I threw in to see if anyone was doing them so I could explain why they're not good methods to use.

  • "Don't run any consistency checks at all". I'm sure I don't need to labor the point with this one - you need to run consistency checks as well as having some kind of page protection turned on. If you don't proactively check for corruption, when it does occur it will likely be more widespread, and harder to recover within your data-loss and recovery time objectives than if you'd discovered it earlier. I've written lots about this in the Corruption and CHECKDB From Every Angle categories.
  • "Run DBCC CHECKDB on a database snapshot on a mirror database". I was asked about this several times while here at SQL Connections too. Database mirroring works by shipping transaction log records between the principal and mirror databases, NOT by shipping database pages - so if a page gets corrupted on the principal, the corruption will not be transferred to the mirror. This means that DBCC CHECKDB on a snapshot of the mirror does not reflect the consistency state of the principal database at all. Saying that, it is however, possible for corruption in the principal to affect the mirror. Imagine a column value is corrupted on disk by the principal's I/O subsystem. If that value is then read and used to calculate another value, which is then persisted - that calculated value is also 'corrupt' and will be reflected in the mirror database. Kind of an insidious, second-order corruption effect.
  • "Use BACKUP WITH CHECKSUM to validate page checksums, no DBCCs". Another option that sounds feasible, but in fact isn't for a couple of reasons. Firstly, BACKUP WITH CHECKSUM will stop when it finds a bad checksum, whereas DBCC CHECKDB will continue reading the rest of the database and tell you everything that's wrong with it. Secondly, BACKUP WITH CHECKSUM will only check those pages that HAVE page checkums, whereas DBCC CHECKDB will consistency check everything, regardless of whether a page has a checksum or not. For a database upgraded from an earlier version, where not every page will have a page checksum since you enabled them after upgrading, this is a critical point.

To summarize, this week's results were great, with almost 90% of respondents running some kind of consistency check. If you're not running any, for whatever reason, rethink your decision - there's always a way to run some kind of consistency checks and give yourself more peace of mind. See CHECKDB From Every Angle: Consistency Checking Options for a VLDB for more info.

Next post - this week's survey!

Jack Li, one of the Senior Escalation Engineers in Product Support, just posted details of an interesting case over on the CSS blog - his article is here. It talks about index builds and rebuilds, but the issue is the same for both, so I'll just talk about rebuilds.

The jist of the problem is that index rebuilds can parallelize, but sometimes they don't parallelize vey well. Each thread gets a certain range of the index to rebuild, using the existing index statistics to divide the ranges equally between the threads. If there's massive data skew, then one thread can end up doing the majority of the work, leading to a long run-time. The case in Jack's post involved a 250 million row index where 150 million rows had the same (NULL) key value. This range has to be processed by a single thread - a single value can't be divided between two+ threads.

Now, this is understandable behavior by the database engine, but it relies on the statistics being up-to-date. That's a bit of a catch-22 - rebuilding an index updates the statistics, but if the statistics aren't up-to-date then the index rebuild might parallelize badly! I guess the solution is that if you know that you have massive data skew in your large indexes, update statistics BEFORE doing an index rebuild. And given what I've been hearing this week at SQL Connections about how badly statistics keep biting people, I'm leaning towards a different recommendation for those people who have lots of perf trouble caused by statistics and the potential for skewed data - rebuild all your statistics regularly, and only rebuild/reorganize fragmented indexes. Statistics just cause so many problems it seems.

Thanks

PS Kimberly has a lot more info about statistics over on her blog - I'm just starting to venture into that mine-field

While I was teaching the MCM-Database class last week, we were discussing fragmentation and the effect of a high-order GUID key on an index. Without going into too many details, having a random GUID - as generated from the NEWID() - function is bad, but having one generated by NEWSEQUENTIALID() isn't anyway near so bad (I'll discuss the details more in the fragmentation series I'm starting). As part of the demo, we wanted to change the column default for the leading key of a table from NEWID() to NEWSEQUENTIALID() - problem was that none of us could remember the exact syntax, so we worked it out together. I thought it would make an interesting post, so here it is.

First off, here's my table with a poor clustered index key:

CREATE TABLE BadKeyTable (
    c1 UNIQUEIDENTIFIER DEFAULT NEWID () ROWGUIDCOL,
    c2 SMALLDATETIME DEFAULT GETDATE (),
    c3 CHAR (400) DEFAULT 'a',
    c4 VARCHAR(MAX) DEFAULT 'b');
GO
CREATE CLUSTERED INDEX BadKeyTable_CL ON BadKeyTable (c1);
GO

INSERT INTO BadKeyTable DEFAULT VALUES;
GO

(And you'll notice that I've given up doing nice colors in the T-SQL I post - it's too time consuming). The default we're interested in is in bold above. To change the default, we first need to find the constraint name so we can drop it. There are two queries you can use:

SELECT [name] FROM sys.objects
WHERE [parent_object_id] = OBJECT_ID ('BadKeyTable');
GO

or

SELECT [name] FROM sys.default_constraints
WHERE [parent_object_id] = OBJECT_ID ('BadKeyTable');
GO

The second is obviously the more supported way, as the first will return the names of all sub-objects of this table - all constraints, and all internal tables, such as XML indexes. The second query returns:

name
-------------------------------
DF__BadKeyTable__c1__7C8480AE
DF__BadKeyTable__c2__7D78A4E7
DF__BadKeyTable__c3__7E6CC920
DF__BadKeyTable__c4__7F60ED59

The constraint we're interested in is the one for the first column - DF__BadKeyTable__c1__7C8480AE. Now we need to drop the constraint and then add the new one as there's no way to simply alter the constraint in-place. We do that using:

ALTER TABLE BadKeyTable DROP CONSTRAINT DF__BadKeyTable__c1__7C8480AE;
GO

ALTER TABLE BadKeyTable ADD CONSTRAINT DF__BadKeyTable__c1
DEFAULT NEWSEQUENTIALID() FOR c1;
GO

And we're done.

In this week's survey, I'd like to know *how* you run consistency checks, not how often. I'll report on the results next week (around 3/27/09)

Thanks!

Last week’s survey was on what kind of regular index maintenance you perform (see here for the survey) as a way of kicking off a new series I’m writing around index maintenance. Here are the results as of 3/21/09 – I find them very encouraging:

As you can see, about 2/5 of respondents are performing some form of analysis-based fragmentation removal (answers 5+6), which I consider the best way to perform index maintenance, if you’re willing to invest the time involved to set it up. It allows the least amount of work to be performed, for the most targeted performance gains – and so is especially appropriate for 24x7 systems where there’s a minimal or non-existent maintenance window.

The next best option is to do all rebuilds or all defrags based on a fragmentation threshold (answers 3+4), which about 1/5 of respondents do. This also allows work to be limited, but by choosing only a single method of removing fragmentation, there are pros and cons. Now, the survey was limited to a number of questions so I couldn’t explore what the threshold is that people are using (e.g. logical fragmentation, page density, extent fragmentation, or something else). Some measures are good to use and some not so good, and I’ll be exploring the various counters and ways of determining fragmentation as the series progresses.

Either of the options to operate on all indexes regardless of fragmentation (options 1+2) can lead to lots of wasted resources (disk space, transaction log space, I/Os, CPU) by operating on indexes that are not fragmented in the first place, or for which fragmentation removal has no benefit for workload performance. About 1/3 of respondents do this. This isn’t surprising to me as rebuild-all-the-indexes-every-night/week is a very common index maintenance plan for “involuntary DBAs” who know that index maintenance is important, but don’t have the knowledge or training to implement a more sophisticated maintenance plan. This growing size of this group of people is one of the main reasons I’m going to write this series.

Doing absolutely nothing for index maintenance, which about 1/10 do, is usually not a good idea, as indexes in a database that’s not read-only commonly become fragmented over time. However, these people may know they don’t suffer from fragmentation issues, or that removing fragmentation has no effect on workload performance. However, my suspicion (based on what I see in the field) is that some of these respondents don’t realize the benefits of performing index maintenance.

You may be surprised to hear that I don’t consider doing nothing to be the worst choice. That dubious honor goes to answer 7 – doing any kind of index maintenance followed by a database shrink operation – as 3 respondents do. A post-maintenance shrink operation may well undo some of the benefits of the maintenance by introducing massive amounts of index fragmentation – see my blog post Auto-shrink - turn it OFF! for details of how bad this can get.

I’m actually pretty encouraged by these results though. Compared to how things were around 1999 when I wrote DBCC INDEXDEFRAG and DBCC SHOWCONTIG for SQL Server 2000, these results show that knowledge in the field (or at least in the group that reads my blog and responded) has vastly improved. My aim for the forthcoming series about index fragmentation and maintenance is to increase knowledge a lot more broadly.

Next up - this week's survey. Thanks for reading!

I'm about to start a new series of post about index fragmentation and removing it. For this week's survey, I'd like to know what index maintenance you do to address fragmentation (in any of its forms) - I'll report on it in a week.

Thanks

Last week I kicked off the first weekly survey - on whether you validate your backups or not (see here for the survey). The results are very interesting (as of 3/13/09):

As you can see, almost 25% of respondents never validate their backups! And a further 25% only validate them occasionally, with 30% doing some kind of regular checks, and only a handful checking all the time.

While these results may seem shocking to you, based on what I've heard when teaching, they're pretty normal. There are lots of reasons why DBAs may choose not to validate backups as often as they should, including:

  • Not enough time to restore the backups to check them
  • Not enough disk space
  • Not part of the day-to-day operations guide
  • Don't see why it's important

Kimberly and I have a saying (well, to be fair, Kimberly coined it): you don't have a backup until you've restored it. You don't know whether the backup you just took was corrupt or not and will actually work in a disaster recovery situation.

Can you ever get a guarantee? No. Here's an analogy, taken from a very old post of mine. Consider Paul, who works for the Seattle Police Department in traffic control. Paul's in a control room somewhere in the city with a large bank of monitors connected to various traffic cameras. Paul's job is to cycle through the cameras every 1/2 hour, looking for traffic accidents. At the end of the 1/2 hour cycle, if Paul han;t seen any accidents then he knows that there are no accidents in the city.

Ah - but hold on. Does Paul really know that? No. All Paul knows is that at the point he looked at a particular camera, there was not an accident at that spot in the city. The very instant he switches to another camera feed, an accident could happen at a spot covered by the previous camera.

The same is true for validating backups. As soon as you've validated a backup, it could then be corrupted by the I/O subsystem, but at least you know that it was valid at some point. But what if that happens, I hear you ask? Well, then you need to have multiple copies of your backups, and you should not rely on backups as the only method of disaster recovery. A good high-availability solution includes as many technologies as you need to mitigate all risks - and backups are just one of those technologies. You're going to have to have some kind of redundant system too that you can fall back on (or mayb even immediately failover to, depending on your particular disaster recovery plan). But, saying that, you can't rely on the redundant server either - if it goes wrong, you'll need your backups.

So - whichever way you look at it, validating backups is a really good practice to get into so you don't get bitten when it comes to the crunch. When I teach, I've got many stories of customers losing data, business, time, and money (and DBAs losing their jobs) because the backups didn't work or were destroyed along with the data. Here's one for you (simplified, and no I won't divulge names etc). Major US investment firm decides to provision new hardware, so takes a backup of the database storing all the 401k accounts for all their customers (private and corporate), flattens the hardware, and goes to restore the backup. The backup is corrupt - on SQL 2000, where there's no RESTORE ... WITH CONTINUE_AFTER_ERROR. What happened? Well, the SQL team and Product Support had to get involved to help get the data back, but people in the firm lost their jobs and it cost a lot of time and money to recover the data. If only they'd had multiple copies of the backup, and tested their backup before removing the database (or better yet, restored the database on the new hardware before flattening the old hardware). They learned a costly lesson, but they did change their practices after that.

Unfortunately this is so often the way - people don't realize they need to validate backups or have an HA plan UNTIL they have a disaster. Then suddenly its the top priority at the company. Being proactive can save a lot of grief, and make you look good when disaster strikes.

Backups can be unusable for a number of reasons, including:

  • The full backup is corrupt, because something in the I/O subsystem corrupted it.
  • A backup in the log backup chain is corrupt, meaning restore cannot continue past that point in the chain.
  • All backups following a full backup are written to the same backup set, but the WITH INIT clause is used accidentally on all backups, meaning the only backup present in the backup set is the last one taken.
  • An out-of-band backup was taken without using the WITH COPY_ONLY clause and the log backup chain was broken (see BACKUP WITH COPY_ONLY - how to avoid breaking the backup chain).
  • The backups worked but the database contained corruption before it was backed up (kind of a separate issue).

The only ones that are out of your control are the first two, but they can be mitigated by having multiples copies of backups. All of these though, can be avoided at disaster recovery time by reguarly restoring your backups as a test of what you'd do if there was a real disaster. You might be surprised what you'd find out...

This is more of an editorial style post than a deep technical or example script post - I'm going to start doing more of these around the weekly surveys. Next post - this week's survey.

Thanks!

Ola Hallengren, who we meet every so often at SQL Connections, has a great script that helps automate consistency checking, backups, and index maintenance. He's constantly updating it and it's good quality code. Check out the latest version for  SQL Server 2005 and 2008:

Enjoy! 

The April edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Disappearing errors with DBCC CHECKDB
  • Provisioning tempdb when moving from 2000 to 2008
  • Does fillfactor prevent fragmentation and should it be set instance-wide
  • Avoiding FILESTREAM performance problems

Check it out at http://technet.microsoft.com/en-us/magazine/2009.04.sqlqa.aspx

As you may know, DBCC CHECKDB (and some of the other DBCC CHECK*) commands use an internal database snapshot to get a transactionally consistent view of the database (if you didn't, see my blog post CHECKDB From Every Angle: Complete description of all CHECKDB stages for an explanation). It's entirely possible that the database snapshot may grow and run out of space, causing the DBCC command to fail, when the database has a lot of updates running concurrently with the DBCC command (see the blog post Database snapshots - when things go wrong for some examples of this happening). There have also been some nasty bugs in SQL Server and in Windows that can cause the database snapshot to fail, and possibly cause SQL Server to hang when DBCC CHECKDB is running multi-threaded.

These bugs have been fixed and are available in a variety of ways. Suresh Kandoth, an old friend and Senior Escalation Engineer in PSS, has published a nice post that summarizes the various bugs, along with associated KB articles and release vehicle details. Check it out at Sparse File Errors: 1450 or 665 due to file fragmentation: Fixes and Workarounds.

This is an interesting case that cropped up yesterday - the transaction log is damaged so a log backup doesn't work (with the error below):

Backup detected log corruption in database FakeDBName. Context is FirstSector. LogFile: 2 'F:\SQLLOGS\XYZ\FakeDBName_Log.ldf' VLF SeqNo: x502e VLFBase: x2ce40000 LogBlockOffset: x2d0a9000 SectorStatus: 2 LogBlock.StartLsn.SeqNo: x4ee3 LogBlock.StartLsn.Blk: x1348 Size: x200 PrevSize: x400
2009-03-06 10:00:02.61 Backup      Error: 3041, Severity: 16, State: 1.
2009-03-06 10:00:02.61 Backup      BACKUP failed to complete the command BACKUP LOG FakeDBName Check the backup application log for detailed messages.
2009-03-06 10:00:03.61 Backup      Error: 3041, Severity: 16, State: 1.

However a full backup succeeds, as does a DBCC CHECKDB. What's going on?

The answer comes with understanding what portions of the transaction log are required for these operations. (For a good understanding of the transaction log itself, along with logging and recovery, see my article in the February TechNet Magazine.)

A transaction log backup, by its very nature, has to backup *all* transaction log generated since the last log backup - so it will try to backup the corrupt portion and fail.

A full database backup only has to backup enough transaction log to allow the database to be restored and recovered to a transactionally consistent point. In other words, it only requires the transaction log back to the beginning of the oldest active transaction at the point that the data-reading section of the full backup completes. This is a source of immense confusion - many people don't believe that a full (or differential) backup needs to also backup some transaction log. For a more in-depth study of this, see my previous blog posts Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes.

A DBCC CHECKDB operation uses a database snapshot to get a transactionally consistent view of the database on which to run consistency checks. When the database snapshot is created, crash recovery is run on it to make it transactionally consistent. That requires the same amount of log as if a full backup was taken - back to the beginning of the oldest active transaction at the time the database snapshot is created. See CHECKDB From Every Angle: Complete description of all CHECKDB stages for more info.

So - it's entirely possible for the situation reported above to exist. The question then becomes, how to recover from it?

Assuming that the database files are intact, there is a simple solution. This solution will break the log backup chain, but given that the log is corrupt so a log backup cannot be taken, the log backup chain is *already* broken. Here's what to do:

  1. Stop all user activity in the database
  2. Switch to the SIMPLE recovery model (breaking the log backup chain and removing the requirement that the damaged portion of log must be backed up)
  3. Switch to the FULL recovery model
  4. Take a full database backup (thus starting a new log backup chain)
  5. Start taking log backups

You might want to manually shrink and grow the log file in between steps 2 and 3 too - in case the log file is on a damaged portion of disk - or maybe even shrink it right down and add another log file on an undamaged disk. You also will need to do some root-cause analysis to determine why the corruption occured in the first place, and to take preventative measures to stop it happening again.

Hope this helps

PS In my previous post, Testing a new survey method: backup validation, the answer with the largest number of responses so far is that people never verify their backups - very disturbing!

I'm toying with the idea of having a weekly survey that'll highlight an interesting facet of database management. I've signed up with SurveyPopups.com, which is free and allows you to see the results as you vote on them.

If you think this is a cool idea, vote in the survey using the options below. If I get more than 100 people voting then I'll start doing one weekly or so and using the results to seed a blog post.

Thanks

We're doing some research into database settings that people use and the best way to get a good variety of systems is to ask you to send us some data. We're interested in whether people are using the defaults or changing them in any way. If your're interested, run this query (in the context of any database, doesn't matter which) with text output:

SELECT d.*, mf.* FROM sys.databases d
CROSS APPLY sys.master_files mf
WHERE mf.database_id = d.database_id;

And send me the results in email by clicking here: mailto:paul@sqlskills.com?Subject=Feedback. Dumping into a .xls works and my email won't filter it out.

I'll aggregate the results and post some articles. As payment for your info, if I see anything in your settings that I recommend changing, I'll let you know.

Thanks in advance for any data you send!

[Edit: PS - of course everything will be totally anonymous and only aggregated data will be discussed on the blog - by all means remove database names if you want]

Back in 2005 Kimberly produced two very popular webcast series - an 11-part webcast series for TechNet called SQL Server 2005 for the IT Professional and a 10-part webcast series for MSDN called A Primer for Proper SQL Server Development. The webcast links and blog posts were broken for quite a while but now they're all fixed up and working again. I've created some web pages that link to all the webcasts and blog posts, along with abstracts. I've also included some more recent ones too and will be adding to the list over the next few weeks.

There's over 30 hours of good stuff to watch - check them out at http://www.sqlskills.com/webcasts.asp

In my previous post today I talked about error 825 and how you should have an Agent alert set up to catch it when it happens. There are two ways you can set up Agent alerts - using SSMS or using the stored-proc sp_add_alert. In a post from April 2008 I showed how to create an Operator and alerts using the SSMS wizards (see that post here). To use the SP to create the alert for message 825, the code would be as follows:

USE msdb;
GO

EXEC msdb.dbo.sp_add_alert @name = N'825 - Read-Retry Required',
    @message_id = 825,
    @severity = 0,
    @enabled = 1,
    @delay_between_responses = 0,
    @include_event_description_in = 1,
    @category_name = N'IO Subsystem Error';
GO

Note the @include_event_description parameter. This is a tinyint that says which of the notification mechanisms should include the description. The list of possibilities is:

  • 0 - None
  • 1 - Email
  • 2 - Pager
  • 4 - Net send

So my value of 1 only includes the description in email, even if I've set up the Operator to receive email and a net send. You can get more info about this SP from the following link: http://msdn.microsoft.com/en-us/library/ms189531.aspx.

As to which alerts you should have defined, my advice would be to have one for message 825 and a separate alert for each severity 19, 20, 21, 22, 23, 24, and 25. These are the high-severity errors and are also written to the error log. You can get a complete description of all error severities at http://msdn.microsoft.com/en-us/library/ms164086.aspx. Note that if you specify @severity, the @message_id must be 0 (and vice-versa).

Happy alerting!

PS And of course you'll need to use the SP sp_add_notification (see http://technet.microsoft.com/en-us/library/ms173843.aspx) to hook the alert to an Operator. Thanks Tibor (who just blogged about how to determine which errors to alert on) - got distracted by email while writing this and forgot to include it!

There are two pretty well-known I/O errors - 823, and 824 - but there's also one called 825 which most DBAs do*not* know about, and definitely should.

From SQL Server 2005 onwards, if you ever see an 823 or 824, SQL Server has actually tried that I/O a total of 4 times before it finally declares a lost cause and surfaces the high-severity I/O error to the connection's console, killing the connection into the bargain. The idea behind this read-retry logic came from Exchange, where adding the logic reduced the amount of immediate downtime that customers experienced. While in concept this was something I agreed with at the time, I didn't agree with the way it was implemented.

If the I/O continues to fail, then the 823/824 is surfaced - that's fine. But what if the I/O succeeds on one of the retries? No high-severity error is raised, and the query completes, blissfully unaware that anything untoward happened. However, something *did* go badly wrong - the I/O subsystem failed to read 8KB of data correctly until the read was attempted again. Basically, the I/O subsystem had a problem, which luckily wasn't fatal *this time*. And that's what I don't like - the I/O subsystem went wrong but there are no flashing lights and alarm bells that fire for the DBA, as with an 823 or 824. If read-retry is required to get a read to complete, the only notification of this is a severity-10 informational message in the error log - error 825. It looks like this:

Msg 825, Level 10, State 2, Line 1.
A read of the file ‘D:\SQLskills\TestReadRetry.mdf’ at offset 0×0000017653C000 succeeded after failing 2 time(s) with error: incorrect checksum (expected: 0×4a224f20; actual: 0×2216ee12). Additional messages in the SQL Server error log and system event log may provide more detail. This error condition threatens database integrity and must be corrected. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Now, what this message is really saying is that your I/O subsystem is going wrong and you must do something about it. And unless you're regularly scanning the error log looking for these, you'll be none-the-wiser.

So - my recommendation is that you add a specific Agent alert for error 825, along with your other alerts (see following blog post).

In SQL Server 2000 and before, the symptoms of database corruption would occasionally manifest themselves as asserts, such as:

SQL Server Assertion: File: <recbase.cpp>, line=1378 Failed Assertion = 'm_offBeginVar < m_sizeRec'.

To reduce the number of assertions being fired by the SQL Engine, my team changed these asserts into real error messages for SQL Server 2005 onwards - either 5242 or 5243 - such as:

Msg 5242, Level 22, State 1, Line 1

An inconsistency was detected during an internal operation in database 'MyCorruptDatabase'(ID:12) on page (1:34923). Please contact technical support. Reference number 4.

Error 5243 is exactly the same except that the database couldn't be determined for some reason. Both of the errors above say that the offset of the variable length column offset array is beyond the end of the record.

I've noticed increasing confusion from these errors being reported - sometimes the worry is that maintenance jobs are causing them, or DBCC CHECKDB is causing them. This may appear to be so but is really just an artifact of the fact that DBCC CHECKDB reads all allocated pages in the database and may read a corrupt page that your normal queries just don't happen to cause to be read. The messages may disappear because other maintenance jobs cause indexes to be rebuilt and so corrupt pages may be deallocated from the database - meaning they won't be read by DBCC CHECKDB.

If you have either torn-page detection or page checksums enabled, you may not ever see these 5242 or 5243 errors as you'll likely see an 824 error instead. The 824 error is raised by the buffer pool when the page is read - before the page can be processed by the record-cracking code that would raise the 5242 or 5243 errors. Bottom line is that 5242 and 5243 says you've got corruption in the structure of a record - the reference numbers at the end of error say exactly what kind of corruption there is - for instance that the record size is invalid or the record type is wrong for the type of page it resides. If you see these errors, you need to go through the motions of recovering from corruption and figuring out what's wrong with your I/O subsystem.

Hope this helps.

One of the things I mentioned in my recent TechNet Magazine article on Understanding Logging and Recovery was the need to manage the transaction log so the number of VLFs (virtual log files) does not get too large (with too large being more than, say, 100 VLFs). Linchi Shea (a fellow MVP) has just posted some performance numbers comparing insert/update/delete performance between a database with 16 VLFs and one with 20000 VLFs, representing a poorly managed transaction log. You can see his blog post here.

For those wishing to take charge of unruly transaction logs, checkout Kimberly's blog posts - starting with this one. And if you want to quickly know how many VLFs your transaction log has, use the undocumented DBCC LOGINFO command - the number of lines of output is the number of VLFs you have.

Next posts coming up will be photos - we're in Bangkok right now and I've got 3 Where In The World Are Paul and Kimberly posts queued up - St. Lucia, Hyderabad, and Bangkok.

Wow - today is all about new content. As if I haven't already blogged about enough stuff to keep you reading through next week, the February issue of TechNet Magazine is now available and contains a feature article I wrote about understanding how logging and recovery work inside SQL Server.

The article covers:

  • What is logging?
  • What is recovery?
  • The transaction log (include logical and physical architecture)
  • Recovery models and how they affect the behavior of the transaction log

There's also a ten-minute screencast video where I demonstrate a runaway transaction log.

Check it out at http://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx.

Over the last few weeks Sunil Agarwal (from the SQL Storage Engine team) has posted a great series of blog articles about tempdb and the version store, over on my old stomping ground - the Storage Engine blog. The articles are well worth reading - the links are:

Enjoy!

Well, we're back from vacation finally (only for 10 days and then off to India and Thailand for 3 weeks... phew) and I've got a bunch of blogs posts to catch up on. First up - I did an interview with TechNet Radio in mid-December where I talked about database corruption and things to do to recover from it - similar to the conference sessions I've done but a little higher level.

The links for the interview are:

Enjoy!

This blog post describes the demo "2 - NC Indexes" from my Corruption Survival Techniques conference session from various conferences in 2008. The links to the scripts and databases to use are in this blog post.

The aim of this demo is to show that sometimes its just redundant data (i.e. nonclustered indexes) that get corrupted, and so you don't have to do anything that takes the actual data offline - like restoring from a full backup or running one of the repair options (both of which require the database to be in single-user mode).

Let's look at an example. Extract and restore the DemoNCIndex database, and the NCIndexCorruption.sql script. What do we get from running DBCC CHECKDB on the DemoNCIndex database (lines 39-42)?

DBCC CHECKDB (DemoNCIndex) WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO

Msg 8951, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Data row does not have a matching index row in the index 'CustomerName' (ID 2). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:45:28) identified by (CustomerID = 29) with index values 'LastName = 'Adams' and CustomerID = 29'.
Msg 8951, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Data row does not have a matching index row in the index 'CustomerName' (ID 2). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:180:164) identified by (CustomerID = 2118) with index values 'LastName = 'Adams' and CustomerID = 2118'.

<snip - removed for brevity>

Msg 8952, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Index row in index 'CustomerName' (ID 2) does not match any data row. Possible extra or invalid keys for:
Msg 8956, Level 16, State 1, Line 1
Index row (1:24482:16) with values (LastName = 'Andersen' and CustomerID = 18718) pointing to the data row identified by (CustomerID = 18718).
Msg 8952, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Index row in index 'CustomerName' (ID 2) does not match any data row. Possible extra or invalid keys for:
Msg 8956, Level 16, State 1, Line 1
Index row (1:24482:127) with values (LastName = 'Arthur' and CustomerID = 9758) pointing to the data row identified by (CustomerID = 9758).
CHECKDB found 0 allocation errors and 26 consistency errors in table 'Customers' (object ID 453576654).
CHECKDB found 0 allocation errors and 26 consistency errors in database 'DemoNCIndex'
repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (DemoNCIndex).
 

Lots of errors. Now, in this example there are only 26 errors, but in cases where there are hundreds of errors it can be really hard to tell whether all the corruptions are in nonclustered indexes (i.e. indexes with IDs > 1). Luckily, there's an undocumented option to all the DBCC CHECK* commands - WITH TABLERESULTS. The option is undocumented because the output can change from release to release, but basically this converts the DBCC output into tabular form. Try running lines 48-50 in the script and you'll see what I mean. One of the columns in the output is IndexId - so you can easily see whether all the errors are in nonclustered indexes. In this case, they are, and all in one index of the Customers table.

You could run lines 55-57 of the script to prove to yourself that repairs can't be run online, and then realize that we can address the problem without having to run repair or restore. First off we need to figure out the name of the index to fix - index ID 2 of the Customers table. Lines 77-80 run sp_HelpIndex on the table (although I should really be using Kimberly's sp_HelpIndex2):

USE DemoNCIndex
GO
EXEC sp_HelpIndex 'Customers';
GO

index_name     index_description                                   index_keys
-------------- --------------------------------------------------- ------------
CustomerName   nonclustered located on PRIMARY                     LastName
CustomerPK     clustered, unique, primary key located on PRIMARY   CustomerID

The nonclustered index is called CustomerName. Plug in the index name to line 82, then try fixing the index by doing an online index rebuild, and run DBCC CHECKDB afterwards (lines 82-89). The corruption hasn't been fixed! Online index rebuild reads the old index to build the new one so the new index has the same missing rows as the old one. We need to do an offline index rebuild - with lines 110-115. After the last DBCC CHECKDB, the index is fixed up. Now, on SQL Server 2008, you may or may not get a query plan for the index rebuild that doesn't use the old index, because the query optimizer has some more plan choices available to it - so on SQL Server 2008 you may need to do an actual drop and create of the broken index (carefully, if its enforcing a constraint).

So - just because DBCC CHECKDB reports a ton of errors, that doesn't necessarily mean that the database needs to be taken (essentially) offline to repair it - check through the errors to see if it's just nonclustered indexes that are affected.

For those of you who couldn't make it to a conference this year where I presented my Corruption Survival Techniques session, the folks at TechEd EMEA have just posted an 80 minute long video of the presentation I did in Barcelona in early November. It walks through I/O errors, what CHECKDB does, how it works, how to run it, CHECKDB FAQ, how to interpret the output, choosing between repair and restore and has a bunch of demos of recovering from corruptions. Lots of fun stuff!

The video is available at http://www.microsoft.com/emea/teched2008/itpro/tv/default.aspx?vid=78. The accompanying scripts and corrupt databases are all posted on our website - see this blog post for details.

Enjoy!

Here's a question I got from someone who attended our database maintenance workshop at PASS last week (paraphrased):

I attended your pre-conference session on database maintenance and found it to be very informative.  From what you told use though, I think I need to change my nightly backup procedure.  I like to get my databases back to as small of a size as possible before backing them up, so I run the following commands to do this before taking the full database backup.  Could you help me with a better way of doing this? We're on SQL Server 2005.

BACKUP LOG <mydbname> WITH NO_LOG

DBCC SHRINKDATABASE (<mydbname>)

And here's the answer I sent back:

How large is the database? And how long must you keep the backups around? If the cumulative size of the backups takes up a large proportion of your available storage space (and we're talking more than just a single direct-attached 100+GB drive), then it may be worth compressing the backups - otherwise you're likely causing yourself more trouble than its worth.
 
By doing BACKUP LOG WITH NO_LOG you're effectively throwing away log records and removing the possibility of doing any kind of point-in-time, or up-to-the-second recovery (see BACKUP LOG WITH NO_LOG - use, abuse, and undocumented trace flags to stop it). If you're running in the FULL recovery model, and you don't care about either of these features, then you should switch to the SIMPLE recovery model. If you really want to be in FULL, don't ever use WITH NO_LOG.

The amount of transaction log that a full backup requires cannot be changed by you truncating the log. The full backup will backup any log it requires to enable the restored database to be a transctionally consistent copy of the database. See Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes.

Doing a DBCC SHRINKDATABASE (the same exact operation as a database auto-shrink) will cause massive index fragmentation, and cause file-system fragmentation of the data files, as they will likely need to grow again after you've squeezed all the space out of them. See Auto-shrink - turn it OFF! for more details on the effects.

If you're really concerned about backup sizes and space is at a premium, I recommend using a 3rd-party backup compression tool such as LiteSpeed or HyperBac so you're not affecting the actual database. Remember also that SQL Server 2008 has native backup compression too - see my blog post here for more details.

Hope this helps

In the previous blog post I mentioned our partner company in Australia, so I'd better explain...

We have a new partner company - SQLskills.com.au - run by our good friend and fellow MVP Greg Linwood, along-side his other company MyDBA, which provides DBA support, consulting and staffing services to customers world-wide. By extending SQLskills.com into Australia, we can provide world-class training to the burgeoning SQL Server base in and around Australia, without customers having to travel to the US. As well as providing their own custom courses, the SQLskills.com.au team will be teaching courses developed by me, Kimberly, and Bob Beauchemin - using only the best instructors that we've personally taught and approved.

Update 12/10/08: the Australia classes have been pushed out to June 2009 for various reasons - watch the blog for more details.

The courses have already been running the last two months to great success and I'm very excited to announce that Kimberly and I are coming to Australia in February 2009 to teach four classes in the SQLskills.com.au training facility in Melbourne. The classes we have planned are:

Click the links for more details and registration info. The internals course (or equivalent knowledge) is really a pre-requisite for the perf tuning and maintenance courses that immediately follow it, as these courses will be really in-depth.

Checkout the SQLskills.com.au site for other classes they offer for developers and BI specialists.

We hope to see you in Melbourne next year!

This blog post explains the demo scripts and databases I've posted to cover all the Corruption Survival Techniques and DBCC CHECKDB sessions I've presented at conferences this year. There are two zip files you need to download: the example corrupt databases (36-MB zip) and the demo scripts. These are also both available through our past events pages.

The databases zip contains the following databases:

  • DemoDataPurity
    • 192-MB SQL Server 2005 database with a single 2570 (data purity) error
  • DemoFatalCorruption1
    • 1-MB SQL Server 2005 database with a corrupt system table (that allows CHECKDB to complete)
  • DemoFatalCorruption2
    • 1-MB SQL Server 2005 database with a corrupt system table (that terminates CHECKDB)
  • DemoNCIndex
    • 192-MB SQL Server 2005 database with a bunch of nonclustered index corruptions
  • DemoRestoreOrRepair
    • 1-MB SQL Server 2005 database with a page checksum failure (in fact a zero'd out page)
  • DemoCorruptMetadata
    • 1-MB SQL Server *2000* database with corrupt syscolumns table

The scripts zip contains the following directories:

  • 1 - Fatal Errors
    • This makes use of the DemoFatalCorruption1 and DemoFatalCorruption2 databases. The FatalErrors.sql script has the steps to follow and see this blog post for a complete walk-through.
  • 2 - NC Indexes
    • This makes use of the DemoNCIndex database. The NCIndexCorruption.sql script has the steps to follow and see this blog post for a complete walk-through.
  • 3 - Data Purity
    • This makes use of the DemoDataPurity database. The DataPurityCorruption.sql script has the steps to follow. I'll blog more details on this script sometime in the next week (and then update this post).
  • 4 - Metadata
    • This makes use of the DemoCorruptMetadata database. The CorruptMetadata.sql script has the steps to follow and see this blog post for a complete walk-through.
  • 5 - Restore or Repair
    • This makes use of the DemoRestoreOrRepair database. There's a setup script (01CreateRestoreOrRepair.sql) and two demo scripts - FixUsingPageRestore.sql and FixUsingRepair.sql. I'll blog more details on these scripts sometime in the next week (and then update this post).
  • 6 - Suspect Database
    • This demo doesn't come with a corrupt database - you create your own one. The SuspectDatabase.sql script has the steps to follow and see this blog post for a complete walk-through.

Enjoy!

It's been almost two weeks since my last post as we've been offline in Scotland (another photo post to follow - I owe you two now...) but now we're back for the crazy Fall conference and teaching season.

The latest installment of my regular Q&A column in TechNet Magazine is available at http://technet.microsoft.com/en-us/magazine/cc895648.aspx. This month I cover the following topics:

  • How backups and restores work and why the times for each may differ
  • The difference between log shipping and database mirroring around BULK_LOGGED operations
  • How other factors apart from log backups can contribute to excessively large transaction logs
  • Why database repair exists and why it shouldn't be used
  • Tracking index usage with sys.dm_db_index_usage_stats

Enjoy!

I got bored on the first leg of the journey from Seattle to London so thought I’d bang out a quick blog post.

After my previous post on checking whether a SQL instance is able to use instant initialization (see here), I had a discussion with Scott R., who regularly comments on blog articles. He proposed an alternative method of checking whether the SQL service account has the Perform volume maintenance tasks privilege (AKA SeManageVolumePrivilege) (or other useful privileges like Lock pages in memorySeLockMemoryPrivilege) and outlined the pros and cons of the various methods. With his permission, I’ve turned the information he provided into this blog post.

To use whoami /priv to find the SQL service account privileges you need to enable xp_cmdshell to do it from within SQL, or be logged into the Windows box as the SQL service account. Scott suggested using an alternative tool called AccessChk, written by my friend Mark Russinovich (formerly of Sysinternals, now a Technical Fellow at Microsoft). Using it you can find the privileges assigned to other users, services, or processes. The example syntax to do this for a SQL instance would be:

AccessChk –p sqlservr.exe –f –q -v

This allows you to find the privileges of the SQL service account without having to enable xp_cmdshell or login as the service account itself.

Let’s compare the whoami + xp_cmdshell combination with the AccessChk + command window combination.

Whoami + xp_cmdshell

Advantages:

  • You don’t need access to a command window.
  • You don’t need an administrator-capable user account.
  • Whoami may already be installed on a given Windosw server.

Disadvantages:

  • You need to enable xp_cmdshell (for a brief period – and have the authority to do so) to run whoami, which is often a separate security issue in many IT organizations.
  • Whoami can’t be run from a separate command window, because it can’t get the privileges from any user account other than the one which is running the command.  Since the goal is to get the privileges of the SQL service account, and that service account is often locked down from unnecessary privileges (such as interactive logon, using command windows, etc.), whoami can’t easily be used to derive this information outside the context of running from SQL Server via xp_cmdshell.
  • Whoami may not already be installed on a given Windows server.  This could be a barrier to using it for this purpose in IT organizations with rigid program install guidelines.

AccessChk + command window

Advantages:

  • You don’t need to enable xp_cmdshell, bypassing this separate security issue.
  • You don’t need to use the SQL service account as AccessChk can report on privileges of other running processes (and their implied user account) or other explicit user accounts, without requiring the security context of the desired user account to run AccessChk.
  • This approach may be used to automate capture of such information for configuration management and reporting purposes.

Disadvantages:

  • You need access to a command window on the Windows server.
  • You may need an administrator-capable user account (which you may not have) – I am not sure of this requirement, but it may be the case.
  • AccessChk is most likely not already installed on a given Windows server.  This could be a barrier to using it for this purpose in IT organizations with rigid program install guidelines, just like for whoami.

Summary

So is there a recommendation here? No, just a quick examination of the trade-offs with the two methods. Once again, I recommend you grant this privilege to the SQL service account for the massive performance boost it gives with data file creation or growth operations.

On Tuesday I recorded a couple of sessions with Richard and Greg on RunAs Radio. It's been a while since either Kimberly or I have been on the show - last November at TechEd in Barcelona. In the first session we discussed what it is to be an "involuntary DBA" - someone who's thrown into the DBA role with no training. This goes along nicely with the TechNet Magazine article I wrote for the August 2008 issue on Effective Database Maintenance (for the involuntary DBA). Kimberly also pops in from time to time when she's not coughing, we make fun of her, and I describe one of my favorite Calvin and Hobbes cartoons around collecting spit in a jar - all the usual silliness.

The show is 35 minutes long and you can download it at http://www.runasradio.com/default.aspx?showNum=72.

Enjoy!

Now this one's sure to spark some controversy...

I was checking my RSS feeds of the SQL blogs that I follow and noticed an interesting post on Kalen's blog (the post is here). She'd been lambasted for suggesting in the weekly SQL Server Magazine newsletter that *Microsoft* shouldn't provide a log reader tool and asked for comments. It's worth reading her original newsletter too, as it provides some interesting history of the development of log readers.

Donning my flame-proof suit, I wholeheartedly agree. I don't think Microsoft should spend engineering resources on a log reader tool that tries to show the SQL that generated the log records. So why shouldn't Microsoft build one? Well, IMHO, in a perfect world with no mistakes and perfect applications, there really isn't a need for a log reader tool. In an imperfect world, there is a need - but should Microsoft be the one to provide it? I think there are way more important tools and features that Microsoft should spend engineering resources on.

So why do people want a log reader tool, apart from curiosity? The three uses I see are change tracking, auditing, and the ability to rollback mistakes.

The first use, change tracking, is viable, and in fact the change data capture feature in 2008 is built on top of the transactional replication log reader Agent job (I'll post more on this, and I've just written an article on tracking changes in 2008 for the November issue of TechNet Magazine).

For auditing, how can a log reader tell whether the SQL statement was being run under a different security context, such as after an EXECUTE AS statement? How can it tell the difference between a single statement UPDATE with a multi-part WHERE clause, and multiple UPDATE statements of single rows? And on top of that, it needs to read through all the transaction log, causing contention on the log drive. In 2008 there's an in-built, synchronous auditing solution (SQL Server Audit), although it has issues with parameterized queries. In 2005, you could roll your own auditing by having all DML done through stored-procs that log what they did, for instance, or using DML triggers.

For the ability to rollback mistakes... don't get me started! Recovering from user mistakes is not a situation you want to be in - you can setup your system to avoid mistakes altogether. DDL triggers, DML triggers, schema separation, no direct access to tables for DML, and so on. Any of these are better to do than having to figure out what someone did so you can undo a mistake. Ok - so people make mistakes and you'd like to rollback one statement. How about using your backups? Oh, you don't have a good backup strategy. Well, that's where a log reader can help, if it works. But should Microsoft have to provide it?

Yes, I know the alternatives I mentioned above are sometimes easier said than done, especially with 3rd-party applications, but that's for the application writers to fix. Microsoft shouldn't have to provide a tool because of broken applications, or 3rd-party license agreements that preclude adding triggers, or DBAs that haven't implemented safe-guards. It already provides features that can stop mistakes happening, and allow auditing to happen. Why provide a log reader tool that allows other companies to then produce tools on top of that which do what SQL Server already provides out-of-the-box? And why provide a log reader at all when other companies do it already?

If anything, I'd like to see the existing tools be made to work in all circumstances. AFAIK there isn't a log reader tool on the market today that copes with absolutely everything 2005 can put into the log. Although the log internals are supposedly proprietary, there's an internals document that Microsoft licenses for free (at least when I was on the team until last summer) to companies wishing to build such tools, and they're not *secret*. There's plenty of info about the log internals on the web (some provided by me) and you can poke about to your heart's content using the undocumented tools (that's what DBCC LOG and fn_dblog are for).

However, what I'd *really* like to see is the need for a log reader tool to slowly die away as more devs and DBAs are educated and implement techniques for preventing the problems that log reader tools help to rectify (sometimes). One thing I haven't mentioned above is to have a log-shipping secondary with a load-delay - that way you have a redundant copy of the data that's always several hours behind your primary system. Or even using regular database snapshots.

Ok - that was a bit of a rant, and this is the same view when I worked for Microsoft too. To summarize, I don't think Microsoft should provide a log-reader tool. I see the need for them, when a system isn't setup to prevent mistakes happening, and there's no good backup or redundancy strategy, but I think that need can be filled by 3rd-party vendors.

Happy to hear well-thought-out arguments on this either way, either privately or as comments.

I woke up this morning and someone had replaced my wife with someone who likes to blog :-). Kimberly's turned over a new leaf and is going to blog much more often - in fact she's blogged 4 times today already. Check out her blog here.

Continuing on the transaction log theme of the last few Search Engine Q&A posts, this one addresses a question I've heard a few times, most recently on an MVP discussion group. Let me paraphrase:

If I have a transaction that inserts a huge amount of data, the transaction log grows to 50-GB. I then rollback the transaction. When I take the next log backup, it's way smaller than 50-GB. What's going on?

Let's see if we can repro the scenario. I've created a database with a 500-MB data file and a 1-MB log file, with 100-MB and 1-MB auto-growth intervals. I want the log to be as small as possible and to grow in small chunks so I can see just how much it *has* to grow by, rather than having a large growth size. Then I set the recovery mode to full and took a database backup to make sure the log won't truncate until it's backed up.

CREATE DATABASE LogSizeTest ON
   
(NAME = N'LogSizeTest',
   
FILENAME = N'C:\SQLskills\LogSizeTest.mdf',
   
SIZE = 512MB,
   
FILEGROWTH = 100MB)
LOG ON 
   
(NAME = N'LogSizeTest_log',
   
FILENAME = N'C:\SQLskills\LogSizeTest_log.ldf',
   SIZE = 1MB,
   
FILEGROWTH = 1MB);
GO

ALTER DATABASE LogSizeTest SET RECOVERY FULL;
GO

BACKUP DATABASE LogSizeTest TO DISK = 'C:\SQLskills\LogSizeTest.bak';
GO

Let's check the size of the log:

DBCC SQLPERF (LOGSPACE);
GO

Database Name  Log Size (MB) Log Space Used (%) Status
-------------- ------------- ------------------ -----------
LogSizeTest    0.9921875     36.66339           0

This gives back info for all databases, I've trimmed down the output just for the LogSizeTest database.

Now I'm going to create a table, start an explicit transaction and add about 500-MB of info to the table.

USE LogSizeTest;
GO
SET NOCOUNT ON;
GO
CREATE
TABLE Test (c1 INT IDENTITY, C2 CHAR (8000) DEFAULT (REPLICATE ('a', 8000)));
GO

BEGIN TRAN;
GO

DECLARE @count INT;
SELECT @count = 0;
WHILE (@count < 64000)
BEGIN
   
INSERT INTO Test DEFAULT VALUES;
   
SELECT @count = @count + 1;
END;
GO

Checking the log file size again gives:

DBCC SQLPERF (LOGSPACE);
GO

Database Name  Log Size (MB) Log Space Used (%) Status
-------------- ------------- ------------------ -----------
LogSizeTest    703.9922      99.98737           0

The log size has grown to about 700-MB, way more than the size of the data I was inserting, and it's completely full. Now let's rollback the transaction and check the log size again.

ROLLBACK TRAN;
GO

DBCC SQLPERF (LOGSPACE);
GO

Database Name  Log Size (MB) Log Space Used (%) Status
-------------- ------------- ------------------ -----------
LogSizeTest    703.9922      85.21268           0

The size of the log file is the same, but the percentage used has actually gone down! How can that happen? Let's take a backup and checkout it's size:

BACKUP LOG LogSizeTest TO DISK = 'C:\SQLskills\LogSizeTest_log.bck';
GO
RESTORE HEADERONLY FROM DISK = 'C:\SQLskills\LogSizeTest_log.bck';
GO

The BackupSize in the output from the RESTORE HEADERONLY is 631454208, which is 602.2-MB. Taking the numbers from the DBCC SQLPERF output above, 85.21268% of 703.9922-MB is 599.89-MB - so the backup is roughly the same size as the used transaction log. That's what I'd expect, but why is it smaller than the total size of the transaction log?

So what's going on? Why did the transaction log need to grow so much larger than it needed to, and why did the percentage used actually *drop* after the transaction rolled back?

The answer is in the way the transaction log works. Whenever a logged operation occurs in a transaction, there is some transaction log space reserved in case the transaction rolls back. The idea is that there's always enough space available in the transaction log for a transaction to roll back, without having to grow the transaction log and potentially have that fail. If a transaction could not roll back successfully because the log didn't have enough space, the database would become transactionally inconsistent, would be taken offline and the state changed to SUSPECT.

The behavior we saw was the Storage Engine reserving transaction log space for a potential roll back. When the roll back occured, the transaction log records necessary to undo the effects of the transaction (called compensation log records) are created and written to the log. The issue is that they usually don't take up as much space as the Storage Engine reserved, as it tends to be very conservative in its estimates of how much log space to reserve, to avoid the potential for SUSPECT databases. This explains the difference between the various sizes and percentages we saw above.

The Storage Engine code to do the reservations is quite interesting - I remember fixing a couple of bugs in it during SQL Server 2000 development in 1999 while I was getting to know the internals of the logging and recovery system before tackling some of the (since removed) log-reading code in DBCC CHECKDB in SQL Server 2000.

Anyway, there you have it. Log space reservation is the answer, and is also one of the reasons why it can be tricky to estimate how large a transaction log should be when a database is created.

Moving databases around is pretty common, as is moving databases between servers running different Editions of SQL Server, especially during a disaster recovery situation. You may not know this, but in SQL Server 2005, if you had partitioning anywhere in a database, you could only attach/restore that database using an Enterprise or Developer instance. I say 'you may not know this' because partitioning isn't really widely used (from what I can gather from talking to customers). The upshot of this is that SQL Server 2005 will refuse to attach/restore a database with partitioning in - even if you're in a disaster recovery situation and the only server you have available has Standard Edition.

In SQL Server 2008, a lot more people are going to bump into this issue. The list of features that are Enterprise only, and will prevent attach/restore using a lower edition has expanded to include 3 new features that WILL be much more commonly used than partitioning. The four features that are in this category are:

  • Data compression
  • Partitioning
  • Transparent data encryption
  • Change data capture

All of these require elevated permissions to enable EXCEPT data compression, which only requires ALTER permission on a table. This means someone with table-owner privileges could enable compression without the DBA knowing, and suddenly the database can't be attached/restored to, say, Standard Edition.

If you're a DBA and have just taken over a database, there's now an easy way to tell whether the database contains these features. A new DMV has been added - sys.dm_db_persisted_sku_features - that will report which of these four features are present in a database. Let's check it out.

Using a 2008 Enterprise instance:

CREATE DATABASE EnterpriseOnly;
GO
USE EnterpriseOnly;
GO

CREATE TABLE compressed (c1 INT) WITH (DATA_COMPRESSION = ROW);
GO

SELECT * FROM sys.dm_db_persisted_sku_features;
GO

feature_name    feature_id
--------------  -----------
Compression     100

Now let's try backing up and restoring the database on a different edition:

BACKUP DATABASE EnterpriseOnly TO DISK = 'C:\SQLskills\EnterpriseOnly.bck';
GO

And on a 2008 Express instance:

RESTORE DATABASE EnterpriseOnly FROM DISK = 'C:\SQLskills\EnterpriseOnly.bck'
   
WITH MOVE 'EnterpriseOnly' TO 'C:\SQLskills\EnterpriseOnly.mdf',
   
MOVE 'EnterpriseOnly_log' TO 'C:\SQLskills\EnterpriseOnly_log.ldf'
GO

Processed 160 pages for database 'EnterpriseOnly', file 'EnterpriseOnly' on file 1.
Processed 5 pages for database 'EnterpriseOnly', file 'EnterpriseOnly_log' on file 1.
Msg 3167, Level 16, State 1, Line 1
RESTORE could not start database 'EnterpriseOnly'.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.
Msg 909, Level 21, State 1, Line 1
Database 'EnterpriseOnly' cannot be started in this edition of SQL Server because part or all of object 'compressed' is enabled with data compression or vardecimal storage format. Data compression and vardecimal storage format are only supported on SQL Server Enterprise Edition.
Msg 933, Level 21, State 1, Line 1
Database 'EnterpriseOnly' cannot be started because some of the database functionality is not available in the current edition of SQL Server.

Now, it's cool that it tells you exactly why the database couldn't be restored, but did you notice the first two lines of output? The database is fully restored BEFORE the operation fails! This makes perfect sense, as the database needs to be fully recovered before the server can tell whether any of the four features are still enabled or not. However, in a disaster recovery situation, waiting many hours for a database to restore only to be told that it can't be restored on this instance would be even more disastrous.

To summarize, you should always know what's happening in databases you manage. If portability of your databases to a lower Edition is important, make sure that none of these features can be enabled without you knowing about it.

This is a quickie in response to a blog comment from my previous post on instant initialization - How to tell if you have instant initialization enabled?. The comment was:

I must say, I love instant initialization. It's awesome. But I always wondered why it's not available for the log file. I assume there's a technical reason... but what is it? Does it depend on having the rest of the file be zeroed out? Doesn't it already know where it's start and stop points are anyways, since the log is circular?

I couldn't remember the exact answer so I discussed with Peter Byrne on the Storage Engine dev team and now I have the answer to share. There is a lot of metadata kicking around in the Storage Engine about the transaction log (mostly in the boot page - see my post Search Engine Q&A #20: Boot pages, and boot page corruption), including where to start reading the log during crash recovery. However, there's nothing that can be used after a crash occurs to determine where the active transaction log ends (i.e. where should crash recovery stop processing log records).

The way this is done is to have each log sector have parity bits stamped on it. When the log is first created, it is zero-initialized (with zero being an illegal log sector parity value). As the log is written, each sector has parity bits in it. When the end of the log is reached, and it wraps around to the start of the log file, the parity bits are flipped, so that overwritten log sectors have the opposite parity from when they were last written. When a crash occurs, log sectors are read and processed until a log sector with an out-of-sequence parity is found.

This entire process will not work if there's already random data in the space used by the log file - some of the random data could just look like a valid set of parity bits and cause the recovery system to try to process a log sector full of garbage, leading to a suspect database, at best.

So - it's not just a "there wasn't time" - there really is a good, architectural reason why instant initialization cannot be done with the transaction log.

At the user group meeting on Monday I spent some time explaining how GUIDs can cause fragmentation in clustered indexes AND in non-clustered indexes, even if the GUID isn't specifically included in the non-clustered index key. GUIDs are essentially random values (pseudo-random in ranges if generated using NEWSEQUENTIALID) that are also unique. Their uniqueness is what makes them attractive to many developers as a key value, without understanding the havoc they can cause in production in terms of fragmentation and poor query performance.

A GUID key causes fragmentation because of its randomness. The insertion point of a new record in an index is dictated by the value of the index key, so if the key value is random, so is the insertion point. This means that if an index page is full, a random insert that happens to have to go onto that page will cause a page split to make room for the new record. A page-split is where a new page is allocated and (as near as possible to) half the rows from the splitting page are moved to the new page. The new row is then inserted into one of the two pages, determined by the key value. Usually the newly allocated page is not physically contiguous to the splitting page, and so fragmentation has been caused. In this case *two* kinds of fragmentation have been caused - logical fragmentation (where the next logical page as determined by the index order is not the next physical page in the data file) and physical (or internal) fragmentation (where space is being wasted on index pages). These can both affect query performance (topic for a later post), as well as the expense of having to do the page split in the first place.

It's fairly well known that GUIDs can cause fragmentation in the index where the GUID is the key (e.g. a clustered index), but not about the knock-on effects in non-clustered indexes. Here's an example - I'll create two clustered indexes with GUID keys (one generated from NEWID and one from NEWSEQUENTIALID), plus a non-clustered index on each. Let's see what happens when we insert 100000 rows:

-- Create a table with a GUID key
CREATE TABLE BadKeyTable (
   
c1 UNIQUEIDENTIFIER DEFAULT NEWID () ROWGUIDCOL,
   
c2 DATETIME DEFAULT GETDATE (),
   
c3 CHAR (400) DEFAULT 'a');
CREATE CLUSTERED INDEX BadKeyTable_CL ON BadKeyTable (c1);
CREATE NONCLUSTERED INDEX BadKeyTable_NCL ON BadKeyTable (c2);
GO

-- Create another one, but using NEWSEQUENTIALID instead
CREATE TABLE BadKeyTable2 (
   
c1 UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID () ROWGUIDCOL,
   
c2 DATETIME DEFAULT GETDATE (),
   
c3 CHAR (400) DEFAULT 'a');
CREATE CLUSTERED INDEX BadKeyTable2_CL ON BadKeyTable2 (c1);
CREATE NONCLUSTERED INDEX BadKeyTable2_NCL ON BadKeyTable2 (c2);
GO

DECLARE @a INT;
SELECT @a = 1;
WHILE (@a < 10000)
BEGIN
   
INSERT INTO BadKeyTable DEFAULT VALUES;
   
INSERT INTO BadKeyTable2 DEFAULT VALUES;
   
SELECT @a = @a + 1;
END;
GO

-- And now check for fragmentation
SELECT
   
OBJECT_NAME (ips.[object_id]) AS 'Object Name',
   
si.name AS 'Index Name',
   ROUND (ips.avg_fragmentation_in_percent, 2) AS 'Fragmentation',
   
ips.page_count AS 'Pages',
   
ROUND (ips.avg_page_space_used_in_percent, 2) AS 'Page Density'
FROM sys.dm_db_index_physical_stats (DB_ID ('DBMaint2008'), NULL, NULL, NULL, 'DETAILED') ips
CROSS APPLY sys.indexes si
WHERE
   
si.object_id = ips.object_id
   
AND si.index_id = ips.index_id
   AND ips.index_level = 0;
GO

Object Name    Index Name        Fragmentation  Pages  Page Density
-------------  ----------------  -------------  -----  ------------
BadKeyTable    BadKeyTable_CL    99.13          8092   66.08
BadKeyTable    BadKeyTable_NCL   30.97          78     64.1
BadKeyTable2   BadKeyTable2_CL   0.83           5556   96.26
BadKeyTable2   BadKeyTable2_NCL  1.88           372    99.61

The BadKeyTable_CL clustered index with the GUID key generated from NEWID is almost perfectly fragmented, with 34% space being wasted on each page. Conversely, the BadKeyTable2_CL clustered index with the GUID key generated from NEWSEQUENTIALID is hardly fragmented and only 4% of free space is wasted (and this is just because of the row size chosen). These numbers are entirely expected given the nature of the cluster keys.

Now look at the non-clustered indexes. BadKeyTable_NCL is 31% fragmented with 36% space wasted on each page! BadKeyTable2_NCL is harldy fragmented with no free space wasted on each page. So what's going on? The non-clustered index key in both cases is a datetime column, which has a minimum granularity of 3 milliseconds. The code above runs in a tight loop inserting records and so can insert more than one record per 3ms time interval - and I allow this because I didn't create make the non-clustered indexes unique. For all the records inserted in one 3ms time interval, there *has* to be something that makes the non-clustered index key unique internally (as even though an index can be defined as non-unique, the Storage Engine requires that each record really is unique in an index and will add whatever it needs to so that happens).

In this case, the clustered index key (which must be present in the non-clustered index anyway) is used to differentiate between all the non-clustered index records with the same datetime value. For BadKeyTable_NCL, the cluster key is a random GUID, so the non-clustered index record insertion points ALSO become random within each 3ms time interval - leading to the fragmentation above. The BadKeyTable2_NCL non-clustered index has the same time interval issue, but it's cluster key is a sequential GUID, so the non-clustered index doesn't get fragmented. If the non-clustered index key was a time datatype with a larger minimum granularity (like smalldatetime, or the new date), the fragmentation of BadKeyTable_NCL would be even worse - try it for yourself and you'll see.

So the answer to the question in the blog post title is really - it depends! Under the right conditions, a GUID cluster key can also seriously fragment a non-clustered index as well.

There's been a spate of problems on the forums and in my inbox with people's transaction logs filling up. With the increase in "involuntary DBAs", there are more people being thrust into the role of database administration without being given the chance to learn the skills, tips, and tricks necessary. One of the most common problems I see is the tranaction log filling up and the database grinding to a halt, and the most common cause of this is the lack of log backups when in the FULL recovery mode.

The very first post in my Search Engine Q&A series last September dealt with this problem - and showed two examples of filling up a transaction log plus the command to see why a transaction log is full (examining the log_reuse_wait_desc column in master.sys.databases). See here for more details. One thing I don't go into too much detail about in that post is the way that transactional replication or database mirroring can cause the log to grow. Both of these technologies rely on reading the transaction log for an unbroken stream of changes to be sent to the secondary systems (but with slightly different mechanisms) - but both will cause the log to grow if there is transaction log that hasn't yet been read (usually because of a performance issue). The same thing applies to the new Change Data Capture technology in SQL Server 2008 - it uses the transactional replication log reader to harvest changes from the log. The addition of any of these technologies could cause your log to grow excessively.

So it's grown too big - now what? Let's deal with the case where you've taken a full database backup in the FULL recovery mode but haven't taken any log backups. There are a number of options:

  1. Take a log backup! This might be easier said than done if you're like the customer I once had who had a 10GB database with an (I kid you not) 987GB log file.
  2. Switch to the SIMPLE recovery model (either permanantly or temporarily). If you're not interested in point-in-time recovery, there's no need for you to be in the FULL recovery mode (unless you're using a feature that requires it, like database mirroring, in which case you have no choice but to start taking log backups)
  3. In SQL Server 2005 and before, you can manually throw away the log contents - see my blog post here. This is not advisable.
  4. Grow the log file. This really just addresses the symptom but not the cause - the log will fill up again eventually.

Note that options #1-3 will just remove the restriction that the log contents are still needed (called "clearing" or "truncating" the log). None of these will cause the log file to shrink. The ONLY way to change the size of the log file is to use DBCC SHRINKFILE. After doing one of #1-3, you must manually shrink the log file using DBCC SHRINKFILE. If you do #2, and then go back into the FULL recovery mode, you should immediately take a full database backup. After shrinking the log file, you should manually grow it again using ALTER DATABASE so that it doesn't suffer repeated auto-grows and lead to performance issues with the log (see Kimberly's post here).

This topic is really a two-hour lecture in the database maintenance class I teach, but at least here (and with the blog post links) I've given you a bit of insight into why it can happen and how to solve the problem.

I've had a few follow-ups on my two posts about boot page and file header page corruption - asking if its possible to do single-page restore operations for these pages. Let's try:

CREATE DATABASE BootPageTest;
GO

-- Single page restore is only possible using the FULL recovery model
ALTER DATABASE BootPageTest SET RECOVERY FULL;
GO

BACKUP DATABASE BootPageTest TO DISK = 'C:\sqlskills\BootPageTest.bck';
GO
BACKUP LOG BootPageTest TO DISK = 'C:\sqlskills\BootPageTest.trn';
GO

RESTORE DATABASE BootPageTest PAGE = '1:9' FROM DISK = 'C:\sqlskills\BootPageTest.bck';
GO

Msg 3111, Level 16, State 1, Line 2
Page (1:9) is a control page which cannot be restored in isolation. To repair this page, the entire file must be restored.
Msg 3013, Level 16, State 1, Line 2
RESTORE DATABASE is terminating abnormally.

The answer is no. The following page types cannot be restored using single-page restore:

  • File header pages (see here)
  • Boot page (see here)
  • GAM, SGAM, DIFF map, ML map pages (see here)

As you may already know, instant initialization is a way to prevent data (not log) file create and grow operations having to zero-initialize the new space before allowing it to be used. This can vastly speed up these operations as zero-initialization can take a *long* time for large files. This is especially useful in disaster recovery operations as the first phase of a restore is always to create the requisite files, if they don't already exist. Cutting minutes or even hours from this phase can significantly reduce downtime. It's available on XP SP2 and Windows Server 2003 and above. You can get more details from a blog post of Kimberly's from March 2007.

The way to enable it is to give the SQL Server service account the 'Perform volume maintenance tasks' privilege and then restart the service. There's no way to enable or disable it from within SQL Server, and until now, no way I've known of to tell whether it's enabled from within SQL Server. I was teaching a Microsoft-internal class on Database Maintenance last week and one of the students came up with a neat way to tell - using xp_cmdshell to execute the whoami /priv command, which lists all the privileges that SQL Server service account has.

The whoami command is available on Windows Server machines but for XP you need to download the support tools from Microsoft to get it to work, as I did on my laptop. You can get them here (5MB download) and you need to stop/start SQL Server on XP after installing them so it picks up the new tools path. Now, most people will be running with xp_cmdshell turned off, because of the security risks involved with enabling it, so here's a script that turns it on, checks the privileges, and turns it off again.

EXEC sp_configure 'xp_cmdshell', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO

CREATE TABLE #xp_cmdshell_output (Output VARCHAR (8000));
GO

INSERT INTO #xp_cmdshell_output EXEC ('xp_cmdshell ''whoami /priv''');
GO

IF EXISTS (SELECT * FROM #xp_cmdshell_output WHERE Output LIKE '%SeManageVolumePrivilege%')
   
PRINT 'Instant Initialization enabled'
ELSE
   
PRINT 'Instant Initialization disabled';
GO

DROP TABLE #xp_cmdshell_output;
GO

EXEC sp_configure 'xp_cmdshell', 0;
GO
RECONFIGURE WITH OVERRIDE;
GO

Enjoy!

Well, we're just back from vacation (photo blog post to follow) and I've heard that the feature article on Effective Database Maintenance I wrote for the August issue of TechNet Magazine is live on the web. It also includes a 5 minute long screencast I recorded where I demo the effect of database shrink on index fragmentation.

You can get to the article at http://technet.microsoft.com/en-us/magazine/cc671165.aspx. The topics covered are:

  • Managing data and transaction log files
  • Eliminating index fragmentation
  • Ensuring accurate, up-to-date statistics
  • Detected corrupted database pages
  • Establishing an effective backup strategy

It's written around 2-300 level and presents a good overview (well, at least I think so :-)) of the concepts involved.

Also, the August SQL Q&A column is available at http://technet.microsoft.com/en-us/magazine/cc671180(TechNet.10).aspx. This month's topics on the web (more in the print magazine) are:

  • Database version changes with upgrades
  • Benefits of partitioning
  • Consistency checking options for VLDBs

Enjoy!

Over the last few weeks I've seen (and helped correct) quite a few myths and misconceptions about index rebuild operations. There's enough now to make it worthwhile doing a blog post (and it's too hot here in Orlando for us to go sit by the pool so we're both sitting here blogging)...

Myth 1:  index rebuild pre-allocates the necessary space

This myth has two variations:

  1. The space for the new copy of the index is pre-allocated
  2. The space for the sort portion of the rebuild is pre-allocated

Neither of these are true. Index rebuild (whether online or offline, and at least as far back as 7.0) will create a new copy of the index before dropping the old copy. The pages and extents required to do this will always be allocated as needed, as with any other operation in SQL Server. The sort phase of an index rebuild, if required (in certain cases it is skipped in 2005), will adhere to the same allocation behavior.

Myth 2: indexes are rebuilt within a single file in a multi-file filegroup

This is a new one that I just heard yesterday - (paraphrasing) "In a two-file filegroup, an index in file 1 will be rebuilt into file 2. The next time it is rebuilt, it will be built in file 1. And so on".

This is untrue. Any time any allocations are done in a multi-file filegroup, the allocations are spread amongst all the files using the allocation system's proportional fill algorithm. In a nutshell, this says that space will be allocated more frequently from larger files with more free space than from smaller files with less free space. There is no concept in SQL Server of limiting allocations to a particular file in a multi-file filegroup.

Myth 3: non-clustered indexes are always rebuilt when a clustered index is rebuilt

This is untrue. The rules are a little complex here but can be summed up as follows:

  • In 2005+, rebuilding a unique or non-unique clustered index (without changing its definition) will NOT rebuild the non-clustered indexes
  • In 2000:
    • Rebuilding a non-unique clustered index WILL rebuild the non-clustered indexes
    • Rebuilding a unique clustered index will NOT rebuild the non-clustered indexes

The first few service packs of 2000 had bugs that changed the behavior of rebuilding unique clustered indexes back and forth - this is the source of much of the confusion around this myth.

For a much more detailed discussion of this, see my blog post from last Fall - Indexes From Every Angle: What happens to non-clustered indexes when the table structure is changed?.

Myth 4: BULK_LOGGED recovery mode decreases the size of the transaction log and log backups for an index rebuild

This myth is partly true.

Switching to the BULK_LOGGED recovery mode while doing an index rebuild operation WILL reduce the amount of transaction log generated, which is very useful for limiting the size of the transaction log file (note I say 'file', not 'files' - you only need one log file).

Switching to the BULK_LOGGED recovery mode while doing an index rebuild will NOT reduce the size of the transaction log BACKUP. Although the operation will be minimally-logged, the next transaction log backup will read all the transaction log since the last backup plus all the extents that were changed by the minimally-logged index rebuild. This will result in a log backup that's almost exactly the same size as for a fully-logged index rebuild. The ONLY time a log backup will contain data extents is when a minimally-logged operation has taken place since the last log backup - see here on MSDN for more info.

If you're considering using the BULK_LOGGED recovery mode, beware that you lose the ability to do point-in-time recovery to ANY point covered by a transaction log backup that contains even a single minimally-logged operation. Make sure that there's nothing else happening in the database that you may need to effectively roll-back with P.I.T. recovery. The operations you should perform if you're going to do this are:

  • In FULL recovery mode, take log backup immediately before switching to BULK_LOGGED
  • Switch to BULK_LOGGED and do the index rebuild
  • Switch back to FULL and immediately take a log backup

This limits the time period in which you can't do P.I.T. recovery.

Myth 5: online index rebuild doesn't take any locks

This myth is untrue. The 'online' in 'online index operations' is a bit of a misnomer.  Online index operations need to take two very short-term table locks. An S (Shared) table lock at the start of the operation to force all write plans that could touch the index to recompile, and a SCH-M (Schema-Modification - think of it as an Exclusive) table lock at the end of operation to force all read and write plans that could touch the index to recompile.

The most recent time this came up on the forums was someone noticing insert queries timing out after an online index rebuild operation had just started. The problem is that the  table lock that online index rebuild needs has to be entered into the grant queue in the lock manager until it can be acquired - and it will stay there until existing transactions that are holding conflicting locks either commit or roll-back. Any transaction that requires a conflicting lock AFTER the index rebuild lock has been queued but not acquired (and then released) will wait behind it in the lock grant queue. If the query timeout is reached before the transaction can get it's lock, it will timeout.

This is still much better than the table lock being held for the entire duration of the index rebuild operation. For more info, checkout this whitepaper on Online Index Operations in SQL Server 2005.

My first magazine article is in print! I've taken over the bi-monthly SQL Q&A column for TechNet Magazine and I just received the June magazine in the mail today with my first column in it. Topics covered are:

  • Creating corruption and using page checksums
  • The shrink-grow-shrink-grow trap
  • How many databases can be mirrored per instance
  • A tip on changing the default server port, from Jens Suessmeyer

I've also just completed a feature article for either the July or August issue dealing with database maintenance for the 'involuntary' DBA - more details when it gets published.

If you don't get the print version of TechNet Magazine, you can get to this month's SQL Q&A column at http://technet.microsoft.com/en-us/magazine/cc510328.aspx. There may not be anything new if you've been following my blog for a while, but if you've just started, it's worth a quick look.

Enjoy!

PS Let me know if you've got any good questions - I've already completed the August column but I'd like to hear of any questions you may have for later columns.

Many times I've been asked to do a blog post about creating Agent alerts, and given that today I demo'd it as part of our Accidental DBA workshop at Connections, it seemed a good time to do the blog post too!

I demo this in the context of alerting a DBA when an 823 or 824 IO error occurs. One of my early blog posts (see here) explains what these are, as well as providing a corrupt database that you can use to see these errors happening.

The idea is that I want to know as soon as an IO error occurs so I can start recovering and take preventative action to stop it happening again. I don't want to rely on users telling me when a query hits an IO error, and I don't want to have to scan the SQL error logs to find them. So I'm going to create an alert.

The first step is to fire up Management Studio and make sure SQL Server Agent is running. Next we need to make there's actually an Operator defined - so the new alert has someone to actually alert! - so we'll use the New Operator wizard (see below for how to get there).

In the New Operator Wizard that appears, I've created an operator named 'SysAdmin'. There are three Notification Options you can use - email, net send, and pager. I've setup SysAdmin to use net send to my local machine. You need to make sure the Messenger service is enabled otherwise net send will not work. Also, be aware the net sends will NOT work unless the machine has a network connection - even if the net send source and destination are the same machine! Given the various issues with net send, it's better to use email or pager alerts - but for the purposes of this blog post its the easiest option.

Now let's create the new alert - using the New Alert wizard (see the below for how to get there).

In the New Alert Wizard that appears, I've created an alert named 'IO Errors' for all severity 24 errors on all databases. Below is a portion of the General tab of the wizard showing these settings:

I also need to specify what happens. In the Response tab of the wizard I've set the SysAdmin operator to be notified using net send. Again, see below.

In the Options tab I've checked the box to include the error text in the net send.

Now let's test it. Using the database called 'broken' that I provide as an example (see here), I'll force an IO error to occur. In my query window I get:

SELECT * from broken..brokentable;
GO

Msg 824, Level 24, State 2, Line 1

SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x7232c940; actual: 0x720e4940). It occurred during a read of page (1:143) in database ID 10 at offset 0x0000000011e000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\broken.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

And a few seconds later I get the net send:

Pretty cool!

You can also use the WMI Provider to do this - see Creating a SQL Server Agent Alert by Using the WMI Provider for Server Events.

Over the weekend there was a question on one of the internal aliases at MS: how can I tell what percentage of a database has changed since the last full backup, so I can choose between a differential or full backup?

No such code exists as far as I know - until now! I happened to read the thread while sitting in the airport in Washington D.C. on the way back from Iceland so I started playing around and this morning I completed the code.

The code below creates a function and a stored procedure. The basic idea behind the code is as follows:

For each online data file in the database
   For each GAM interval in the file
      Crack the DIFF map page using DBCC PAGE
      Interpret the DIFF bitmap to aggregate the changed extents
      Add the sum to the total changed extents for the database
   End
End
Report results

There's a function that I create in msdb call SQLskillsConvertToExtents that cracks some of the DBCC PAGE output, and the main procedure is called sp_SQLskillsDIFForFULL and it created as a system object in master. I tried making it a table-valued function but you can't do things like INSERT-EXEC in a function, and that's required for processing the DBCC PAGE output. So - create your own wrapper function or whatever to use it. The interface/output is:

EXEC sp_SQLskillsDIFForFULL 'msdb';
GO

Total Extents Changed Extents Percentage Changed
------------- --------------- ----------------------
102           56              54.9

I've tested it with databases with multiple files and up to around 700GB for a single file size. There's the potential for an issue with file sizes of 4TB and beyond (where PFS intervals and GAM intervals map to the same extent every 4TB or so, but I think it'll be ok and the position of the DIFF map in the extent won't change - if someone could test it with a 4+TB file I'd be grateful). It's been tested on SQL Server 2005 and 2008. It will not work on SQL Server 2000 - I'll do a 2000 version sometime soon.

Note that after doing a full backup you will never see Changed Extents equal to zero. It will always be 4 + (number of online data files - 1), and around 20 or so for msdb. This is because the extent containing the file header in each file is always marked as changed, as are three extents in the primary file containing the roots of some critical system tables.

Anyway - here it is. You can download it in a zip file from SQLskillsDIFForFULL.zip (2.65KB). Enjoy!

/*============================================================================
   
File: SQLskillsDIFForFULL.sql

   Summary: This script creates a system-wide SP SQLskillsDIFForFILL that
   
works out what percentage of a database has changed since the
   
previous full database backup.

   Date: April 2008

   SQL Server Versions:
         
10.0.1300.13 (SS2008 February CTP - CTP-6)
         
9.00.3054.00 (SS2005 SP2)
------------------------------------------------------------------------------
   
Copyright (C) 2008 Paul S. Randal, SQLskills.com
   
All rights reserved.

   For more scripts and sample code, check out 
      
http://www.sqlskills.com/

   You may alter this code for your own *non-commercial* purposes. You may
   
republish altered code as long as you give due credit.

   THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF 
   
ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED 
   
TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
   
PARTICULAR PURPOSE.

============================================================================*/

-- Create the function in MSDB
--
USE msdb;
GO

IF EXISTS (SELECT * FROM sys.objects WHERE NAME = 'SQLskillsConvertToExtents')
   
DROP FUNCTION SQLskillsConvertToExtents;
GO

-- This function cracks the output from a DBCC PAGE dump
-- of an allocation bitmap. It takes a string in the form
-- "(1:8) - (1:16)" or "(1:8) -" and returns the number
-- of extents represented by the string. Both the examples
-- above equal 1 extent.
--

CREATE FUNCTION SQLskillsConvertToExtents (
   
@extents VARCHAR (100))
RETURNS INTEGER
AS
BEGIN
   
DECLARE @extentTotal   INT;
   
DECLARE @colon         INT;
   
DECLARE @firstExtent   INT;
   
DECLARE @secondExtent  INT;

   SET @extentTotal = 0;
   
SET @colon = CHARINDEX (':', @extents);

   -- Check for the single extent case
   --
   IF (CHARINDEX (':', @extents, @colon + 1) = 0)
      
SET @extentTotal = 1;
   
ELSE
      
-- We're in the multi-extent case
      --
      BEGIN
      
SET @firstExtent = CONVERT (INT,
         
SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      
SET @colon = CHARINDEX (':', @extents, @colon + 1);
      
SET @secondExtent = CONVERT (INT,
         
SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      
SET @extentTotal = (@secondExtent - @firstExtent) / 8 + 1;
   
END

   RETURN @extentTotal;
END;
GO

USE master;
GO

IF OBJECT_ID ('sp_SQLskillsDIFForFULL') IS NOT NULL
   
DROP PROCEDURE sp_SQLskillsDIFForFULL;
GO

-- This SP cracks all differential bitmap pages for all online
-- data files in a database. It creates a sum of changed extents
-- and reports it as follows (example small msdb):
--
-- EXEC sp_SQLskillsDIFForFULL 'msdb';
-- GO
--
-- Total Extents Changed Extents Percentage Changed
-- ------------- --------------- ----------------------
-- 102           56              54.9
--
-- Note that after a full backup you will always see some extents
-- marked as changed. The number will be 4 + (number of data files - 1).
-- These extents contain the file headers of each file plus the
-- roots of some of the critical system tables in file 1.
-- The number for msdb may be round 20.
--
CREATE PROCEDURE sp_SQLskillsDIFForFULL (
   
@dbName VARCHAR (128))
AS
BEGIN
   
SET NOCOUNT ON;

   -- Create the temp table
   
--
   
IF EXISTS (SELECT * FROM msdb.sys.objects WHERE NAME = 'SQLskillsDBCCPage')
   
DROP TABLE msdb.dbo.SQLskillsDBCCPage;

   CREATE TABLE msdb.dbo.SQLskillsDBCCPage (
      
[ParentObject] VARCHAR (100),
      
[Object]       VARCHAR (100),
      
[Field]        VARCHAR (100),
      
[VALUE]        VARCHAR (100));

   DECLARE @fileID         INT;
   
DECLARE @fileSizePages  INT;
   
DECLARE @extentID       INT;
   
DECLARE @pageID         INT;
   
DECLARE @DIFFTotal      INT;
   
DECLARE @sizeTotal      INT;
   
DECLARE @total          INT;
   
DECLARE @dbccPageString VARCHAR (200);

   SELECT @DIFFTotal = 0;
   
SELECT @sizeTotal = 0;

   -- Setup a cursor for all online data files in the database
   
--
   
DECLARE files CURSOR FOR
      
SELECT [file_id], [size] FROM master.sys.master_files
      
WHERE [type_desc] = 'ROWS'
      
AND [state_desc] = 'ONLINE'
      
AND [database_id] = DB_ID (@dbName);

   OPEN files;

   FETCH NEXT FROM files INTO @fileID, @fileSizePages;

   WHILE @@FETCH_STATUS = 0
   
BEGIN
      
SELECT @extentID = 0;

      -- The size returned from master.sys.master_files is in
      
-- pages - we need to convert to extents
      
--
      
SELECT @sizeTotal = @sizeTotal + @fileSizePages / 8;

      WHILE (@extentID < @fileSizePages)
      
BEGIN
         
-- There may be an issue with the DIFF map page position
         
-- on the four extents where PFS pages and GAM pages live
         
-- (at page IDs 516855552, 1033711104, 1550566656, 2067422208)
         
-- but I think we'll be ok.
         
-- PFS pages are every 8088 pages (page 1, 8088, 16176, etc)
         
-- GAM extents are every 511232 pages
         
--
         
SELECT @pageID = @extentID + 6;

         -- Build the dynamic SQL
         
--
         
SELECT @dbccPageString = 'DBCC PAGE ('
            
+ @dbName + ', '
            
+ CAST (@fileID AS VARCHAR) + ', '
            
+ CAST (@pageID AS VARCHAR) + ', 3) WITH TABLERESULTS, NO_INFOMSGS';

         -- Empty out the temp table and insert into it again
         
--
         
DELETE FROM msdb.dbo.SQLskillsDBCCPage;
         
INSERT INTO msdb.dbo.SQLskillsDBCCPage EXEC (@dbccPageString);

         -- Aggregate all the changed extents using the function
         
--
         
SELECT @total = SUM ([msdb].[dbo].[SQLskillsConvertToExtents] ([Field]))
         
FROM msdb.dbo.SQLskillsDBCCPage
            
WHERE [VALUE] = '    CHANGED'
            
AND [ParentObject] LIKE 'DIFF_MAP%';

         SET @DIFFTotal = @DIFFTotal + @total;

         -- Move to the next GAM extent
         
SET @extentID = @extentID + 511232;
      
END

      FETCH NEXT FROM files INTO @fileID, @fileSizePages;
   
END;

   -- Clean up
   
--
   
DROP TABLE msdb.dbo.SQLskillsDBCCPage;
   
CLOSE files;
   
DEALLOCATE files;

   -- Output the results
   
--
   
SELECT
      
@sizeTotal AS [Total Extents],
      
@DIFFTotal AS [Changed Extents],
      
ROUND (
         
(CONVERT (FLOAT, @DIFFTotal) /
         
CONVERT (FLOAT, @sizeTotal)) * 100, 2) AS [Percentage Changed];
END;
GO

-- Mark the SP as a system object
--
EXEC sys.sp_MS_marksystemobject sp_SQLskillsDIFForFULL;
GO

-- Test to make sure everything was setup correctly
--
EXEC sp_SQLskillsDIFForFULL 'msdb';
GO

This came up several times during the week so I thought it was about time to blog about it. One of the new features we put into SQL Server 2005 was storing the last time that DBCC CHECKDB completed successfully (called the last-known good time). What does successfully mean? This is the catch - if DBCC CHECKDB runs to completion then it considers that a successful run - EVEN if it found some corruptions. However, the last-known good time is ONLY updated if there were NO corruptions found. Slightly confusing I know.

Cool - but how can you see it? Well, the only time it's ever reported is when the database starts up. Not too useful if the database has been running for months. So how to see it??? The trick is to use DBCC PAGE. The last-known good time is stored in the boot page of the database - page 9. The following code will dump page 9 for you:

-- you need this to get the DBCC PAGE output to your console
DBCC TRACEON (3604);
GO

-- page 9 is the boot page
DBCC PAGE (dbname, 1, 9, 3);
GO

You need to look for the dbi_dbccLastKnownGood field. That was the last time that DBCC CHECKDB ran without finding any corruptions.

Now - what about if you're trusting your SQL Agent jobs to run DBCC CHECKDB for you every so often (how often is a whole other discussion...) and relying on the Agent job failing if DBCC CHECKDB finds corruptions. Well, if all you do is run the DBCC command, you're never going to know about corruption unless DBCC CHECKDB itself fails for some reason - remember, it returns successfully even it it found corruptions.The key is to add another statement after you run DBCC that checks the value of @@ERROR (the last error code reported by SQL Server). Although DBCC CHECKDB doesn't stop with an error when it finds corruption, it will set @@ERROR if there are any error messages in its output.

Edit: The stuff I've struck-through above is not true. I thought I'd remembered it from testing previously and I've heard plenty of anecdotal evidence from customers too BUT an Agent job WILL fail if a DBCC CHECKDB within it fails. I went back and forth with Tara Kizer on SQLteam.com and we tested on 2005 and 2000 - and it worked. Now here is a real issue - the Job History that's captured will not contain all the output from DBCC CHECKDB - especially if you didn't use the WITH NO_INFOMSGS option. You should make sure you capture the output to a file to avoid having to go back and run DBCC CHECKDB all over again.

Summary - make sure you really know when DBCC CHECKDB runs successfully, without finding any corruptions!

Here's an interesting question that came in to our questions line (questions@SQLskills.com - no guarantee of an answer - I check it every so often):

I have seen demonstrations where a large database being broken down into smaller ones using synonym names. I think it was used on a data warehouse and allowed smaller database backups instead of doing one large one and simulating file group backups. Is there somebody who has worked with this variation and can identify when this would be an advantage over file group backups if there is even an advantage.

My answer will always be to keep the VLDB (Very Large DataBase) as a single unit and go with filegroups if you need to. Breaking the VLDB into smaller databases has some serious issues:

  • Queries become more complicated as they're now potentially cross-database. This means you need to keep all the security settings in all the databases synchronized.
  • Referential integrity becomes a big problem as you can't create foreign key constraints across databases
  • You have multiple transaction logs to manage instead of one. This means you need to be doing log backups of ALL the databases, vastly increasing the number of backup files to manage.
  • Point-in-time recovery becomes very hard as you have to restore ALL the databases to a single point-in-time. Now, this may not be too much of a problem if the data in the VLDB is essentially read-only, and gets updated en-masse every so often from your OLTP system - but for changing data it's a nightmare.
  • Implementing a high-availability solution becomes very challenging. As soon as you start to think of multiple databases that need to be in sync, you can pretty much forget about log shipping and database mirroring. You're going to need whole-instance failure protection - which means failover clustering. Then if you want to mitigate the single-point-of-failure in a failover cluster (the shared disks), you're going to need SAN replication to a remote failover cluster too - expensive!!!

These are just the ones that spring to mind in 5 minutes - I'm sure there are more if I sat and thought about it longer (e.g. how to create a database snapshot, run a consistency check, ...)

So - IMHO it's always going to be easier to backup and restore a single VLDB split into filegroups than a VLDB split into multiple databases.

PS If there's something you'd like to see me do a blog post on, shoot me an email here.

A couple of weeks ago I blogged about the three tracks of the SQL Server 2008 JumpStart course that SQLskills.com taught internally for Microsoft and some MVPs - see here for details. Well, the content is now available to download! Note that this was based on CTP-5 (November 2007 CTP) and there have been *lots* of behavioral changes since then (with more planned for CTP-6 Refresh and RTM), but if you want a high-level overview of a bunch of the features (albeit in bullet-point summaries on slides) then this is a good place to start.

For me, what's *REALLY* cool is that the site also has a downloadable VPC plus lab manuals for all of the AlwaysOn High-Availability hands-on labs that SQLskills.com wrote. We originally wrote these labs for SQL Server 2005 and I updated them all for CTP-5. The VPC has a long lab on each of the following:

  • Database Snapshots
  • Data Recovery and Preventative Techniques
  • Instant Initialization
  • Peer-to-Peer Replication (including the new Topology Wizard I blogged about here)
  • Table and Index Partitioning
  • Snapshot Isolation
  • Online Operations
  • Database Mirroring (including a demo I wrote of Automatic Page Repair, described here)
  • Service Oriented Database Architecture

There is some great depth in each of these - Kimberly blogged more info about the exercises in each lab here. There's also another VPC image with some higher-level labs on a variety of 2008 features and written by a number of different people- including some labs on Policy-Based Management and Performance Data Collection that Kimberly wrote.

So - where can you get these from? Go to http://sqlserver2008jumpstart.microsofttraining.com/ and hit the Download link on the right-hand side. Register and then you can get to the materials. The AlwaysOn VPC image is Collection 2 at the bottom of the page, and you'll see all the slide decks as you scroll down the page.

Enjoy!

Almost a year ago to the day I asked a question on my old blog - how long does *your* CHECKDB take to run? Well, I had a bunch of replies but I never got around to posting the results. I was reminded of this recently so threw together some graphs of the results. I've plotted database size against run-time of DBCC CHECKDB, for a number of different numbers of CPUs, and mostly on SQL Server 2000. I've also added a linear trend line too.

Now - bear in mind this doesn't take into account the multitude of other factors that can affect how long DBCC CHECKDB takes to run (see my previous post here for the list).

Hopefully these are interesting to some of you!

It seems like all I've been talking about on the forums the last couple of days is the correct order of operations in a maintenance plan. The biggest confusion seems to be about the effect of rebuilding an index on statistics, with some incorrect advice being given out on what to do.

Rebuilding an index will update statistics with the equivalent of a full scan - doesn't matter whether you use DBCC DBREINDEX or ALTER INDEX ... REBUILD, the effect is the same. It can do this because it sees a complete picture of the index while its doing the rebuild.

Reorganizing an index (using the old DBCC INDEXDEFRAG I wrote, or the new ALTER INDEX ... REORGANIZE) will NOT update statistics at all, because it only sees a few pages of the index at a time.

The problem I've been seeing is people rebuilding indexes and then updating statistics. So why is this a problem? Well, it depends :-)

  • If your default for updating statistics is to do a sampled scan, then having rebuild update the statistics with a full scan (as a side effect), and then proceeding to manually update them again with the default sampled scan, means that after both operations you're left with sampled statistics. You've wasted resources doing the sampled scan AND you've lost the 'free' full-scan statistics that the index rebuild did for you.
  • If your default is to do a full scan, then you don't lose out on having the best statistics, but you do waste resources by unnecessarily updating statistics a second time.

So what's the solution?

The simple answer is not to update statistics on indexes that have just been rebuilt.

The more complicated answer is to:

  1. Have a list of indexes (F) that you know will cause workload performance problems if they get fragmented
  2. Have a list of indexes (S) that you know will cause workload performance problems if the statistics don't get regularly updated
  3. Evaluate fragmentation for list F, and choose to reorganize, rebuild, or do nothing
  4. For all indexes in list S that were not rebuilt in step 3, update statistics

Hope this helps.

While we were in Barcelona we sat down with Richard Campbell and Greg Hughes from RunAs Radio to record a 1/2 hour interview on SQL Server 2008. We touch on a ton of different features (look at the number of Categories I've tagged this with!) and have a bunch of laughs along the way - check it out here.

PS There's been a ton of interest in the slide deck idea I had so we'll be going ahead with that. Look for an announcement sometime in the first few months of next year about how to get them. Thanks to everyone that replied!

This is a question that comes up a lot - how to run consistency checks on a VLDB?

We're talking hundreds of GBs or 1 TB or more. These databases are now common on SQL Server 2000 and 2005 - at TechEd IT Forum this week (and at SQL Connections the week before) there are many customers with multi-TB databases. Any experienced DBA knows the value of running consistency checks, even when the system is behaving perfectly and the hardware is rock-solid. The two problems that people have with running a full CHECKDB on their VLDB are:

  • It takes a long time to run (based on many factors – see my previous post here for details).
  • It uses lots of resources – memory, CPU, IO bandwidth, tempdb space.

So it uses lots of resources for a long time. Even with a decent sized maintenance window, the CHECKDB may run over into normal operations. There's also the case of a system that's already pegged in more or more resource dimensions. Whatever the case, there are a number of options:

  • Don't run consistency checks
  • Run CHECKDB using the WITH PHYSICAL_ONLY option
  • Use SQL Server 2005's partitioning feature and devise a consistency checking plan around that
  • Figure out your own scheme to divide up the consistency checking work over several days
  • Offload the consistency checks to a separate system

Let's look at each in turn.

Don't run consistency checks

A lot of people end up doing this as they can’t figure out how to run consistency checks on their DB. Don't even think about using this option – there’s always a way to do it. If you absolutely cannot figure out a way to get consistency checks on your system, send me email and I'll help you – I’ve helped many customers with all the schemes below, both in and out of Microsoft.

Use WITH PHYSICAL_ONLY

A full CHECKDB does a lot of stuff - see previous posts in this series for more details. You can vastly reduce the run-time and resource usage of CHECKDB by using the WITH PHYSICAL_ONLY option. With this option, CHECKDB will:

  • Run the equivalent of DBCC CHECKALLOC (i.e. check all the allocation structures)
  • Read and audit every allocated page in the database

So it skips all the logical checks, inter-page checks, and things like DBCC CHECKCATALOG. The fact that all allocated pages are read means that:

  • Any pages that cannot be read at all (i.e. 823 errors) will be discovered
  • If page checksums are enabled in SQL Server 2005, any corruptions caused by the IO subsystem will be discovered as the page checksum will be checked as part of reading the page into the buffer pool

So there's a trade-off of consistency checking depth against run-time and resource usage - but this option will pick up problems caused by the IO subsystem as long as page checksums are enabled and present.

Use the SQL Server 2005 partitioning feature

One of the obvious ways to reduce the time/resources issue is to partition the load. If you're using the partitioning feature in SQL Server 2005 then you're already setup for this. Given that you've hopefully got your partitions stored on separate filegroups, you can use the DBCC CHECKFILEGROUP command.

Consider this example - you have the database partitioned by date such that the current month is on a read-write filegroup and the past 11 months are on 11 read-only filegroups (data from more than a year ago is on some offline storage medium). The prior months also have multiple backups on various media so are considered much 'safer' than the current month. It makes sense that you don't need to check the read-only filegroups as often as the current month's filegroup so an example consistency checking scheme could be:

  • Run a DBCC CHECKFILEGROUP on each read-only filegroup every week or two
  • Run a DBCC CHECKFILEGROUP on the read-write filegroup every day or two (depending on the stability of the hardware, the criticality of the data, and the frequency and comprehensiveness of your backup strategy).

I know of several companies who've made the decision to move to SQL Server 2005 in part because of this capability to easily divide up the consistency checking.

Beware that until SP2 of SQL Server 2005, DBCC CHECKFILEGROUP would not check a table at all if it was split over multiple filegroups. This is now fixed and DBCC CHECKFILEGROUP will check partitions on the specified filegroup even if the table is now completely contained on the filegroup.

Figure out your own way to partition the checks

If you're on SQL Server 2000, or you just haven't partitioned your database on SQL Server 2005, then there are ways you can split up the consistency checking workload so that it fits within a maintenance window. You basically need to simulate what CHECKDB does in a staggered manner. Here's one scheme that I've recommended to several customers:

  • Figure out your largest tables (by number of pages) and split the total number into 7 buckets, such that there are a roughly equal number of database pages in each bucket.
  • Take all the remaining tables in the database and divide them equally between the 7 buckets (using number of pages again)
  • On Sunday:
    • Run a DBCC CHECKALLOC
    • Run a DBCC CHECKCATALOG
    • Run a DBCC CHECKTABLE on each table in the first bucket
  • On Monday, Tuesday, Wednesday:
    • Run a DBCC CHECKTABLE on each table in the 2nd, 3rd, 4th buckets, respectively
  • On Thursday:
    • Run a DBCC CHECKALLOC
    • Run a DBCC CHECKTABLE on each table in the 5th bucket
  • On Friday and Saturday:
    • Run a DBCC CHECKTABLE on each table in the 6th and 7th buckets, respectively

In pre-RTM builds of SQL Server 2005, DBCC CHECKTABLE could not bind to the critical system tables, just like with T-SQL - but that's fixed so you can cover all system tables in SQL Server 2000 and 2005 using the method above.

There's one drawback to this method - a new internal database snapshot is created each time you start a new DBCC command, even for a DBCC CHECKTABLE. If the update workload on the database is significant, then there could be a lot of transaction log to recover each time the database snapshot is created - leading to a long total run-time. In this case, you may need to alter the number of buckets you use to make the total operation fit within your available window.

Use a separate system

This alternative is relatively simple - restore your backup (you are taking regular backups, right?) on another system and run a full CHECKDB on the restored database. This offloads the consistency checking burden from the production system and also allows you to check that your backups are valid. There are some drawbacks to this however:

  • You need to have sufficient disk space on the spare system to be able to restore the backup onto. If the production database is several TB, you need the same several TB on the spare box. This equates to a non-trivial amount of money - initial capital investment plus ongoing storage management costs. (Hopefully a future release will alleviate this – while at Microsoft I invented and patented a mechanism for consistency checking a database in a backup without restoring it.)
  • If the consistency checks find an error, you don't know for sure that the database is corrupt on the production system. It could be a problem with the spare box that's caused the corruption. The only way to know for sure is to run a consistency check on the production system. This is a small price to pay though, because most of the time the consistency checks on the spare system will be ok, so you know the production database was clean at the time the backup was taken.

Summary

You've got a bunch of choices to allow you to run consistency checks, so there's really no excuse for not knowing (within a reasonable timeframe) that something's gone wrong with your database.

This is a subject I posted about last year on my old blog but it came up at SQL Connections last week several times so I want to repost it for those who’ve just started following my blog.

There's only one time when you should be trying to work out how long a CHECKDB is going to take - when you're planning your regular database maintenance. If you're faced with a corrupt (or suspected corrupt) database and you're only just starting to think about how long a CHECKDB is going to take - you've made a mistake while planning your disaster recovery strategy. You always need to know how long CHECKDB takes (on average) to run for your database so:

  • You can tell whether a particular run of CHECKDB is taking longer than usual - a sign that it's found some corruption
  • You know how long it will take to get results in a disaster recovery situation

At every conference I go to, someone asks me how long CHECKDB will take to run on their database. There are several ways I could answer this:

  • The unhelpful answer - I've got no idea.
  • The almost-helpful answer - how long did it take to run last time and are the conditions exactly the same?
  • The answer I usually give - it depends.

Now, many people would see the third answer as being somewhat equivalent to the first answer - unhelpful. The problem is that there are many factors which influence how long CHECKDB will take to run. Let me explain the ten most important factors so you get an idea why this is actually a helpful answer. These aren't in any particular order of importance.

1) The size of the database

Pretty obvious... CHECKDB has to read every allocated page in the database so the bigger it is, the longer it will take to read all the pages.

2) Concurrent IO load on the server

At the simplest level, what is CHECKDB going to do? It reads every allocated page in the database. That's a lot of IO. CHECKDB takes great pains to do the most efficient IO it can and read the database pages in their physical order with plenty of readahead so that the disk heads move smoothly across the disks (rather than jumping around randomly and incurring disk head seek delays). If there's no concurrent IO load on the server, then the IOs will be as efficient as CHECKDB can make them. However, introducing any additional IO from SQL Server means that the disk heads will be jumping around - slowing down the CHECKDB IOs. If the IO subsystem is at capacity already from CHECKDB's IO demands, any additional IO is going to reduce the IO bandwidth available to CHECKDB - slowing it down.

3) Concurrent CPU activity on the server

At the next level of simplicity, CHECKDB is going to process every page it reads in some way. Depending on the various options you've specified and the database schema (details below), that's going to use a lot of CPU - it's possible that the server may be pegged at 100% CPU when CHECKDB is running. If there's any additional workload on the server, that's going to take CPU cycles away from CHECKDB and it going to slow it down.

Basically what points #2 and #3 are saying is that CHECKDB is very resource intensive! It’s probably one of the most resource intensive things you can ask SQL Server to do and so it's usually a good idea to not run it during peak workload times, as you'll not only cause CHECKDB to take longer to run, you will slowdown the concurrent workload, possibly unacceptably.

4) Concurrent update activity on the database

This is relevant for both SQL 2000 and SQL 2005, but for different reasons.

In SQL 2000, CHECKDB gets its consistent view of the database from transaction log analysis of concurrent DML transactions (see here for details). The more concurrent DML there is while CHECKDB is running, the more transaction log will be generated - and so the longer it will take for CHECKDB to analyze that transaction log. It's possible that on a large multi-CPU box with a ton of concurrent DML and CHECKDB limited to a single CPU that this phase of CHECKDB could take several times longer than the reading and processing of the database pages! (I've seen this in real-life several times.)

In SQL 2005, CHECKDB gets its consistent view of the database from a database snapshot, which is stored on the same disk volumes as the database itself. If there are a lot of changes in the database while CHECKDB is running, the changed pages are pushed to the snapshot so that it remains consistent. As the snapshot files are stored in the same location as the database files, every time a page is pushed to the snapshot, the disk head has to move, which interrupts the efficient IO described in #2. Also, whenever CHECKDB goes to read a page and it needs to read the page from the snapshot files instead of the database files, that's another disk head move, and another efficient IO interruption. The more concurrent changes to the database, the more interruptions to efficient IO and the slower that CHECKDB runs.

5) Throughput capabilities of the IO subsystem

This one's simple. CHECKDB is going to do a boat-load of IOs and it could even end up being IO-bound (meaning that the CPUs are idle periodically waiting for IOs to complete) depending on the options specified and the database schema. This means that the throughput of the IO subsystem is going to have a direct effect on the run-time of CHECKDB. so, if you have a 1TB database and the IO subsystem can only manage 100MB/sec, it's going to take almost 3 hours just to read the database (1TB / 100MB / 3600 secs) and there's nothing you can do to speed that up except upgrade the IO subsystem.

I've lost count of the number of times I've heard customers complain that CHECKDB (or index rebuilds or other IO-heavy operations) are running sloooowly only to find that the disk queue lengths are enormous and the IO subsystem it entirely unmatched to the server and workload.

6) The number of CPUs (processing cores) on the box

This also really encompasses the Edition of SQL Server that's being run. In Enterprise Edition, CHECKDB can run in parallel across all the CPUs in the box (or as many as the query processor decides to parallelize over when the CHECKDB internal queries are compiled). Running in parallel can give a significant performance boost to CHECKDB and lower run times, as long as the database is also spread over multiple files too (so the IOs can be parallelized). There's a nifty algorithm that’s used that allows CHECKDB to run in parallel which I'll explain in detail in a future post.

On the other hand, the fact that CHECKDB can run in parallel in Enterprise Edition can be bad for some scenarios, and so some DBAs chose to force CHECKDB to be single-threaded. SAP typically recommends this to help with user query predictability. The way to do this is to turn on the documented trace flag 2528.

7) The speed of the disks where tempdb is placed

Running CHECKDB against a VLDB uses lots of memory for internal state and for VLDBs the memory requirement usually exceeds the amount of memory available to SQL Server. In this case, the state is spooled out to tempdb and so the performance of tempdb can be a critical factor in CHECKDB performance. See this post for more details of this and how CHECKDB can run out of disk space if tempdb is too small.

8) The complexity of the database schema

This can have a really high impact on the run-time of CHECKDB because it impacts the amount of CPU that CHECKDB requires. For example, the most expensive checks that CHECKDB does are for non-clustered indexes. It needs to check that each row in a non-clustered index maps to exactly one row in the heap or clustered index for the table, and that every heap/clustered index row has exactly one matching row in each non-clustered index. Although there's a highly efficient algorithm for doing this, it still takes around 30% of the total CPU that CHECKDB uses!

There are a bunch of other checks that are only done if the features have been used in the database - e.g. computed column evaluation, links between off-row LOB values, Service Broker, XML indexes, indexed views - so you can see that empirical factors along aren't enough to determine the run-time.

9) Which options are specified

This is almost the same as #7 in that by specifying various options you're limiting what checks CHECKDB actually performs. For instance, using the WITH NOINDEX option will turn off the non-clustered index checks that I described in #7 and using the WITH PHYSICAL_ONLY option will turn off all logical checks, vastly decreasing the run-time of CHECKDB and making it nearly always IO-bound rather than CPU-bound (in fact this is the most common option that DBAs of VLDBs use to make the run-time of CHECKDB manageable).

One thing to be aware of - if you specify any repair options, CHECKDB always runs single-threaded, even on a multi-proc box on Enterprise Edition.

10) The number and type of corruptions that exist in the database

Again, this is similar to #7 and #8. If there are any corruptions present, there may be extra checks triggered to try to figure out more details of the corruptions. For instance, for the non-clustered index checks, the algorithm is tuned very heavily for the case when there are no corruptions present (the overwhelming majority of cases considering the millions of times CHECKDB is run every day around the world). When a non-clustered index corruption is detected, a more in-depth algorithm has to be used to figure out exactly where the corruption is, which involves re-scanning a bunch of data and so taking a bunch more time. There are a few other algorithms like this too.

Summary

So you can see that there's no simple answer.

One of the sessions we did yesterday at IT Forum was an Instructor-Led Lab on database snapshots, an Enterprise Edition feature of SQL Server 2005. Database snapshots are not particularly well known in the DBA community and there are many misconceptions about them. The session generated a bunch of questions, some of which I'll answer here.

Q1) Can you change the data in table in a database snapshot?

A1) No.

Q2) Can you change the permissions in a database snapshot?

A2) Nope, sorry.

Q3) Can you backup a database snapshot?

A3) Afraid not.

A4) Can you detach a database snapshot?

A4) Err, nope.

You may have noticed a pattern here :-) Basically, the only things you can do with a database snapshot are select from it, and revert to it - i.e. rollback the entire database on which it is based to the point-in-time at which the database snapshot was created. Database snapshots are not updateable in any way, and at the time I left MS, there were no plans to change that in the future. Regardless of these limitations, database snapshots are very useful. I searched both our blogs for some examples to link to and couldn't find any so I'll put together a post on using them sometime over the next week or so.

The final question I *was* able to answer successfully for a very happy conference attendee:

Q5) Can you create a partial snapshot on a database, say a single filegroup, for reporting?

A5) No - HOWEVER, if you're on Enterprise Edition, you can make use of partial database availability. Hopefully you have your database structured so the primary filegroup has nothing but system tables in, and you have the table you're interested in isolated in a seperate filegroup - let's call it filegroup X. (He answered 'yes' to all three!). If that is that case, backup the primary filegroup and filegroup X. Then restore the primary filegroup followed by filegroup X in a separate location. As long as the primary filegroup of a database is online in SQL Server 2005 Enterprise Edition, then the database is online and any other online filegroups are available - partial database availability. This means you've effectively created a point-in-time snapshot of a single filegroup, albeit at the expense of having to have a full copy of the filegroup.

Last week at SQL Connections someone said that CHECKDB’s output is ‘useless’.  Given that I wrote CHECKDB for SQL Server 2005 I was only mildly offended :-) But there’s a lot of truth in that statement – the error messages from CHECKDB do tell you exactly what’s corrupt in the database but they’re not exactly easy to read and understand unless you’re intimately familiar with the on-disk structures used to store a SQL Server database. It’s pretty easy to understand the output if there’s only one or two error messages, but once you get past 10 or so it becomes extremely difficult to figure out what the actual problem is for normal people (i.e. not CHECKDB geeks like me, Ryan or a handful of senior people in Product Support)

So, how are you supposed to figure out what’s wrong? In this post I want to run through a few tips and tricks you can use.

  • If the recommended repair level (at the very bottom of the output) is REPAIR_REBUILD then only non-clustered indexes are damaged. This meanscyou don’t need to run repair (which needs the database in single-user mode) or restore from backups. You can just manually rebuild the damaged indexes to fix the problem, possibly even online if you're running Enterprise Edition and the index doesn't have any LOB columns.
  • There are a couple of cases where even if it’s only non-clustered indexes that are damaged, REPAIR_ALLOW_DATA_LOSS may still be the recommended repair option. In this case, check the index ID in all the error messages – if all index IDs are 2 or higher, then it’s only non-clustered indexes that are damaged and you can rebuild them yourself instead of running repair/restore.
  • Figure out a list of tables that are affected by looking at the summary messages about how many corruptions were found per table. If each of the tables is isolated on a separate filegroup then these filegroups can be taken offline and restored individually, possibly without interrupting the application workload. If there's no backup, then you can run DBCC CHECKTABLE with repair, for a faster repair operation than running a full CHECKDB.
  • Figure out the list of damaged pages. There are a number of error messages that could point to a damaged page – usually 8928 or 2537 – and then you can choose to do single-page restores from your backups to minimize downtime, possibly even online in Enterprise Edition.
  • Are there any errors that CHECKDB can’t repair?  If so, you have no choice but to restore from backups or to extract as much info as possible into a new database. Things to look for that mean repair won’t be able to fix everything are:
    • CHECKDB stops early and complains about system table pre-checks failing (errors 7984 – 7988 inclusive)
    • CHECKDB reports any metadata corruption (8992, 8995 errors)
    • CHECKDB reports any errors on PFS page headers (8939, 8946 errors with a possible 8998 error as well)

While I was at Microsoft I wrote two 200-page+ documents detailing all the CHECKDB error messages for SQL Server 2000 and 2005 – the team says they should all be translated into Books Online entries by next summer – a bunch of them are there already. In the meantime, I hope the hints above will take a little of the mystery out of CHECKDB’s output.

I’m in the middle of a flight from Washington D.C. to Zurich on the way to Barcelona for TechEd IT Forum and I can’t sleep – Kimberly’s out like a light so what else is there to do except write another blog post? :-) OK - actually posting this from Barcelona on Tuesday before our first of 12 sessions here!

In the Database Maintenance workshop we did at SQL Connections last week I promised to blog a script I used to show how data file shrink operations cause massive fragmentation of indexes. The reason is that data file shrink starts at the end of the data file and moves a single page at a time to a free space below the shrink threshold. In the process of doing so, it perfectly reverses the physical order of the pages comprising the leaf level of an index – thus perfectly fragmenting it!!

Let’s try out my simple script that demonstrates this. First thing I’m going to do is create a new database and create a 10MB ‘filler’ table, which I’m going to then drop later to create a space that shrink can use.

USE MASTER;

GO

 

IF DATABASEPROPERTYEX ('shrinktest', 'Version') > 0

      DROP DATABASE shrinktest;

 

CREATE DATABASE shrinktest;

GO

USE shrinktest;

GO

 

SET NOCOUNT ON;

GO

 

-- Create and fill the filler table

CREATE TABLE filler (c1 INT IDENTITY, c2 VARCHAR(8000))

GO

DECLARE @a INT;

SELECT @a = 1;

WHILE (@a < 1280) -- insert 10MB

BEGIN

      INSERT INTO filler VALUES (REPLICATE ('a', 5000));

      SELECT @a = @a + 1;

END;

GO

Next I’ll create the ‘production’ table that we’d really like to keep in optimal shape for performance.

-- Create and fill the production table

CREATE TABLE production (c1 INT IDENTITY, c2 VARCHAR (8000));

CREATE CLUSTERED INDEX prod_cl ON production (c1);

GO

DECLARE @a INT;

SELECT @a = 1;

WHILE (@a < 1280) -- insert 10MB

BEGIN

      INSERT INTO production VALUES (REPLICATE ('a', 5000));

      SELECT @a = @a + 1;

END;

GO 

Now I’ll use the sys.dm_db_index_physical_stats DMV to check the fragmentation of the production table’s clustered index – it should be almost zero:

-- check the fragmentation of the production table

SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (

      DB_ID ('shrinktest'), OBJECT_ID ('production'), 1, NULL, 'LIMITED');

GO

avg_fragmentation_in_percent fragment_count

---------------------------- --------------------

0.390930414386239            6

This is what I expected. Now I’m going to drop the filler table, run a shrink operation and then check the fragmentation again:

-- drop the filler table and shrink the database

DROP TABLE filler;

GO

 

-- shrink the database

DBCC SHRINKDATABASE (shrinktest);

GO

 

-- check the index fragmentation again

SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (

      DB_ID ('shrinktest'), OBJECT_ID ('production'), 1, NULL, 'LIMITED');

GO

avg_fragmentation_in_percent fragment_count

---------------------------- --------------------

99.7654417513683             1277

Wow! The index went from almost 0% fragmented to almost 100% fragmented – the shrink operation totally reversed the physical ordering of the leaf level of the clustered index – nasty.

One of the common maintenance operations I see at customer sites is to run a database shrink at some interval, and I always advise against it – now you can see why. Running a regular shrink operation can cause horrible fragmentation problems. The worst problems I see are those customers with maintenance plans that rebuild all indexes and then run a shrink to remove the extra space necessary for the index rebuilds – completely undoing the effects of the index rebuild!

One other common thing I see is to have auto-shrink set on for one or databases. This is bad for several reasons:

  • Shrink causes index fragmentation, as I’ve just demonstrated above.
  • You can't control when it kicks in. Although it doesn't have any effect like long-term blocking, it does take up a lot of resources, both IO and CPU. It also moves a lot of data through the buffer pool and so can cause hot pages to be pushed out to disk, slowing things down further. If the server is already pushing the limits of the IO subsystem, running shrink may push it over, causing long disk queue lengths and possibly IO timeouts.
  • You're likely to get into a death-spiral of auto-grow then auto-shrink then auto-grow then auto-shrink... (in my experience, if someone is using auto-shrink, they're most likely using and relying on auto-grow too). An active database usually requires free space for normal operations - so if you take that free space away then the database just has to grow again. This is bad for several reasons:
    • Repeatedly shrinking and growing the data files will cause file-system level fragmentation, which can slow down performance
    • It wastes a huge amount of resources, basically running the shrink algorithm for no reason
    • Auto-grow itself can be bad, especially if you're using SQL Server 2000 (or don't have Instant File Initialization turned on - see this post from Kimberly's blog) where all allocations to the file being grown are blocked while the new portion of the file is being zero-initialized.

Bottom-line: auto-shrink should *NEVER* be turned on…

As promised, here's the first of the grab-bag of questions we were asked during conferences. I'm blogging a selection of the stuff I noted down - Kimberly also has a bunch of stuff too that she's blogging about as I type - see here. These are some of the questions we were asked during our pre-con at SQL Connections on Database Maintenance: From Planning to Practice to Post-Mortem. It was cool that people came prepared with so many great questions - most of which we answered during the session and some I need to follow-up with the Product Team about.

Q1) I have a filegroup with two files. I add a third file, use some of the space, and then do a rebuild of the index that takes up most of the space in the original two files. Why doesn't the newly rebuilt index get spread evenly across the three files? I.e. why doesn't SQL Server rebalance the data across the files?

A1) There are two things to consider here. The first is the way that SQL Server allocates space from multiple files in a filegroup. It uses a mechanism called proportional-fill that will allocates space from files in round-robin fashion, but weights the allocations towards files that are larger and have more free space. In the example above, the space in the newly-added third file will be used before the first and second files are grown to add more space. The second thing to consider is that the process of rebuilding an index requires building a new copy of the index before dropping the old on - so in the example above, the existing allocated space can't be reused until after the index rebuild operation completes.

The concept of adding a file and having SQL Server rebalance the data across the files doesn't exist. It was something I proposed during SQL Server 2005 development but we (seriously) didn't have time to do it. The solution I recommend is to create a new filegroup with as many files as you need, rebuild the index into the new filegroup using the CREATE INDEX WITH DROP_EXISTING command, and then drop the old filegroup.

Q2) Multiple questions about whether a non-clustered index gets rebuilt on SQL Server 2005 under different circumstances

A2) See the blog post I wrote here which goes into all the different combinations. A lot of the confusion comes from the fact that on SQL Server 2000, for non-unique clustered indexes where SQL Server has to generate a uniquifying value (called a uniquifier), when it gets rebuilt all the non-clustered indexes have to be rebuilt too as the uniquifier values are regenerated. On SQL Server 2005 this is not the case - a BIG improvement.

Q3) A lot of the features we discussed (e.g. partitioning and online operations) are in Enterprise Edition only. What's the complete list of features that are in Enterprise vs Standard Editions for SQL Server 2005?

A3) The best list we know of is in MSDN - http://msdn2.microsoft.com/en-us/library/ms143761.aspx

Q4) In a maintenance plan, is it a good idea to do an index rebuild followed by an update statistics?

A4) No! An index rebuild will do the equivalent of an update stats with a full scan. A manual update stats will use whichever sampling rate was set for that particular set of statistics. So - not only does doing an update stats after an index rebuild waste resources, you may actually end of with a worse set of stats if the manualy update stats only does a sampled scan.

Q5) How can you tell whether DBCC CHECKDB is doing a deep-dive that's going to take much longer than usual?

A5) There are several algorithms in DBCC CHECKDB that are designed to quickly tell whether a corruption exists or not, but at the expense of not being able to tell exactly where the corruption is. The justification for this is that corruptions are not very common (considering the millions of times per day that DBCC CHECKDB is run) and so it makes sense to engineer for the success case and take the hit of a longer run-time in the error case. When a corruption is discovered there hasn't been any to tell that DBCC CHECKDB is going to run logner than usual until SQL Server 2005 SP2. In SP2 a new error, 5268 was added that will be output to the errorlog when one of the deep-dive algorithms is triggered. I advise you to add an alert on this error.

Q6) What are the various forums where Paul answers questions on corruptions?

A6) There are 4 forums that I (and others with good advice also) frequent:

I'm also happy for you to send me email! Just beware that on forum posts and emails you may not get a reply from anyone for a day or more - so if you have a critical server-down issue then you should contact SQL Product Support.

Ok - that's enough for tonight. More tomorrow from the Disaster Recovery workshop. Thanks to everyone who came to the workshop today!

(Been a few days since I posted - had some real work to do :-) Today I'll post a few things from the queue that's been building up)

This is part Q&A and part follow-on from my last post about running index maintenance when a database is mirrored.

A customer has a maintenance plan that involves running regular ALTER INDEX ... REORGANIZE on a 100GB clustered index to remove fragmentation. Three weeks ago they added database mirroring, with the database setup for synchronous mirroring. Every so often, they see the state of the mirror change from SYNCHRONIZED to SYNCHRONIZING and then a bit later back to SYNCHRONIZED. What's going on? Once a synchronously-mirrored database is synchronized, it should ever get out of sync, right?

Well not quite - if the communication link between the principal and the mirror is broken, then the mirror becomes unsynchronized. The exact behavior in this situation depends on how mirroring is setup and what's failed:

  1. If there's no witness instance, then transactions will continue on the principal database but the transaction log starts to grow, because the transactions can't be cleared from the principal's log (even after a log backup) until they've been sent to the mirror. The database is running 'exposed'. Once the link is reestablished, the mirror while synchronize again.
  2. If there's a witness, and the witness can still talk to the principal, then everything continues as in #1
  3. If there's a witness, and the communication link between it and principal is also broken, the the principal will stop serving the database - transactions will stop. In this case, if the mirror and the witness can still see each other, then a failover will occur.

There are some great Books Online entries that describe all of this - see http://msdn2.microsoft.com/en-us/library/ms179344.aspx to start with.

The customer had situation #1. Every so often the mirror would change state and it seemed to coincide with the defrag job. Looking in the error log shows messages like:

2007-10-24 11:43:36.21 spid23s     Error: 1474, Severity: 16, State: 1.

2007-10-24 11:43:36.21 spid23s     Database mirroring connection error 2 'Connection attempt failed with error: '10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)'.' for 'TCP://roadrunnerpr.sqlskills.com:5022'.

So the network link was dying sometimes when the defrag was running - that explains the switch between SYNCHRONIZED and SYNCHRONIZING. Why the network link was dying is still under investigation but it seems like the additional transaction log generated by the defrag job was causing the network to become overloaded and some component of it wasn't behaving correctly under load.

There are a few things to learn from this:

  1. Not only do you need to make sure that your IO subsystem can handle the load on it correctly, you also need to make sure your network can handle the load on it. There are a bunch of tools available to stress-test network paths - one simple one is TrafficEmulator.
  2. When you're running on your test system before going into production, make sure you test *everything* as if you were running in production - including maintenance jobs because they can add significant load to a production system.
  3. When you implement an HA solution such as mirroring, consider all the ways that transaction log will be generated when figuring out the required network bandwidth to support your HA configuration - something like a defrag or rebuild can cause an enormous spike in log generation

Two of the cool features in SQL Server 2005 are CROSS APPLY and DMVs (Dynamic Management Views). Now, far be it for me to get my hands dirty explaining developer stuff like CROSS APPLY :-) but I was having a discussion with Colin Leversuch-Roberts in the UK about the composability limitations of the sys.dm_db_index_physical_stats DMV. (Btw - you should check out Colin's blog post series on Analysing Indexes - lots of useful stuff).

So CROSS APPLY lets you do join-like functionality with table-valued functions that take parameters - which you can't do using JOIN. This works for most of the DMVs, but some of them are written to an older internal implementation that doesn't support CROSS APPLY, and sys.dm_db_index_physical_stats is one of them. If you try it you'll get an error like:

Msg 413, Level 16, State 1, Line 26

Correlated parameters or sub-queries are not supported by the inline function "master.sys.dm_db_index_physical_stats".

Fortunately there's a way around this - wrap the DMV in an artificial TVF of your own, and then CROSS APPLY to that. Here's an example:

 

CREATE FUNCTION my_index_physical_stats (

@database_id INT,

@object_id INT,

@index_id INT,

@partition_number INT,

@mode INT)

RETURNS @result TABLE (

database_id SMALLINT NULL, object_id INT NULL, index_id INT NULL, partition_number INT NULL,

index_type_desc NVARCHAR(60) NULL, alloc_unit_type_desc NVARCHAR (60) NULL, index_depth TINYINT NULL,

index_level TINYINT NULL, avg_fragmentation_in_percent FLOAT NULL, fragment_count BIGINT NULL,

avg_fragment_size_in_pages FLOAT NULL, page_count BIGINT NULL, avg_page_space_used_in_percent FLOAT NULL,

record_count BIGINT NULL, ghost_record_count BIGINT NULL, version_ghost_record_count BIGINT NULL,

min_record_size_in_bytes INT NULL, max_record_size_in_bytes INT NULL, avg_record_size_in_bytes FLOAT NULL,

forwarded_record_count BIGINT NULL)

BEGIN

INSERT INTO @result SELECT * FROM

sys.dm_db_index_physical_stats (@database_id, @object_id, @index_id, @partition_number, @mode)

RETURN

END;

GO

There's been some discussion over on the SQL Server Central forums about database mirroring performance, and one question in particular on how to do index rebuilds on mirrored VLDBs.

Remember that in database mirroring, the database has to be in the full recovery mode, so all index rebuilds are fully logged. It all comes down to the amount of transaction log generated and whether this causes a problem.

  • In synchronous mirroring, the additional log being generated could overload the network link between the principal and mirror. This will slow down the hardening of regular transactions in the mirror's transaction log, and thus led to a decrease in transaction throughput on the principal.
  • In asynchronous mirroring, the additional log being generated could again overload the network link - but this time there's no requirement for transactions to harden in the mirror before they can commit on the principal, so instead the SEND queue on the principal gets really large. Now, in asynchronous mirroring the SEND queue represents the amount of work that will be lost if a failover occurs, as its all the transaction log that hasn't yet been sent to the mirror. So, in asynchronous mode, a large index rebuild operation could lead to increase exposure to data loss in the event of a failover.

In SS2008, the log stream compression I blogged about here should go a long way to alleviating this problem. There are three ways I can think of to reduce the amount of transaction log generated by and index maintenance plan in SS2005:

  1. Use a potentially less expensive (in terms of logging) solution for removing index fragmentation. Doing an index rebuild is guaranteed to generate an equivalent amount of transaction log to the size of the index being rebuilt, no matter how much fragmentation there is - because an index rebuild always rebuilds the entire index. The alternative is to do an index reorganize, either using my old DBCC INDEXDEFRAG or the new ALTER INDEX ... REORGANIZE. These will only generate transaction log when index pages are compacted and reorganized - so for less heavily fragmented indexes. There's no hard and fast rule here but I generally say where Logical Scan Fragmentation/Average Fragmentation in Percent from DBCC SHOWCONTIG/sys.dm_db_index_physical_stats, respectively, is less than 30%. You also need to consider page density too - but really this is a topic for a whole other post.
  2. Be very selective on which indexes you choose to rebuild/reorganize. A lot of people have a maintenance plan that does this for every index every week, without checking whether the index is even fragmented or whether removing fragmentation for an index improves workload performance. Again, this a whole other topic but there is an old whitepaper for SS2000 I helped with that describes some of this - Microsoft SQL Server 2000 Index Defragmentation Best Practices.
  3. Partition the tables/indexes so that the changing portion of the data is the only portion that's affected by index maintenance. If most of your data is read-only, there's no point in having it included in reindex/reorganize operations, right? Kimberly recently wrote a blog post about such an architecture here.

For more info on database mirroring performance considerations, checkout the whitepaper Database Mirroring: Best Practices and Performance Considerations. There's also a slide deck presentation based on this whitepaper that's been presented at various TechEds (I did it in China and Hong Kong last year and Kimberly did it in South Africa last year) - you can download it from the Hong Kong website here.

The bottom line is when mirroring is in the mix, you need to be more intentional with your database maintenance.

This has come up several times over the last few days, and is something that Kimberly and I discuss (and deplore) when we lecture - the use of BACKUP LOG WITH NO_LOG (or TRUNCATE_ONLY - they're synonymous) to allow log truncation.

How is it used?

The common use is when the transaction log grows to be inordinately large on a database that's in FULL (or BULK_LOGGED) recovery mode. This happens after a database backup has been taken, which switches the log into a mode where it won't truncate until its been backed up. In these circumstances, if you don't take a transaction log backup, the log will continue to grow until it runs out of disk space.

The alternative to taking a real log backup is to issue a BACKUP LOG dbname WITH NO_LOG command. Let's see this in action - note that I'm not advocating its use but I want to show you what it does. First off I'll create some transaction log after taking a full database backup:

USE nologtest;
GO

BACKUP DATABASE nologtest TO DISK = 'c:\sqlskills\nologtest.bck' WITH INIT;
GO

CREATE TABLE t1 (c1 INT, c2 VARCHAR (8000))
GO

SET NOCOUNT ON
GO

DECLARE @a INT;
SELECT @a = 1;
WHILE (@a < 10000)
BEGIN
   
INSERT INTO t1 VALUES (@a, REPLICATE ('a', 8000));
   
SELECT @a = @a + 1;
END;
GO

How large is the transaction log now?

SELECT name, size FROM sys.database_files;
GO

name            size
--------------- --------
nologtest       90264
nologtest_log   104128

A little bit larger than the data file (which is what I'd expect after the operation I just performed) and they're both around 100MB. I'll pretend that I haven't been paying attention to the size of the database and log and now I don't have any space to perform a backup. Can I just shrink the log?

DBCC SHRINKFILE (nologtest_log, 2);
GO

Cannot shrink log file 2 (nologtest_log) because all logical log files are in use.
DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages
------ ----------- ----------- ----------- ----------- --------------
11     2           102944      63          102944      56

No. What's stopping me (well DBCC SHRINKFILE just told me, but let's double-check)?

SELECT log_reuse_wait_desc FROM sys.databases WHERE NAME = 'nologtest';
GO

LOG_BACKUP

Ok - so I can't perform a backup so I'll use BACKUP LOG WITH NO_LOG.

BACKUP LOG nologtest WITH NO_LOG;
GO

SELECT log_reuse_wait_desc FROM sys.databases WHERE NAME = 'nologtest';
GO

NOTHING

Now it looks like I can do the shrink:

DBCC SHRINKFILE (nologtest_log, 2);
GO

DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages
------ ----------- ----------- ----------- ----------- --------------
11     2           256         63          256         56

Success!

Hold on, are you sure?

Why is it bad?

Do you realize what just happened? We discarded the contents of the transaction log since the last full/differential/log backup. That means that until the next full or differential backup completes, all work since the last backup completed will be lost if the database is destroyed. So if your business relies on not losing data/work in the event of a disaster - then what I just did is sacrilege!

The whole point of FULL (and BULK_LOGGED) recovery modes is to preserve the transaction log so that the database can be recovered right up to the point of a failure, or to do point-in-time restores. Using BACKUP LOG WITH NO_LOG negates all of that. (For completeness, note that you can't do a point-in-time restore to any time in a transaction log backup containing a bulk-logged transaction).

What are the alternatives?

If you don't want the FULL recovery mode behavior, then don't use FULL recovery mode - switch to SIMPLE mode. But be aware that you lose the ability to restore to any point in between your full or differential backups.

If you want the FULL recovery mode behavior, but don't want to run out of log space - then the answer is again simple (pun intended :-), take log backups! If your log is growing too quickly, take more frequent log backups! If you don't have enough disk space to store the log backups, talk to your management about the options - buying more disks and continuing to use the FULL recovery options, or being forced to move to SIMPLE recovery mode.

One thing to bear in mind - you actually have to monitor the size of your log to tell whether its growing. That's what gets people into trouble in the first place - a combination of:

  • Not knowing that the database is in FULL recovery mode
  • Not tracking the size of the log

The problem comes for those people who just use an app and have no idea that the developer put the database (maybe for a web application log) into FULL recovery mode, or even that they have SQL Server on their system. For these people there is no good answer unfortunately apart from educating developers.

How to prevent it being used

If you're a sysadmin who wants to stop your database owners and backup operators from using the NO_LOG option, there is way to do it. Trace flag 3231 in SS2000 and SS2005 will turn the NO_LOG and TRUNCATE_ONLY options into no-ops in FULL/BULK_LOGGED recovery mode, and will clear the log in SIMPLE recovery mode. Trace flag 3031 in SS2005 turns them in checkpoints in all recovery modes. These trace flags are undocumented but they're allowed to be publicized.

This way no-one can truncate the log unless you allow them to, and when they come to you to complain, you can educate them!

(This is an updated repost from earlier this year on my old blog)

Whenever I’m discussing index maintenance, and specifically fragmentation, I always make a point of saying ‘Make sure the index is being used before doing anything about fragmentation’. If an index isn’t being used very much, but has very low page density (lots of free space in the index pages), then it will be occupying a lot more disk space than it could do and it may be worth compacting (with a rebuild or a defrag) to get that disk space back. However, usually there’s not much point spending resources to remove any kind of fragmentation when an index isn’t being used. This is especially true of those people who rebuild all indexes every night or every week.

 

You could even go so far as to say if a non-clustered index isn’t being used, why is it there at all? Extra non-clustered indexes drag down performance in a number of ways. Consider a non-clustered index called IX_MyNCIndex on the table MyTable:

  • Any time a record is inserted into MyTable, a matching record is inserted into IX_MyNCIndex. This is a bunch of extra IOs, extra log records, plus maybe even a page-split.
  • Any time a record is deleted from MyTable, the matching record in IX_MyNCIndex must be deleted. Extra IOs again and log records again.
  • Any time a record in MyTable is updated:
    • If MyTable has a clustered index, and the clustered index key value changes, then the matching record in IX_MyNCIndex must be updated. Extra IOs and log records again.
    • If any of the non-clustered index key values changes, or any of the INCLUDEd column values changes, then the matching record in IX_MyNCIndex must be updated. Extra IOs and log records again.
  • If a clustered index is created on MyTable, then IX_MyNCIndex has to be rebuilt to include the logical RIDs rather than the physical heap RIDs (see this post for an explanation). Lot of extra IOs and log records again.

That’s a significant amount of extra IOs and log records to maintain each extraneous non-clustered index.

 

So, how can you tell if an index is being used?

  • In SQL Server 2000 there is no way to do it
  • In SQL Server 2005 there are a few different ways in SQL Server 2005 – the one I want to discuss in this post is the sys.dm_db_index_usage_stats DMV.

This DMV exposes the information that is tracked about index usage (as the name suggests). It does not generate any information itself; it just returns info from a cache inside SQL Server. This cache is empty when the server instance starts, and is not persisted across instance restarts. All cache entries for indexes in a database are removed when that database is closed. So, the cache tracks usage information about which indexes have been used since the database they are part of was last opened (either manually or as part of instance start-up).

 

This continues to confuse people so I'll call it out: if the output from the DMV does not have an entry for the index you're interested in, it has not been used since the last database startup.

 

The cache tracks the following info for each index (for user queries and system queries):

  • The number of times it was used in a seek operation (either looking up a single row, or doing a range scan) along with the time of the last seek.
  • The number of times it was used in a scan operation (e.g. a select * operation) along with the time of the last scan
  • The number of times it was used in a lookup operation (this means a bookmark lookup – where a non-clustered index does not fully cover a query and additional columns must be retrieved from the base table row) along with the time of the last lookup.
  • The number of times it was used in an update operation (this counts inserts, updates, and deletes) along with the time of the last update.

Let’s have a look at its use.

SELECT * FROM sys.dm_db_index_usage_stats;

GO

The output is too wide for a single image so I've split it in two (I won't post any more output from the DMV - I'll just talk about it):

indexusage11.jpg

indexusage21.jpg

Unless you've just re-started your instance, you'll see a bunch of output from this, representing all index activity since the instance/databases started. If you're interested in whether an index is being used, you can filter the output. Let's focus in on a particular table  - AdventureWorks.Person.Address.

SELECT * FROM sys.dm_db_index_usage_stats

WHERE database_id = DB_ID('AdventureWorks')

AND object_id = OBJECT_ID('AdventureWorks.Person.Address');

GO

You'll probably see nothing in the output, unless you've been playing around with that table. Let's force the clustered index on that table to be used, and look at the DMV output again.

SELECT * FROM AdventureWorks.Person.Address;

GO

SELECT * FROM sys.dm_db_index_usage_stats

WHERE database_id = DB_ID('AdventureWorks')

AND object_id = OBJECT_ID('AdventureWorks.Person.Address');

GO

Now there's a single row, showing a scan on the clustered index. Let's do something else.

SELECT StateProvinceID FROM AdventureWorks.Person.Address

WHERE StateProvinceID > 4 AND StateProvinceId < 15;

GO

SELECT * FROM sys.dm_db_index_usage_stats

WHERE database_id = DB_ID('AdventureWorks')

AND object_id = OBJECT_ID('AdventureWorks.Person.Address');

GO

And there's another row, showing a seek in one of the table's non-clustered indexes.

So, its easy to look at the index usage for particular tables and indexes. But how can you monitor this over time? This is easy too - let's see how. First we need to create our own table to store snapshots of the DMV output.

IF OBJECTPROPERTY (object_id (N'master.dbo.MyIndexUsageStats'), 'IsUserTable') = 1 DROP TABLE dbo.MyIndexUsageStats;

GO

SELECT GETDATE () AS ExecutionTime, * INTO master.dbo.MyIndexUsageStats

FROM sys.dm_db_index_usage_stats WHERE database_id=0;

GO

Next we need to take a baseline snapshot of the DMV output.

INSERT master.dbo.MyIndexUsageStats

SELECT getdate (), * FROM sys.dm_db_index_usage_stats;

GO

And now simulate a few operations and take another snapshot of the DMV:

SELECT * FROM AdventureWorks.Person.Address;

GO

SELECT * FROM AdventureWorks.Person.Address;

GO

SELECT StateProvinceID FROM AdventureWorks.Person.Address

WHERE StateProvinceID > 4 AND StateProvinceId < 15;

GO

INSERT master.dbo.MyIndexUsageStats

SELECT getdate (), * FROM sys.dm_db_index_usage_stats;

GO

And look at the filtered contents of our snapshot table:

SELECT * FROM master.dbo.MyIndexUsageStats

WHERE database_id = DB_ID('AdventureWorks')

AND object_id = OBJECT_ID('AdventureWorks.Person.Address');

GO

You should see four rows - two from the baseline snapshot and two from the final snapshot. If you ran just the statements above, you'll see that the user_scans count for the clustered index has increased by two, and the user_seeks count for the non-clustered index has increased by one.

So this is a pretty simple example of how you can track index usage. By putting something like this into a regularly run script you can tell which indexes aren't being used and could be candidates for less-regular index maintenance or removal altogether.

Let me know how you get on.

This was a question from the MSDN Disaster Recovery forum I started while I was at Microsoft.

I have a 600 gig database that has a mirror.  I need to move the databases from local drives to a SAN.  Can anyone recommend a document that lists the steps to go through to move both the principle and mirror to the SAN with no down time? or minimal down time?

As far as I know, there isn't any such document so I had a crack at coming up with a list of operations. Here's what I had:

  1. Take a full backup of the principal on node A
  2. Restore it on the SAN on node B using WITH NORECOVERY, remembering to use WITH MOVE to place the files correctly, and with a different database name than the current mirror
  3. Take the required log backup on the principal and restore on the database copy on the SAN on node B
  4. Break the mirroring partnership
  5. Drop the current mirror database on node B
  6. Rename the database on the SAN on node B to be the mirror database -- THIS DOESN"T WORK!
  7. Setup the mirroring partnership to point to the newly restored database on the SAN on node B
  8. Start mirroring and the new mirror will catch-up
  9. Failover to the mirror on node B, which becomes the new principal
  10. Follow the same procedure to move the new mirror on node A onto its SAN
  11. Failback if you want to

And I promised to try it out to make sure I had it right so in this blog post I'm going to walk through the steps of doing this. It turns out that the steps above are slightly incorrect. Step 6 above doesn't work because the database is in recovery (so is inaccessible) and there's a short-cut when moving the database on the first node to avoid having to take and copy more backups. Let's see how it works and I'll post the corrected sequence at the end.

As I did in yesterday's mirroring post, I'm going to use the TicketSalesDB database from our Always-On DVDs. It's only a few hundred MB instead of 600GB but the principal is the same (no pun intended :-)). I've got mirroring running between two nodes, SQLDEV01 (the principal) and SQLDEV02 (the mirror), both of which are running 2005 SP2 and I've got a simulated workload inserting rows into the database. I don't actually have a SAN laying around so I'm cheating and I have directories called C:\SQLDEV01SAN and C:\SQLDEV02SAN instead. It's the location change that's the interesting part, not where the actual location is.

Step 1

On SQLDEV01, take a full backup and a log backup:

BACKUP DATABASE TicketSalesDB TO DISK = 'C:\SQLskills\TicketSalesDB.BAK' WITH INIT;

GO

BACKUP LOG TicketSalesDB TO DISK = 'C:\SQLskills\TicketSalesDB_Log.bak' WITH INIT;

GO

Step 2

On SQLDEV01, break the mirroring partnership:

ALTER DATABASE TicketSalesDB SET PARTNER OFF;

GO

And just check that it's gone:

SELECT mirroring_state_desc FROM sys.database_mirroring WHERE database_id = DB_ID ('TicketSalesDB');

GO

which returns:

NULL

Step 3

On SQLDEV02, drop the mirror database - this wouldn't work unless mirroring was no longer running:

DROP DATABASE TicketSalesDB;

GO

Step 4

Copy the backups to SQLDEV02 and restore them on the SAN and remembering to use WITH NORECOVERY:

RESTORE DATABASE TicketSalesDB FROM DISK='C:\SQLskills\TicketSalesDB.bak'

WITH

MOVE 'TicketSalesDBData' TO 'C:\SQLDEV02SAN\TicketSalesDBData.MDF',

MOVE 'TicketSalesFG2005Q1' TO 'C:\SQLDEV02SAN\TicketSalesFG2005Q1.NDF',

MOVE 'TicketSalesFG2005Q2' TO 'C:\SQLDEV02SAN\TicketSalesFG2005Q2.NDF',

MOVE 'TicketSalesFG2005Q3' TO 'C:\SQLDEV02SAN\TicketSalesFG2005Q3.NDF',

MOVE 'TicketSalesFG2005Q4' TO 'C:\SQLDEV02SAN\TicketSalesFG2005Q4.NDF',

MOVE 'TicketSalesDBLog' TO 'C:\SQLDEV02SAN\TicketSalesDBLog.LDF',

NORECOVERY;

GO

RESTORE LOG TicketSalesDB FROM DISK = 'C:\SQLskills\TicketSalesDB_Log.bak' WITH NORECOVERY;

GO

Step 5

On SQLDEV02, set the mirroring partner to be SQLDEV01:

ALTER DATABASE TicketSalesDB SET PARTNER = 'TCP://SQLDEV01:5091';

GO

Step 6

On SQLDEV01, start mirroring:

ALTER DATABASE TicketSalesDB SET PARTNER = 'TCP://SQLDEV02:5092';

GO

And check that it's running:

SELECT mirroring_state_desc FROM sys.database_mirroring WHERE database_id = DB_ID ('TicketSalesDB');

GO

This time it returns:

SYNCHRONIZED

Step 7

Now we need to failover so that we can move the database on SQLDEV01 onto its SAN. Before we do that, let's make sure that SQLDEV01 is the principal:

SELECT mirroring_role_desc FROM sys.database_mirroring WHERE database_id = DB_ID ('TicketSalesDB');

GO

which returns:

PRINCIPAL

Now force the failover:

ALTER DATABASE TicketSalesDB SET PARTNER FAILOVER;

GO

And query the DMV again to make sure. This time the mirroring_state_desc returned is:

MIRROR

Excellent!

Now, I did all of this while my workload was running and it automatically failed over to SQLDEV02, with the database now hosted on the SAN. To do the same move on SQLDEV01, we don't need to go through the backup and copy process again - we can just use the original backups we took in step 1.

Step 8

We need to break the mirroring partnership again, this time executing on SQLDEV02, the new principal:

ALTER DATABASE TicketSalesDB SET PARTNER OFF;

GO

On SQLDEV01, we can now drop the database and restore the original backups onto the SAN:

DROP DATABASE TicketSalesDB;

GO

RESTORE DATABASE TicketSalesDB FROM DISK='C:\SQLskills\TicketSalesDB.bak'

WITH

MOVE 'TicketSalesDBData' TO 'C:\SQLDEV01SAN\TicketSalesDBData.MDF',

MOVE 'TicketSalesFG2005Q1' TO 'C:\SQLDEV01SAN\TicketSalesFG2005Q1.NDF',

MOVE 'TicketSalesFG2005Q2' TO 'C:\SQLDEV01SAN\TicketSalesFG2005Q2.NDF',

MOVE 'TicketSalesFG2005Q3' TO 'C:\SQLDEV01SAN\TicketSalesFG2005Q3.NDF',

MOVE 'TicketSalesFG2005Q4' TO 'C:\SQLDEV01SAN\TicketSalesFG2005Q4.NDF',

MOVE 'TicketSalesDBLog' TO 'C:\SQLDEV01SAN\TicketSalesDBLog.LDF',

NORECOVERY;

GO

RESTORE LOG TicketSalesDB FROM DISK = 'C:\SQLskills\TicketSalesDB_Log.bak' WITH NORECOVERY;

GO

And setup mirroring again. On SQLDEV01:

ALTER DATABASE TicketSalesDB SET PARTNER = 'TCP://SQLDEV02:5092';

GO

And on SQLDEV02:

ALTER DATABASE TicketSalesDB SET PARTNER = 'TCP://SQLDEV01:5091';

GO

And we're running again.

Step 9

Now all we need to do is fail the workload back to SQLDEV01 by executing this on SQLDEV02:

ALTER DATABASE TicketSalesDB SET PARTNER FAILOVER;

GO

Summary

So - the corrected sequence for moving a database while mirroring is running is the following:

  1. Take a full backup of the principal database on node A, and the required log backup
  2. Break the mirroring partnership
  3. Drop the current mirror database on node B
  4. Copy the backups to node B and restore it on the SAN on node B using WITH NORECOVERY, remembering to use WITH MOVE to place the files correctly
  5. Setup the mirroring partnership to point to the newly restored database on the SAN on node B
  6. Start mirroring and the new mirror will catch-up
  7. Failover to the mirror on node B, which becomes the new principal
  8. Follow the same procedure to move the new mirror on node A onto its SAN, but using the original backups from step 1
  9. Failback

Hope this helps.

Theme design by Nukeation based on Jelle Druyts