The March edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Distributed transactions and database mirroring - why they don't work together
  • Background processes that can cause I/Os even with no connections to the server
  • How to restore backups from a file containing multiple appended backups
  • The perennial problem of production databases being too large to restore in development

Check it out at http://technet.microsoft.com/en-us/magazine/ff458345.aspx.

Happy New Year! 

The January edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Using backups for corruption recovery
  • Why snapshot isolation is required when using change tracking, and its performance implications
  • Is DBCC CHECKDB a really comprehensive integrity check?
  • The difference between data file and log file shrinking

Check it out at http://technet.microsoft.com/en-us/magazine/ee914612.aspx.

Earlier today there was a question on SQL Server Central where someone wanted to know what could be causing so many reads on their transaction log. I was asked to chime in by fellow MVP Jonathan Kehayias (who also sent me some questions that I've answered in this post - thanks Jon!), so I did, with a list of everything I could think of. I thought it would make for a good post, so here it is, with a few more things I remembered while writing the post.

Before I start, if you're not comfortable talking log records and transaction log architecture, see my TechNet Magazine article on Understanding Logging and Recovery, which explains everything clearly, including how having too many VLFs can affect operations on the log that have to scan VLFs.

Each of these things can cause reads of the log:

  • Transaction rollback: when a transaction has to roll back (either because you say ROLLBACK TRAN or something goes wrong and SQL Server aborts the transaction), the log records describing what happened in the transaction have to be read so that their effects can be removed from the database. This is explained in the TechNet Magazine article. Note that it doesn't matter if you're using explicit transactions or not (i.e. BEGIN TRAN), SQL Server always starts a transaction for you (called an implicit transaction) so that it can put a boundary on what needs to be rolled back in case of a failure.
  • Crash recovery: crash recovery must read the transaction log to figure out what to do with all the log records in the active portion of the log (all the way back to the earlier of the most recent checkpoint or the start of the oldest active transaction). The log is read twice - once going forward from that oldest point (called the REDO phase) and then going backwards (called the UNDO phase). Again, this is explained in great depth in the article.
  • Creating a database snapshot: a database snapshot is a point-in-time view of a database. What's more, it's a transactionally consistent point-in-time view of a database - which means that, essentially, crash recovery must be run on the real database to create the transactionally consistent view. The crash recovery is run into the database snapshot, the real database isn't affected - apart from having all the active transaction log read so that crash recovery can run.
  • Running DBCC CHECKDB: creates a database snapshot by default on 2005 onwards, and runs the consistency checks on the snapshot. See above. There's a much more detailed description, including how this worked in 2000, in the first part of the 10-page blog post CHECKDB From Every Angle: Complete description of all CHECKDB stages.
  • Transaction log backups: this one's kind of obvious. A transaction log backup contains all the transaction log records generated since the last log backup finished (or since the log backup chain was established). To back up the log it has to read it. What's not so obvious is that a log backup will also scan through all the VLFs in the log to see if any active ones can be made inactive (called clearing or truncating the log - both misnomers as nothing is cleared and nothing is truncated). See my TechNet Magazine article on Understanding SQL Server Backups and in the blog post Importance of proper transaction log size management.
  • Any kind of data backup: (full/differential backup of a file/filegroup/database). Yup - data backups always include transaction log - so the backup can be restored and give you a transactionally consistent view of the database. See Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes for details if you don't believe me.
  • Transactional replication: transactional replication works by harvesting committed transactions from the transaction log of the publication database (and then sending them to the subscriber(s) via the distribution database - beyond the scope of this post). This is done by the Log Reader Agent job, running from the Distributor. It needs to read all the log records generated in the publication database, even if they're nothing to do with the publications. More log equals more reads. My whitepaper on combining database mirroring and transactional replication in 2008 has more details on this stuff, as does Books Online.
  • Change data capture (in 2008): CDC uses the transactional replication log reader agent to harvest changes from the transaction log. See above. This means the CDC can cause the log to not be able to clear properly, just like transactional replication or database mirroring - see my blog post Search Engine Q&A #1: Running out of transaction log space for more details. Note the I didn't say Change Tracking - it uses a totally different mechanism - see my TechNet Magazine article on Tracking Changes in Your Enterprise Database for more details.
  • Database mirroring: DBM works by sending physical log records from the principal to the mirror database. If the mirroring sessions drops out of the SYNCHRONIZED state, then the log records won't be able to be read from memory and the mirroring subsystem will have to get them from disk - causing log reads. This can happen if you're running asynchronous mirroring (where you're specifically allowing for this), or if something went wrong while running synchronous mirroring (e.g. the network link between the principal and mirror dropped out, and a witness wasn't configured or the principal could still see the witness - again, beyond the scope of this post). Regardless, this is called having a SEND queue on the principal.
  • Restoring a backup: whenever backups are restored, even is you've said WITH NORECOVERY, the REDO portion of recovery is run for each restore, which reads the log.
  • Restoring a log backup using WITH STANDBY: in this case, you've essentially said you'd like recovery to run, but not to affect the transaction log itself. Running recovery has to read the log. For more info on using WITH RECOVERY, NORECOVERY, or STANDBY, see my latest TechNet Magazine article on Recovering from Disasters Using Backups, which explains how restores work.
  • A checkpoint, in the SIMPLE recovery mode only: see my blog post How do checkpoints work and what gets logged for a description of what checkpoints are and what they do. In the SIMPLE recovery mode, checkpoints are responsible for clearing the log (described with links above) so must read through all the VLFs to see which can be marked inactive.
  • When processing a DML trigger (on 2000): (thanks to Clay Lenhart for the comment that reminded me of this). In SQL Server 2000, the before and after tables that you can process in a DML trigger body are actually found from looking at the log records generated by the operation that caused the trigger to fire. My dev team changed this in 2005 to store the before and after tables using the version store, giving a big perf boost to DML trigger processing. 
  • Manually looking in the log (with DBCC LOG or the table-valued function fn_dblog): this one's pretty obvious.

Phew - a lot of things can cause log reads, the trick is knowing which one it is!

As you can see, there could be a lot of activity reading from your log as well as writing to it, which could cause an IO bottleneck. Make sure that the IO subsystem on which you place the log file (note: you don't get ANY performance benefit from having multiple log files) can handle the read and write workload the log demands. RAID 1 or RAID 10 with a bunch of spindles to spread the IOs out (note/warning/achtung: that's a big generalization - don't reply with a comment saying it's wrong because you've seen something different - different scenarios have different demands), and a proper RAID configuration (64k multiple for a stripe size, NTFS allocation unit size, volume partition alignment).

The November 2009 edition of TechNet Magazine is up on the web and contains my latest feature article, the second in a 3-part series on backups/restores/repairs.

In this article I explain all about using the RESTORE command. Not much point having backups if you don't know how to use them! Topics include:

  • The four phases of a restore operation - how they work and how you can speed them up
  • Figuring out what you need to restore when a disaster happens
  • Figuring out what you're actually able to restore
  • Using WITH RECOVERY / NORECOVERY / STANDBY
  • Point-in-time restore operations
  • Considerations when restoring to a different location

There seem to be a few formatting issues when the magazine's web folks put the article up - I've notified them and hopefully they'll fix them soon.

You can get to the article at: http://technet.microsoft.com/en-us/magazine/ee677581.aspx

Enjoy!

One of the perennial problems facing both experienced and involuntary DBAs is how to tell whether a database is really in the FULL recovery model or not? This is complicated by the fact that when you switch a database into the FULL recovery mode, it actually behaves as if it's in the SIMPLE recovery mode until the log backup chain is established (this is commonly called being in 'pseudo-SIMPLE').

It's a problem for several reasons:

1) if the database is really in the FULL recovery model then log backups must be taken so the log can clear/truncate properly and it doesn't grow out of control

2) if the database is in the FULL recovery model but the log backup chain has been broken (or not established at all since the database was created) then log backups are not possible (except for the yuckiness in SQL 2000 when log backups would succeed without complaint but be totally useless during disaster recovery)

I don't know of any script to easily determine whether a database is really in the FULL recovery mode, so I knocked one together - and I present it here for you to use.

The trick to the script is finding the last LSN that's been backed up for the database. if this is non-NULL, then a log backup chain exists and the database is really in the FULL recovery mode. This is stored in the dbi_dbbackupLSN field in the database boot page (see Search Engine Q&A #20: Boot pages, and boot page corruption) but also nicely available in the DMV sys.database_recovery_status.

I've tested this on 2005 SP3 and 2008 SP1.

Note that this doesn't work on SQL 2000 - I've poked around and can't find a way to get at the LSN without reading the boot page directly, which can't be done gracefully inside a function - I'll leave that as an exercise for you. You'd expect the IsTruncLog property returned by DATABASEPROPERTY to be correct when the database is in pseudo-SIMPLE, but it's not unfortunately.

Here are some test cases for the script:

CREATE DATABASE SimpleModeDB;
CREATE DATABASE BulkLoggedModeDB;
CREATE DATABASE FullModeDB;
GO

ALTER DATABASE SimpleModeDB SET RECOVERY SIMPLE;
ALTER DATABASE BulkLoggedModeDB SET RECOVERY BULK_LOGGED;
ALTER DATABASE FullModeDB SET RECOVERY FULL;
GO

SELECT [Name], msdb.dbo.SQLSkillsIsReallyInFullRecovery ([Name]) AS 'ReallyInFULL'
FROM sys.databases
WHERE [Name] LIKE '%ModeDB';
GO

Name              ReallyInFULL
----------------- -------------
SimpleModeDB      0
BulkLoggedModeDB  0
FullModeDB        0

This makes sense - the new FullModeDB database is still in pseudo-SIMPLE. Now what if we take a full database backup?

BACKUP DATABASE FullModeDB To DISK='C:\SQLskills\FullModeDB.bck' WITH INIT;
GO
SELECT msdb.dbo.SQLSkillsIsReallyInFullRecovery ('FullModeDB') AS 'ReallyInFULL';
GO

Processed 152 pages for database 'FullModeDB', file 'FullModeDB' on file 1.
Processed 1 pages for database 'FullModeDB', file 'FullModeDB_log' on file 1.
BACKUP DATABASE successfully processed 153 pages in 0.230 seconds (5.449 MB/sec).

ReallyInFULL
------------
1

Perfect. Now what about switching it back to SIMPLE and back to FULL again?

ALTER DATABASE FullModeDB SET RECOVERY SIMPLE;
ALTER DATABASE FullModeDB SET RECOVERY FULL;
GO
SELECT msdb.dbo.SQLSkillsIsReallyInFullRecovery ('FullModeDB') AS 'ReallyInFULL';
GO

ReallyInFULL
------------
0

Just as we expect - the log backup chain has been broken and the database is back to pseudo-SIMPLE again.

Now what if we restart the log backup chain using a full database backup?

BACKUP DATABASE FullModeDB To DISK='C:\SQLskills\FullModeDB.bck' WITH INIT;
GO
SELECT msdb.dbo.SQLSkillsIsReallyInFullRecovery ('FullModeDB') AS 'ReallyInFULL';
GO

Processed 152 pages for database 'FullModeDB', file 'FullModeDB' on file 1.
Processed 1 pages for database 'FullModeDB', file 'FullModeDB_log' on file 1.
BACKUP DATABASE successfully processed 153 pages in 0.095 seconds (13.193 MB/sec).

ReallyInFULL
------------
1

Perfect. Now what about if we break the chain and try to restart it using a differential database backup?

ALTER DATABASE FullModeDB SET RECOVERY SIMPLE;
ALTER DATABASE FullModeDB SET RECOVERY FULL;
GO
BACKUP DATABASE FullModeDB To DISK='C:\SQLskills\FullModeDB_diff.bck' WITH INIT, DIFFERENTIAL;
GO
SELECT msdb.dbo.SQLSkillsIsReallyInFullRecovery ('FullModeDB') AS 'ReallyInFULL';
GO

Processed 40 pages for database 'FullModeDB', file 'FullModeDB' on file 1.
Processed 1 pages for database 'FullModeDB', file 'FullModeDB_log' on file 1.
BACKUP DATABASE WITH DIFFERENTIAL successfully processed 41 pages in 0.080 seconds (4.192 MB/sec).

ReallyInFULL
------------
1

Perfect - that works too, as I'd expect. You may wonder why a differential backup works - either a full or differential backup will work as they bridge the LSN gap since the last full or differential backup before the log backup chain was broken - both of these backups include transaction log - see More on how much transaction log a full backup includes.

And here's the script itself - enjoy!

/*============================================================================
   File: SQLskillsIsReallyInFullRecovery.sql

   Summary: This script creates a function in msdb that returns a BIT value of
   1 if the supplied database is really in the FULL recovery mode and not still
   in pseudo-SIMPLE.

   Date: October 2009

   SQL Server Versions:
      10.0.2531.00 (SS2008 SP1)
      9.00.4035.00 (SS2005 SP3)
------------------------------------------------------------------------------
   Copyright (C) 2009 Paul S. Randal, SQLskills.com
   All rights reserved.

   For more scripts and sample code, check out
     
http://www.sqlskills.com/

   You may alter this code for your own *non-commercial* purposes. You may
   republish altered code as long as you give due credit.

   THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF
   ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED
   TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
   PARTICULAR PURPOSE.
============================================================================*/

USE [msdb];
GO

IF EXISTS (SELECT * FROM sys.objects WHERE NAME = 'SQLskillsIsReallyInFullRecovery')
   DROP FUNCTION SQLskillsIsReallyInFullRecovery;
GO
 
CREATE FUNCTION SQLskillsIsReallyInFullRecovery (
   @DBName sysname)
RETURNS BIT
AS
BEGIN
  
DECLARE @IsReallyFull  BIT;
   DECLARE @LastLogBackupLSN NUMERIC (25,0);
   DECLARE @RecoveryModel  TINYINT;

   SELECT @LastLogBackupLSN = [last_log_backup_lsn]
   FROM sys.database_recovery_status
   WHERE [database_id] = DB_ID (@DBName);

   SELECT @RecoveryModel = [recovery_model]
   FROM sys.databases
   WHERE [database_id] = DB_ID (@DBName);

   IF (@RecoveryModel = 1 AND @LastLogBackupLSN IS NOT NULL)
      SELECT @IsReallyFull = 1
   ELSE
      SELECT @IsReallyFull = 0;

   RETURN (@IsReallyFull);
END;
GO

The 35-page whitepaper on high availability I wrote for the SQL team over the summer has been published on MSDN. It's a 2-300 level whitepaper that describes the various high-availability technologies in SQL Server 2008 and how they can be used to mitigate disasters. It's chock-full of links to other whitepapers, technical articles and Books Online sections and also presents my methodology for planning a high-availability strategy.

You can get it at http://msdn.microsoft.com/en-us/library/ee523927.aspx.

Enjoy!

Here's the table of contents:

  • Introduction
  • Causes of Downtime and Data Loss
    • Planned Downtime
    • Unplanned Downtime and Data Loss
  • Planning a High-Availability Strategy
    • Requirements
    • Limitations
    • Technology Evaluation
  • SQL Server 2008 High-Availability Technologies
    • Logging and Recovery
    • Backup, Restore, and Related Technologies
      • Partial Database Availability and Online Piecemeal Restore
      • Instant File Initialization
      • Mirrored Backups
      • Backup Checksums
      • Backup Compression
    • Other Single-Instance Technologies
      • Online Operations
      • Database Snapshots
      • Hot-Add Memory and CPU
      • Resource Governor
    • Multi-Instance Technologies
      • Log Shipping
      • Transactional Replication
      • Database Mirroring
      • Failover Clustering
      • Combining Multi-Instance Technologies
      • Virtualization
  • Mitigating the Causes of Downtime and Data Loss
  • High-Availability Features Supported by SQL Server 2008 Editions
  • Conclusion

The July 2009 edition of TechNet Magazine is available on the web and include a feature article I wrote explaining backups. It's the first in a 3-part series, with parts 2 and 3 being on recovering from disasters using backups and recovering from disasters without backups.

The backup article covers:

  • Full backups
  • Differential backups
  • Log backups
  • Putting together a backup strategy
  • Bakcup integrity

It debunks a lot of myths about backups, explains in detail how each backup works, and explains the pros and cons of some different backup strategies.

Check it out at Understanding SQL Server Backups - enjoy!

PS I'll also be presenting a session on backup strategies at SQL Connections in November and covering them in my post-con workshop at PASS in November the week before.

PPS I've just signed a contract with TechNet Magazine so that there will be something on SQL Server in every issue - either my SQL Q&A column or a feature article. Very cool!

Chad Miller over on SQLServerCentral has put together a really great collection of scripts that allow you to easily monitor and report on what backups are happening on your system. These should be especially useful for involuntary DBAs, rather than having to dig into the backup history tables themselves. I'll be including a link to his post and downloadable scripts in all my DBA classes.

Check out Chad's article and scripts at Backup Monitoring and Reporting.

There's still a widely held misconception that when properly in the FULL or BULK_LOGGED recovery models that full or differential backups can truncate the log. No. It *NEVER* happens. This is one of the reasons why I'm doing a whole spotlight session on this at PASS this year - the transaction log and its behavior is IMHO one of the most misunderstood parts of SQL Server.

Notice that I said 'when properly in the FULL or BULK_LOGGED recovery models'. If you switch recovery models to FULL or BULK_LOGGED, until you take the first full backup, you are still essentially in the SIMPLE recovery model, and so the log will truncate on checkpoint. Once you take that first full backup, you are then in I-will-manage-the-size-of-the-log-through-log-backups mode. After that, the ONLY thing that will allow the log to clear/truncate is a log backup, as long as nothing else requires those transaction log records.

If you're not familiar with the term 'log clearing' or 'log truncating', they mean exactly the same thing - part of the transaction log is marked as no longer needed and can be overwritten. Nothing is zeroed out, the log file size is not altered. Some background reading on this:

Earlier this week I was involved in a discussion about log backup size management and how to prevent a log backup following a maintenance operation to not contain details of the maintenance operation.

There's a very simple answer: you can't.

If you do an operation in the FULL or BULK_LOGGED recovery models, the next log backup will contain all information required to replay that operation. In the FULL recovery model, everything is fully logged, so the log backup will contain all the log records generated by the operation. In the BULK_LOGGED recovery model, you may perform a minimally-logged operation, which generates hardly any transaction log, but the next log *backup* will be about the same size as if the operation was fully logged - because the log backup will pick up all the data extents modified by the minimally-logged operation.

One point in the discussion was that if you're running in the FULL or BULK_LOGGED recovery models, and you do a full backup after the maintenance operation, and before the log backup, the full backup will contain all the changes made by the maintenance operation, yes, and will clear the log.

No. Never.

A log backup is *ALL* the log generated since the last log backup. If this were not the case, how would log shipping work? You could take a full backup on the log shipping primary and suddenly you've broken the log backup chain and log shipping breaks. No, this is not how things work. A full backup contains only enough transaction log necessary to be able to restore that database to a transactionally consistent time - the time at which the data reading portion of the full backup completed. I blogged about this extensively previously:

But you don't have to believe me - it's very simple to convince yourself. The following script will show you that a full backup has no effect on the transaction log. It does the following:

  • Create a database and put it into the FULL recovery model, with a full backup.
  • Create and populate and index.
  • Take log backup 1 (just to clear things out)
  • Rebuild the index.
  • Take log backup 2.
  • Rebuild the index.
  • Take a full backup.
  • Take log backup 3.

And we will see that log backup #3 is the same size as log backup #2. The full backup will make no difference whatsoever.

Here's the script:

USE master;
GO
DROP DATABASE LogBackupTest;
GO
CREATE DATABASE LogBackupTest;
GO
USE LogBackupTest;
GO

ALTER DATABASE LogBackupTest SET RECOVERY FULL;
GO
BACKUP DATABASE LogBackupTest TO
DISK = 'C:\SQLskills\LogBackupTest_Full1.bak' WITH INIT;
GO

CREATE TABLE t1 (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
CREATE CLUSTERED INDEX t1c1 ON t1 (c1);
GO
SET NOCOUNT ON;
GO
INSERT INTO t1 DEFAULT VALUES;
GO 1000

BACKUP LOG LogBackupTest TO
DISK = 'C:\SQLskills\LogBackupTest_Log1.bak' WITH INIT;
GO

-- Rebuild the index to generate some log and get a baseline
ALTER INDEX t1c1 ON t1 REBUILD;
GO
BACKUP LOG LogBackupTest TO
DISK = 'C:\SQLskills\LogBackupTest_Log2.bak' WITH INIT;
GO

-- Now do it again, but take a full backup before the log backup
ALTER INDEX t1c1 ON t1 REBUILD;
GO
BACKUP DATABASE LogBackupTest TO
DISK = 'C:\SQLskills\LogBackupTest_Full2.bak' WITH INIT;
GO
BACKUP LOG LogBackupTest TO
DISK = 'C:\SQLskills\LogBackupTest_Log3.bak' WITH INIT;
GO

And here's the result:

 

Log backup #3 is the same size as log backup #2. It contains all the log generated since log backup #2 was taken. The full backup had no affect whatsoever, because that would break the log backup chain.

If you don't believe me, run the script yourself and you'll see. A full backup does not and cannot affect the transaction log.

About a year ago, I blogged a cool script that would work out how much of a database has changed since the last full backup - i.e. how big will the next differential backup be. You can find that script at New script: How much of the database has changed since the last full backup?. I'm in the middle of writing a script that will tell you how big your next log backup will be, and as part of it, I need a way to determine how many extents from minimally logged operations will be included in the next log backup. I thought it might be useful to someone so I tidied it up and here it is for you to play with. It's almost the same as the previous script I published.

The code below creates a function and a stored procedure. The basic idea behind the code is as follows:

For each online data file in the database
   For each GAM interval in the file
      Crack the ML map page using DBCC PAGE
      Interpret the ML bitmap to aggregate the changed extents
      Add the sum to the total changed extents for the database
   End
End
Report results

There's a function that I create in msdb call SQLskillsConvertToExtents that cracks some of the DBCC PAGE output, and the main procedure is called sp_SQLskillsMLChangedData and it created as a system object in master. I tried making it a table-valued function but you can't do things like INSERT-EXEC in a function, and that's required for processing the DBCC PAGE output. So - create your own wrapper function or whatever to use it. The interface/output is:

EXEC sp_SQLskillsMLChangedData 'testdb';
GO

Total Extents Changed Extents Percentage Changed
------------- --------------- ----------------------
355           153              43.1

Anyway - here it is. You can download it in a zip file from SQLskillsMLChangedData.zip (2.50KB). Enjoy!

/*============================================================================
   File: SQLskillsMLChangedData.sql

   Summary: This script creates a system-wide SP SQLskillsMLChangedData that
   works out what percentage of a database has been changed by minimally
   logged operations since the last log backup.

   Date: May 2009

   SQL Server Versions:
      10.0.2531.00 (SS2008 SP1)
      9.00.4035.00 (SS2005 SP3)
------------------------------------------------------------------------------
   Copyright (C) 2009 Paul S. Randal, SQLskills.com
   All rights reserved.

   For more scripts and sample code, check out
      http://www.sqlskills.com/

   You may alter this code for your own *non-commercial* purposes. You may
   republish altered code as long as you give due credit.

   THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF
   ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED
   TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
   PARTICULAR PURPOSE.
============================================================================*/

-- Create the function in MSDB
--
USE msdb;
GO

IF EXISTS (SELECT * FROM sys.objects WHERE NAME = 'SQLskillsConvertToExtents')
   DROP FUNCTION SQLskillsConvertToExtents;
GO

-- This function cracks the output from a DBCC PAGE dump
-- of an allocation bitmap. It takes a string in the form
-- "(1:8) - (1:16)" or "(1:8) -" and returns the number
-- of extents represented by the string. Both the examples
-- above equal 1 extent.
--
CREATE FUNCTION SQLskillsConvertToExtents (
   @extents VARCHAR (100))
RETURNS INTEGER
AS
BEGIN
   DECLARE @extentTotal INT;
   DECLARE @colon INT;
   DECLARE @firstExtent INT;
   DECLARE @secondExtent INT;
   SET @extentTotal = 0;
   SET @colon = CHARINDEX (':', @extents);

   -- Check for the single extent case
   --
   IF (CHARINDEX (':', @extents, @colon + 1) = 0)
      SET @extentTotal = 1;
   ELSE
      -- We're in the multi-extent case
      --
      BEGIN
      SET @firstExtent = CONVERT (INT,
         SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      SET @colon = CHARINDEX (':', @extents, @colon + 1);
      SET @secondExtent = CONVERT (INT,
         SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      SET @extentTotal = (@secondExtent - @firstExtent) / 8 + 1;
   END

RETURN @extentTotal;
END;
GO

USE master;
GO

IF OBJECT_ID ('sp_SQLskillsMLChangedData') IS NOT NULL
   DROP PROCEDURE sp_SQLskillsMLChangedData;
GO

-- This SP cracks all minamally-logged bitmap pages for all online
-- data files in a database. It creates a sum of changed extents
-- and reports it as follows (example small msdb):
--
-- EXEC sp_SQLskillsMLChangedData 'msdb';
-- GO
--
-- Total Extents Changed Extents Percentage Changed
-- ------------- --------------- ----------------------
-- 355           153             43.1
--
CREATE PROCEDURE sp_SQLskillsMLChangedData (
   @dbName VARCHAR (128))
AS
BEGIN
   SET NOCOUNT ON;

   -- Create the temp table
   --
   IF EXISTS (SELECT * FROM msdb.sys.objects WHERE NAME = 'SQLskillsDBCCPage')
      DROP TABLE msdb.dbo.SQLskillsDBCCPage;

   CREATE TABLE msdb.dbo.SQLskillsDBCCPage (
      [ParentObject] VARCHAR (100),
      [Object] VARCHAR (100),
      [Field] VARCHAR (100),
      [VALUE] VARCHAR (100));

   DECLARE @fileID INT;
   DECLARE @fileSizePages INT;
   DECLARE @extentID INT;
   DECLARE @pageID INT;
   DECLARE @MLTotal INT;
   DECLARE @sizeTotal INT;
   DECLARE @total INT;
   DECLARE @dbccPageString VARCHAR (200);

   SELECT @MLTotal = 0;
   SELECT @sizeTotal = 0;

   -- Setup a cursor for all online data files in the database
   --
   DECLARE files CURSOR FOR
      SELECT [file_id], [size] FROM master.sys.master_files
      WHERE [type_desc] = 'ROWS'
      AND [state_desc] = 'ONLINE'
      AND [database_id] = DB_ID (@dbName);

   OPEN files;

   FETCH NEXT FROM files INTO @fileID, @fileSizePages;

   WHILE @@FETCH_STATUS = 0
   BEGIN
      SELECT @extentID = 0;

      -- The size returned from master.sys.master_files is in
      -- pages - we need to convert to extents
      --
      SELECT @sizeTotal = @sizeTotal + @fileSizePages / 8;

      WHILE (@extentID < @fileSizePages)
      BEGIN
         -- There may be an issue with the ML map page position
         -- on the four extents where PFS pages and GAM pages live
         -- (at page IDs 516855552, 1033711104, 1550566656, 2067422208)
         -- but I think we'll be ok.
         -- PFS pages are every 8088 pages (page 1, 8088, 16176, etc)
         -- GAM extents are every 511232 pages
         --
         SELECT @pageID = @extentID + 7;

         -- Build the dynamic SQL
         --
         SELECT @dbccPageString = 'DBCC PAGE ('
            + @dbName + ', '
            + CAST (@fileID AS VARCHAR) + ', '
            + CAST (@pageID AS VARCHAR) + ', 3) WITH TABLERESULTS, NO_INFOMSGS';

         -- Empty out the temp table and insert into it again
         --
         DELETE FROM msdb.dbo.SQLskillsDBCCPage;
         INSERT INTO msdb.dbo.SQLskillsDBCCPage EXEC (@dbccPageString);

         -- Aggregate all the changed extents using the function
         --
         SELECT @total = SUM ([msdb].[dbo].[SQLskillsConvertToExtents] ([Field]))
         FROM msdb.dbo.SQLskillsDBCCPage
            WHERE [VALUE] = ' MIN_LOGGED'
            AND [ParentObject] LIKE 'ML_MAP%';

         SET @MLTotal = @MLTotal + @total;

         -- Move to the next GAM extent
         SET @extentID = @extentID + 511232;
      END

      FETCH NEXT FROM files INTO @fileID, @fileSizePages;
   END;

   -- Clean up
   --
   DROP TABLE msdb.dbo.SQLskillsDBCCPage;
   CLOSE files;
   DEALLOCATE files;

   -- Output the results
   --
   SELECT
      @sizeTotal AS [Total Extents],
      @MLTotal AS [Changed Extents],
      ROUND (
         (CONVERT (FLOAT, @MLTotal) /
         CONVERT (FLOAT, @sizeTotal)) * 100, 2) AS [Percentage Changed];
END;
GO

-- Mark the SP as a system object
--
EXEC sys.sp_MS_marksystemobject sp_SQLskillsMLChangedData;
GO

-- Test to make sure everything was setup correctly
--
EXEC sp_SQLskillsMLChangedData 'msdb';
GO

This week's survey is inspired from many stories I saw on the forums and Twitter this week - mostly bad, one good (someone I'm following is spending the weekend testing their disaster recovery plan - cool!). I'd like to know whether you're ever tested your disaster recovery plan, and if so, what happened? I'll report on the results sometime of the weekend of 5/24/09.

And what do I mean by disaster recovery? The definition varies from just restoring a backup to conducting a full failover of a datacenter - but I'd like you to decide for yourself. I'll editorialize more about this next week.

As always, a big Thanks! for contributing to the blog by responding. Please shoot me an email (Contact button, bottom left of the blog) if you have an idea for a good survey.

PS No comments on this post please (I'll delete them straight away) - please wait for the survey results post to avoid skewing the answers.

PPS Probably a lot less blogging/Twitter (@PaulRandal) this coming week - I'm teaching all 5 days.

Last week's survey was on what kind of backups you take, along with the recovery model used (see here for the survey). Here are the results as of 5/2/2009.

 

The 'other' responses were combinations of the other answers.

This survey is a bit of a pre-cursor to my article on Understanding SQL Server Backups that will be in the July TechNet Magazine (available start of June). In that article I explain how the three major backup types work, and then how to combine them into an effective backup strategy, so I don't want to steal all my own thunder.  In the spirit of my weekly surveys, this is a 20-minute, brain-dump editorial, rather than a very carefully planned out article.

When I'm teaching classes on Database Maintenance or High-Availability, I always tell people not to plan a backup strategy; plan a restore strategy. You don't want to come up against a disaster recovery situation and find that the only backup you have of your multi TB 24x7 busy OLTP database is a full from several weeks ago. Kiss your job goodbye on that one. You have to make sure that you have backups that allow you to recover as quickly as possible and with the minimum of data loss.

So, you're probably taking backups so that you can restore when disaster strikes. Some of you may also be taking backups because:

  1. You periodically restore the database onto a reporting server, onto a data warehouse server, or onto test/dev servers
  2. You've got log shipping implemented (backup, copy, restore; backup, copy, restore;...)
  3. You've got database mirroring implemented (and so you must use the FULL recovery model, and thus must take log backups to manage the transaction log)
  4. You're in the FULL recovery model, even though you're not interested in up-to-the-minute or point-in-time recovery, and must take backups to manage the size of the transaction log

The survey included what recovery model you're running in too - basically SIMPLE or FULL. I don't know anyone that runs all the time in the BULK_LOGGED recovery model; and most 24x7 systems cannot switch into BULK_LOGGED because of the possibility of crashing and not being able to take a tail-of-the-log backup, if a minimally-logged operation has occurred since the last log backup.

In the SIMPLE recovery model, you cannot take log backups - so you're basically saying that you don't need to be able to do up-to-the-minute recovery, point-in-time recovery, or single-page restores. That's cool - it totally depends on your situation, and is you're running in SIMPLE it means you understand that there's no point running in FULL and having to take log backups if you're not interested in using them. If you're in the group covered by #4 above, switch to SIMPLE!

In the FULL recovery model, you have to take log backups - plain and simple - otherwise the log will grow out of control and eventually your database will grind to a halt when the log runs out of space. I've done a few blog posts about that so I won't labor the point (see Search Engine Q&A #1: Running out of transaction log space, Search Engine Q&A #23: My transaction log is full - now what?, and Importance of proper transaction log size management).

In either recovery model, there's the question of just how often should the various kinds of (I like to think of) 'mandatory' backups be taken, and whether to use the 'optional' differential backups.  Again, not stealing my own thunder from TechNet Magazine: a full backup is a copy of all the data in the database; a log backup is all the transaction log generated since the last log backup (or first full/differential following a break in the log backup chain); a differential backup is all data that has changed since the last *full* backup.

You need to take regular full backups - but just how regular depends. If the database is very large, and you need to keep your backups around for regulatory purposes, you might choose to take a full backup every couple of weeks or a month, with compression. Commonly I see people taking a full backup once a week.

Almost 20% of respondents are in SIMPLE and only take full backups - I wonder how many realize that a full backup only gives you a single point-in-time to which you can recover - and you lose all work since the last full backup. In the SIMPLE recovery model, you can't take log backups, so if you want to reduce the amount of data loss when a disaster occurs, you'll need to take differentials too, which only a handful of respondents do. Although this still means you'll have some data loss, it's vastly reduced. The amount of potential data loss will be the amount of work since your last (e.g. daily) differential backup, rather than since your last (e.g. weekly) full backup.

In the FULL recovery model, only a few % of respondents are NOT taking log backups - which means they shouldn't be in the FULL recovery model, or are being forced to (e.g. from using mirroring) and don't realize that having a redundant copy of the database isn't a good enough HA strategy - you have to have backups too in case of a secondary failure.

The vast majority of respondents use the FULL recovery model taking full and log backups (45% - which is what I expected) and about 25% are also taking differentials too. This is a more advanced strategy and can seriously REDUCE your downtime in the event of a disaster. A differential backup basically short-circuits the need to restore all the log backups that were taken in the time between the last full backup and that differential backup. You'll find out more on this in the article. For the ultimate in flexibility and fast recovery, this is the way to go - but at the cost of a little more complicated backup strategy, and extra storage space for the differential backups.

Now, what about the exotic answers at the end of the survey?

  • OS-level backups: this isn't a popular solution *at all* because of the complexity of getting the SQL data back out from the OS-level backup and recovered. I'm not surprised that no-one selected this.
  • SAN-level backups: I've seen a few customers do this, with mixed results. You must make sure that the SAN admin knows what they're doing - I heard of one customer who had data and log files on different LUNs. The SAN admin snapped the data LUN, and *then* the log LUN. Every so often, corruption would occur...
  • Shutdown SQL and copy the files: Just don't even think about this. Taking downtime to take a backup is daft, and what if the database is corrupt and won't re-attach?
  • Don't take backups: No need to discuss this one.

Ok - that's a quick blast that should give you some idea of why you'd want to make sure you've got the *right* backups, not just any old backup strategy.

Next post - this week's survey!

(PS I'm really enjoying being on Twitter - lots on interesting stuff. See me at http://twitter.com/PaulRandal)

In this week's survey I'm interested in what kind of backups you take, and also what recovery model you mostly use. If you have multiple database with different strategies, by all means respond multiple times. The more responses the better! I'll report on the survey results some time over the weekend of 5/2/2009.

A couple of notes on the survey:

  • If you're using BULK_LOGGED at all, just answer as FULL.
  • When I say full backup, I mean either full database, full filegroup, or full file backup. Same for differential.
  • I don't care whether you use a tool outside SQL Server to take your SQL backups (e.g. LiteSpeed) - it's the type of backup that's interesting.

Thanks!

PS No comments on this post please (I'll delete them straight away) - please wait for the survey results post to avoid skewing the answers.

A short post to start the day (in India) and then at lunchtime I'll do how it works: FILESTREAM garbage collection.

This is a question that came up recently on an MVP discussion list: why does a 200MB log backup take 40 minutes to restore on my log-shipping secondary?

First answer I thought of was that the I/O subsystem on the log-shipping secondary was very slow, or broken in some way, so I/Os were taking ages to complete. Nope - I/O subsystem was performing well.

Next thing I suggested was the the log-shipping secondary was restoring the log backups using WITH STANDBY and there was something like an index rebuild that was being rolled back. As an aside, when you restore log backups on the log-shipping secondary, you have a choice how they recovery: either WITH NORECOVERY or WITH STANDBY. The first option doesn't allow any access to the database, as the database is still "in recovery". The second option runs the REDO part of recovery, then runs the UNDO part of recovery, but saves the details of what it had to do for UNDO into a file (who's name and location you specify). It then allows read-only access to the database, for queries/reporting/whatever. Obviously if there's a lot of transaction log that has to be undone (i.e. rolling back transactions that weren't committed at the time the log backup completed), then this could take some time. But 40 minutes? No. That wasn't it.

I was partly right on my second guess. The *previous* log backup that was restored WITH STANDBY contained a long-running index operation, and so the undo file that the restore created was *huge*. The next log backup that's restored after a RESTORE ... WITH STANDBY, must first undo everything in the undo file (i.e. put the database back to the exact state it was in as if the WITH STANDBY part of the restore never happened) before it can restore more transaction log. In this case, no-one had noticed that the previous log restore *also* took a lot of time and created the huge undo file. It just looked like the 200MB log backup was causing the problem.

My advice: in any kind of there's-suddenly-a-corruption-or-performance-problem situation, don't just look at what's happening now. Always look at what happened leading up to the problem, as the answer is often there.

Last week's survey was on how *you* manage the size of your transaction log (see here for the survey). Here are the results as of 4/10/2009 - the most popular survey yet:

In my opinion (guarantee that I'm going to say stuff that some of you won't agree with!), there are only two valid answers in this survey: #1 - run in the FULL recovery model and take regular log backups, and #4 - run in the SIMPLE recovery model all the time. The last answer is applicable if you run out of log space even though you're in either of these situations but isn't a general strategy like #1 or #4. IMHO, you should be in one of these two situations and in the rest of this editorial I'll explain why. I'm not going to touch on *why* your transaction log might start to fill up, instead here are some links:

Now for the survey options:

  1. Take regular log backups. I'm very pleased to see the vast majority of respondents doing this, as it's the only valid log size management strategy when in the FULL recovery model (same thing applies to BULK_LOGGED of course, but with a few twists). Once you take that first database backup in the FULL recovery model, SQL Server assumes you're going to manage the transaction log size from that point on by taking log backups. Unfortunately that isn't documented in big, red flashing letters in Books Online - so people can get themselves into troubel inadvertently. Also, the FULL recovery model is the default, and is required for database mirroring - which further adds to the potential for people (such as involuntary DBAs) to accidentally switch into I-will-take-log-backups mode and then not take them. If you don't want to take log backups for recovery purposes, or you don't want to use database mirroring, don't use the FULL recovery model - it's as SIMPLE as that (ha ha). You might argue and say that you're only using FULL because of database mirroring, and don't want to take log backups. I'd argue back and say that if you care enough to have hot standby of your database, you must also take backups - as you can't rely solely on a redundant copy of your database on a different server.
  2. Use BACKUP LOG WITH NO_LOG/TRUNCATE_ONLY regularly. These two commands do basically the same thing - allow the log to be cleared without taking a log backup. What's the point if you're not taking log backups? - just switch to SIMPLE and let the checkpoints clear the log. In fact, in 2008 these two commands have been removed. See my blog post BACKUP LOG WITH NO_LOG - use, abuse, and undocumented trace flags to stop it.
  3. Use BACKUP LOG WITH NO_LOG/TRUNCATE_ONLY when it fills up. Same as above. You might argue that you're only keeping the log around in case there's a disaster, so that you can take a log backup at that point and use it to recover up to the point of the disaster. I'd argue that's broken on two counts: 1) what if the log file is damaged and you can't back it up? 2) that's *all* the transaction log since the last full database backup you took (if you break the log backup chain and then take a full database backup, that backup becomes the base of subsequent log backups) so that may take a long time to restore and replay...
  4. Run in the SIMPLE recovery model all the time. If you don't need to use FULL, don't. Running in SIMPLE is perfectly acceptable, as long as you don't mind losing work in the event of a disaster.
  5. Switch to SIMPLE when it fills up, then back to FULL. This is like #s 2 and 3 - what's the point?
  6. Switch to SIMPLE when it fills up, shrink the log, then switch back to FULL. This is worse than 2, 3, or 5. If you shrink the log, then it's going to grow again - possibly causing VLF fragmentation (see Transaction Log VLFs - too many or too few?), and definitely causing your workload to pause while the log grows, as the log can't use instant initialization (see Search Engine Q&A #24: Why can't the transaction log use instant initialization?).
  7. Shutdown SQL Server and delete the transaction log file(s). Just don't get me started on this one - I'm glad no-one 'fessed up to doing it. There are many reasons why this is daft, including: 1) you have to shutdown to do it, so your workload is off-line 2) if the database wasn't cleanly shut down, it won't be able to be started again without using EMERGENCY mode repair, and your data will be transactionally inconsistent 3) as the log can't be instant initialized, the database won't come online until the log has been created and zero'd. Just don't do this. Ever.
  8. Add more transaction log files. As a general strategy for managing the size of the log, this is not good. Extra log files have zero effect on performance (the myth that SQL Server writes in parallel to the log files is just that - a myth) and make management more tricky. However, if your log fills up for some out-of-the-ordinary reason, then adding another log file may be the only way to allow the database to keep running. If you have to do this though, make sure that you remove the extra log file(s) when you can to keep log management uncomplicated.

To summarize, if you want to be able to take log backups to aid in point-in-time or up-to-the-second recovery, use the FULL recovery model. If not, use SIMPLE and you won't need to mess around with the log when it fills up because you're not taking log backups.

Next post - this week's survey! (And thanks to all those who are responding to them!)

PS For those of you who sent me details about your databases from the survey back at the start of March (see here) - I haven't forgotten. I was waiting to get a decent sample size and now I'm going to go through the data. If you want to send me any more data, you've got until Sunday.

Just saw this on a forum - running REPAIR_ALLOW_DATA_LOSS on msdb to cope with corruption. Yes, this will work but it's certainly not safe and something I'd only recommend as a last resort.

Here's why: msdb stores your backup history and your Agent jobs. REPAIR_ALLOW_DATA_LOSS works by usually "delete what's broken and fix up all the links" (quoting myself). If you run repair on msdb, you need to pay close attention to what was deleted by repair, as it could lead to big problems later on. Some examples:

  1. What if repair deletes a record which just happened to store the Agent job that runs the transaction log backups for your main production database? Suddenly your log isn't being backed up and you don't know about it. The log starts to grow and eventually runs out of space. The database stops and your application is down until you figure out what's wrong.
  2. What if repair deletes a record which just happened to store the details of a log backup of the production database? Your have a disaster and run the script that looks through the backup history tables and auto-generates RESTORE statements to get you up and running again with up-to-the minute recovery. Because of the missing record, there's a missing RESTORE LOG statement in the middle of the restores of the log-backup-chain for the production database. So the restores fail, and you're down until you figure out what's wrong.

Bottom line, it's not safe to run repair on msdb - proceed with extreme caution. Not that repair won't work, or will cause corruption, but that it's far better to have backups of msdb that you can restore from.

(Yes, Kimberly's lecturing again...)

Here's something that I've seen crop up a lot recently on corruption forums:

Server: Msg 602, Level 21, State 50, Line 1
Could not find row in sysindexes for database ID 10, object ID 1, index ID 1. Run DBCC CHECKTABLE on sysindexes.

This isn't corruption - it comes from trying to attach a 2005 database to a 2000 server. If you try to restore a 2005 database on a 2000 server, you'll see:

Server: Msg 3169, Level 16, State 1, Line 1
The backed-up database has on-disk structure version 611. The server supports version 539 and cannot restore or upgrade this database.
Server: Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

This gives a more useful error. Basically, SQL Server is not up-level compatible in terms of the database physical structures. A SQL 2000 server cannot understand the new structures that are in a SQL 2005 database. There's a lot of confusion about this, and why setting database compatibility level is not the same as the physical version of the database. My blog post Search Engine Q&A #13: Difference between database version and database compatibility level has more details.

Trying to do the same thing with a 2008 database on a 2005 server is a bit better. For the attach we get:

Msg 1813, Level 16, State 2, Line 1
Could not open new database 'Dbmaint2008'. CREATE DATABASE is aborted.
Msg 948, Level 20, State 1, Line 1
The database 'Dbmaint2008' cannot be opened because it is version 655. This server supports version 612 and earlier. A downgrade path is not supported.

(Note that 611 is the physical version for all SQL Server 2005 builds, but if VARDECIMAL is enabled, the version gets bumped by one to 612 - long story...)

But the restore error is still a little cryptic:

Msg 3241, Level 16, State 7, Line 1
The media family on device 'c:\sqlskills\dbmaint2008.bck' is incorrectly formed. SQL Server cannot process this media family.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

Bottom line - you can't attach a higher-version database to a lower-version server.

After teaching some of the MCM-SharePoint class last week, one of the attendees pointed me at a blog post about measuring churn in SharePoint databases. The poster gave code to measure how large full backups are, which really only measures how much data there is in the database, not whether existing data has changed. If two successive full database backups are the same size, there's no way to tell how much changed - and of course, you need to take a full backup to be able to tell whether the size changed.

A while ago I wrote a script that would enable SQL database DBAs to tell how big the next differential backup will be. A differential backup contains everything that's changed since the last full database backup, so every new and everything changed. Although this still doesn't show whether a single piece of existing content changed multiple times, it can still show whether existing content changed at least once. Better still, you don't need to take any kind of backup to run this script.

So, to get an idea of the churn rate of your content databases, check out this script - see New script: How much of the database has changed since the last full backup?.

Enjoy!

PS If you're using SharePoint and find this useful, please let me know and if there's any other scripts that would be useful - Kimberly and I are both starting to get more into SharePoint admin from a SQL perspective.

Last week I kicked off the first weekly survey - on whether you validate your backups or not (see here for the survey). The results are very interesting (as of 3/13/09):

As you can see, almost 25% of respondents never validate their backups! And a further 25% only validate them occasionally, with 30% doing some kind of regular checks, and only a handful checking all the time.

While these results may seem shocking to you, based on what I've heard when teaching, they're pretty normal. There are lots of reasons why DBAs may choose not to validate backups as often as they should, including:

  • Not enough time to restore the backups to check them
  • Not enough disk space
  • Not part of the day-to-day operations guide
  • Don't see why it's important

Kimberly and I have a saying (well, to be fair, Kimberly coined it): you don't have a backup until you've restored it. You don't know whether the backup you just took was corrupt or not and will actually work in a disaster recovery situation.

Can you ever get a guarantee? No. Here's an analogy, taken from a very old post of mine. Consider Paul, who works for the Seattle Police Department in traffic control. Paul's in a control room somewhere in the city with a large bank of monitors connected to various traffic cameras. Paul's job is to cycle through the cameras every 1/2 hour, looking for traffic accidents. At the end of the 1/2 hour cycle, if Paul han;t seen any accidents then he knows that there are no accidents in the city.

Ah - but hold on. Does Paul really know that? No. All Paul knows is that at the point he looked at a particular camera, there was not an accident at that spot in the city. The very instant he switches to another camera feed, an accident could happen at a spot covered by the previous camera.

The same is true for validating backups. As soon as you've validated a backup, it could then be corrupted by the I/O subsystem, but at least you know that it was valid at some point. But what if that happens, I hear you ask? Well, then you need to have multiple copies of your backups, and you should not rely on backups as the only method of disaster recovery. A good high-availability solution includes as many technologies as you need to mitigate all risks - and backups are just one of those technologies. You're going to have to have some kind of redundant system too that you can fall back on (or mayb even immediately failover to, depending on your particular disaster recovery plan). But, saying that, you can't rely on the redundant server either - if it goes wrong, you'll need your backups.

So - whichever way you look at it, validating backups is a really good practice to get into so you don't get bitten when it comes to the crunch. When I teach, I've got many stories of customers losing data, business, time, and money (and DBAs losing their jobs) because the backups didn't work or were destroyed along with the data. Here's one for you (simplified, and no I won't divulge names etc). Major US investment firm decides to provision new hardware, so takes a backup of the database storing all the 401k accounts for all their customers (private and corporate), flattens the hardware, and goes to restore the backup. The backup is corrupt - on SQL 2000, where there's no RESTORE ... WITH CONTINUE_AFTER_ERROR. What happened? Well, the SQL team and Product Support had to get involved to help get the data back, but people in the firm lost their jobs and it cost a lot of time and money to recover the data. If only they'd had multiple copies of the backup, and tested their backup before removing the database (or better yet, restored the database on the new hardware before flattening the old hardware). They learned a costly lesson, but they did change their practices after that.

Unfortunately this is so often the way - people don't realize they need to validate backups or have an HA plan UNTIL they have a disaster. Then suddenly its the top priority at the company. Being proactive can save a lot of grief, and make you look good when disaster strikes.

Backups can be unusable for a number of reasons, including:

  • The full backup is corrupt, because something in the I/O subsystem corrupted it.
  • A backup in the log backup chain is corrupt, meaning restore cannot continue past that point in the chain.
  • All backups following a full backup are written to the same backup set, but the WITH INIT clause is used accidentally on all backups, meaning the only backup present in the backup set is the last one taken.
  • An out-of-band backup was taken without using the WITH COPY_ONLY clause and the log backup chain was broken (see BACKUP WITH COPY_ONLY - how to avoid breaking the backup chain).
  • The backups worked but the database contained corruption before it was backed up (kind of a separate issue).

The only ones that are out of your control are the first two, but they can be mitigated by having multiples copies of backups. All of these though, can be avoided at disaster recovery time by reguarly restoring your backups as a test of what you'd do if there was a real disaster. You might be surprised what you'd find out...

This is more of an editorial style post than a deep technical or example script post - I'm going to start doing more of these around the weekly surveys. Next post - this week's survey.

Thanks!

This is an interesting case that cropped up yesterday - the transaction log is damaged so a log backup doesn't work (with the error below):

Backup detected log corruption in database FakeDBName. Context is FirstSector. LogFile: 2 'F:\SQLLOGS\XYZ\FakeDBName_Log.ldf' VLF SeqNo: x502e VLFBase: x2ce40000 LogBlockOffset: x2d0a9000 SectorStatus: 2 LogBlock.StartLsn.SeqNo: x4ee3 LogBlock.StartLsn.Blk: x1348 Size: x200 PrevSize: x400
2009-03-06 10:00:02.61 Backup      Error: 3041, Severity: 16, State: 1.
2009-03-06 10:00:02.61 Backup      BACKUP failed to complete the command BACKUP LOG FakeDBName Check the backup application log for detailed messages.
2009-03-06 10:00:03.61 Backup      Error: 3041, Severity: 16, State: 1.

However a full backup succeeds, as does a DBCC CHECKDB. What's going on?

The answer comes with understanding what portions of the transaction log are required for these operations. (For a good understanding of the transaction log itself, along with logging and recovery, see my article in the February TechNet Magazine.)

A transaction log backup, by its very nature, has to backup *all* transaction log generated since the last log backup - so it will try to backup the corrupt portion and fail.

A full database backup only has to backup enough transaction log to allow the database to be restored and recovered to a transactionally consistent point. In other words, it only requires the transaction log back to the beginning of the oldest active transaction at the point that the data-reading section of the full backup completes. This is a source of immense confusion - many people don't believe that a full (or differential) backup needs to also backup some transaction log. For a more in-depth study of this, see my previous blog posts Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes.

A DBCC CHECKDB operation uses a database snapshot to get a transactionally consistent view of the database on which to run consistency checks. When the database snapshot is created, crash recovery is run on it to make it transactionally consistent. That requires the same amount of log as if a full backup was taken - back to the beginning of the oldest active transaction at the time the database snapshot is created. See CHECKDB From Every Angle: Complete description of all CHECKDB stages for more info.

So - it's entirely possible for the situation reported above to exist. The question then becomes, how to recover from it?

Assuming that the database files are intact, there is a simple solution. This solution will break the log backup chain, but given that the log is corrupt so a log backup cannot be taken, the log backup chain is *already* broken. Here's what to do:

  1. Stop all user activity in the database
  2. Switch to the SIMPLE recovery model (breaking the log backup chain and removing the requirement that the damaged portion of log must be backed up)
  3. Switch to the FULL recovery model
  4. Take a full database backup (thus starting a new log backup chain)
  5. Start taking log backups

You might want to manually shrink and grow the log file in between steps 2 and 3 too - in case the log file is on a damaged portion of disk - or maybe even shrink it right down and add another log file on an undamaged disk. You also will need to do some root-cause analysis to determine why the corruption occured in the first place, and to take preventative measures to stop it happening again.

Hope this helps

PS In my previous post, Testing a new survey method: backup validation, the answer with the largest number of responses so far is that people never verify their backups - very disturbing!

I'm toying with the idea of having a weekly survey that'll highlight an interesting facet of database management. I've signed up with SurveyPopups.com, which is free and allows you to see the results as you vote on them.

If you think this is a cool idea, vote in the survey using the options below. If I get more than 100 people voting then I'll start doing one weekly or so and using the results to seed a blog post.

Thanks

Back in 2005 Kimberly produced two very popular webcast series - an 11-part webcast series for TechNet called SQL Server 2005 for the IT Professional and a 10-part webcast series for MSDN called A Primer for Proper SQL Server Development. The webcast links and blog posts were broken for quite a while but now they're all fixed up and working again. I've created some web pages that link to all the webcasts and blog posts, along with abstracts. I've also included some more recent ones too and will be adding to the list over the next few weeks.

There's over 30 hours of good stuff to watch - check them out at http://www.sqlskills.com/webcasts.asp

OK - last content post today. I forgot that the February TechNet Magazine also has the latest edition of my regular SQL Q&A column. This month's column covers:

  • Should backup compression be enabled at the instance level?
  • Client redirection during database mirroring failovers
  • Partition-level lock escalation in SQL Server 2008
  • Is it ever safe to rebuild a transaction log?

Check out the column at http://technet.microsoft.com/en-us/magazine/2009.02.sqlqa.aspx

Here's a question I got from someone who attended our database maintenance workshop at PASS last week (paraphrased):

I attended your pre-conference session on database maintenance and found it to be very informative.  From what you told use though, I think I need to change my nightly backup procedure.  I like to get my databases back to as small of a size as possible before backing them up, so I run the following commands to do this before taking the full database backup.  Could you help me with a better way of doing this? We're on SQL Server 2005.

BACKUP LOG <mydbname> WITH NO_LOG

DBCC SHRINKDATABASE (<mydbname>)

And here's the answer I sent back:

How large is the database? And how long must you keep the backups around? If the cumulative size of the backups takes up a large proportion of your available storage space (and we're talking more than just a single direct-attached 100+GB drive), then it may be worth compressing the backups - otherwise you're likely causing yourself more trouble than its worth.
 
By doing BACKUP LOG WITH NO_LOG you're effectively throwing away log records and removing the possibility of doing any kind of point-in-time, or up-to-the-second recovery (see BACKUP LOG WITH NO_LOG - use, abuse, and undocumented trace flags to stop it). If you're running in the FULL recovery model, and you don't care about either of these features, then you should switch to the SIMPLE recovery model. If you really want to be in FULL, don't ever use WITH NO_LOG.

The amount of transaction log that a full backup requires cannot be changed by you truncating the log. The full backup will backup any log it requires to enable the restored database to be a transctionally consistent copy of the database. See Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes.

Doing a DBCC SHRINKDATABASE (the same exact operation as a database auto-shrink) will cause massive index fragmentation, and cause file-system fragmentation of the data files, as they will likely need to grow again after you've squeezed all the space out of them. See Auto-shrink - turn it OFF! for more details on the effects.

If you're really concerned about backup sizes and space is at a premium, I recommend using a 3rd-party backup compression tool such as LiteSpeed or HyperBac so you're not affecting the actual database. Remember also that SQL Server 2008 has native backup compression too - see my blog post here for more details.

Hope this helps

This blog post explains the demo scripts and databases I've posted to cover all the Corruption Survival Techniques and DBCC CHECKDB sessions I've presented at conferences this year. There are two zip files you need to download: the example corrupt databases (36-MB zip) and the demo scripts. These are also both available through our past events pages.

The databases zip contains the following databases:

  • DemoDataPurity
    • 192-MB SQL Server 2005 database with a single 2570 (data purity) error
  • DemoFatalCorruption1
    • 1-MB SQL Server 2005 database with a corrupt system table (that allows CHECKDB to complete)
  • DemoFatalCorruption2
    • 1-MB SQL Server 2005 database with a corrupt system table (that terminates CHECKDB)
  • DemoNCIndex
    • 192-MB SQL Server 2005 database with a bunch of nonclustered index corruptions
  • DemoRestoreOrRepair
    • 1-MB SQL Server 2005 database with a page checksum failure (in fact a zero'd out page)
  • DemoCorruptMetadata
    • 1-MB SQL Server *2000* database with corrupt syscolumns table

The scripts zip contains the following directories:

  • 1 - Fatal Errors
    • This makes use of the DemoFatalCorruption1 and DemoFatalCorruption2 databases. The FatalErrors.sql script has the steps to follow and see this blog post for a complete walk-through.
  • 2 - NC Indexes
    • This makes use of the DemoNCIndex database. The NCIndexCorruption.sql script has the steps to follow and see this blog post for a complete walk-through.
  • 3 - Data Purity
    • This makes use of the DemoDataPurity database. The DataPurityCorruption.sql script has the steps to follow. I'll blog more details on this script sometime in the next week (and then update this post).
  • 4 - Metadata
    • This makes use of the DemoCorruptMetadata database. The CorruptMetadata.sql script has the steps to follow and see this blog post for a complete walk-through.
  • 5 - Restore or Repair
    • This makes use of the DemoRestoreOrRepair database. There's a setup script (01CreateRestoreOrRepair.sql) and two demo scripts - FixUsingPageRestore.sql and FixUsingRepair.sql. I'll blog more details on these scripts sometime in the next week (and then update this post).
  • 6 - Suspect Database
    • This demo doesn't come with a corrupt database - you create your own one. The SuspectDatabase.sql script has the steps to follow and see this blog post for a complete walk-through.

Enjoy!

It's really scary how quickly time flies - seems like it was just last week when I last blogged about TechNet Magazine (actually it was a month ago when I blogged about my 2008 Change Tracking article - see here). Anyway, a new issue of TechNet Magazine has just come out and this one has the latest installment of my bi-monthly SQL Q&A column.

This month's topics are:

  • How row-overflow columns can lead to poor range scan performance, even on completely defragmented indexes.
  • Combining database mirroring and failover clustering without undesired failover behavior.
  • Adding differential backups to a full+log backup strategy to lower recovery time.
  • Memory settings for multi-instance failover clustering

You can get to the column online at http://technet.microsoft.com/en-us/magazine/dd228989.aspx.

Enjoy!

PS If you have any ideas for what would make a good SQL Q&A topic, please drop me a line - paul@sqlskills.com

Today's post is based on a bunch of questions I've had around the various compression features in SQL Server 2008.

Does turning on data compression or backup compression compress the transaction log files on disk?

No. The transaction log is not compressed in any way under any circumstances. Rows from tables and indexes that have compression enabled will be logged in their row compressed forms, even if page compression is enabled.

Does data compression compress LOB columns?

No. There is no native compression support for any LOB columns (n/text, image, n/varchar(max), varbinary(max), XML), whether stored in-row or out-of-row. There's also no native compression support for FILESTREAM data.

Does log shipping use compression to compress the logs being shipped?

Log shipping does not ship transaction logs - it ships log *backups*. If backup compression is enabled for the instance hosting the log shipping primary database, or the log shipping job is changed to enable backup compression, then the log backups will be compressed and less data will be sent over the wire to the log shipping secondary(s).

Is backup compression the same as log stream compression with database mirroring?

No. Backup compression compresses backups (see my previous blog post here). Log stream compression with database mirroring compresses transaction log records before sending them between the principal and the mirror (see my previous blog post here).

Should I just turn on backup compression at the instance level?

Not necessarily. It depends whether the majority of database on the instance will benefit from backup compression. Backup compression (and any compression algorithm) uses CPU whether a decent compression ratio is achieved or not. Check what compression ratio is achieved first and then enable backup compression if its worth it. Otherwise, just enable it for individual databases.

Does data compression use the same algorithm as backup compression?

No. Backup compression uses a proprietary block-based compression algorithm that is part of Windows. Data compression uses up to 3 algorithms, depending on the level of compression configured. Row compression just makes all non-LOB columns into variable-length columns. Page compression does row compression, then common-prefix compression for each table columns, then common-value dictionary compression for each page. Details can be found at the following BOL sections: Row Compression Implementation  and Page Compression Implementation.

Hope this helps!

On Tuesday we had a look around our customer's data centers here in Austria - hidden away in the labryinthine bowels of a very large building in Vienna. Typical data centre with a halon fire extinguishing system but exceptional in its neatness and organization. The star of the show was their new HP Superdome - 32 dual-core Itaniums with 1/2 a terabyte of memory. Very nice - providing a lot of headroom for their workload to grow (currently at four hundred thousand SQL statements per *second* - spelled out to show there's no accidentally added zeros).

One of the systems they showed us is responsible for doing backups. They stripe the backups across 12 devices using multiple network cards and can manage to backup 2 terabytes in two hours after tweaking the BLOCKSIZE, BUFFERCOUNT, and MAXTRANSFERSIZE! Now comes the cool(er) part - with backup compression on SQL Server 2008 they've benchmarked backing up 2 terabytes in 36 minutes! That's a pretty awesome number and makes for some excellent disaster recovery times.

(Details printed with permission)

It's been almost exactly a week since the last post - an unusually long time for me. Kimberly and I were teaching the first week of the inaugural Microsoft Certifed Masters SQL course last week in Redmond (a little intense at 8 hours of *teaching* per 10-hour day - so no energy for blogging afterwards). Now we're in Vienna, where we're on-site with one of our favorite customers, then we head to Barcelona next week for TechEd EMEA, back to Vegas for SQL Connections the week after, and finally back to Seattle for PASS the following week. Phew! Look for a photo-stuffed Where In The World Are Paul and Kimberly post from Vienna next week.

I'll try to post some interesting stuff that comes up while we're at the conferences - always a good crop of questions.

In this quick post I want to touch on something that's becoming more important to explain as more and more people start managing systems who haven't got a lot of DBA experience (what I like to call involuntary DBAs). How do you plan a backup strategy? As you'll see if you look at my Backup/Restore category, I've got lots of info about specific types of backups but nothing on putting a plan together in the first place.

It's very simple to decide on using the SIMPLE recovery model and regular database backups - the backup schedule is easy. But, what happens when you come to recovering from a disaster? Which backups to you need to restore and how long does it take you? If you take weekly full backups, say, then you stand the chance of losing a lot of data if the disaster occurs just before your next full backup. So, if you switch to the FULL recovery model and add log backups, you can recover right up to the point of the crash.

But again, what backups do you have to restore and how long does it take you? Are you able to restore within the maximum allowable downtime for your business? If you have a 300-GB database, and the downtime allowance is 15 minutes (as with one DBA I've known), the answer is no. Do you need to move to a partitioned schema that makes use of multiple filegroups so that you can use partial database availability to bring your application online faster with only the critical filegroups?

The key point when planning a backup strategy is not to think about what backups you want to take - think about what restores you have to be able to perform, then work backwards from that.

Cheers

It's been almost two weeks since my last post as we've been offline in Scotland (another photo post to follow - I owe you two now...) but now we're back for the crazy Fall conference and teaching season.

The latest installment of my regular Q&A column in TechNet Magazine is available at http://technet.microsoft.com/en-us/magazine/cc895648.aspx. This month I cover the following topics:

  • How backups and restores work and why the times for each may differ
  • The difference between log shipping and database mirroring around BULK_LOGGED operations
  • How other factors apart from log backups can contribute to excessively large transaction logs
  • Why database repair exists and why it shouldn't be used
  • Tracking index usage with sys.dm_db_index_usage_stats

Enjoy!

I woke up this morning and someone had replaced my wife with someone who likes to blog :-). Kimberly's turned over a new leaf and is going to blog much more often - in fact she's blogged 4 times today already. Check out her blog here.

Continuing on the transaction log theme of the last few Search Engine Q&A posts, this one addresses a question I've heard a few times, most recently on an MVP discussion group. Let me paraphrase:

If I have a transaction that inserts a huge amount of data, the transaction log grows to 50-GB. I then rollback the transaction. When I take the next log backup, it's way smaller than 50-GB. What's going on?

Let's see if we can repro the scenario. I've created a database with a 500-MB data file and a 1-MB log file, with 100-MB and 1-MB auto-growth intervals. I want the log to be as small as possible and to grow in small chunks so I can see just how much it *has* to grow by, rather than having a large growth size. Then I set the recovery mode to full and took a database backup to make sure the log won't truncate until it's backed up.

CREATE DATABASE LogSizeTest ON
   
(NAME = N'LogSizeTest',
   
FILENAME = N'C:\SQLskills\LogSizeTest.mdf',
   
SIZE = 512MB,
   
FILEGROWTH = 100MB)
LOG ON 
   
(NAME = N'LogSizeTest_log',
   
FILENAME = N'C:\SQLskills\LogSizeTest_log.ldf',
   SIZE = 1MB,
   
FILEGROWTH = 1MB);
GO

ALTER DATABASE LogSizeTest SET RECOVERY FULL;
GO

BACKUP DATABASE LogSizeTest TO DISK = 'C:\SQLskills\LogSizeTest.bak';
GO

Let's check the size of the log:

DBCC SQLPERF (LOGSPACE);
GO

Database Name  Log Size (MB) Log Space Used (%) Status
-------------- ------------- ------------------ -----------
LogSizeTest    0.9921875     36.66339           0

This gives back info for all databases, I've trimmed down the output just for the LogSizeTest database.

Now I'm going to create a table, start an explicit transaction and add about 500-MB of info to the table.

USE LogSizeTest;
GO
SET NOCOUNT ON;
GO
CREATE
TABLE Test (c1 INT IDENTITY, C2 CHAR (8000) DEFAULT (REPLICATE ('a', 8000)));
GO

BEGIN TRAN;
GO

DECLARE @count INT;
SELECT @count = 0;
WHILE (@count < 64000)
BEGIN
   
INSERT INTO Test DEFAULT VALUES;
   
SELECT @count = @count + 1;
END;
GO

Checking the log file size again gives:

DBCC SQLPERF (LOGSPACE);
GO

Database Name  Log Size (MB) Log Space Used (%) Status
-------------- ------------- ------------------ -----------
LogSizeTest    703.9922      99.98737           0

The log size has grown to about 700-MB, way more than the size of the data I was inserting, and it's completely full. Now let's rollback the transaction and check the log size again.

ROLLBACK TRAN;
GO

DBCC SQLPERF (LOGSPACE);
GO

Database Name  Log Size (MB) Log Space Used (%) Status
-------------- ------------- ------------------ -----------
LogSizeTest    703.9922      85.21268           0

The size of the log file is the same, but the percentage used has actually gone down! How can that happen? Let's take a backup and checkout it's size:

BACKUP LOG LogSizeTest TO DISK = 'C:\SQLskills\LogSizeTest_log.bck';
GO
RESTORE HEADERONLY FROM DISK = 'C:\SQLskills\LogSizeTest_log.bck';
GO

The BackupSize in the output from the RESTORE HEADERONLY is 631454208, which is 602.2-MB. Taking the numbers from the DBCC SQLPERF output above, 85.21268% of 703.9922-MB is 599.89-MB - so the backup is roughly the same size as the used transaction log. That's what I'd expect, but why is it smaller than the total size of the transaction log?

So what's going on? Why did the transaction log need to grow so much larger than it needed to, and why did the percentage used actually *drop* after the transaction rolled back?

The answer is in the way the transaction log works. Whenever a logged operation occurs in a transaction, there is some transaction log space reserved in case the transaction rolls back. The idea is that there's always enough space available in the transaction log for a transaction to roll back, without having to grow the transaction log and potentially have that fail. If a transaction could not roll back successfully because the log didn't have enough space, the database would become transactionally inconsistent, would be taken offline and the state changed to SUSPECT.

The behavior we saw was the Storage Engine reserving transaction log space for a potential roll back. When the roll back occured, the transaction log records necessary to undo the effects of the transaction (called compensation log records) are created and written to the log. The issue is that they usually don't take up as much space as the Storage Engine reserved, as it tends to be very conservative in its estimates of how much log space to reserve, to avoid the potential for SUSPECT databases. This explains the difference between the various sizes and percentages we saw above.

The Storage Engine code to do the reservations is quite interesting - I remember fixing a couple of bugs in it during SQL Server 2000 development in 1999 while I was getting to know the internals of the logging and recovery system before tackling some of the (since removed) log-reading code in DBCC CHECKDB in SQL Server 2000.

Anyway, there you have it. Log space reservation is the answer, and is also one of the reasons why it can be tricky to estimate how large a transaction log should be when a database is created.

Moving databases around is pretty common, as is moving databases between servers running different Editions of SQL Server, especially during a disaster recovery situation. You may not know this, but in SQL Server 2005, if you had partitioning anywhere in a database, you could only attach/restore that database using an Enterprise or Developer instance. I say 'you may not know this' because partitioning isn't really widely used (from what I can gather from talking to customers). The upshot of this is that SQL Server 2005 will refuse to attach/restore a database with partitioning in - even if you're in a disaster recovery situation and the only server you have available has Standard Edition.

In SQL Server 2008, a lot more people are going to bump into this issue. The list of features that are Enterprise only, and will prevent attach/restore using a lower edition has expanded to include 3 new features that WILL be much more commonly used than partitioning. The four features that are in this category are:

  • Data compression
  • Partitioning
  • Transparent data encryption
  • Change data capture

All of these require elevated permissions to enable EXCEPT data compression, which only requires ALTER permission on a table. This means someone with table-owner privileges could enable compression without the DBA knowing, and suddenly the database can't be attached/restored to, say, Standard Edition.

If you're a DBA and have just taken over a database, there's now an easy way to tell whether the database contains these features. A new DMV has been added - sys.dm_db_persisted_sku_features - that will report which of these four features are present in a database. Let's check it out.

Using a 2008 Enterprise instance:

CREATE DATABASE EnterpriseOnly;
GO
USE EnterpriseOnly;
GO

CREATE TABLE compressed (c1 INT) WITH (DATA_COMPRESSION = ROW);
GO

SELECT * FROM sys.dm_db_persisted_sku_features;
GO

feature_name    feature_id
--------------  -----------
Compression     100

Now let's try backing up and restoring the database on a different edition:

BACKUP DATABASE EnterpriseOnly TO DISK = 'C:\SQLskills\EnterpriseOnly.bck';
GO

And on a 2008 Express instance:

RESTORE DATABASE EnterpriseOnly FROM DISK = 'C:\SQLskills\EnterpriseOnly.bck'
   
WITH MOVE 'EnterpriseOnly' TO 'C:\SQLskills\EnterpriseOnly.mdf',
   
MOVE 'EnterpriseOnly_log' TO 'C:\SQLskills\EnterpriseOnly_log.ldf'
GO

Processed 160 pages for database 'EnterpriseOnly', file 'EnterpriseOnly' on file 1.
Processed 5 pages for database 'EnterpriseOnly', file 'EnterpriseOnly_log' on file 1.
Msg 3167, Level 16, State 1, Line 1
RESTORE could not start database 'EnterpriseOnly'.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.
Msg 909, Level 21, State 1, Line 1
Database 'EnterpriseOnly' cannot be started in this edition of SQL Server because part or all of object 'compressed' is enabled with data compression or vardecimal storage format. Data compression and vardecimal storage format are only supported on SQL Server Enterprise Edition.
Msg 933, Level 21, State 1, Line 1
Database 'EnterpriseOnly' cannot be started because some of the database functionality is not available in the current edition of SQL Server.

Now, it's cool that it tells you exactly why the database couldn't be restored, but did you notice the first two lines of output? The database is fully restored BEFORE the operation fails! This makes perfect sense, as the database needs to be fully recovered before the server can tell whether any of the four features are still enabled or not. However, in a disaster recovery situation, waiting many hours for a database to restore only to be told that it can't be restored on this instance would be even more disastrous.

To summarize, you should always know what's happening in databases you manage. If portability of your databases to a lower Edition is important, make sure that none of these features can be enabled without you knowing about it.

There's been a spate of problems on the forums and in my inbox with people's transaction logs filling up. With the increase in "involuntary DBAs", there are more people being thrust into the role of database administration without being given the chance to learn the skills, tips, and tricks necessary. One of the most common problems I see is the tranaction log filling up and the database grinding to a halt, and the most common cause of this is the lack of log backups when in the FULL recovery mode.

The very first post in my Search Engine Q&A series last September dealt with this problem - and showed two examples of filling up a transaction log plus the command to see why a transaction log is full (examining the log_reuse_wait_desc column in master.sys.databases). See here for more details. One thing I don't go into too much detail about in that post is the way that transactional replication or database mirroring can cause the log to grow. Both of these technologies rely on reading the transaction log for an unbroken stream of changes to be sent to the secondary systems (but with slightly different mechanisms) - but both will cause the log to grow if there is transaction log that hasn't yet been read (usually because of a performance issue). The same thing applies to the new Change Data Capture technology in SQL Server 2008 - it uses the transactional replication log reader to harvest changes from the log. The addition of any of these technologies could cause your log to grow excessively.

So it's grown too big - now what? Let's deal with the case where you've taken a full database backup in the FULL recovery mode but haven't taken any log backups. There are a number of options:

  1. Take a log backup! This might be easier said than done if you're like the customer I once had who had a 10GB database with an (I kid you not) 987GB log file.
  2. Switch to the SIMPLE recovery model (either permanantly or temporarily). If you're not interested in point-in-time recovery, there's no need for you to be in the FULL recovery mode (unless you're using a feature that requires it, like database mirroring, in which case you have no choice but to start taking log backups)
  3. In SQL Server 2005 and before, you can manually throw away the log contents - see my blog post here. This is not advisable.
  4. Grow the log file. This really just addresses the symptom but not the cause - the log will fill up again eventually.

Note that options #1-3 will just remove the restriction that the log contents are still needed (called "clearing" or "truncating" the log). None of these will cause the log file to shrink. The ONLY way to change the size of the log file is to use DBCC SHRINKFILE. After doing one of #1-3, you must manually shrink the log file using DBCC SHRINKFILE. If you do #2, and then go back into the FULL recovery mode, you should immediately take a full database backup. After shrinking the log file, you should manually grow it again using ALTER DATABASE so that it doesn't suffer repeated auto-grows and lead to performance issues with the log (see Kimberly's post here).

This topic is really a two-hour lecture in the database maintenance class I teach, but at least here (and with the blog post links) I've given you a bit of insight into why it can happen and how to solve the problem.

I've had a few follow-ups on my two posts about boot page and file header page corruption - asking if its possible to do single-page restore operations for these pages. Let's try:

CREATE DATABASE BootPageTest;
GO

-- Single page restore is only possible using the FULL recovery model
ALTER DATABASE BootPageTest SET RECOVERY FULL;
GO

BACKUP DATABASE BootPageTest TO DISK = 'C:\sqlskills\BootPageTest.bck';
GO
BACKUP LOG BootPageTest TO DISK = 'C:\sqlskills\BootPageTest.trn';
GO

RESTORE DATABASE BootPageTest PAGE = '1:9' FROM DISK = 'C:\sqlskills\BootPageTest.bck';
GO

Msg 3111, Level 16, State 1, Line 2
Page (1:9) is a control page which cannot be restored in isolation. To repair this page, the entire file must be restored.
Msg 3013, Level 16, State 1, Line 2
RESTORE DATABASE is terminating abnormally.

The answer is no. The following page types cannot be restored using single-page restore:

  • File header pages (see here)
  • Boot page (see here)
  • GAM, SGAM, DIFF map, ML map pages (see here)

As you may already know, instant initialization is a way to prevent data (not log) file create and grow operations having to zero-initialize the new space before allowing it to be used. This can vastly speed up these operations as zero-initialization can take a *long* time for large files. This is especially useful in disaster recovery operations as the first phase of a restore is always to create the requisite files, if they don't already exist. Cutting minutes or even hours from this phase can significantly reduce downtime. It's available on XP SP2 and Windows Server 2003 and above. You can get more details from a blog post of Kimberly's from March 2007.

The way to enable it is to give the SQL Server service account the 'Perform volume maintenance tasks' privilege and then restart the service. There's no way to enable or disable it from within SQL Server, and until now, no way I've known of to tell whether it's enabled from within SQL Server. I was teaching a Microsoft-internal class on Database Maintenance last week and one of the students came up with a neat way to tell - using xp_cmdshell to execute the whoami /priv command, which lists all the privileges that SQL Server service account has.

The whoami command is available on Windows Server machines but for XP you need to download the support tools from Microsoft to get it to work, as I did on my laptop. You can get them here (5MB download) and you need to stop/start SQL Server on XP after installing them so it picks up the new tools path. Now, most people will be running with xp_cmdshell turned off, because of the security risks involved with enabling it, so here's a script that turns it on, checks the privileges, and turns it off again.

EXEC sp_configure 'xp_cmdshell', 1;
GO
RECONFIGURE WITH OVERRIDE;
GO

CREATE TABLE #xp_cmdshell_output (Output VARCHAR (8000));
GO

INSERT INTO #xp_cmdshell_output EXEC ('xp_cmdshell ''whoami /priv''');
GO

IF EXISTS (SELECT * FROM #xp_cmdshell_output WHERE Output LIKE '%SeManageVolumePrivilege%')
   
PRINT 'Instant Initialization enabled'
ELSE
   
PRINT 'Instant Initialization disabled';
GO

DROP TABLE #xp_cmdshell_output;
GO

EXEC sp_configure 'xp_cmdshell', 0;
GO
RECONFIGURE WITH OVERRIDE;
GO

Enjoy!

Well, we're just back from vacation (photo blog post to follow) and I've heard that the feature article on Effective Database Maintenance I wrote for the August issue of TechNet Magazine is live on the web. It also includes a 5 minute long screencast I recorded where I demo the effect of database shrink on index fragmentation.

You can get to the article at http://technet.microsoft.com/en-us/magazine/cc671165.aspx. The topics covered are:

  • Managing data and transaction log files
  • Eliminating index fragmentation
  • Ensuring accurate, up-to-date statistics
  • Detected corrupted database pages
  • Establishing an effective backup strategy

It's written around 2-300 level and presents a good overview (well, at least I think so :-)) of the concepts involved.

Also, the August SQL Q&A column is available at http://technet.microsoft.com/en-us/magazine/cc671180(TechNet.10).aspx. This month's topics on the web (more in the print magazine) are:

  • Database version changes with upgrades
  • Benefits of partitioning
  • Consistency checking options for VLDBs

Enjoy!

Over the last few weeks I've seen (and helped correct) quite a few myths and misconceptions about index rebuild operations. There's enough now to make it worthwhile doing a blog post (and it's too hot here in Orlando for us to go sit by the pool so we're both sitting here blogging)...

Myth 1:  index rebuild pre-allocates the necessary space

This myth has two variations:

  1. The space for the new copy of the index is pre-allocated
  2. The space for the sort portion of the rebuild is pre-allocated

Neither of these are true. Index rebuild (whether online or offline, and at least as far back as 7.0) will create a new copy of the index before dropping the old copy. The pages and extents required to do this will always be allocated as needed, as with any other operation in SQL Server. The sort phase of an index rebuild, if required (in certain cases it is skipped in 2005), will adhere to the same allocation behavior.

Myth 2: indexes are rebuilt within a single file in a multi-file filegroup

This is a new one that I just heard yesterday - (paraphrasing) "In a two-file filegroup, an index in file 1 will be rebuilt into file 2. The next time it is rebuilt, it will be built in file 1. And so on".

This is untrue. Any time any allocations are done in a multi-file filegroup, the allocations are spread amongst all the files using the allocation system's proportional fill algorithm. In a nutshell, this says that space will be allocated more frequently from larger files with more free space than from smaller files with less free space. There is no concept in SQL Server of limiting allocations to a particular file in a multi-file filegroup.

Myth 3: non-clustered indexes are always rebuilt when a clustered index is rebuilt

This is untrue. The rules are a little complex here but can be summed up as follows:

  • In 2005+, rebuilding a unique or non-unique clustered index (without changing its definition) will NOT rebuild the non-clustered indexes
  • In 2000:
    • Rebuilding a non-unique clustered index WILL rebuild the non-clustered indexes
    • Rebuilding a unique clustered index will NOT rebuild the non-clustered indexes

The first few service packs of 2000 had bugs that changed the behavior of rebuilding unique clustered indexes back and forth - this is the source of much of the confusion around this myth.

For a much more detailed discussion of this, see my blog post from last Fall - Indexes From Every Angle: What happens to non-clustered indexes when the table structure is changed?.

Myth 4: BULK_LOGGED recovery mode decreases the size of the transaction log and log backups for an index rebuild

This myth is partly true.

Switching to the BULK_LOGGED recovery mode while doing an index rebuild operation WILL reduce the amount of transaction log generated, which is very useful for limiting the size of the transaction log file (note I say 'file', not 'files' - you only need one log file).

Switching to the BULK_LOGGED recovery mode while doing an index rebuild will NOT reduce the size of the transaction log BACKUP. Although the operation will be minimally-logged, the next transaction log backup will read all the transaction log since the last backup plus all the extents that were changed by the minimally-logged index rebuild. This will result in a log backup that's almost exactly the same size as for a fully-logged index rebuild. The ONLY time a log backup will contain data extents is when a minimally-logged operation has taken place since the last log backup - see here on MSDN for more info.

If you're considering using the BULK_LOGGED recovery mode, beware that you lose the ability to do point-in-time recovery to ANY point covered by a transaction log backup that contains even a single minimally-logged operation. Make sure that there's nothing else happening in the database that you may need to effectively roll-back with P.I.T. recovery. The operations you should perform if you're going to do this are:

  • In FULL recovery mode, take log backup immediately before switching to BULK_LOGGED
  • Switch to BULK_LOGGED and do the index rebuild
  • Switch back to FULL and immediately take a log backup

This limits the time period in which you can't do P.I.T. recovery.

Myth 5: online index rebuild doesn't take any locks

This myth is untrue. The 'online' in 'online index operations' is a bit of a misnomer.  Online index operations need to take two very short-term table locks. An S (Shared) table lock at the start of the operation to force all write plans that could touch the index to recompile, and a SCH-M (Schema-Modification - think of it as an Exclusive) table lock at the end of operation to force all read and write plans that could touch the index to recompile.

The most recent time this came up on the forums was someone noticing insert queries timing out after an online index rebuild operation had just started. The problem is that the  table lock that online index rebuild needs has to be entered into the grant queue in the lock manager until it can be acquired - and it will stay there until existing transactions that are holding conflicting locks either commit or roll-back. Any transaction that requires a conflicting lock AFTER the index rebuild lock has been queued but not acquired (and then released) will wait behind it in the lock grant queue. If the query timeout is reached before the transaction can get it's lock, it will timeout.

This is still much better than the table lock being held for the entire duration of the index rebuild operation. For more info, checkout this whitepaper on Online Index Operations in SQL Server 2005.

Over the weekend there was a question on one of the internal aliases at MS: how can I tell what percentage of a database has changed since the last full backup, so I can choose between a differential or full backup?

No such code exists as far as I know - until now! I happened to read the thread while sitting in the airport in Washington D.C. on the way back from Iceland so I started playing around and this morning I completed the code.

The code below creates a function and a stored procedure. The basic idea behind the code is as follows:

For each online data file in the database
   For each GAM interval in the file
      Crack the DIFF map page using DBCC PAGE
      Interpret the DIFF bitmap to aggregate the changed extents
      Add the sum to the total changed extents for the database
   End
End
Report results

There's a function that I create in msdb call SQLskillsConvertToExtents that cracks some of the DBCC PAGE output, and the main procedure is called sp_SQLskillsDIFForFULL and it created as a system object in master. I tried making it a table-valued function but you can't do things like INSERT-EXEC in a function, and that's required for processing the DBCC PAGE output. So - create your own wrapper function or whatever to use it. The interface/output is:

EXEC sp_SQLskillsDIFForFULL 'msdb';
GO

Total Extents Changed Extents Percentage Changed
------------- --------------- ----------------------
102           56              54.9

I've tested it with databases with multiple files and up to around 700GB for a single file size. There's the potential for an issue with file sizes of 4TB and beyond (where PFS intervals and GAM intervals map to the same extent every 4TB or so, but I think it'll be ok and the position of the DIFF map in the extent won't change - if someone could test it with a 4+TB file I'd be grateful). It's been tested on SQL Server 2005 and 2008. It will not work on SQL Server 2000 - I'll do a 2000 version sometime soon.

Note that after doing a full backup you will never see Changed Extents equal to zero. It will always be 4 + (number of online data files - 1), and around 20 or so for msdb. This is because the extent containing the file header in each file is always marked as changed, as are three extents in the primary file containing the roots of some critical system tables.

Anyway - here it is. You can download it in a zip file from SQLskillsDIFForFULL.zip (2.65KB). Enjoy!

/*============================================================================
   
File: SQLskillsDIFForFULL.sql

   Summary: This script creates a system-wide SP SQLskillsDIFForFILL that
   
works out what percentage of a database has changed since the
   
previous full database backup.

   Date: April 2008

   SQL Server Versions:
         
10.0.1300.13 (SS2008 February CTP - CTP-6)
         
9.00.3054.00 (SS2005 SP2)
------------------------------------------------------------------------------
   
Copyright (C) 2008 Paul S. Randal, SQLskills.com
   
All rights reserved.

   For more scripts and sample code, check out 
      
http://www.sqlskills.com/

   You may alter this code for your own *non-commercial* purposes. You may
   
republish altered code as long as you give due credit.

   THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF 
   
ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED 
   
TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
   
PARTICULAR PURPOSE.

============================================================================*/

-- Create the function in MSDB
--
USE msdb;
GO

IF EXISTS (SELECT * FROM sys.objects WHERE NAME = 'SQLskillsConvertToExtents')
   
DROP FUNCTION SQLskillsConvertToExtents;
GO

-- This function cracks the output from a DBCC PAGE dump
-- of an allocation bitmap. It takes a string in the form
-- "(1:8) - (1:16)" or "(1:8) -" and returns the number
-- of extents represented by the string. Both the examples
-- above equal 1 extent.
--

CREATE FUNCTION SQLskillsConvertToExtents (
   
@extents VARCHAR (100))
RETURNS INTEGER
AS
BEGIN
   
DECLARE @extentTotal   INT;
   
DECLARE @colon         INT;
   
DECLARE @firstExtent   INT;
   
DECLARE @secondExtent  INT;

   SET @extentTotal = 0;
   
SET @colon = CHARINDEX (':', @extents);

   -- Check for the single extent case
   --
   IF (CHARINDEX (':', @extents, @colon + 1) = 0)
      
SET @extentTotal = 1;
   
ELSE
      
-- We're in the multi-extent case
      --
      BEGIN
      
SET @firstExtent = CONVERT (INT,
         
SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      
SET @colon = CHARINDEX (':', @extents, @colon + 1);
      
SET @secondExtent = CONVERT (INT,
         
SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      
SET @extentTotal = (@secondExtent - @firstExtent) / 8 + 1;
   
END

   RETURN @extentTotal;
END;
GO

USE master;
GO

IF OBJECT_ID ('sp_SQLskillsDIFForFULL') IS NOT NULL
   
DROP PROCEDURE sp_SQLskillsDIFForFULL;
GO

-- This SP cracks all differential bitmap pages for all online
-- data files in a database. It creates a sum of changed extents
-- and reports it as follows (example small msdb):
--
-- EXEC sp_SQLskillsDIFForFULL 'msdb';
-- GO
--
-- Total Extents Changed Extents Percentage Changed
-- ------------- --------------- ----------------------
-- 102           56              54.9
--
-- Note that after a full backup you will always see some extents
-- marked as changed. The number will be 4 + (number of data files - 1).
-- These extents contain the file headers of each file plus the
-- roots of some of the critical system tables in file 1.
-- The number for msdb may be round 20.
--
CREATE PROCEDURE sp_SQLskillsDIFForFULL (
   
@dbName VARCHAR (128))
AS
BEGIN
   
SET NOCOUNT ON;

   -- Create the temp table
   
--
   
IF EXISTS (SELECT * FROM msdb.sys.objects WHERE NAME = 'SQLskillsDBCCPage')
   
DROP TABLE msdb.dbo.SQLskillsDBCCPage;

   CREATE TABLE msdb.dbo.SQLskillsDBCCPage (
      
[ParentObject] VARCHAR (100),
      
[Object]       VARCHAR (100),
      
[Field]        VARCHAR (100),
      
[VALUE]        VARCHAR (100));

   DECLARE @fileID         INT;
   
DECLARE @fileSizePages  INT;
   
DECLARE @extentID       INT;
   
DECLARE @pageID         INT;
   
DECLARE @DIFFTotal      INT;
   
DECLARE @sizeTotal      INT;
   
DECLARE @total          INT;
   
DECLARE @dbccPageString VARCHAR (200);

   SELECT @DIFFTotal = 0;
   
SELECT @sizeTotal = 0;

   -- Setup a cursor for all online data files in the database
   
--
   
DECLARE files CURSOR FOR
      
SELECT [file_id], [size] FROM master.sys.master_files
      
WHERE [type_desc] = 'ROWS'
      
AND [state_desc] = 'ONLINE'
      
AND [database_id] = DB_ID (@dbName);

   OPEN files;

   FETCH NEXT FROM files INTO @fileID, @fileSizePages;

   WHILE @@FETCH_STATUS = 0
   
BEGIN
      
SELECT @extentID = 0;

      -- The size returned from master.sys.master_files is in
      
-- pages - we need to convert to extents
      
--
      
SELECT @sizeTotal = @sizeTotal + @fileSizePages / 8;

      WHILE (@extentID < @fileSizePages)
      
BEGIN
         
-- There may be an issue with the DIFF map page position
         
-- on the four extents where PFS pages and GAM pages live
         
-- (at page IDs 516855552, 1033711104, 1550566656, 2067422208)
         
-- but I think we'll be ok.
         
-- PFS pages are every 8088 pages (page 1, 8088, 16176, etc)
         
-- GAM extents are every 511232 pages
         
--
         
SELECT @pageID = @extentID + 6;

         -- Build the dynamic SQL
         
--
         
SELECT @dbccPageString = 'DBCC PAGE ('
            
+ @dbName + ', '
            
+ CAST (@fileID AS VARCHAR) + ', '
            
+ CAST (@pageID AS VARCHAR) + ', 3) WITH TABLERESULTS, NO_INFOMSGS';

         -- Empty out the temp table and insert into it again
         
--
         
DELETE FROM msdb.dbo.SQLskillsDBCCPage;
         
INSERT INTO msdb.dbo.SQLskillsDBCCPage EXEC (@dbccPageString);

         -- Aggregate all the changed extents using the function
         
--
         
SELECT @total = SUM ([msdb].[dbo].[SQLskillsConvertToExtents] ([Field]))
         
FROM msdb.dbo.SQLskillsDBCCPage
            
WHERE [VALUE] = '    CHANGED'
            
AND [ParentObject] LIKE 'DIFF_MAP%';

         SET @DIFFTotal = @DIFFTotal + @total;

         -- Move to the next GAM extent
         
SET @extentID = @extentID + 511232;
      
END

      FETCH NEXT FROM files INTO @fileID, @fileSizePages;
   
END;

   -- Clean up
   
--
   
DROP TABLE msdb.dbo.SQLskillsDBCCPage;
   
CLOSE files;
   
DEALLOCATE files;

   -- Output the results
   
--
   
SELECT
      
@sizeTotal AS [Total Extents],
      
@DIFFTotal AS [Changed Extents],
      
ROUND (
         
(CONVERT (FLOAT, @DIFFTotal) /
         
CONVERT (FLOAT, @sizeTotal)) * 100, 2) AS [Percentage Changed];
END;
GO

-- Mark the SP as a system object
--
EXEC sys.sp_MS_marksystemobject sp_SQLskillsDIFForFULL;
GO

-- Test to make sure everything was setup correctly
--
EXEC sp_SQLskillsDIFForFULL 'msdb';
GO

Here's an interesting question that came in to our questions line (questions@SQLskills.com - no guarantee of an answer - I check it every so often):

I have seen demonstrations where a large database being broken down into smaller ones using synonym names. I think it was used on a data warehouse and allowed smaller database backups instead of doing one large one and simulating file group backups. Is there somebody who has worked with this variation and can identify when this would be an advantage over file group backups if there is even an advantage.

My answer will always be to keep the VLDB (Very Large DataBase) as a single unit and go with filegroups if you need to. Breaking the VLDB into smaller databases has some serious issues:

  • Queries become more complicated as they're now potentially cross-database. This means you need to keep all the security settings in all the databases synchronized.
  • Referential integrity becomes a big problem as you can't create foreign key constraints across databases
  • You have multiple transaction logs to manage instead of one. This means you need to be doing log backups of ALL the databases, vastly increasing the number of backup files to manage.
  • Point-in-time recovery becomes very hard as you have to restore ALL the databases to a single point-in-time. Now, this may not be too much of a problem if the data in the VLDB is essentially read-only, and gets updated en-masse every so often from your OLTP system - but for changing data it's a nightmare.
  • Implementing a high-availability solution becomes very challenging. As soon as you start to think of multiple databases that need to be in sync, you can pretty much forget about log shipping and database mirroring. You're going to need whole-instance failure protection - which means failover clustering. Then if you want to mitigate the single-point-of-failure in a failover cluster (the shared disks), you're going to need SAN replication to a remote failover cluster too - expensive!!!

These are just the ones that spring to mind in 5 minutes - I'm sure there are more if I sat and thought about it longer (e.g. how to create a database snapshot, run a consistency check, ...)

So - IMHO it's always going to be easier to backup and restore a single VLDB split into filegroups than a VLDB split into multiple databases.

PS If there's something you'd like to see me do a blog post on, shoot me an email here.

Here's a question that came in - what changed in SQL Server 2005 that allows concurrent log and full backups?

First a little background, in case you didn't know about the change in behavior. In SQL Server 2000, a concurrent log backup with either a full or diff backup (I'll just say 'full' from now on but take it to mean 'full or diff') was not permitted. The reason is that a log backup would clear the inactive portion of the log once it's been backed up, but a full backup may still need some of that log so it can't be cleared (see this post and this post for an explanation). The simple route was taken of disallowing concurrent log backups with fulls.

In SQL Server 2005, the restriction was lifted, but there's a subtle twist. You can do concurrent log backups with fulls BUT the log is not cleared when the log backup ends. The clearing of the inactive portion of the log is delayed until the full backup completes. This could cause you to have disk space problems if your log generation rate is huge and you're relying on very frequent backups to manage the log size.

So - what changed that allowed the SS2000 restriction to be lifted? Nothing - just the code was changed to delay the log clearing and allow the concurrent backups.

Pretty cool change - but watch out for the twist.

Phew - last week Kimberly and I spent 3 days teaching the ins-and-outs of SQL Server 2008 for DBAs/IT-Pros to about 130 Microsoft SQL Server experts and MVPs (like Kalen Delaney, Adam Machanic and Ron Talmage). This was the (95% complete) Beta delivery of a course we've been developing for the last six months for Microsoft that they'll use to train their SQL experts around the world on the new release. It's been very interesting watching the features develop through the CTPs (especially since I left the fold last August) - and making demos work on pre-release builds of the CTPs.

Teaching the course was a *blast* - the thing I love about teaching a really geeky crowd is the plethora of great questions and opportunities for going deep with explanations. Our team actually wrote and delivered the concurrently presented Developer and BI tracks as well. As you can see from the list below (and this is just the features a DBA needs to use/know about), SQL Server 2008 isn't a dot release of Yukon at all, as some people have suggested. Over the three days we covered:

  • Database Mirroring (D)
  • Backup Compression
  • Peer-to-Peer Replication (D)
  • Transparent Data Encryption (D)
  • Extensible (Off-Box) Key Management
  • All Actions Audited (D)
  • Policy-Based Management
  • Resource Governor (D)
  • Extended Events (D)
  • Spatial Indexes
  • Integrated Full-Text Search
  • Sparse Columns (D)
  • Filtered Indexes
  • Change Tracking
  • Change Data Capture (D)
  • FILESTREAM (D)
  • Performance Data Collection
  • Query Optimizer Enhancements
  • Data Compression (D)
  • Service Broker
  • Partition-Level Lock Escalation (D)

The features marked with a (D) are ones I demo'd during the course (Kimberly demo'd a bunch of the others - especially the tools features). Some of the demos were challenging to make work in time as we only got a pre-CTP6 build mid-January just before we headed off to China.

So why am I posting this? Well, a bunch of these features are in CTP-6, which should be just around the corner, and I have some easy-to-understand demos of them that I'll be posting here over the next month or so. Also, if this course sounds interesting, Kimberly and I will be teaching it in various configurations over the next year - starting with SQL Connections in April, a soon-to-be-announced class in Iceland in March, and the ITPro portion of TechEd in June.

Watch this space starting next week (today's the last day of six straight weeks of teaching for us so this weekend's a break :-))

This post is based on one from my old MSDN blog but the topic has come up a few times in recent days so I want to revamp it and re-post.

There are two things that confuse people about mirrored backups - can you mix-n-match backup devices from the mirrors, and what exactly do the various sizes mean?

1) Single-device backup, no mirror

The code below creates a single-device backup with no mirror, and then examines it.

BACKUP DATABASE AdventureWorks TO
DISK
= N'C:\SQLskills\mediaset1device1.bck'
WITH FORMAT, STATS;
GO

RESTORE HEADERONLY FROM DISK = N'C:\SQLskills\mediaset1device1.bck';
GO

The BackupSize in the HEADERONLY output is 168,899,072 bytes and the on-disk size of the file mediaset1device1.bck is 161MB.

2) Single-device backup, mirrored

The code below creates a single-device backup with a mirror, and then examines it.

BACKUP DATABASE AdventureWorks TO
DISK
= N'C:\SQLskills\mediaset1device1.bck'
MIRROR TO DISK = N'C:\SQLskills\mediaset2device1.bck'
WITH FORMAT, STATS;
GO

RESTORE HEADERONLY FROM DISK = N'C:\SQLskills\mediaset1device1.bck';
RESTORE HEADERONLY FROM DISK = N'C:\SQLskills\mediaset2device1.bck';
GO

The BackupSize in the HEADERONLY output of both files is 337,798,144 bytes. This is double the size of the backup in case #1 above - and it because there are now two copies of the backup. The on-disk size of both files is 161MB, which is what we'd expect as mediaset2device1.bck is a copy of mediaset1device1.bck.

3) Two-device backup, no mirror

The code below creates a two-device backup with no mirror, and then examines it.

BACKUP DATABASE AdventureWorks TO
DISK
= N'C:\SQLskills\mediaset1device1.bck',
DISK = N'C:\SQLskills\mediaset1device2.bck'
WITH FORMAT, STATS;
GO

RESTORE HEADERONLY FROM DISK = N'C:\SQLskills\mediaset1device1.bck';
GO

The BackupSize in the HEADERONLY output is 169,959,424 bytes. This is nearly exactly the same as for the single-device backup in case #1, but includes a bit more to account for the extra metadata in the second device. This time, the on-disk size of the file mediaset1device1.bck is 81MB. This is half of the on-disk size from the single-device case #1 as the backup is now split between the two files.

4) Two-device backup, mirrored

The code below creates a single-device backup with a mirror, and then examines it.

BACKUP DATABASE AdventureWorks TO
DISK
= N'C:\SQLskills\mediaset1device1.bck',
DISK = N'C:\SQLskills\mediaset1device2.bck'
MIRROR TO DISK = N'C:\SQLskills\mediaset2device1.bck',
DISK = N'C:\SQLskills\mediaset2device2.bck'
WITH FORMAT, STATS;
GO

RESTORE HEADERONLY FROM DISK = N'C:\SQLskills\mediaset1device1.bck';
RESTORE HEADERONLY FROM DISK = N'C:\SQLskills\mediaset2device1.bck';
GO

The BackupSize in the HEADERONLY output of both files is 339,918,848 bytes - again, double the size of the non-mirrored backup in case #3. The on-disk size of each file is 81MB, as each file is one half of a copy of the backup.

Restoring

Now let's try to mix devices from the two backup media sets and see if it's possible:

RESTORE DATABASE AdventureWorks
FROM DISK = N'C:\SQLskills\mediaset1device1.bck'
,
DISK = N
'C:\SQLskills\mediaset2device2.bck'
WITH REPLACE, STATS
;
GO

And it works fine - excellent! That's the whole point of having mirrored backups.

One other question is - can backup device types can differ between media sets in the same backup. The answer to this is no - as documented in Books Online. All the backup devices involved in a single backup, regardless of whether they're part of a mirror media set or not, must be of the same type and have similar characteristics.

Hope this is useful.

In a previous post (see here) I debunked a myth about how much transaction log a full backup would include. I had a question in the blog post comments that asked (paraphrasing):

The full backup has to include all the transaction log from the begin LSN of the oldest active transaction at the time the data read portion of the backup ends, until the LSN at which the data read portion ends. If that begin LSN is later in time than the LSN of the checkpoint that backup does initially, why does the full backup need to include all thr transaction log between the checkpoint and the begin LSN? What is it used for?

I replied in the comments with a quip that it would be easier to reply with a whiteboard and a timeline - so I got all enthusiastic and created a picture in Powerpoint to help explain better.

Consider the timeline in the picture above for a full backup (the red numbers match the list below):

  1. The backup operation take a checkpoint to force all dirty pages in the buffer pool to disk - both those containing changes from transactions that have committed and those containing changes from transactions that are still in-flight. The backup operation then starts reading the allocated pages in the database.
  2. The read operation reads page X
  3. Transaction A starts
  4. Transaction A makes a change to page X. The copy in the backup is now out-of-date. Note that the backup will not read page X again - it's already passed that point in the database.
  5. Transaction B starts. It won't complete before the data read operation completes so it's begin LSN is the oldest active transaction begin LSN.
  6. Transaction A commits. This commits the changes to page X.
  7. The backup data read operation completes and transaction log reading starts.

Now, the reason that the transaction log is read is so that the restore operation can recover the database so it is transactionally consistent as of the point in time when the read data operation completed.

If the transaction log was only included from the oldest active transaction begin LSN (point 5), then the copy of page X that was restored from the backup (read at point 2) would not be updated with the changes from transaction A (that happened at point 4). This means that it would not be transactionally consistent with the rest of the database as of the time the read data operation completed (point 7).

So, (ignoring replication) the minimum LSN of the transaction log that's included in the full backup is MIN (LSN of last checkpoint, LSN of oldest active transaction). This ensures that recovery can REDO log records to bring pages up-to-date and UNDO log records for transactions that had not committed.

Much easier to explain with aid of a picture than without! :-)

Back in September last year I blogged about the native Backup Compression that is in SQL Server 2008 (see here) and promised to blog more when CTP-5 came out and I ran some tests. Well, it's here and I have so here's some data for you.

I expanded the AdventureWorks database to be 322Mb (random size, but big enough to get a decent sized run-time on my server). I used System Monitor to capture %user-mode CPU time, plus backup/restore throughput for a compressed and uncompressed backup operation, and then restores.

1) For the uncompressed backup the average CPU was 5% (the green line at the bottom), the run-time was 39.5s, and, of course, it took 322Mb to store the backup.

2) For the compressed backup the average CPU was way higher at 25%, BUT the run-time was 21.6s (a 45% improvement), and the backup was stored in 76.7MB (a 4.2x compression ratio). Very cool.

3) For the restore of the uncompressed backup the average CPU was 8%, and the run-time was 71.0s.

4) For the restore of the compressed backup the average CPU was 14.5%, and the run-time was 36s (a 50% improvement).

So - to summarize, turning on compression means more CPU and smaller run-times - just what was expected. Note that if you try this on your database you will see different results - the compression ratio and CPU usage is entirely dependent on the data being compressed.

In the mail today I received notice that my first software patent has been granted by the US Patent Office :-) (after being filed 3.5 years ago while I was still writing DBCC CHECKDB code!) It's basically a way to run DBCC CHECKDB on a database stored in a backup without actually having to restore the whole backup. This is really cool for people with VVVVLDBs as it means you don't need to restore the whole backup to verify that the database stored within it is valid. Anyway - I'm pretty pleased!! I hope the SQL team gets around to implementing it at some point in the future.

If you're interested, you can read it here (warning: some of the legalese is pretty dry...)

Next - some people have been 'complaining' that we haven't been posting recently - we took a break last week to grab some winter sun and some total downtime but we'll be back into blogging in between parties over the holidays.

Finally - to all of you who've followed my blog (the old MS one and my new one here), attended any of our conference sessions or workshops, sent us interesting questions, and just generally been part of the SQL community I love - THANKS!! I hope you and your families have a great Festive Season (whatever you celebrate) and a prosperous New Year!

Best wishes - Paul.

Categories:
Backup/Restore | DBCC | Personal

While we were in Barcelona we sat down with Richard Campbell and Greg Hughes from RunAs Radio to record a 1/2 hour interview on SQL Server 2008. We touch on a ton of different features (look at the number of Categories I've tagged this with!) and have a bunch of laughs along the way - check it out here.

PS There's been a ton of interest in the slide deck idea I had so we'll be going ahead with that. Look for an announcement sometime in the first few months of next year about how to get them. Thanks to everyone that replied!

There's a well known problem that every time a backup operation succeeds a message is written into the error log and Windows event log. If you're taking very frequent log backups (say every 5 minutes) of multiple databases, that's a significant amount of clutter in the logs. Well - now there's a fix!

Kevin Farlee, the Storage Engine PM responsible for (among many other things) BACKUP and RESTORE has just blogged about a trace flag - 3226 - that's been in the product since SQL Server 2000 that will suppress the success messages. He's planning to document this (and other) trace flags in this area starting in SQL Server 2008. Excellent!

I've been involved in a few conversations today that have highlighted some big misconceptions about how backups work. I'd like to use this blog post to debunk them. I checked everything I say here with my friend Steve Schmidt, the developer on the Storage Engine team responsible for BACKUP/RESTORE for the last ten years.

Myth 1: A full database backup only contains the transaction log from the start of the backup to the end of the backup

When you restore a full database backup, you get a transactionally consistent database. Consider the case where there's an active transaction that doesn't commit until after the backup completes. If the backup only contained the log that occured while the database was being backed up, how would it roll back the active transaction. It *has* to include enough transaction log to roll back the active transaction. The start LSN of the log included in a database backup is the minimum of:

  • LSN of the last checkpoint
  • LSN of the start of the oldest active transaction
  • LSN of the last replicated transaction

Let me prove it to you. I'm going to create a database, put it into FULL recovery mode, start a transaction, checkpoint, and then take a backup. The checkpoint ensures the page I've altered is flushed to disk.

CREATE DATABASE stopattest;

GO

ALTER DATABASE stopattest SET RECOVERY FULL;

GO

BACKUP DATABASE stopattest TO DISK = 'c:\sqlskills\stopattest.bck' WITH INIT;

GO

USE stopattest;

GO

CREATE TABLE t1 (c1 INT);

GO

BEGIN TRAN;

INSERT INTO t1 VALUES (1);

GO

Now in another connection I'll take another full database backup.

BACKUP DATABASE stopattest TO DISK = 'c:\sqlskills\stopattest.bck' WITH INIT;

GO

The msdb.dbo.backupmedia table will tell us the relevant LSNs in the backup (I added some spaces to delineate the prts of the LSN for clarity):

SELECT last_lsn, checkpoint_lsn, database_backup_lsn FROM msdb.dbo.backupset

WHERE database_name = 'stopattest';

GO

last_lsn              checkpoint_lsn        database_backup_lsn
--------------------- --------------------- ---------------------
21 0000000190 00001   21 0000000174 00037    21 0000000058 00037

[Edit: After swapping some email with Kalen Delaney, I realized that when I originally put this together I had more log records in the post and when I removed them I messed up the description of the (21:174:37) LSN - its now corrected below)

So you can see the checkpoint that begins the backup was at (21:174:37). The LSN of the first log record that the backup contains is (21:58:37), which is before the start of the backup. And the backup contains all the log from then until (21:190:1). Now let's look at the actual transaction log to see what these LSNs correspond to.

SELECT [Current LSN], Operation, [Transaction Name] FROM fn_dblog (null,null);

GO

Here's some selected output:

Current LSN              Operation        Transaction Name
------------------------ ---------------- ------------------
00000015:0000003a:0025   LOP_BEGIN_CKPT   NULL
                (this is the calculated minimum LSN the backup must contain (21:58:37) - which is (15:3a:25) in hex)
.
.
00000015:00000061:0001   LOP_BEGIN_XACT   user_transaction
                (here's my transaction starting - before the backup started but within the LSN range contained in the backup)
.
.
00000015:000000ab:0004   LOP_BEGIN_XACT   Backup:InvalidateDiffMaps
                (this is the backup clearing the differential bitmaps)
.
.
00000015:000000ae:0025  LOP_BEGIN_CKPT    NULL
                (this is the checkpoint that BACKUP does - matching the checkpoint LSN above)
.
.

So - this clearly shows that the backup contains more than just the log from the time the backup was running.

Myth 2: It's possible to do a STOPAT with only a full database backup

This myth is that its possible to use STOPAT with a full database backup to stop during the time the backup was being taken. The argument FOR this myth is that the backup contains the log for all the changes that happened while the backup was being taken, so it must be possible to stop at any point in time. Technically, that's correct, but in practice it's wrong - you cannot stop at a point while the backup was running, using only the database backup.

This one's more complicated to explain. Doing a STOPAT operation means getting the database to a state where operations later than the time or LSN specified in the STOPAT clause haven't affected the database yet. A database backup reads pages that may or may not have been changed while the backup was running. If they are changed, it could be at any point while the backup is running.

Consider the case where page X is changed at LSN (10:45:12), *just* before the backup completes and is read by the backup at the time equivalent to LSN (10:45:13). The backup will contain the changed page image, plus the log record for the change. What if I want to stop at a point while the backup was running but *before* the change to page X, say at LSN (10:44:00). The backup only contains the image of page X at LSN (10:45:12) - how can it be put back to the image at the time we want to stop at? The argument is that we have the log record for the change - can't SQL Server just undo it?

No. It won't even see it. STOPAT works by recovering the database up to the point that the STOPAT specified. If we ask to stop at LSN (10:44:00), then the log will only be read and recovered up to that point. However, because the database backup didn't read page X until LSN (10:45:13), it only has the image of it from when it was altered at (10:45:12). This clearly won't give a database image as of (10:44:00).

The only way to stop at a particular time/LSN, is to have images of *all* database pages from before that time/LSN (i.e. from the *previous* database backup) and then restore all the transaction logs up to and including the time/LSN to stop at.

Hopefully that makes sense.

(I've heard from many of you that the Comments feature of my blog isn't working. I know - there's an issue with our blog engine that we're fixing. My apologies - I'll post a quick note when it's fixed.)

After posting last week about a BACKUP feature that I don't like (WITH NO_LOG - see here), I thought I'd do a quick post this week about a feature that was introduced in SS2005 for BACKUP that I DO like - the COPY_ONLY option to BACKUP DATABASE and BACKUP LOG.

Here's a situation I've seen several times that really screws people up. A savvy DBA of a busy web-fronted sales business has a rigorous backup schedule setup - daily full backups at midnight and differential backups every 4 hours. Everything's working perfectly. One day a disaster strikes at 7pm and the storage for the database is destroyed. The DBA starts restoring the backups using WITH NORECOVERY, gets to the noon differential backup and gets the following message:

RESTORE DATABASE production FROM DISK = 'c:\sqlskills\production-diff12pm.bck' WITH NORECOVERY;

GO

Msg 3136, Level 16, State 1, Line 1
This differential backup cannot be restored because the database has not been restored to the correct earlier state.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.

Uh-oh. That's not good. That says that the 12pm differential backup does not have the same differential base as the 4am and 8am ones. How can that have happened? The DBA takes a look in the backup history tables in msdb (stripping out all the log backups):

SELECT name, backup_start_date, type, first_lsn, database_backup_lsn

FROM msdb.dbo.backupset WHERE database_name = 'production';

GO

name                           backup_start_date       type first_lsn            database_backup_lsn
------------------------------ ----------------------- ---- -------------------- --------------------
production Full 10/14/07       2007-10-14 00:00:00.000 D    88000000025300001    0
production Diff 4am 10/14/07   2007-10-14 04:00:00.000 I    118000000003000160   88000000025300001
production Diff 8am 10/14/07   2007-10-14 08:00:00.000 I    144000000070500160   88000000025300001
NULL                           2007-10-14 10:29:50.000 D    161000000056100147   88000000025300001
production Diff 12pm 10/14/07  2007-10-14 12:00:00.000 I    161000000062800034   161000000056100147
production Diff 4pm 10/14/07   2007-10-14 16:00:00.000 I    173000000054100144   161000000056100147

Aha! Look at the highlighted date in the output - someone took a full database backup of the database at 10.29am. The DBA checks and finds that one of the developers wanted a copy of the production database to play with so took a database backup. He restored the backup and then deleted both it and the database. Looking at the database_backup_lsn field, we can see that all the backups up till the accidental backup (look at the LSNs highlighted blue) have the differential base equal to the first_lsn of the full backup from midnight. The two backups after that have the differential base equal to the first_lsn of the accidental full backup (the LSNs highlighted red).

Oops! That means that the production database cannot be rolled forward any further than the last log backup before the accidental full backup was taken - losing more than 8 hours of data completely. Even though all the subsequent backups are intact, the initial full backup for them no longer exists so they're useless!

So how can a developer get a copy of the database without screwing up a recovery from a potential disaster? Using the new COPY_ONLY option. Taking a full backup with this option does not make the new backup a differential base - it does not clear any of the differential bitmaps and basically doesn't interfere with the regularly scheduled backups. Apart from that, it's a regular full backup of the database. One thing to bear in mind is that it's a one-off - you can't use one of these backups as a differential base, so you can't take COPY_ONLY differential backups. If you specify COPY_ONLY with DIFFERENTIAL, the COPY_ONLY option is ignored.

One other cool thing is that you can specify this option for a BACKUP LOG command too. This behaves the same way - it takes a log backup, but does not change the transaction log at all (i.e. it doesn't make any portion of the log inactive and permit log truncation). This is useful for doing online file restores without having the necessary backup of the tail of the log affect the log backup chain. More on that in a later post...

[Edit: In the initial version of this post, the DBA's backup strategy included log backups. I went through a couple of versions of this post before settling on full + diffs, but I forgot to remove the reference to log backups. In the first comment, Mark House correctly points out that an accidental full backup doesn't prevent a DBA with a complete log backup chain from recovering to any point in time. Apologies for the confusion!]

All the razzamatazz about new releases go on about what's new but hardly ever is there discussion of what's been removed. So that's the topic of this short post.

In the Books Online that comes with the July CTP of SS2008 (here's a link to the download page), the topic Discontinued Database Engine Functionality in SQL Server 2008 lists the following:

Category Discontinued feature Replacement

Aliases

sp_addalias

Replace aliases with a combination of user accounts and database roles. For more information, see CREATE USER (Transact-SQL) and CREATE ROLE (Transact-SQL). Remove aliases in upgraded databases by using sp_dropalias (Transact-SQL).

Backup and restore

DUMP statement

BACKUP

Backup and restore

LOAD statement

RESTORE

Backup and restore

BACKUP LOG WITH NO_LOG

None. The transaction log is automatically truncated when the database is using the simple recovery model. If you must remove the log backup chain from a database, switch to the simple recovery model.

Backup and restore

BACKUP LOG WITH TRUNCATE_ONLY

None. The transaction log is automatically truncated when the database is using the simple recovery model. If you must remove the log backup chain from a database, switch to the simple recovery model.

Backup and restore

BACKUP TRANSACTION

BACKUP LOG

Backup and restore

sp_helpdevice

Query the sys.backup_devices catalog view.

Compatibility level

60, 65, and 70 compatibility levels

Databases must be set to at least compatibility level 80.

DBCC

DBCC CONCURRENCYVIOLATION

None

Groups

sp_addgroup

Use roles.

Groups

sp_changegroup

Use roles.

Groups

sp_dropgroup

Use roles.

Groups

sp_helpgroup

Use roles.

Sample databases

Northwind and pubs

Use AdventureWorks. Northwind and pubs are available as downloads, or can be copied from a previous installation of SQL Server.

For more information, see AdventureWorks Sample Databases.

Most of these aren't going to bother people much I would guess but removing BACKUP LOG WITH NO_LOG is going to cause problems. Don't get me wrong, I think it's great that it's being removed (see my previous post for why I think it's evil) but it's going to break a lot of people who rely on it now for managing their transaction logs instead of taking log backups or switching to SIMPLE recovery mode.

This has come up several times over the last few days, and is something that Kimberly and I discuss (and deplore) when we lecture - the use of BACKUP LOG WITH NO_LOG (or TRUNCATE_ONLY - they're synonymous) to allow log truncation.

How is it used?

The common use is when the transaction log grows to be inordinately large on a database that's in FULL (or BULK_LOGGED) recovery mode. This happens after a database backup has been taken, which switches the log into a mode where it won't truncate until its been backed up. In these circumstances, if you don't take a transaction log backup, the log will continue to grow until it runs out of disk space.

The alternative to taking a real log backup is to issue a BACKUP LOG dbname WITH NO_LOG command. Let's see this in action - note that I'm not advocating its use but I want to show you what it does. First off I'll create some transaction log after taking a full database backup:

USE nologtest;
GO

BACKUP DATABASE nologtest TO DISK = 'c:\sqlskills\nologtest.bck' WITH INIT;
GO

CREATE TABLE t1 (c1 INT, c2 VARCHAR (8000))
GO

SET NOCOUNT ON
GO

DECLARE @a INT;
SELECT @a = 1;
WHILE (@a < 10000)
BEGIN
   
INSERT INTO t1 VALUES (@a, REPLICATE ('a', 8000));
   
SELECT @a = @a + 1;
END;
GO

How large is the transaction log now?

SELECT name, size FROM sys.database_files;
GO

name            size
--------------- --------
nologtest       90264
nologtest_log   104128

A little bit larger than the data file (which is what I'd expect after the operation I just performed) and they're both around 100MB. I'll pretend that I haven't been paying attention to the size of the database and log and now I don't have any space to perform a backup. Can I just shrink the log?

DBCC SHRINKFILE (nologtest_log, 2);
GO

Cannot shrink log file 2 (nologtest_log) because all logical log files are in use.
DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages
------ ----------- ----------- ----------- ----------- --------------
11     2           102944      63          102944      56

No. What's stopping me (well DBCC SHRINKFILE just told me, but let's double-check)?

SELECT log_reuse_wait_desc FROM sys.databases WHERE NAME = 'nologtest';
GO

LOG_BACKUP

Ok - so I can't perform a backup so I'll use BACKUP LOG WITH NO_LOG.

BACKUP LOG nologtest WITH NO_LOG;
GO

SELECT log_reuse_wait_desc FROM sys.databases WHERE NAME = 'nologtest';
GO

NOTHING

Now it looks like I can do the shrink:

DBCC SHRINKFILE (nologtest_log, 2);
GO

DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages
------ ----------- ----------- ----------- ----------- --------------
11     2           256         63          256         56

Success!

Hold on, are you sure?

Why is it bad?

Do you realize what just happened? We discarded the contents of the transaction log since the last full/differential/log backup. That means that until the next full or differential backup completes, all work since the last backup completed will be lost if the database is destroyed. So if your business relies on not losing data/work in the event of a disaster - then what I just did is sacrilege!

The whole point of FULL (and BULK_LOGGED) recovery modes is to preserve the transaction log so that the database can be recovered right up to the point of a failure, or to do point-in-time restores. Using BACKUP LOG WITH NO_LOG negates all of that. (For completeness, note that you can't do a point-in-time restore to any time in a transaction log backup containing a bulk-logged transaction).

What are the alternatives?

If you don't want the FULL recovery mode behavior, then don't use FULL recovery mode - switch to SIMPLE mode. But be aware that you lose the ability to restore to any point in between your full or differential backups.

If you want the FULL recovery mode behavior, but don't want to run out of log space - then the answer is again simple (pun intended :-), take log backups! If your log is growing too quickly, take more frequent log backups! If you don't have enough disk space to store the log backups, talk to your management about the options - buying more disks and continuing to use the FULL recovery options, or being forced to move to SIMPLE recovery mode.

One thing to bear in mind - you actually have to monitor the size of your log to tell whether its growing. That's what gets people into trouble in the first place - a combination of:

  • Not knowing that the database is in FULL recovery mode
  • Not tracking the size of the log

The problem comes for those people who just use an app and have no idea that the developer put the database (maybe for a web application log) into FULL recovery mode, or even that they have SQL Server on their system. For these people there is no good answer unfortunately apart from educating developers.

How to prevent it being used

If you're a sysadmin who wants to stop your database owners and backup operators from using the NO_LOG option, there is way to do it. Trace flag 3231 in SS2000 and SS2005 will turn the NO_LOG and TRUNCATE_ONLY options into no-ops in FULL/BULK_LOGGED recovery mode, and will clear the log in SIMPLE recovery mode. Trace flag 3031 in SS2005 turns them in checkpoints in all recovery modes. These trace flags are undocumented but they're allowed to be publicized.

This way no-one can truncate the log unless you allow them to, and when they come to you to complain, you can educate them!

This is another question that came up on the Disaster Recovery forum on MSDN. Paraphrasing - ' I have a backup file containing full backups for 45 databases. How can I restore them all using a script?'

The answer is pretty straightforward. Let's create the situation described, using 3 databases for clarity rather than 45:

BACKUP DATABASE tinylogtest TO DISK = 'c:\sqlskills\mixedbackups.bck' WITH NAME = 'tinylogtest 10/09/07', INIT;

BACKUP DATABASE pagesplittest TO DISK = 'c:\sqlskills\mixedbackups.bck' WITH NAME = 'pagesplittest 10/09/07';

BACKUP DATABASE ghostrecordtest TO DISK = 'c:\sqlskills\mixedbackups.bck' WITH NAME = 'ghostrecordtest 10/09/07';

GO

You now can't do a regular restore from that backup file for any database except the first one in the file:

RESTORE DATABASE pagesplittest FROM DISK = 'c:\sqlskills\mixedbackups.bck' WITH REPLACE;

GO

Msg 1834, Level 16, State 1, Line 1

The file 'c:\tinylogtest\tinylogtest.mdf' cannot be overwritten. It is being used by database 'tinylogtest'.

Msg 3156, Level 16, State 4, Line 1

File 'fgt_mdf' cannot be restored to 'c:\tinylogtest\tinylogtest.mdf'. Use WITH MOVE to identify a valid location for the file.

Msg 1834, Level 16, State 1, Line 1

The file 'c:\tinylogtest\tinylogtest.ldf' cannot be overwritten. It is being used by database 'tinylogtest'.

Msg 3156, Level 16, State 4, Line 1

File 'fgt_log' cannot be restored to 'c:\tinylogtest\tinylogtest.ldf'. Use WITH MOVE to identify a valid location for the file.

Msg 3119, Level 16, State 1, Line 1

Problems were identified while planning for the RESTORE statement. Previous messages provide details.

Msg 3013, Level 16, State 1, Line 1

RESTORE DATABASE is terminating abnormally.

So how to restore the other databases apart from the first one? The answer is that you need to specify which of the backups within the file you're after. You can find the positions using the RESTORE HEADERONLY command:

RESTORE HEADERONLY FROM DISK = 'c:\sqlskills\mixedbackups.bck';

GO

This returns lots of information:

and then a whole bunch more columns and then ending with:

For our purposes, the interesting things to note are the Position, DatabaseName, and BackupTypeDescription. In our simple example we have 3 full backups. To restore the pagesplittest database as we tried to do above, we have to specifiy the position within the file of the backup we want to restore - just the database name isn't enough.

RESTORE DATABASE pagesplittest FROM DISK = 'c:\sqlskills\mixedbackups.bck' WITH REPLACE, FILE = 2;

GO

And it works. So the question asked how to do this, plus how to do it in a script. Below is a script I've adapted from the example I wrote for the Books Online for DBCC SHOWCONTIG back in 1999 when I rewrote DBCC SHOWCONTIG for SQL Server 2000.

Enjoy!

-- Create a temporary table to hold the output from RESTORE HEADERONLY

CREATE TABLE master.dbo.restoreheaderonly (

BackupName NVARCHAR (128), BackupDescription NVARCHAR (255), BackupType SMALLINT, ExpirationDate DATETIME,

Compressed TINYINT, Position SMALLINT, DeviceType TINYINT, UserName NVARCHAR (128), ServerName NVARCHAR (128),

DatabaseName NVARCHAR (128), DatabaseVersion INT, DatabaseCreationDate DATETIME, BackupSize NUMERIC (20, 0),

FirstLSN NUMERIC (25, 0), LastLSN NUMERIC (25,0), CheckpointLSN NUMERIC (25,0), DatabaseBackupLSN NUMERIC (25, 0),

BackupStartDate DATETIME, BackupFinishDate DATETIME, SortOrder SMALLINT, CodePage SMALLINT, UnicodeLocaleId INT,

UnicodeComparisonStyle INT, CompatibilityLevel TINYINT, SoftwareVendorId INT, SoftwareVersionMajor INT,

SoftwareVersionMinor INT, SoftwareVersionBuild INT, MachineName NVARCHAR (128), Flags INT, BindingID UNIQUEIDENTIFIER,

RecoveryForkID UNIQUEIDENTIFIER, Collation NVARCHAR (128), FamilyGUID UNIQUEIDENTIFIER, HasBulkLoggedData BIT,

IsSnapshot BIT, IsReadOnly BIT, IsSingleUser BIT, HasBackupChecksums BIT, IsDamaged BIT, BeginsLogChain BIT,

HasIncompleteMetaData BIT, IsForceOffline BIT, IsCopyOnly BIT, FirstRecoveryForkID UNIQUEIDENTIFIER,

ForkPointLSN NUMERIC (25, 0) NULL, RecoveryModel NVARCHAR (60), DifferentialBaseLSN NUMERIC (25, 0) NULL,

DifferentialBaseGUID UNIQUEIDENTIFIER, BackupTypeDescription NVARCHAR (60), BackupSetGUID UNIQUEIDENTIFIER NULL);

GO

 

-- Populate the table

INSERT INTO master.dbo.restoreheaderonly EXEC ('RESTORE HEADERONLY FROM DISK = ''C:\sqlskills\mixedbackups.bck''') ;

GO

 

DECLARE @Position SMALLINT;

DECLARE @DatabaseName NVARCHAR (128);

DECLARE @ExecString NVARCHAR (255);

 

-- Declare a cursor to iterate over the results

DECLARE databases CURSOR FOR

SELECT Position, DatabaseName FROM master.dbo.restoreheaderonly WHERE BackupTypeDescription = 'Database';

 

-- Open the cursor.

OPEN databases;

 

-- Loop through the databases.

FETCH NEXT FROM databases INTO @Position, @DatabaseName;

WHILE @@FETCH_STATUS = 0

BEGIN

SELECT @ExecString = 'RESTORE DATABASE ' + RTRIM (@DatabaseName) +

' FROM DISK = ''C:\sqlskills\mixedbackups.bck''' +

' WITH REPLACE, FILE = ' + RTRIM (CONVERT (VARCHAR (10), @Position));

SELECT 'Restoring database ' + RTRIM (@DatabaseName);

EXEC (@ExecString);

FETCH NEXT FROM databases INTO @Position, @DatabaseName;

END;

 

-- Close and deallocate the cursor.

CLOSE databases;

DEALLOCATE databases;

 

-- Delete the temporary table.

DROP TABLE master.dbo.restoreheaderonly;

GO

It's common knowledge that SQL Server copes with daylight savings time (DST) correctly so why should you care?

Well, it's not so common knowledge that at the end of DST when the clocks go back an hour (always at 02:00 in the U.S.), SQL Agent essentially pauses for an hour (in at least SS2000 onwards). This means that if you have a job that's doing something every 15 minutes, there will be a gap of 75 minutes between the job execution at 01:45 and the job execution at 02:00. This happens because at 02:00, the time is set back to 01:00 but the next run time of all the jobs remains the same - so your job cannot execute until it's next scheduled time of 02:00. So, in the northern hemisphere every Fall, and in the southern hemisphere every Spring, you lose an hour's worth of SQL Agent jobs. Still, why should you care?

Well, it depends what the jobs are that get delayed by an hour. If you have a job that takes a log backup every 15 mins then on the day DST ends, there's actually a gap of 75 minutes between log backups. If you have a Service Level Agreement (SLA) that limits the maximum amount of lost work to 15 minutes in the event of a disaster, then for those 75 minutes you're exposed to potentially not being able to meet that SLA!

That could be a pretty big deal, especially if something goes wrong during that hour (no more or less likely than something going wrong at any other time, but still possible). In that case, you need to come up with an alternative solution. A couple of ways to get around the problem I can think of:

  • Have someone stay up late during that hour and take manual log backups.
  • Switch over to database mirroring, which continually streams the log to the redundant server and so isn't affected the DST issue.

Both of these are viable solutions but I think the best one is to create a SQL Agent job that runs at 01:59 and creates extra backup jobs to run at 01:00, 01:15, 01:30, and 01:45. I don't see why this shouldn't be possible. At 10:36 this morning I created a simple agent job to print the date to a file and set it to execute at 09:40 - in the past. I then set my system time back one hour and the job executed perfectly. The only downside of this solution is that you need to create and schedule the extra jobs using the T-SQL Agent SPs embedded in job steps for your 01:59 job - tedious but not hard. Maybe someone could send me a script and I'll blog it as a follow-on?

So with DST coming to an end on November 4th this is definitely something for you to be aware of even if you don't want to go to the trouble of coping with the extra hour's exposure. As an aside - the dates when DST starts and ends changed this year. KB article 931975 discusses which parts of SQL Server aren't aware of the changed dates and what you can do about it.

Here's a really interesting question that was in my search engine logs yesterday - if I have a transaction that runs and completes while a backup is running, will the complete transaction be in the backup? The answer is.... it depends!

In terms of what gets backed up, the way a full backup works is:

  1. Note the transaction log's LSN (Log Sequence Number)
  2. Read all allocated extents in the various data files
  3. Note the LSN again
  4. Read all the transaction log between the starting LSN and the ending LSN

Any transaction that commits before or on the LSN read in step 3 will be fully reflected when the database is restored. If not, the transaction will be undone. So you can't just go by the completion time of the backup and the completion time of the transaction. The transaction may well commit before the backup operation completes, but it may complete during step 4, and so it will get rolled back during a restore. In this case, it's necessary to take a log backup as well and restore that too to make the transaction be fully reflected after a restore.

As you know I always like to prove things  - so here's my proof of what I just said. I'm going to use the AdventureWorks database to do this. First thing is to set it to full recovery mode (and take the first full backup to start full recovery mode logging):

ALTER DATABASE AdventureWorks SET RECOVERY FULL;

BACKUP DATABASE AdventureWorks TO DISK='C:\SQLskills\AdventureWorks.bck' WITH INIT;

GO

Now I'm going to flush out the backup history tables in MSDB:

USE msdb;

GO

EXEC sp_delete_backuphistory '10/6/07';

GO

I've got a really contrived example that I'm going to use to show whether my transaction is wholely contained in the full backup. Using the HumanResources.Employee table, there's a column VacationHours which I'm going to set to 0 and then force all the changes pages to disk. This is my base state:

UPDATE AdventureWorks.HumanResources.Employee SET VacationHours = 0;

GO

CHECKPOINT;

GO

My contrived transaction is going to be in a tight loop updating all the rows in the table, which will generate lots of transaction log, and timed to complete just before the backup completes (i.e. in stage 4). In one connection I start the backup:

BACKUP DATABASE AdventureWorks TO DISK='C:\SQLskills\AdventureWorks.bck' WITH INIT;

SELECT GETDATE ();

GO

and in another I start my contrived transaction, after starting the backup:

BEGIN TRAN

DECLARE @a INT

DECLARE @b INT

SELECT @a = 1

WHILE (@a < 6)

BEGIN

SELECT @b = 1

WHILE (@b < 201)

BEGIN

UPDATE AdventureWorks.HumanResources.Employee SET VacationHours = @b

SELECT @b=@b+1

END

SELECT @a=@a+1

END

COMMIT TRAN;

SELECT GETDATE ();

GO

It's not pretty but it does the job. The backup finishes at 2007-10-05 17:42:38.983 and the transaction finishes at 2007-10-05 17:42:38.107 - before the backup finishes. Remember I set the VacationHours all to zero before running my transaction - let's check the transaction did actually change them:

SELECT MAX (VacationHours) FROM AdventureWorks.HumanResources.Employee;

GO

This returns 200 - which is what I'd expect. But is the transaction wholely contained in the backup? Let's look at the backup history to find out the last LSN that was captured in the full backup:

SELECT Backup_Start_Date, Backup_Finish_Date, First_LSN, Last_LSN

FROM msdb.dbo.backupset WHERE database_name = 'AdventureWorks';

GO

And the output we get is:

Backup_Start_Date       Backup_Finish_Date      First_LSN           Last_LSN
----------------------- ----------------------- ------------------- -------------------
2007-10-05 17:42:22.000 2007-10-05 17:42:38.000 86000000001600029   91000000625600001

The LSNs are in decimal, so we need to convert the three numbers to hex so we can compare them to what's in the log - giving: 5B:1870:1. This is the LSN of the last log record that was backed up in the full backup. Now let's take a look at the transaction log for AdventureWorks using the undocumented fn_dblog function. This is undocumented but very well known. It's a fully composable alternative to using the old DBCC LOG command.

USE AdventureWorks;

GO

SELECT [Current LSN], Operation, [Transaction ID], AllocUnitName FROM fn_dblog (NULL, NULL);

GO

The log record at that LSN is:

00000058:00001870:0001  LOP_MODIFY_ROW     0000:00001338  HumanResources.Employee.PK_Employee_EmployeeID

This is obviously in the middle of my contrived transaction - showing that it isn't all in the full backup. The end of the transaction isn't until way later in the log:

0000005e:00000628:01b1  LOP_MODIFY_ROW     0000:00001338  HumanResources.Employee.PK_Employee_EmployeeID
0000005e:00000628:01b2  LOP_COMMIT_XACT    0000:00001338  NULL

Before I do anything else, I want to take a log backup to preserve my transaction:

BACKUP LOG AdventureWorks TO DISK='C:\SQLskills\AdventureWorks_Log.bck' WITH INIT;

GO

Now I want to restore the full backup and really show that my transaction isn't all in there:

USE master;

GO

RESTORE DATABASE AdventureWorks FROM DISK='C:\SQLskills\AdventureWorks.bck' WITH REPLACE, RECOVERY;

GO

SELECT MAX (VacationHours) FROM AdventureWorks.HumanResources.Employee;

GO

This returns 0. Clearly my transaction isn't all in there, and the parts that are were rolled back during the restore. Now let's do the same thing but using NORECOVERY for the restore of the full backup and also applying the log backup I took:

RESTORE DATABASE AdventureWorks FROM DISK='C:\SQLskills\AdventureWorks.bck' WITH REPLACE, NORECOVERY;

GO

RESTORE LOG AdventureWorks FROM DISK='C:\SQLskills\AdventureWorks_Log.bck' WITH RECOVERY;

GO

SELECT MAX (VacationHours) FROM AdventureWorks.HumanResources.Employee;

GO

This time the SELECT returns 200. And now you can start playng around with fn_dblog if you didn't know about it before. I'll be posting more about Storage Engine internals that you can figure out from the transaction log in future.

This one's a quickie.

In the previous post I explained about database pages - their structure and some page types. Now I'd like to explain how pages are grouped into units called extents. An extent is a group of eight physically consecutive pages in a data file. Extents are always aligned on 64KB boundaries (i.e. 8-page boundaries), starting at the beginning of the file. Extents, and all their properties, are exactly the same in SQL Server 2000 and 2005.

There are two types of extents: mixed extents and dedicated (or uniform) extents.

Mixed extents

The first 8 pages that are allocated to any IAM chain (either for an index in SQL Server 2000 or allocation unit in SQL Server 2005) are single-page allocations, which are called mixed pages. This means that only a single page is allocated to the IAM chain at a time, rather than a whole extent. The rationale behind this is to allow very small tables to take up the minimum amount of space. Once an IAM chain has passed the 8-page threshold and switched to allocating dedicated extents, it will never go back to allocating mixed pages again.

These mixed pages are allocated from mixed extents that are not allocated to any particular IAM chain. The extent is tracked as being allocated globally (by a GAM page), so no IAM chain can allocate it. If the mixed extent has any unallocated pages in it, it will also be tracked by an SGAM page. When a mixed page is required to be allocated, the SGAM pages are checked to see if any such extents are available. If not, a new mixed extent is allocated, a page is allocated from it and then the extent is tracked by the relevant SGAM page until all its pages are allocated.

As the mixed extent is not allocated to any particular IAM chain, this means that it may hold pages allocated to possibly 8 separate IAM chains. IAM pages themselves are always mixed pages, regardless of how many pages have been allocated to an IAM chain. This means a mixed extent may hold a variety of page types too,including IAM, data, index, or text pages.

Dedicated/Uniform extents

Once the 8-page threshold is passed, all further allocations to an IAM chain are from dedicated extents. This means that an extent at a time is allocated to an IAM chain and marked as such in one of the IAM pages in the IAM chain - whichever one maps the GAM interval that the extent is in. The extent is also tracked as being allocated by the relevant GAM page so no other IAM chain can allocate it.

All pages from a dedicated extent must be allocated to the same IAM chain. They do not all need to be the same type, however. For instance, a clustered index has a mixture of data and index pages. Also, when an extent is allocated to an IAM chain the pages in the extent are not all allocated at once (except for certain bulk operations). The pages are usually allocated as needed. The allocation state of each page is tracked using a PFS page.

When all the pages from a dedicated extent are deallocated, the extent itself is deallocated from its owning IAM chain and is available again for allocation to any of IAM chain, or to become a mixed extent.

Tracking changes to extents for backup

Changes to extents are tracked in two places:

  • Any extent that has been changed since the last full or differential backup will be tracked in the relevant differential bitmap page. This is how a differential backup knows which extents to backup instead of the whole database. All diff map pages are reset when the next backup is taken.
  • Any extent that has been changed by a bulk-logged operation since the last full, differential, or log backup will be tracked in the relevant minimally-logged bitmap page. Any log backup that's taken after a bulk-logged operation will also include all extents tracked this way. All ML map pages are reset when the next backup is taken.

More on PFS, GAM, SGAM, ML, and DIFF pages in a not-too-distant future post.

Over the next few months I'll be blogging a lot about new features that are coming in SQL Server 2008 for DBAs and ITPros.

First up is Backup Compression. This has been one of most heavily requested features since before I joined the SQL team back in 1999 - for me it's really cool that it's in this coming release.

It's widely recognized that database sizes are growing significantly, and that the growth is being forced by many factors, such as companies requiring more data to be available online for longer (e.g. to comply with government regulations) or an increasing amount of data being digitized for storage. Although the per/GB cost of storage is falling, the actual cost of keeping a database available is increasing. Here's why:

  • Prior to SQL Server 2008, a backup of a VLDB (or any sized database really) takes, on average, the same amount of disk space as the data in the database itself (remember that database files can be sized much larger than the amount of data they hold). A prudent backup strategy is to have multiple backups online, and multiple redundant copies of each backup. Thus, the amount of disk space required increases rather quickly - more than offsetting the drop in storage costs.
  • Keeping safe copies of all database backups means they have to be physically copied to a separate location. Increasing database sizes translates into more bandwidth required to copy larger database backups (or lower bandwidth links get saturated for longer and longer).
  • If you have a well designed storage strategy, then you're probably using RAID - so the more backups you need to store, and the higher level of RAID you use, the more drives you need to store all these backup copies.

In addition to costs, the elapsed times of backup and restore operations need to be considered. Backup and restore operations are essentially IO-bound. For a backup in SQL Server 2005 and before, the whole database has to be read and then written to a new location - with the total IO size of the writes to the backup equalling that of the reads from the database. The same is true for a restore, with reads from the backup equalling writes to the restored database. (As an aside, a restore operation also has the added CPU cost of having to examine each page as it's read from the backup to figure out where it should be written to in the restored database - this usually makes a restore operation take 10-20% more elapsed time than a backup.)

The perfect solution to these problems, at the expense of sometimes-considerable CPU time, is to compress the database as it's written into the backup. This reduces storage space per backup, required network bandwidth per backup copy, and elapsed time for backup and restore operations. The last point - reducing elapsed time for restore operations - is especially important in disaster recovery situations, where anything that can help reduce database downtime is a good thing.

[Edit: I've added a little here based on comments I received - thanks Brent and Marc!]

A less ideal solution I've seen is to take a regular backup and then compress it after-the-fact before copying to other locations. While this is a reasonable solution, it requires more disk space than compressing the database as its backed up, and it's a more complicated procedure. It also increases the time to take the backup, as the compression is done in a seperate step.

An alternative to compressing the backup at all is to make the backup location a compressed NTFS directory. While this achieves the compression goal, it doesn't permanently compress the backup so doesn't reduce the network bandwidth required to copy the backup or the space needed to archive the backup to tape.

Up until SQL Server 2008, the only solutions for compression-during-backup have come from third-party software vendors. Although these solutions do the job of aleviating the problems I've described above, there are two major roadblocks to their adoption that I've heard from SQL Server customers:

  1. You need to buy another software license as well as SQL Server - this can be pretty expensive for a large number of SQL Server installations. Management is also a headache, to ensure that all sites that may need to decompress the backup have the correct software installed.
  2. Some IT shops are Microsoft-only, which precludes the use of any software not supplied by Microsoft.

In SQL Server 2008, Microsoft will include a long-awaited and much-asked-for backup compression solution - eliminating the two roadblocks above. This is a fantastic first step improving backup/restore functionality - hopefully in the release after SQL Server 2008 we'll see further innovations that will allow encrypted backups, table-level restore, and easier validation of the database stored in a backup.

Some points to note:

  • Adhering to the principal-of-least-surprise, backup compression will be off by default, with very simple syntax to turn it on - directly with T-SQL or through the tools.
  • The compression algorithm used is proprietary to Microsoft and has yielded similar compression ratios to well-known third-party products.
  • During Microsoft's in-house testing on real -world customer databases, average compression ratios of 5:1 have been observed.
  • A backup set will not be able to contain both compressed and uncompressed backups.
  • None of the existing functionality will be altered by compression - e.g. the operation of WITH CHECKSUM or WITH CONTINUE_AFTER_ERROR (see here for more info on those options).
  • A restore operation will be able to tell automatically whether a given backup is compressed or not and just do the right thing.

Once backup compression is available in a public CTP, I'll blog some sample scripts and report on things like:

  • the compression ratios, elapsed time differences, and CPU usages for a few sample databases
  • the varoius configuration options available
  • any differences in the MSDB backup history tables or the output from RESTORE HEADERONLY/LABELONLY

In summary, backup bompression is a very exciting feature and should enable many customers to save money, either on additional software licenses or on storage/network costs, and time, especially in all-important disaster recovery situations.

Thanks

This was originally posted as two posts on the SQL Server Storage Engine site. It was very popular so I've combined the two posts together and added a bunch more commentary - especially on page checksums and IO errors. You may also notice some color differences between the scripts from earlier in the year and today's post - more keywords are recognized in Management Studio in SP2 than before (but still not CHECKDB though...)

It's almost inevitable that at some point every DBA will face dealing with corruption - so it's very important that you know how the server will behave when corruption happens. You also need to make sure that whatever logic you've created to catch corruptions (either through error log parsing, alerts, or Agent jobs - topic for a future post) will actually work.

To do all this you need a corrupt database to play with. Earlier this year I created two corrupt databases - one for 2000 (attached in broken2000.zip) and one for 2005 (attached in broken2005.zip). The two attached files can be restored by unzipping them and then using the following syntax (substituting the correct backup name):

RESTORE DATABASE broken FROM DISK='c:\sqlskills\brokenXXX.bck'

WITH MOVE 'broken' TO 'c:\sqlskills\broken.mdf',

MOVE 'broken_log' TO 'c:\sqlskills\broken_log.ldf';

GO

The databases have the same schema - a table called 'brokentable' (c1 int, c2 varchar(7000)) with one row in it. The table has a single data-page which I've corrupted differently in 2000 and 2005:

  • 2000: The corrupt page has page ID (1:75) and the page header is corrupt so that selecting from the table will result in a 605 error which will kill the connection.
  • 2005: The corrupt page has page ID (1:143) and the page header is corrupt such that the page checksum is bad.

A cautionary note on page checksums - if you upgrade a database from 2000 to 2005 and turn on page checksums, nothing happens! It's not until a database page is read into the buffer pool, changed in some way and then written back out to disk that it will have a page checksum stamped on it. This means that once you turn them on, you need to trigger a page checksum being written to each page in some way - e.g. rebuilding all indexes or forcing an in-place update of all table rows. Neither of these is very palatable and there's no tool to force page checksums in SQL Server 2005. Unfortunately there are no plans to include such a tool in SQL Server 2008 either.

Below I've listed a few things you can try out to see what would happen on your database if a checksum failed. These are all using the 2005 corrupt database.

Query errors

Any query that touches that page is going to fail with an 824 error. The IO errors in 2005 are different from 2000 - they've been split into 3:

  • 823 - a hard IO error. This is where SQL Server has asked the OS to read the page but it just can't.
  • 824 - a soft IO error. This is where the OS could read the page but SQL Server decided that the page was corrupt - for example with a page checksum failure
  • 825 - a read-retry error. This is where either an 823 or 824 occured, SQL server retried the IO automatically and it succeeded. This error is written to the errorlog only - you need to be aware of these as they're a sign of your IO subsystem going awry. There's no way to turn off read-retry and force SQL Server to 'fail-fast' - whether this behavior is a good or bad thing can be argued both ways - personally I don't like it.

SELECT * FROM broken..brokentable;

GO

Msg 824, Level 24, State 2, Line 1

SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0x7232c940; actual: 0x720e4940). It occurred during a read of page (1:143) in database ID 8 at offset 0x0000000011e000 in file 'c:\sqlskills\broken.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

DBCC CHECKDB

CHECKDB throws us some nice errors. Note that it doesn't actually mention a page checksum failure. CHECKDB is the only thing in SQL Server that can 'eat' IO errors and convert them into non-fatal corruption errors. Note in the CHECKDB output below that the repair level needed to repair this error is 'repair_allow_data_loss' - this is because the repair for a page with any kind of IO error on it is to delete the page, fix-up all relevant linkages, and rebuild any referencing indexes.

DBCC CHECKDB ('broken') WITH NO_INFOMSGS, ALL_ERRORMSGS;

GO

Msg 8928, Level 16, State 1, Line 1

Object ID 2073058421, index ID 0, partition ID 72057594038321152, alloc unit ID 72057594042318848 (type In-row data): Page (1:143) could not be processed. See other errors for details.

Msg 8939, Level 16, State 98, Line 1

Table error: Object ID 2073058421, index ID 0, partition ID 72057594038321152, alloc unit ID 72057594042318848 (type In-row data), page (1:143). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12716041 and -4.

CHECKDB found 0 allocation errors and 2 consistency errors in table 'brokentable' (object ID 2073058421).

CHECKDB found 0 allocation errors and 2 consistency errors in database 'broken'.

repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (broken).

Backup with CHECKSUM

If you have page checksums turned on, you should always use the WITH CHECKSUM option when taking backups. This will cause the page checksums to be checked as they're read into the backup. If a bad page checksum is found, the backup will stop and print a message identifying the bad page. Using the WITH CHECKSUM option on a backup will also generate a checksum over the entire backup stream and store it in the backup. This means we can detect a damaged backup by recalculating the checksum and comparing it against that stored in the backup - in much the same way that page checksums work.

BACKUP DATABASE broken TO DISK='c:\sqlskills\broken2.bck'

WITH CHECKSUM;

GO

Msg 3043, Level 16, State 1, Line 1

BACKUP 'broken' detected an error on page (1:143) in file 'c:\sqlskills\broken.mdf'.

Msg 3013, Level 16, State 1, Line 1

BACKUP DATABASE is terminating abnormally.

The backup has failed because of a bad page checksum. However, we can force it to backup. If this is the only copy of the database we have, and we're being forced to run repair to fix a corruption, for instance, then we want to make sure we have a backup to restore from in case something goes wrong with the repair. Even a backup that contains a corrupt database is better than no backup at all. In this case, we can use the CONTINUE_AFTER_ERROR option which will force the backup to continue when it finds a bad page.

BACKUP DATABASE broken TO DISK='c:\sqlskills\broken2.bck'

WITH CHECKSUM, CONTINUE_AFTER_ERROR;

GO

Processed 160 pages for database 'broken', file 'broken' on file 1.

Processed 1 pages for database 'broken', file 'broken_log' on file 1.

BACKUP WITH CONTINUE_AFTER_ERROR successfully generated a backup of the damaged database. Refer to the SQL Server error log for information about the errors that were encountered.

BACKUP DATABASE successfully processed 161 pages in 2.025 seconds (0.651 MB/sec).

RESTORE VERIFYONLY

You can check the high-level validity of any backup set using the RESTORE VERIFYONLY command.

RESTORE VERIFYONLY FROM DISK='c:\sqlskills\broken2005.bck';

GO

The backup set on file 1 is valid.

What about on the backup that we forced using CONTINUE_AFTER_ERROR?

RESTORE VERIFYONLY FROM DISK='c:\sqlskills\broken2.bck';

GO

The backup set was written with damaged data by a BACKUP WITH CONTINUE_AFTER_ERROR.

Isn't that cool? It tells us that the backup was already corrupt when it was written. Ok - let's ask it to specifically check the checksums in the backup. This will look through all the pages in the backup that have page checksums, check them, and recalculate the backup stream checksum.

RESTORE VERIFYONLY FROM DISK='c:\sqlskills\broken2.bck'

WITH CHECKSUM;

GO

The backup set was written with damaged data by a BACKUP WITH CONTINUE_AFTER_ERROR.

In our case, we get the same as above because the database was known to be corrupt when the backup was taken, so none of the checksum checking is done. What about if we try to check the checksums on the initial backup?

RESTORE VERIFYONLY FROM DISK='c:\sqlskills\broken2005.bck'

WITH CHECKSUM;

GO

Msg 3187, Level 16, State 1, Line 1

RESTORE WITH CHECKSUM cannot be specified because the backup set does not contain checksum information.

Msg 3013, Level 16, State 1, Line 1

VERIFY DATABASE is terminating abnormally.

We can't do that as the backup wasn't taken using the WITH CHECKSUM option in the first place, even though some of the database pages may have page checksums on them.

RESTORE

How about we try to overwrite the existing 'broken' database with the one from the second backup we took?

RESTORE DATABASE broken FROM DISK='c:\sqlskills\broken2.bck'

WITH REPLACE;

GO

Msg 3183, Level 16, State 1, Line 1

RESTORE detected an error on page (1:143) in database "broken" as read from the backup set.

Msg 3013, Level 16, State 1, Line 1

RESTORE DATABASE is terminating abnormally.

It won't let us because the backup contains corrupt data (and it knows that because we forced the backup to complete using the CONTINUE_AFTER_ERROR option). However, there may be cases where you've lost your database and all you have is a corrupt backup. In this case it may be better to restore what data you do have rather than lose everything. You can do it using the CONTINUE_AFTER_ERROR option on the RESTORE command this time:

RESTORE DATABASE broken FROM DISK='c:\sqlskills\broken2.bck'

WITH REPLACE, CONTINUE_AFTER_ERROR;

GO

Processed 160 pages for database 'broken', file 'broken' on file 1.

Processed 1 pages for database 'broken', file 'broken_log' on file 1.

The backup set was written with damaged data by a BACKUP WITH CONTINUE_AFTER_ERROR.

RESTORE WITH CONTINUE_AFTER_ERROR was successful but some damage was encountered. Inconsistencies in the database are possible.

RESTORE DATABASE successfully processed 161 pages in 0.392 seconds (3.364 MB/sec).

Isn't that cool? It works BUT it tells us that the backup set contained corrupt data and that the database was restored but could have corrupt data in.

Summary

Have a play about with these databases to familiarize yourself with the kind of responses you'll get from the various tools when a corruption exists, and how to work around it if need be.

Let me know if you want to see any particular kinds of corruptions explored, or want a database with something specific corrupted in.

broken2000.zip (41 KB)broken2005.zip (149.9 KB)

Theme design by Nukeation based on Jelle Druyts