The March edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Distributed transactions and database mirroring - why they don't work together
  • Background processes that can cause I/Os even with no connections to the server
  • How to restore backups from a file containing multiple appended backups
  • The perennial problem of production databases being too large to restore in development

Check it out at http://technet.microsoft.com/en-us/magazine/ff458345.aspx.

Happy New Year! 

The January edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Using backups for corruption recovery
  • Why snapshot isolation is required when using change tracking, and its performance implications
  • Is DBCC CHECKDB a really comprehensive integrity check?
  • The difference between data file and log file shrinking

Check it out at http://technet.microsoft.com/en-us/magazine/ee914612.aspx.

In last week's survey I asked how you manage the size of your database *data* files - see here for the survey. Here are the results as of 6/24/09.

The 'other' values are as follows:

  1. 5 x 'manual file growths and a custom mom alter to when the datafiles are 98% full. autogrow set to fixed amount in case we miss the mom e-mail'
  2. 1 x 'auto-grow with a procedure to keep the log file size at 20% relative to the total data file size'
  3. 1 x 'Create with very large initial file size, set auto-growth to %'
  4. 1 x 'Lots of white space in files. Auto-grow to a fixed size (in case of emergency).'
  5. 1 x 'set initial size for 1 yr usage, monitor size, manual grow, autogrow percentage - send alert if it does grow'
  6. 1 x 'set initial size for 2 year growth, capture growth stats daily, monitor physical disk space daily'
  7. 1 x 'Set to autogrow by fixed size to cater for emergencies, otherwise maintain 80-90% free space by daily reporting then manual off-peak size increase if necessary.'

As I mentioned in the survey itself, this is just about database *data* files. I covered log file size management in a previous survey - see Importance of proper transaction log size management.

There are really four parts to data file size management, which I'll discuss below.

The first thing I'll say is that if you're able to, enable instant file initialization on 2005+ - as it will vastly reduce the time required to create data files, grow data files, and restore backups (if the restore has to create data files). We're talking minutes/hours down to single-digit seconds. See Misconceptions around instant file initialization. If you're not one of the miniscule fraction of a percent of customers who have volumes shared between secure file servers and SQL Server instances, turn this on ASAP. Most DBAs don't know about this feature, but everyone I teach it to or show it to are amazed and then go turn it on. No brainer.

Initial data file sizing. This can be tricky. Without getting into the how-to-calculate-the-database-size quagmire, I'll simply say that you should provision as much space as you can, based upon your sizing estimates. Don't just provision for the here-and-now - if you're able to, provision space for the next year's worth of anticipated growth - to prevent auto-growth having to kick-in at all. I rewrote all the Books Online pages for 2005 (links are to the 2008 BOL) on Estimating the Size of Heaps, Clustered Indexes, and Nonclustered Indexes and in the blog post Tool for estimating the size of a database I link to a tool someone put together that codified all my formulas. You can also get sizing tools from hardware vendors too.

Data file growth. If you're able to, auto-grow should ALWAYS be turned on, as an emergency measure in case your monitoring fails - you don't want the database to have to grow but it's unable to and then it stops and the application is offline. However, you shouldn't *rely* on auto-grow - it's just for emergencies. The auto-growth default for data files used to be 10% for 2000 and before, but then changed to 1MB from 2005 onwards (log file default auto-growth remained at 10%). Neither of these are good choices. A percentage-based auto-growth means that as your files get bigger, so does the auto-growth, and potentially the time it takes if you don't have instant file initialization enabled. A 1MB autogrowth is just nonsensical. Your auto-growth should be set to a fixed size - but what that should be is a great big 'it depends'. You need to decide whether you want the auto-growth to be a quick stop-gap, or whether the auto-growth will replace manual growth after monitoring. You also need to consider how long the auto-growth will take, especially without instant file initialization. I can't give any guidance here as to what a good number is, but I'd probably settle on something around 10% (fixed), with the percentage steadily falling as the database size grows. It's very important that you have alerts setup to you can tell when auto-growth does occur, so you can then take any necessary action to grow it even more or tweak your settings.

'Other' response #2 is interesting. There's been a 'best-practice' around for a while that the log file should be sized to be a percentage of the data file size. It's totally unfounded and in most cases bears no relation to reality. The vast majority of the time, the size of the log is *NOT* dependent on the data file sizes in any way. Imagine a 10TB database - would you provision a 2TB log? Of course not. Now, I can see special cases where the operations performed on the tables in the database might affect a fixed portion of the largest table in a single batch, and that could generate enough log (with reserved space too) to equal 20% of the data file size - but that's a pretty nonsensical special case, to be honest. You shouldn't use 'set the log as a percentage of the data file' as a guideline.

Data file size/usage monitoring. There's a growing movement towards monitoring the data file usage and manually growing the files as they approach 100% full - avoiding auto-growth altogether, but still having it enabled for emergencies. In my book, this is the best way to go as you have all the control over what happens and more importantly, when it happens - especially without instant file initialization. There are some quirks here though. SCOM, for instance, has logic that disables file size and usage monitoring if you enable auto-grow. It assumes that if you enable auto-grow then you're not interested in monitoring. I happened to have one of the SCOM devs in my last maintenance class I taught on the Redmond MS campus and he's going to try to get that logic fixed.

Data file shrinking. Just this morning I wrote a long blog post about this - see Why you should not shrink your data files. Running data file shrink causes index fragmentation, uses lots of resources, and the vast majority of the time when people use it, is unnecessary and wasteful. It should NEVER be part of a regular maintenance plan - as you get into the shrink-grow-shrink-grow cycle which becomes a zero-sum game with a ton of transaction log being generated. Think about this - if you run a shrink, which is fully logged, then all the log has to be backed up, log-shipped, database mirrored, scanned by the replication log reader agent, and so on. And then the database will probably auto-grow again through normal operation, or some part of the maintenance job that rebuilds indexes. And then you shrink it again. And the cycle continues...

Bottom line - make sure you size the data files with some free space, have auto-growth set appropriately, have instant file initialization enabled if you can, monitor file sizes and usage, alert on auto-grows, and don't use shrink. And don't use shrink. Really.

Next up - this week's survey!

(Been a while longer than usual since blog posts - I've been really busy flying around the country doing stuff onsite with clients. Normal service will be resumed at the start of July :-)

One of my biggest hot-buttons is around shrinking data files. Although I used to own the shrink code while I was at Microsoft, I never had a chance to rewrite it so that data file shrink is a more palatable operation. I really don't like shrink.

Now, don't confuse shrinking the transaction log with shrinking data files. Shrinking the log is necessary if your log has grown out of control, or as part of a process to remove excessive VLF fragmentation (see Kimberly's excellent posts on this here and here). However, shrinking the log should be a rare operation and should not be part of any regular maintenance you perform.

Shrinking of data files should be performed even more rarely, if at all. Here's why - data file shrink causes *massive* index fragmentation. Let me demonstrate with a simple script you can run. The script below will create a data file, create a 10MB 'filler' table at the start of the data file, create a 10MB 'production' clustered index, drop the 'filler' table and then run a shrink to reclaim the space.

USE MASTER;
GO

IF DATABASEPROPERTYEX ('DBMaint2008', 'Version') > 0
 DROP DATABASE DBMaint2008;

CREATE DATABASE DBMaint2008;
GO
USE DBMaint2008;
GO

SET NOCOUNT ON;
GO

-- Create the 10MB filler table at the 'front' of the data file
CREATE TABLE FillerTable (c1 INT IDENTITY,  c2 CHAR (8000) DEFAULT 'filler');
GO

-- Fill up the filler table
INSERT INTO FillerTable DEFAULT VALUES;
GO 1280

-- Create the production table, which will be 'after' the filler table in the data file
CREATE TABLE ProdTable (c1 INT IDENTITY,  c2 CHAR (8000) DEFAULT 'production');
CREATE CLUSTERED INDEX prod_cl ON ProdTable (c1);
GO

INSERT INTO ProdTable DEFAULT VALUES;
GO 1280

-- check the fragmentation of the production table
SELECT [avg_fragmentation_in_percent] FROM sys.dm_db_index_physical_stats (
    DB_ID ('DBMaint2008'), OBJECT_ID ('ProdTable'), 1, NULL, 'LIMITED');
GO

-- drop the filler table, creating 10MB of free space at the 'front' of the data file
DROP TABLE FillerTable;
GO

-- shrink the database
DBCC SHRINKDATABASE (DBMaint2008);
GO

-- check the index fragmentation again
SELECT [avg_fragmentation_in_percent] FROM sys.dm_db_index_physical_stats (
    DB_ID ('DBMaint2008'), OBJECT_ID ('ProdTable'), 1, NULL, 'LIMITED');
GO

avg_fragmentation_in_percent
----------------------------
0.390625

DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages
------ ----------- ----------- ----------- ----------- --------------
6      1           1456        152         1448        1440
6      2           63          63          56          56

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

avg_fragmentation_in_percent
----------------------------
99.296875

Look at the output from the script! The logical fragmentation of the clustered index before the shrink is a near-perfect 0.4%. After the shrink, it's almost 100%. The shrink operation *completely* fragmented the index, removing any chance of efficient range scans on it by ensuring the all range-scan readahead I/Os will be single-page I/Os.

Why does this happen? A data file shrink operation works on a single file at a time, and uses the GAM bitmaps (see Inside The Storage Engine: GAM, SGAM, PFS and other allocation maps) to find the highest page allocated in the file. It then moves it as far towards the front of the file as it can, and so on, and so on. In the case above, it completely reversed the order of the clustered index, taking it from perfectly defragmented to perfectly fragmented.

The same code is used for DBCC SHRINKFILE, DBCC SHRINKDATABASE, and auto-shrink - they're equally as bad. As well as introducing index fragmentation, data file shrink also generates a lot of I/O, uses a lot of CPU, and generates *loads* of transaction log - as everything it does is fully logged.

Data file shrink should never be  part of regular maintenance, and you should NEVER, NEVER have auto-shrink enabled. I tried to have it removed from the product for SQL 2005 and SQL 2008 when I was in a position to do so - the only reason it's still there is for backwards compatibility. Don't fall into the trap of having a maintenance plan that rebuilds all indexes and then tries to reclaim the space required to rebuild the indexes by running a shrink - that's a zero-sum game where all you do is generate a log of transaction log for no actual gain in performance.

So what if you *do* need to run a shrink? For instance, if you've deleted a large proportion of a very large database and the database isn't likely to grow, or you need to empty a file before removing it?

The method I like to recommend is as follows:

  • Create a new filegroup
  • Move all affected tables and indexes into the new filegroup using the CREATE INDEX ... WITH (DROP_EXISTING) ON <filegroup> syntax, to move the tables and remove fragmentation from them at the same time
  • Drop the old filegroup that you were going to shrink anyway (or shrink it way down if its the primary filegroup)

Basically you need to provision some more space before you can shrink the old files, but it's a much cleaner mechanism.

If you absolutely have no choice and have to run a data file shrink operation, be aware that you're going to cause index fragmentation and you should take steps to remove it afterwards if it's going to cause performance problems. The only way to remove index fragmentation without causing data file growth again is to use DBCC INDEXDEFRAG or ALTER INDEX ... REORGANIZE. These commands only require a single 8KB page of extra space, instead of needing to build a whole new index in the case of an index rebuild operation.

Bottom line - try to avoid running data file shrink at all costs!

Here's a question I got from someone who attended our database maintenance workshop at PASS last week (paraphrased):

I attended your pre-conference session on database maintenance and found it to be very informative.  From what you told use though, I think I need to change my nightly backup procedure.  I like to get my databases back to as small of a size as possible before backing them up, so I run the following commands to do this before taking the full database backup.  Could you help me with a better way of doing this? We're on SQL Server 2005.

BACKUP LOG <mydbname> WITH NO_LOG

DBCC SHRINKDATABASE (<mydbname>)

And here's the answer I sent back:

How large is the database? And how long must you keep the backups around? If the cumulative size of the backups takes up a large proportion of your available storage space (and we're talking more than just a single direct-attached 100+GB drive), then it may be worth compressing the backups - otherwise you're likely causing yourself more trouble than its worth.
 
By doing BACKUP LOG WITH NO_LOG you're effectively throwing away log records and removing the possibility of doing any kind of point-in-time, or up-to-the-second recovery (see BACKUP LOG WITH NO_LOG - use, abuse, and undocumented trace flags to stop it). If you're running in the FULL recovery model, and you don't care about either of these features, then you should switch to the SIMPLE recovery model. If you really want to be in FULL, don't ever use WITH NO_LOG.

The amount of transaction log that a full backup requires cannot be changed by you truncating the log. The full backup will backup any log it requires to enable the restored database to be a transctionally consistent copy of the database. See Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes.

Doing a DBCC SHRINKDATABASE (the same exact operation as a database auto-shrink) will cause massive index fragmentation, and cause file-system fragmentation of the data files, as they will likely need to grow again after you've squeezed all the space out of them. See Auto-shrink - turn it OFF! for more details on the effects.

If you're really concerned about backup sizes and space is at a premium, I recommend using a 3rd-party backup compression tool such as LiteSpeed or HyperBac so you're not affecting the actual database. Remember also that SQL Server 2008 has native backup compression too - see my blog post here for more details.

Hope this helps

Every so often I'll see posts on the various data corruption forums discussing causes of corruption. In this post I want to debunk some of the myths around what can cause corruption. There are really two types of corruption to deal with, physical corruption and logical corruption.

Physical corruption

This is where something has altered the contents of a data or log file sector with no regard for what is being stored there. Possible causes of physical corruption are:

  • Problem with the I/O subsystem (99.8% of all cases I've ever seen - only 3 nines as I'd estimate I've seen around about a thousand corruption cases). Remember the I/O subsystem is everything underneath SQL Server in the I/O stack - including the OS, 3rd-party file system filter drivers, device drivers, RAID controllers, SAN controllers, network hardware, drives themselves, and so on. Millions of lines of code and lots of moving parts spinning very fast, very close to very fragile pieces of metal oxide (I once heard Jim Gray liken a disk drive head to a 747 jumbo jet flying at 500 mph at a height of 1/4 inch from the ground...)
  • Problem with the host machine hardware (0.1% of cases). Most of the time this is a memory error.
  • SQL Server bugs (0.1% of cases). Yes, there have been corruption bugs. Every piece of software has bugs. There are KB articles describing bugs.
  • Deliberate introduction of corruption using a hex editor or other means.

Physical corruption is what DBCC CHECKDB usually reports and the majority of cases are caused by a physical failures of some kind, with the minority caused by humans - software bugs.

Logical corruption

This is where something has altered some data so that a data relationship is broken. Possible causes of logical corruption are:

  • Humans

:-) Okay...

  • Application bug. The application deletes one part of an inherent data relationship but not the other. Or the application designer doesn't implement a constraint properly. Or the application designer doesn't cope with a transaction roll-back properly. You get the idea.
  • Accidental update/delete. Someone deletes or updates some data incorrectly.
  • SQL Server bug. See above.
  • DBCC CHECKDB when using the REPAIR_ALLOW_DATA_LOSS option. As is documented in Books Online, and I've blogged about and mentioned when lecturing, if you run repair, it doesn't take into account any inherent or explicit constraints on the data.

The point here is that a physical failure of a component does not cause logical corruption, it causes physical corruption. Conversely, application errors cause logical corruption, not physical corruption. DBCC CHECKDB errors are about physical corruption (okay, with the inclusion of DBCC CHECKCATALOG code in 2005, it will find cases where the DBA has manually altered the system tables, causing logical corruption) and applications cannot cause physical corruption as they can only manipulate data through SQL Server. If an application hits a SQL Server bug which causes physical corruption, that's still not the application causing physical corruption, it's SQL Server.

So - on to the myths.

  • Can an application cause physical corruption? No.
  • Can stopping a shrink operation cause corruption of any kind? No.
  • Can stopping an index rebuild cause corruption of any kind? No.
  • Can running DBCC CHECKDB without repair cause corruption of any kind? No.
  • Can creating a database snapshot cause corruption of any kind? No.

Hope this helps.

Well, we're just back from vacation (photo blog post to follow) and I've heard that the feature article on Effective Database Maintenance I wrote for the August issue of TechNet Magazine is live on the web. It also includes a 5 minute long screencast I recorded where I demo the effect of database shrink on index fragmentation.

You can get to the article at http://technet.microsoft.com/en-us/magazine/cc671165.aspx. The topics covered are:

  • Managing data and transaction log files
  • Eliminating index fragmentation
  • Ensuring accurate, up-to-date statistics
  • Detected corrupted database pages
  • Establishing an effective backup strategy

It's written around 2-300 level and presents a good overview (well, at least I think so :-)) of the concepts involved.

Also, the August SQL Q&A column is available at http://technet.microsoft.com/en-us/magazine/cc671180(TechNet.10).aspx. This month's topics on the web (more in the print magazine) are:

  • Database version changes with upgrades
  • Benefits of partitioning
  • Consistency checking options for VLDBs

Enjoy!

My first magazine article is in print! I've taken over the bi-monthly SQL Q&A column for TechNet Magazine and I just received the June magazine in the mail today with my first column in it. Topics covered are:

  • Creating corruption and using page checksums
  • The shrink-grow-shrink-grow trap
  • How many databases can be mirrored per instance
  • A tip on changing the default server port, from Jens Suessmeyer

I've also just completed a feature article for either the July or August issue dealing with database maintenance for the 'involuntary' DBA - more details when it gets published.

If you don't get the print version of TechNet Magazine, you can get to this month's SQL Q&A column at http://technet.microsoft.com/en-us/magazine/cc510328.aspx. There may not be anything new if you've been following my blog for a while, but if you've just started, it's worth a quick look.

Enjoy!

PS Let me know if you've got any good questions - I've already completed the August column but I'd like to hear of any questions you may have for later columns.

I’m in the middle of a flight from Washington D.C. to Zurich on the way to Barcelona for TechEd IT Forum and I can’t sleep – Kimberly’s out like a light so what else is there to do except write another blog post? :-) OK - actually posting this from Barcelona on Tuesday before our first of 12 sessions here!

In the Database Maintenance workshop we did at SQL Connections last week I promised to blog a script I used to show how data file shrink operations cause massive fragmentation of indexes. The reason is that data file shrink starts at the end of the data file and moves a single page at a time to a free space below the shrink threshold. In the process of doing so, it perfectly reverses the physical order of the pages comprising the leaf level of an index – thus perfectly fragmenting it!!

Let’s try out my simple script that demonstrates this. First thing I’m going to do is create a new database and create a 10MB ‘filler’ table, which I’m going to then drop later to create a space that shrink can use.

USE MASTER;

GO

 

IF DATABASEPROPERTYEX ('shrinktest', 'Version') > 0

      DROP DATABASE shrinktest;

 

CREATE DATABASE shrinktest;

GO

USE shrinktest;

GO

 

SET NOCOUNT ON;

GO

 

-- Create and fill the filler table

CREATE TABLE filler (c1 INT IDENTITY, c2 VARCHAR(8000))

GO

DECLARE @a INT;

SELECT @a = 1;

WHILE (@a < 1280) -- insert 10MB

BEGIN

      INSERT INTO filler VALUES (REPLICATE ('a', 5000));

      SELECT @a = @a + 1;

END;

GO

Next I’ll create the ‘production’ table that we’d really like to keep in optimal shape for performance.

-- Create and fill the production table

CREATE TABLE production (c1 INT IDENTITY, c2 VARCHAR (8000));

CREATE CLUSTERED INDEX prod_cl ON production (c1);

GO

DECLARE @a INT;

SELECT @a = 1;

WHILE (@a < 1280) -- insert 10MB

BEGIN

      INSERT INTO production VALUES (REPLICATE ('a', 5000));

      SELECT @a = @a + 1;

END;

GO 

Now I’ll use the sys.dm_db_index_physical_stats DMV to check the fragmentation of the production table’s clustered index – it should be almost zero:

-- check the fragmentation of the production table

SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (

      DB_ID ('shrinktest'), OBJECT_ID ('production'), 1, NULL, 'LIMITED');

GO

avg_fragmentation_in_percent fragment_count

---------------------------- --------------------

0.390930414386239            6

This is what I expected. Now I’m going to drop the filler table, run a shrink operation and then check the fragmentation again:

-- drop the filler table and shrink the database

DROP TABLE filler;

GO

 

-- shrink the database

DBCC SHRINKDATABASE (shrinktest);

GO

 

-- check the index fragmentation again

SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (

      DB_ID ('shrinktest'), OBJECT_ID ('production'), 1, NULL, 'LIMITED');

GO

avg_fragmentation_in_percent fragment_count

---------------------------- --------------------

99.7654417513683             1277

Wow! The index went from almost 0% fragmented to almost 100% fragmented – the shrink operation totally reversed the physical ordering of the leaf level of the clustered index – nasty.

One of the common maintenance operations I see at customer sites is to run a database shrink at some interval, and I always advise against it – now you can see why. Running a regular shrink operation can cause horrible fragmentation problems. The worst problems I see are those customers with maintenance plans that rebuild all indexes and then run a shrink to remove the extra space necessary for the index rebuilds – completely undoing the effects of the index rebuild!

One other common thing I see is to have auto-shrink set on for one or databases. This is bad for several reasons:

  • Shrink causes index fragmentation, as I’ve just demonstrated above.
  • You can't control when it kicks in. Although it doesn't have any effect like long-term blocking, it does take up a lot of resources, both IO and CPU. It also moves a lot of data through the buffer pool and so can cause hot pages to be pushed out to disk, slowing things down further. If the server is already pushing the limits of the IO subsystem, running shrink may push it over, causing long disk queue lengths and possibly IO timeouts.
  • You're likely to get into a death-spiral of auto-grow then auto-shrink then auto-grow then auto-shrink... (in my experience, if someone is using auto-shrink, they're most likely using and relying on auto-grow too). An active database usually requires free space for normal operations - so if you take that free space away then the database just has to grow again. This is bad for several reasons:
    • Repeatedly shrinking and growing the data files will cause file-system level fragmentation, which can slow down performance
    • It wastes a huge amount of resources, basically running the shrink algorithm for no reason
    • Auto-grow itself can be bad, especially if you're using SQL Server 2000 (or don't have Instant File Initialization turned on - see this post from Kimberly's blog) where all allocations to the file being grown are blocked while the new portion of the file is being zero-initialized.

Bottom-line: auto-shrink should *NEVER* be turned on…

Theme design by Nukeation based on Jelle Druyts