[Edit 1/1/09] Well, actually its so nice here (St. Lucia) that we're staying on vacation another week - maybe I'll post some stuff next week - maybe... Smile

Well, its the end of another year, and my first complete year outside of Microsoft - and what a busy year it's been. Between the two of us, Kimberly and I flew about 220 thousand miles, visited 8 countries, and presented at 5 conferences. With all the classes, books, articles, whitepapers, user groups, forums, and blogging, it seems like this year's been a blur!

Next year we've got some cool stuff coming up that we're looking forward to (books, broadcasting, and the return of our Immersion events), but for now we're flying out to the sun for a long-awaited and well-deserved vacation (and delayed honeymoon). We'll be totally offline from now until the first week of January, so don't expect any blog posts until then.

Thanks to everyone who reads my blog/articles/whitepapers, attended any of our classes/workshops/conference sessions, or just generally helped me stay enthusiastic about the SQL Server community.

Happy holidays, and I hope that 2009 is successful and corruption-free for you all!

Cheers! Laughing

Categories:
Personal

This blog post describes the demo "2 - NC Indexes" from my Corruption Survival Techniques conference session from various conferences in 2008. The links to the scripts and databases to use are in this blog post.

The aim of this demo is to show that sometimes its just redundant data (i.e. nonclustered indexes) that get corrupted, and so you don't have to do anything that takes the actual data offline - like restoring from a full backup or running one of the repair options (both of which require the database to be in single-user mode).

Let's look at an example. Extract and restore the DemoNCIndex database, and the NCIndexCorruption.sql script. What do we get from running DBCC CHECKDB on the DemoNCIndex database (lines 39-42)?

DBCC CHECKDB (DemoNCIndex) WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO

Msg 8951, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Data row does not have a matching index row in the index 'CustomerName' (ID 2). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:45:28) identified by (CustomerID = 29) with index values 'LastName = 'Adams' and CustomerID = 29'.
Msg 8951, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Data row does not have a matching index row in the index 'CustomerName' (ID 2). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:180:164) identified by (CustomerID = 2118) with index values 'LastName = 'Adams' and CustomerID = 2118'.

<snip - removed for brevity>

Msg 8952, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Index row in index 'CustomerName' (ID 2) does not match any data row. Possible extra or invalid keys for:
Msg 8956, Level 16, State 1, Line 1
Index row (1:24482:16) with values (LastName = 'Andersen' and CustomerID = 18718) pointing to the data row identified by (CustomerID = 18718).
Msg 8952, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Index row in index 'CustomerName' (ID 2) does not match any data row. Possible extra or invalid keys for:
Msg 8956, Level 16, State 1, Line 1
Index row (1:24482:127) with values (LastName = 'Arthur' and CustomerID = 9758) pointing to the data row identified by (CustomerID = 9758).
CHECKDB found 0 allocation errors and 26 consistency errors in table 'Customers' (object ID 453576654).
CHECKDB found 0 allocation errors and 26 consistency errors in database 'DemoNCIndex'
repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (DemoNCIndex).
 

Lots of errors. Now, in this example there are only 26 errors, but in cases where there are hundreds of errors it can be really hard to tell whether all the corruptions are in nonclustered indexes (i.e. indexes with IDs > 1). Luckily, there's an undocumented option to all the DBCC CHECK* commands - WITH TABLERESULTS. The option is undocumented because the output can change from release to release, but basically this converts the DBCC output into tabular form. Try running lines 48-50 in the script and you'll see what I mean. One of the columns in the output is IndexId - so you can easily see whether all the errors are in nonclustered indexes. In this case, they are, and all in one index of the Customers table.

You could run lines 55-57 of the script to prove to yourself that repairs can't be run online, and then realize that we can address the problem without having to run repair or restore. First off we need to figure out the name of the index to fix - index ID 2 of the Customers table. Lines 77-80 run sp_HelpIndex on the table (although I should really be using Kimberly's sp_HelpIndex2):

USE DemoNCIndex
GO
EXEC sp_HelpIndex 'Customers';
GO

index_name     index_description                                   index_keys
-------------- --------------------------------------------------- ------------
CustomerName   nonclustered located on PRIMARY                     LastName
CustomerPK     clustered, unique, primary key located on PRIMARY   CustomerID

The nonclustered index is called CustomerName. Plug in the index name to line 82, then try fixing the index by doing an online index rebuild, and run DBCC CHECKDB afterwards (lines 82-89). The corruption hasn't been fixed! Online index rebuild reads the old index to build the new one so the new index has the same missing rows as the old one. We need to do an offline index rebuild - with lines 110-115. After the last DBCC CHECKDB, the index is fixed up. Now, on SQL Server 2008, you may or may not get a query plan for the index rebuild that doesn't use the old index, because the query optimizer has some more plan choices available to it - so on SQL Server 2008 you may need to do an actual drop and create of the broken index (carefully, if its enforcing a constraint).

So - just because DBCC CHECKDB reports a ton of errors, that doesn't necessarily mean that the database needs to be taken (essentially) offline to repair it - check through the errors to see if it's just nonclustered indexes that are affected.

After putting up with years of me occasionally breaking into song with 'Manamana!', Kimberly finally caved-in this afternoon and asked to see the classic Muppets Manamana song video. And so for all of you who haven't been exposed to those cultural icons of 70s and 80s, the Muppets, checkout the video at http://www.youtube.com/watch?v=KC9FtLQJoGM.

Manamana!

Categories:
Personal

The January 2009 issue of TechNet Magazine is now available on the web and has a new article I wrote on Advanced Troubleshooting with Extended Events in SQL Server 2008. The article covers:

  • An overview of troubleshooting in SQL Server, along with links to a bunch of tools like DMVStats and RML
  • An overview of extended events, with descriptions of all the parts of the feature and how to investigate them with the new DMVs
  • Performance considerations of how you setup an extended events session, especially on multi-core servers
  • An example where I build an I/O chargeback mechanism to tie into a system controlled by the SQL Server 2008 resource governor

You can get to the article here, which also has a link to a screencast of me doing a demo. And you can bet that this is an area I'll be blogging about going forward too.

Enjoy!

PS This issue also has the editorial where I get made a Contributing Editor Smile

For those of you who couldn't make it to a conference this year where I presented my Corruption Survival Techniques session, the folks at TechEd EMEA have just posted an 80 minute long video of the presentation I did in Barcelona in early November. It walks through I/O errors, what CHECKDB does, how it works, how to run it, CHECKDB FAQ, how to interpret the output, choosing between repair and restore and has a bunch of demos of recovering from corruptions. Lots of fun stuff!

The video is available at http://www.microsoft.com/emea/teched2008/itpro/tv/default.aspx?vid=78. The accompanying scripts and corrupt databases are all posted on our website - see this blog post for details.

Enjoy!

I was perusing the latest release of the SQL Server 2008 Books Online on MSDN (look, Kimberly's in Dublin this week - what else am I supposed to do to amuse myself in the evenings? Smile) and found a cool new section on change data capture in the SSIS section. It's called Improving Incremental Loads with Change Data Capture and shows how to create an SSIS package that will pull incremental change data for a single table, and for multiple tables. If you read my article on CDC in the October TechNet Magazine (see my blog post here) and have been playing around, then this BOL topic could save you a bunch of time.

Check it out at http://msdn.microsoft.com/en-us/library/bb895315.aspx.

This blog post describes the demo "1 - Fatal Errors" from my Corruption Survival Techniques conference session from various conferences in 2008. The links to the scripts and databases to use are in this blog post.

The aim of this demo is to show that sometimes a database is so corrupt that DBCC CHECKDB just cannot run on it. In that case, there's no way to force DBCC CHECKDB to get past the fatal corruption and so there's no way to run a repair either - you're looking at restoring from a backup or at worst, extracting as much data as possible into a new database.

Let's look at a couple of examples. Extract and restore the DemoFatalCorruption1 and DemoFatalCorruption2 databases, and the FatalErrors.sql script. What do we get from running DBCC CHECKDB on DemoFatalCorruption1 (lines 47-49 in the script)?

DBCC CHECKDB (DemoFatalCorruption1)
WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO

Msg 8928, Level 16, State 6, Line 1
Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown): Page (1:71) could not be processed. See other errors for details.
CHECKDB found 1 allocation errors and 0 consistency errors not associated with any single object.
Msg 8906, Level 16, State 1, Line 1
Page (1:19) in database ID 8 is allocated in the SGAM (1:3) and PFS (1:1), but was not allocated in any IAM. PFS flags 'MIXED_EXT ALLOCATED 0_PCT_FULL'.
Msg 2575, Level 16, State 1, Line 1
The Index Allocation Map (IAM) page (1:71) is pointed to by the next pointer of IAM page (0:0) in object ID 15, index ID 1, partition ID 983040, alloc unit ID 983040 (type In-row data), but it was not detected in the scan.
Msg 7965, Level 16, State 2, Line 1
Table error: Could not check object ID 15, index ID 1, partition ID 983040, alloc unit ID 983040 (type In-row data) due to invalid allocation (IAM) page(s).
Msg 8906, Level 16, State 1, Line 1
Page (1:71) in database ID 8 is allocated in the SGAM (1:3) and PFS (1:1), but was not allocated in any IAM. PFS flags 'IAM_PG MIXED_EXT ALLOCATED 0_PCT_FULL'.
Msg 8939, Level 16, State 5, Line 1
Table error: Object ID 15, index ID 1, partition ID 983040, alloc unit ID 983040 (type In-row data), page (1:71). Test (m_headerVersion == HEADER_7_0) failed. Values are 0 and 1.
Msg 8939, Level 16, State 6, Line 1
Table error: Object ID 15, index ID 1, partition ID 983040, alloc unit ID 983040 (type In-row data), page (1:71). Test ((m_type >= DATA_PAGE && m_type <= UNDOFILE_HEADER_PAGE) || (m_type == UNKNOWN_PAGE && level == BASIC_HEADER)) failed. Values are 0 and 0.
Msg 8939, Level 16, State 5, Line 1
Table error: Object ID 15, index ID 1, partition ID 983040, alloc unit ID 983040 (type In-row data), page (1:71). Test (m_headerVersion == HEADER_7_0) failed. Values are 0 and 1.
Msg 8939, Level 16, State 6, Line 1
Table error: Object ID 15, index ID 1, partition ID 983040, alloc unit ID 983040 (type In-row data), page (1:71). Test ((m_type >= DATA_PAGE && m_type <= UNDOFILE_HEADER_PAGE) || (m_type == UNKNOWN_PAGE && level == BASIC_HEADER)) failed. Values are 0 and 0.
CHECKDB found 5 allocation errors and 3 consistency errors in table 'sys.syshobts' (object ID 15).
Msg 7995, Level 16, State 1, Line 1
Database 'DemoFatalCorruption1': consistency errors in system catalogs prevent further DBCC checkdb processing.
CHECKDB found 0 allocation errors and 1 consistency errors in table 'ALLOCATION' (object ID 99).
CHECKDB found 6 allocation errors and 4 consistency errors in database 'DemoFatalCorruption1'.

A bunch of errors that look like regular DBCC CHECKDB output - but if you look carefully near the end of the output you'll see error 7995 stating that the system catalogs are so corrupt that DBCC CHECKDB can't continue. Notice also that there's nothing at the end of the output stating what the minimum repair level is to fix the errors - because repair cannot be run on this database.

The second example is even worse (running lines 53-55 in the script):

DBCC CHECKDB (DemoFatalCorruption2)
WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO

Msg 211, Level 23, State 51, Line 1
Possible schema corruption. Run DBCC CHECKCATALOG.

In this case, the corruption is so bad that DBCC CHECKDB didn't even get a chance to terminate gracefully - the metadata subsystem in the Query Processor just blew away the whole command. Running DBCC CHECKCATALOG as the error message states doesn't do any better - it just prints the same error! (I didn't write that error message Wink)

So - just because DBCC CHECKDB completes, doesn't always mean it completes successfully. Make sure you always check the output.

Here's a question I got from someone who attended our database maintenance workshop at PASS last week (paraphrased):

I attended your pre-conference session on database maintenance and found it to be very informative.  From what you told use though, I think I need to change my nightly backup procedure.  I like to get my databases back to as small of a size as possible before backing them up, so I run the following commands to do this before taking the full database backup.  Could you help me with a better way of doing this? We're on SQL Server 2005.

BACKUP LOG <mydbname> WITH NO_LOG

DBCC SHRINKDATABASE (<mydbname>)

And here's the answer I sent back:

How large is the database? And how long must you keep the backups around? If the cumulative size of the backups takes up a large proportion of your available storage space (and we're talking more than just a single direct-attached 100+GB drive), then it may be worth compressing the backups - otherwise you're likely causing yourself more trouble than its worth.
 
By doing BACKUP LOG WITH NO_LOG you're effectively throwing away log records and removing the possibility of doing any kind of point-in-time, or up-to-the-second recovery (see BACKUP LOG WITH NO_LOG - use, abuse, and undocumented trace flags to stop it). If you're running in the FULL recovery model, and you don't care about either of these features, then you should switch to the SIMPLE recovery model. If you really want to be in FULL, don't ever use WITH NO_LOG.

The amount of transaction log that a full backup requires cannot be changed by you truncating the log. The full backup will backup any log it requires to enable the restored database to be a transctionally consistent copy of the database. See Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes.

Doing a DBCC SHRINKDATABASE (the same exact operation as a database auto-shrink) will cause massive index fragmentation, and cause file-system fragmentation of the data files, as they will likely need to grow again after you've squeezed all the space out of them. See Auto-shrink - turn it OFF! for more details on the effects.

If you're really concerned about backup sizes and space is at a premium, I recommend using a 3rd-party backup compression tool such as LiteSpeed or HyperBac so you're not affecting the actual database. Remember also that SQL Server 2008 has native backup compression too - see my blog post here for more details.

Hope this helps

Wow - that was tough but *very* fulfilling. As you may know, Kimberly and I are each writing chapters for Kalen's next book - SQL Server 2008 Internals. Well, I *just* finished the DBCC CHECKDB chapter - it's 26000 words and 69 pages, describing all the algorithms in-depth and all the corruption errors that can be reported in SQL Server 2008. It was really fun to write but I'm glad all that stuff's down on paper now - I can make room in my head for a bunch of other stuff Smile

I can't wait to see it in print next Spring!

(Ok - with 5 blog posts today, I think I broke my record. Time to retire for the night before I'm tempted to break it even more...)

Categories:
Books

Now, I'm very thick-skinned and I know there are always some people in a conference session who don't agree with everything I say (that's human nature, and I'm totally cool with that) but this one I just couldn't pass up mentioning here on the blog as I *utterly* disagree with the advice in that post, and suspect that the poster didn't "get" what I was trying to explain in the session.

I came across an interesting blog post from someone who attended PASS, describing my Corruption Survival Techniques session as really interesting and fun, but basically useless. The advice was that there are only a handful of people in the world who can run things like single-page restore and emergency mode repair, and as soon as corruption is suspected, the DBA should just call Product Support for help.

The point of my session is to explain two things - that you should pro-actively be looking for corruption, and you should know what to do when corruption occurs. Both of these enable your business to experience less down-time and data-loss when corruption does occur. So turning on page checksums and running DBCC CHECKDB regularly are easy. So is planning a decent backup strategy (based on what you want to be able to restore - see my previous post on this - Planning a backup strategy - where to start?).

The more tricky part is knowing what to do when corruption does occur. That's why I discuss some of the output of DBCC CHECKDB, in terms of high-level tips and tricks rather than what each and every error means (see my previous post on this - Tips and tricks for interpreting CHECKDB output). I also recommend backups as the best way to limit data-loss, but not necessarily down-time - depending on the backups you have available. The last part of the session shows some tricks for getting around worst-case scenarios, like someone detaching a suspect database or needing to run emergency mode repair. I don't expect everyone to run off and start hacking the 2005 system tables with a single-user booted server and using the DAC (but if you do, see this post Wink) but having some of this knowledge can make DBAs more confident to tackle problems themselves and increase their skills.

Since I've been blogging about this stuff and presenting it at conferences, I've heard from *countless* people who've used these techniques themselves to recover from disasters, and learned a ton of information and good practices in the process. Any production DBA with half a brain (a great Scottish expression Smile) should be able to use restore, single-page restore, or run a repair - otherwise, with all due respect, they shouldn't be running a production system. Now, for "involuntary" DBAs, who (through no fault of their own) may not know anything about backups, restores, or repairs - it's a totally different story, and help should be sought through Product Support or forums.

But to come out with a blanket statement that knowing how to run restores, repairs and do first-level interpretation of DBCC CHECKDB output is useless? And that potentially wasting time and money with front-line Product Support is the best course of action when corruption occurs, when you can work out most of it for yourself? That's *bad advice* as far as I'm concerned.

Maybe I'm just cranky as I'm sitting here with a very sore mouth after getting a filling at the dentist this morning Cry

What do you think? Comments please!

(PS I'm not fishing for praise - I want to know what you think of the argument)

Theme design by Nukeation based on Jelle Druyts