Way back in the mists of time, at the end of the last century, I wrote DBCC SHOWCONTIG for SQL Server 2000, to complement my new invention DBCC INDEXDEFRAG.

I also used to wear shorts all the time, with luminous orange, yellow, or green socks.

Many things change - I now have (some) dress sense, for one. One other thing that changed was that DMVs came onto the scene with SQL Server 2005. DBCC SHOWCONTIG was replaced by sys.dm_db_index_physical_stats. Under the covers though, they both use the same code - and the I/O characteristics haven't changed.

This is a blog post I've been meaning to do for a while now, and I finally had the impetus to do it when I heard about today's T-SQL Tuesday on I/O in general being run by Mike Walsh (Twitter|blog). It's a neat idea so I decided to join in this time. In retrospect, reading this over before hitting 'publish', I got a bit carried away (spending two hours on this) - but it's one of my babies, so I'm entitled to! :-)

This isn't a post about how to use DMVs in general, how to use this DMV in particular, or anything about index fragmentation. This is a blog post about how the DMV works.

DMV is a catch-all phrase that most people (myself included) use to describe all the various utility views in SQL Server 2005 and 2008. DMV = Dynamic Management View. There's a catch with the catch-all though - some of the DMVs aren't views at all, they're functions. A pure DMV gets info from SQL Server's memory (or system tables) and displays it in some form. A DMF, on the other hand, has to go and so some work before it can give you some results. The sys.dm_db_index_physical_stats DMV (which I'm going to call 'the DMV' from now on) is by far the most expensive of these - but only in terms of I/O.

The idea of the DMV is to display physical attributes of indexes (and the special case of a heap) - to do this it has to scan the pages comprising the index, calculating statistics as it goes. Many DMVs support what's called predicate pushdown, which means if you specify a WHERE clause, the DMV takes that into account as it prepares the information. This DMV doesn't. If you ask it for only the indexes in the database that have logical fragmentation > 30%, it will scan all the indexes, and then just tell you about those meeting your criteria. It has to do this because it has no way of knowing which ones meet your criteria until it analyzes them - so can't support predicate pushdown.

This is where understanding what it's doing under the covers comes in - the meat of this post.

LIMITED

The default operating mode of the DMV is called LIMITED. Kimberly always makes fun of the equivalent option for DBCC SHOWCONTIG, which I named as a young and foolish developer - calling it WITH FAST. Hey - it's descriptive!

The LIMITED mode can only return the logical fragmentation of the leaf level plus the page count. It doesn't actually read the leaf level. It makes use of the fact that the next level up in the index contains a key-ordered list of page IDs of the pages at the leaf level - so it's trivial to examine the key-ordered list and see if the page IDs are also in allocation order or not, thus calculating logical fragmentation.

The idea behind this option is to allow you to find the fragmentation of an index by reading the minimum number of pages, i.e. in the smallest amount of time. This option can be magnitudes faster than using the DETAILED mode scan, and it depends on how big the index's fanout is. Without getting too much into the guts of indexes, the fanout is based on the index key size, and determines the number of child-page pointers an index page can hold (e.g. the number of leaf-level pages that a page in the next level up has information about).

Consider an index with a char(800) key. Each entry in a page in the level above the leaf has to include a key value (the lowest key that can possibly appear on the page being referred to), plus a page ID, plus record overhead, plus slot array entry - so 812 bytes. So a page can only hold 8096/812 = 9 such entries. The fanout is at most 9.

Consider an index with a bigint key. Each entry is 13 bytes, so a page can hold 8096/13 = 622 entries. The fanout is at most 622, but will likely be smaller, depending on operations on the index causing fragmentation at the non-leaf levels.

For a table with 1 million pages at the leaf level, the first index will have 1 million/9 = 111112 pages at least at the level above the leaf. The second index will have at least 1608 pages. The savings in I/O from using the LIMITED mode scan will clearly differ based on the fanout.

I've created a 100GB clustered index (on the same hardware as I'm using for the benchmarking series) with 13421760 leaf-level pages and a maximum fanout of 540. In reality, I populated the index using 16 concurrent threads, so there's some fragmentation. The level above the leaf has 63012 pages, an effective fanout of 213. Still, the LIMITED mode scan will read 213x less than a DETAILED scan, but will it be 213x faster?

Here's a perfmon capture of the LIMITED mode scan on my index:

 

There's nothing special going on under the covers in a LIMITED mode scan - the chain of pages at the level above the leaf is read in page-linkage order, with no readahead. The perfmon capture shows:

  • Avg. Disk Read Queue Length (light blue) is a steady 1.
  • Avg. disk sec/Read (pink) is a steady 4ms.
  • Disk Read Bytes/sec (green) is roughly 14.5million.
  • Page reads/sec (dark blue) is roughly 1800.

DETAILED 

The DETAILED mode does two things:

  • Calculate fragmentation by doing a LIMITED mode scan
  • Calculate all other statistics by reading all pages at every level of the index

And so it's obviously the slowest. It has to do the LIMITED mode scan first to be able to calculate the logical fragmentation, because it reads the leaf level pages in the fastest possible way - in allocation order. DBCC has a customized read-ahead mechanism for allocation order scans that it uses for this DMV and for DBCC CHECK* commands. It's *incredibly* aggressive and will hit the disks as hard as it possibly can, especially with DBCC CHECK* running in parallel.

Here's a perfmon capture of the DETAILED mode scan on my index:

 

Not quite as pretty as the LIMITED mode scan, but I like it :-) Here's what it's showing:

  • Avg. Disk Read Queue Length (black) is in the multiple hundreds. Clearly it's appetite for data is outstripping what my RAID array can do. It basically tries to saturate the I/O subsystem to get as much data as possible flowing into SQL Server.
  • Avg. disk sec/Read (pink line at the bottom) is actually measuring in whole seconds, rather than ms. Given the disk queue length, I'd expect that.
  • DBCC Logical Scan Bytes/sec (red) varies substantially as the readahead mechanism throttles up and down, but it's driving anywhere up to 80MB/sec. You can see around 9:49:20 AM when it drops to zero for a few seconds.
  • Readahead pages/sec (green) is tracking the DBCC scan. This is a buffer pool counter, the DBCC one is an Access Methods counter (the dev team I used to run during 2005 development). If I had Disk Read Bytes/sec and Pages reads/sec showing, they'd track the other two perfectly - I turned them off for clarity.

So the DETAILED mode not only reads more data, but it does it a heck of a lot more aggressively so has a much more detrimental effect on the overall I/O capabilities of the system while it's running.

SAMPLED

There is a third mode that was introduced just for the DMV. The idea is that if you have a very large table and you want an idea of some of the leaf level statistics, but you don't want to take the perf hit of running a DETAILED scan, you can use this mode. It does:

  • LIMITED mode scan
  • If the number of leaf level pages is < 10000, read all the pages, otherwise read every 100th pages (i.e. a 1% sample)

Summary

There's no progress reporting from the DMV (or DBCC SHOWCONTIG) but if you look at the reads column in sys.dm_exec_sessions you can see how far through the operation it is. This method works best for DETAILED scans, where can compare that number against the in_row_data_page_count for the index in sys.dm_db_partition_stats (yes, you'll need to mess around a bit if the index is actually partitioned).

In terms of timing, I ran all three scan modes to completion. The results:

  • LIMITED mode: 282 seconds
  • SAMPLED mode: 414 seconds
  • DETAILED mode: 3700 seconds

Although the LIMITED mode scan read roughly 200x less than the DETAILED scan, it was only 13 times faster, because the readahead mechanism for the DETAILED scan is way more efficient than the (necessary) follow-the-page-linkages scan of the LIMITED mode.

Just for kicks, I ran a SELECT COUNT(*) on the index to see how the regular Access Methods readahead mechanism would fare - it completed in 3870 seconds - 5% slower, and it had less processing to do than the DMV. Clearly DBCC rules! :-)

Although the DETAILED mode gives the most comprehensive output, it has to do the most work. For very large indexes, this could mean that your buffer pool effectively gets flushed out by the lazy writer making space available for the DMV to read and process the pages. One of the reasons I advise people to only run the DMV on indexes they know they're interested in - and better yet, run it on a restored backup of the database.

Hope this is helpful!

PS Oh, also beware of using the SSMS fragmentation wizard. It uses a SAMPLED mode scan, but I found it impossible to cancel!

I made them up. Yup.

I'm talking about the guidance which is:

  • if an index has less than 1000 pages and is in memory, don't bother removing fragmentation
  • if the index has:
    • less than 10% logical fragmentation, don't do anything
    • between 10% and 30% logical fragmentation, reorganize it (using DBCC INDEXDEFRAG or ALTER INDEX ... REORGANIZE)
    • more than 30% logical fragmentation, rebuild it (using DBCC DBREINDEX or ALTER INDEX ... REBUILD)

These numbers are made up. They can and will vary for you, but they're a good starting point to work from.

There's been some discussion since PASS, when I confessed publicly on Twitter (during Grant Fritchey's session) to making them up, about whether I really said that etc etc. Yes - I really did make them up.

Back in 1999/2000 when I wrote DBCC INDEXDEFRAG and DBCC SHOWCONTIG for SQL Server 2000, customers wanted *some* guidance on what the thresholds should be where they should care about fragmentation or not, and how to remove it. We had to put *something* into Books Online (my favorite "it depends!" wouldn't have been too helpful), so I talked to some customers, inside and outside Microsoft, and chose these numbers as most appropriate at the time.

They're not set in stone - they're a big generalization, and there are a ton of other factors that may affect your choice of threshold and fragmentation removal method (e.g. recovery model, high-availability technologies in use, log backup schedule, query workload, disk space, buffer pool memory, and so on). I wish Microsoft would update the old whitepaper on fragmentation - they keep promising me they'll get around to it.

In the meantime, take those numbers with a pinch of salt and don't treat them as absolute.

The August edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Proactive page checksum failure detection
  • Why GUIDs make for bad clustered index keys 
  • Possible problems using a log shipping secondary for reporting 
  • Recovery model choice for log size management

Check it out at http://technet.microsoft.com/en-us/magazine/dd776512.aspx.

I'm teaching a class this week on database maintenance, for DBAs inside Microsoft. One of the things we're discussing today is index fragmentation and how poor cluster key choice can lead to page splits, poor performance, index fragmentation, and so on - not just in the clustered index, but also in nonclustered indexes.

One of the students looked in a database underpinning an application and found a unique cluster key, which is the worst I've ever seen (although not the worst that Kimberly's ever seen apparently - the mind boggles!).

The cluster key is defined as a combination of the following column types:

  • 16-byte GUID
  • varbinary (16)
  • nvarchar (512)
  • nvarchar (256)
  • tinyint

Now, the wide cluster key isn't a big deal UNLESS there are nonclustered indexes, but there are in this case - so the cluster key is included in all nonclustered index rows. And the random GUID high-order key is always a bad idea, as it means the clustered index will be heavily fragmented as records are inserted. This is all simplified and generalizations (and I open this can of worms happily) - but you get the idea.

Good design up-front, with an understanding of how key choice affects the behavior of SQL Server and how indexes are stored and indexed, can lead to vastly reduced performance problems and maintenance issues.

Quickie this morning to start the day off. I saw a question on a forum: if I *have* to use a GUID and *must* have a primary key, should I make the primary key clustered or nonclustered?

Now, I'm not getting into the whole GUID vs. bigint identifier, or random GUID vs. GUID generated by NEWSEQUENTIALID(), so please don't comment on those issues, they're not relevant here. I just want to address the question - what kind of index should it be?

From a Storage Engine perspective, my answer is nonclustered. Here are three reasons why:

  • If the index is clustered, then the cluster key is immediately at least 16 bytes (the size of a GUID). This doesn't change the size of the clustered index records (as the GUID column has to be stored in the table anyway, and a clustered index IS the table), but it does change the size of the nonclustered indexes. All nonclustered indexes on the table must include the cluster keys, even of they are not explicitly part of the nonclustered index keys (I'll do a post on this later). This means the GUID is present in every nonclustered index record too. From this perspective, it would be better to use a smaller clustered index key and have the GUID primary ley be nonclustered so it's only present in that one nonclustered index.
  • Random GUIDs used as the high-order key cause index fragmentation. Their random nature means the insertion point into the index is also random. This causes page splits, which cause fragmentation and are *expensive*. (I touched on this a bit a few days ago in my post How expensive are page splits in terms of transaction log?.). With a random key value, it's hard to avoid page splits and fragmentation, although you can delay them somewhat using FILLFACTOR, but at the expense of using extra space. By making the GUID index nonclustered, you can delay page splits even further. The clustered index is the table, so the records are (usually always) larger than nonclustered index records. This means you can get fewer clustered index records on an 8KB page than nonclustered index records. With fewer records per page, you can do fewer random insertions on the page before a page split occurs. So using a nonclustered index for the GUID key means you can do fewer expensive page splits.
  • Given that whatever kind of index you create for the GUID key is going to experience index fragmentation, you're probably going to want to periodically remove the fragmentation as part of your database maintenance plan. It makes sense to try to limit the amount of resources used by the fragmentation removal operation (e.g. cpu, IO, disk space, transaction log space), and so the smaller the fragmented index, the better. A nonclustered index for the GUID key will be smaller than a clustered index, so if you choose a non-fragmentation-causing clustered index key, and confine the fragmentation to the nonclustered index, you can use fewer resources during database maintenance.

And there you have it. I'm sure some of you have seen pathological cases that disprove one of the above points, but my arguments are generalizations. Maybe this is a can of worms I've opened, in which case I look forward to the comments!

PS Brent did a great post about humor when blogging, the cartoon links he includes are great. Check it out here.

Page splits are always thought of as expensive, but just how bad are they? In this post I want to create an example to show how much more transaction log is created when a page in an index has to split. I'm going to use the sys.dm_tran_database_transactions DMV to show how much more transaction log is generated when a page has to split. You can find the list of columns and a small amount of explanation of each column in Books Online here - I was reminded of its existence by someone on Twitter (sorry, don't remember who it was and I couldn't find it in search).

In the example, I'm going to create a table with approximately 1000-byte long rows:

CREATE DATABASE PageSplitTest;
GO
USE pagesplittest;
GO

CREATE TABLE BigRows (c1 INT, c2 CHAR (1000));
CREATE CLUSTERED INDEX BigRows_CL ON BigRows (c1);
GO

INSERT INTO BigRows VALUES (1, 'a');
INSERT INTO BigRows VALUES (2, 'a');
INSERT INTO BigRows VALUES (3, 'a');
INSERT INTO BigRows VALUES (4, 'a');
INSERT INTO BigRows VALUES (6, 'a');
INSERT INTO BigRows VALUES (7, 'a');
GO

I've engineered the case where the clustered index data page has space for one more row, and I've left a 'gap' at c1=5. Let's add it as part of an explicit transaction and see how much transaction log is generated:

BEGIN TRAN
INSERT INTO BigRows VALUES (8, 'a');
GO

SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('PageSplitTest');
GO

database_transaction_log_bytes_used
-----------------------------------
1228

That's about what I'd expect for that row. Now what about when I cause a page split by inserting the 'missing' c1=5 row into the full page?

-- commit previous transaction
COMMIT TRAN
GO

BEGIN TRAN
INSERT INTO BigRows VALUES (5, 'a');
GO

SELECT [database_transaction_log_bytes_used] FROM sys.dm_tran_database_transactions
WHERE [database_id] = DB_ID ('PageSplitTest');
GO

database_transaction_log_bytes_used
-----------------------------------
6724

Wow. 5.5x more bytes are written to the transaction log as part of the system transaction that does the split.

The ratio gets worse as the row size gets smaller. For a row with an approximately 100-byte long row (use the same code as above, but change to a CHAR (100), insert 67 rows with a 'gap' somewhere then insert the 68th to cause the split), the two numbers are 328 and 5924 - the split cause 18 times more log to be generated! For a row with an approximately 10-byte long row, I got numbers of 240 and 10436, because I created skewed data (about 256 rows with the key value 8) and then inserted key value 5 which forced a (rare) non-middle page split. That's a ratio of more than 43 times more log generated! You can try this yourself if you want: I changed the code to have a CHAR (10), inserted values 1, 2, 3, 4, 6, 7, then inserted 256 key values of 8 and then 2 of 5. The resulting page had only 6 rows - it split after the key value 5 - the Storage Engine doesn't always do a 50/50 page split. And that's not even causing nasty cascading page-splits, or splits that have to split a page multiple times to fit a new (variable-sized) row in.

Bottom line: page splits don't just cause extra IOs and index fragmentation, they generate a *lot* more transaction log. And all that log has to be (potentially) backed up, log shipped, mirrored....

Jack Li, one of the Senior Escalation Engineers in Product Support, just posted details of an interesting case over on the CSS blog - his article is here. It talks about index builds and rebuilds, but the issue is the same for both, so I'll just talk about rebuilds.

The jist of the problem is that index rebuilds can parallelize, but sometimes they don't parallelize vey well. Each thread gets a certain range of the index to rebuild, using the existing index statistics to divide the ranges equally between the threads. If there's massive data skew, then one thread can end up doing the majority of the work, leading to a long run-time. The case in Jack's post involved a 250 million row index where 150 million rows had the same (NULL) key value. This range has to be processed by a single thread - a single value can't be divided between two+ threads.

Now, this is understandable behavior by the database engine, but it relies on the statistics being up-to-date. That's a bit of a catch-22 - rebuilding an index updates the statistics, but if the statistics aren't up-to-date then the index rebuild might parallelize badly! I guess the solution is that if you know that you have massive data skew in your large indexes, update statistics BEFORE doing an index rebuild. And given what I've been hearing this week at SQL Connections about how badly statistics keep biting people, I'm leaning towards a different recommendation for those people who have lots of perf trouble caused by statistics and the potential for skewed data - rebuild all your statistics regularly, and only rebuild/reorganize fragmented indexes. Statistics just cause so many problems it seems.

Thanks

PS Kimberly has a lot more info about statistics over on her blog - I'm just starting to venture into that mine-field

Last week’s survey was on what kind of regular index maintenance you perform (see here for the survey) as a way of kicking off a new series I’m writing around index maintenance. Here are the results as of 3/21/09 – I find them very encouraging:

As you can see, about 2/5 of respondents are performing some form of analysis-based fragmentation removal (answers 5+6), which I consider the best way to perform index maintenance, if you’re willing to invest the time involved to set it up. It allows the least amount of work to be performed, for the most targeted performance gains – and so is especially appropriate for 24x7 systems where there’s a minimal or non-existent maintenance window.

The next best option is to do all rebuilds or all defrags based on a fragmentation threshold (answers 3+4), which about 1/5 of respondents do. This also allows work to be limited, but by choosing only a single method of removing fragmentation, there are pros and cons. Now, the survey was limited to a number of questions so I couldn’t explore what the threshold is that people are using (e.g. logical fragmentation, page density, extent fragmentation, or something else). Some measures are good to use and some not so good, and I’ll be exploring the various counters and ways of determining fragmentation as the series progresses.

Either of the options to operate on all indexes regardless of fragmentation (options 1+2) can lead to lots of wasted resources (disk space, transaction log space, I/Os, CPU) by operating on indexes that are not fragmented in the first place, or for which fragmentation removal has no benefit for workload performance. About 1/3 of respondents do this. This isn’t surprising to me as rebuild-all-the-indexes-every-night/week is a very common index maintenance plan for “involuntary DBAs” who know that index maintenance is important, but don’t have the knowledge or training to implement a more sophisticated maintenance plan. This growing size of this group of people is one of the main reasons I’m going to write this series.

Doing absolutely nothing for index maintenance, which about 1/10 do, is usually not a good idea, as indexes in a database that’s not read-only commonly become fragmented over time. However, these people may know they don’t suffer from fragmentation issues, or that removing fragmentation has no effect on workload performance. However, my suspicion (based on what I see in the field) is that some of these respondents don’t realize the benefits of performing index maintenance.

You may be surprised to hear that I don’t consider doing nothing to be the worst choice. That dubious honor goes to answer 7 – doing any kind of index maintenance followed by a database shrink operation – as 3 respondents do. A post-maintenance shrink operation may well undo some of the benefits of the maintenance by introducing massive amounts of index fragmentation – see my blog post Auto-shrink - turn it OFF! for details of how bad this can get.

I’m actually pretty encouraged by these results though. Compared to how things were around 1999 when I wrote DBCC INDEXDEFRAG and DBCC SHOWCONTIG for SQL Server 2000, these results show that knowledge in the field (or at least in the group that reads my blog and responded) has vastly improved. My aim for the forthcoming series about index fragmentation and maintenance is to increase knowledge a lot more broadly.

Next up - this week's survey. Thanks for reading!

I'm about to start a new series of post about index fragmentation and removing it. For this week's survey, I'd like to know what index maintenance you do to address fragmentation (in any of its forms) - I'll report on it in a week.

Thanks

The April edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Disappearing errors with DBCC CHECKDB
  • Provisioning tempdb when moving from 2000 to 2008
  • Does fillfactor prevent fragmentation and should it be set instance-wide
  • Avoiding FILESTREAM performance problems

Check it out at http://technet.microsoft.com/en-us/magazine/2009.04.sqlqa.aspx

Back in 2005 Kimberly produced two very popular webcast series - an 11-part webcast series for TechNet called SQL Server 2005 for the IT Professional and a 10-part webcast series for MSDN called A Primer for Proper SQL Server Development. The webcast links and blog posts were broken for quite a while but now they're all fixed up and working again. I've created some web pages that link to all the webcasts and blog posts, along with abstracts. I've also included some more recent ones too and will be adding to the list over the next few weeks.

There's over 30 hours of good stuff to watch - check them out at http://www.sqlskills.com/webcasts.asp

This blog post describes the demo "2 - NC Indexes" from my Corruption Survival Techniques conference session from various conferences in 2008. The links to the scripts and databases to use are in this blog post.

The aim of this demo is to show that sometimes its just redundant data (i.e. nonclustered indexes) that get corrupted, and so you don't have to do anything that takes the actual data offline - like restoring from a full backup or running one of the repair options (both of which require the database to be in single-user mode).

Let's look at an example. Extract and restore the DemoNCIndex database, and the NCIndexCorruption.sql script. What do we get from running DBCC CHECKDB on the DemoNCIndex database (lines 39-42)?

DBCC CHECKDB (DemoNCIndex) WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO

Msg 8951, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Data row does not have a matching index row in the index 'CustomerName' (ID 2). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:45:28) identified by (CustomerID = 29) with index values 'LastName = 'Adams' and CustomerID = 29'.
Msg 8951, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Data row does not have a matching index row in the index 'CustomerName' (ID 2). Possible missing or invalid keys for the index row matching:
Msg 8955, Level 16, State 1, Line 1
Data row (1:180:164) identified by (CustomerID = 2118) with index values 'LastName = 'Adams' and CustomerID = 2118'.

<snip - removed for brevity>

Msg 8952, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Index row in index 'CustomerName' (ID 2) does not match any data row. Possible extra or invalid keys for:
Msg 8956, Level 16, State 1, Line 1
Index row (1:24482:16) with values (LastName = 'Andersen' and CustomerID = 18718) pointing to the data row identified by (CustomerID = 18718).
Msg 8952, Level 16, State 1, Line 1
Table error: table 'Customers' (ID 453576654). Index row in index 'CustomerName' (ID 2) does not match any data row. Possible extra or invalid keys for:
Msg 8956, Level 16, State 1, Line 1
Index row (1:24482:127) with values (LastName = 'Arthur' and CustomerID = 9758) pointing to the data row identified by (CustomerID = 9758).
CHECKDB found 0 allocation errors and 26 consistency errors in table 'Customers' (object ID 453576654).
CHECKDB found 0 allocation errors and 26 consistency errors in database 'DemoNCIndex'
repair_rebuild is the minimum repair level for the errors found by DBCC CHECKDB (DemoNCIndex).
 

Lots of errors. Now, in this example there are only 26 errors, but in cases where there are hundreds of errors it can be really hard to tell whether all the corruptions are in nonclustered indexes (i.e. indexes with IDs > 1). Luckily, there's an undocumented option to all the DBCC CHECK* commands - WITH TABLERESULTS. The option is undocumented because the output can change from release to release, but basically this converts the DBCC output into tabular form. Try running lines 48-50 in the script and you'll see what I mean. One of the columns in the output is IndexId - so you can easily see whether all the errors are in nonclustered indexes. In this case, they are, and all in one index of the Customers table.

You could run lines 55-57 of the script to prove to yourself that repairs can't be run online, and then realize that we can address the problem without having to run repair or restore. First off we need to figure out the name of the index to fix - index ID 2 of the Customers table. Lines 77-80 run sp_HelpIndex on the table (although I should really be using Kimberly's sp_HelpIndex2):

USE DemoNCIndex
GO
EXEC sp_HelpIndex 'Customers';
GO

index_name     index_description                                   index_keys
-------------- --------------------------------------------------- ------------
CustomerName   nonclustered located on PRIMARY                     LastName
CustomerPK     clustered, unique, primary key located on PRIMARY   CustomerID

The nonclustered index is called CustomerName. Plug in the index name to line 82, then try fixing the index by doing an online index rebuild, and run DBCC CHECKDB afterwards (lines 82-89). The corruption hasn't been fixed! Online index rebuild reads the old index to build the new one so the new index has the same missing rows as the old one. We need to do an offline index rebuild - with lines 110-115. After the last DBCC CHECKDB, the index is fixed up. Now, on SQL Server 2008, you may or may not get a query plan for the index rebuild that doesn't use the old index, because the query optimizer has some more plan choices available to it - so on SQL Server 2008 you may need to do an actual drop and create of the broken index (carefully, if its enforcing a constraint).

So - just because DBCC CHECKDB reports a ton of errors, that doesn't necessarily mean that the database needs to be taken (essentially) offline to repair it - check through the errors to see if it's just nonclustered indexes that are affected.

Here's another corruption bug that people are hitting on 2005 SP2 - something I didn't know about until today.

The situation is this: a table with a non-unique clustered index (i.e. so a hidden uniquifier column is created), and then rebuilt using DBCC DBREINDEX. Sometimes an incorrect query plan is chosen so the rows in the index get sorted incorrectly. A subsequent DBCC CHECKDB (or DBCC CHECKTABLE) will return errors like:

Msg 2511, Level 16, State 1, Line 1
Table error: Object ID 2073058421, index ID 1, partition ID 72057594044678144, alloc unit ID 72057594048348160 (type In-row data). Keys out of order on page (1:16042), slots 47 and 48.

The workaround is to use the new ALTER INDEX command in 2005 to do the rebuild (remember that DBCC DBREINDEX is deprecated). There's also a hotfix available in CU9 (and the upcoming SP3 I guess). Checkout KB 954734 here for details.

Hopefully you won't hit this, but if you do, knowing this may save you a lot of head-scratching.

PS Last post from our 22-day road-trip. Tonight's a date-night in Las Vegas (dinner at Le Cirque, then tickets to Cirque du Soleil's "O") before heading back to our (vacation!) home in Redmond and PASS next week.

It's been a long time since the last Conference Questions Pot-Pourri - in fact it was at the last SQL Connections in Orlando in April. Now we're in Las Vegas doing SQL Connections Fall - Kimberly's lecturing for an hour on partitioning so I can get out a quick post.

This is a question that came up yesterday - can CREATE INDEX ... WITH DROP_EXISTING be used to move indexes that enforce constraints? Let's check it out.

First up I'm going to create a couple of tables. Table t1 has a unique constraint backed by a nonclustered index. Table t1 has a primary key constraint backed by a clustered index.

CREATE DATABASE ConstraintTest;
GO
USE ConstraintTest;
GO

CREATE TABLE UniqueConstraint (c1 INT UNIQUE NONCLUSTERED);
GO
INSERT INTO UniqueConstraint VALUES (1);
GO

CREATE TABLE PrimaryKeyConstraint (c2 INT PRIMARY KEY CLUSTERED);
GO
INSERT INTO PrimaryKeyConstraint VALUES (1);
GO

(Oops - in the original post I c&p'd the wrong code and had the second table as a unique nonclustered constraint too - sorry for the mixup) 

Now I'll add another filegroup that we'll try to move the indexes into.

ALTER DATABASE ConstraintTest ADD FILEGROUP ExtraFilegroup;
GO

ALTER DATABASE ConstraintTest ADD FILE (
   NAME = ExtraFile1,
   FILENAME = 'C:\SQLskills\ExtraFile1.ndf',
   SIZE = 5MB)
TO FILEGROUP ExtraFilegroup;
GO

Now I'll try moving the nonclustered index enforcing the unique constraint.

SELECT [name], [index_id] FROM sys.indexes
WHERE [object_id] = OBJECT_ID ('UniqueConstraint')
GO

name                           index_id
------------------------------ --------
NULL                           0
UQ__UniqueConstraint__7C8480AE 2

The index we want is index ID=2, so we should be able to move it as follows:

CREATE UNIQUE NONCLUSTERED INDEX UQ__UniqueConstraint__7C8480AE
ON UniqueConstraint (c1) WITH (DROP_EXISTING = ON)
ON ExtraFilegroup;
GO

No problem! Now let's try the same thing for the clustered primary key constraint.

SELECT [name], [index_id] FROM sys.indexes
WHERE [object_id] = OBJECT_ID ('PrimaryKeyConstraint')
GO

name                           index_id
------------------------------ --------
PK__PrimaryKeyConstr__7E6CC920 1

We only have one choice, so we should be able to rebuild it using:

CREATE CLUSTERED INDEX PK__PrimaryKeyConstr__7E6CC920
ON PrimaryKeyConstraint (c2) WITH (DROP_EXISTING = ON)
ON ExtraFilegroup;
GO

Msg 1907, Level 16, State 1, Line 1
Cannot recreate index 'PK__PrimaryKeyConstr__7E6CC920'. The new index definition does not match the constraint being enforced by the existing index.

Hmm - what am I missing? I'll check sys.indexes again and not filter the columns this time:

SELECT * FROM sys.indexes
WHERE [object_id] = OBJECT_ID ('PrimaryKeyConstraint')
GO

object_id  name                           index_id type type_desc is_unique
---------- ------------------------------ -------- ---- --------- ---------
2105058535 PK__PrimaryKeyConstr__7E6CC920 1        1    CLUSTERED 1

data_space_id ignore_dup_key is_primary_key is_unique_constraint fill_factor
------------- -------------- -------------- -------------------- -----------
1             0              1              0                    0

is_padded is_disabled is_hypothetical allow_row_locks allow_page_locks
--------- ----------- --------------- --------------- ----------------
0
         0           0               1               1

Ah - I was missing the unique attribute on the index. What I need to do is the following:

CREATE UNIQUE CLUSTERED INDEX PK__PrimaryKeyConstr__7E6CC920
ON PrimaryKeyConstraint (c2) WITH (DROP_EXISTING = ON)
ON ExtraFilegroup;
GO

And that works fine. So - the answer is yes, you can use CREATE INDEX ... WITH DROP_EXISTING to move indexes that enforce constraints. This is really good, as it allows you to move these indexes without having to do DROP INDEX then CREATE INDEX. That method is bad, as it creates a window of opportunity for someone to enter data that violates the constraint while the index is dropped.

For the first time in ages, Kimberly sneaked in an interview without me! I was teaching a class on Database Maintenance for some Microsoft DBAs and Kimberly recorded a RunAs Radio interview on her favorite subject - indexes. I love listening to her talk about indexes - or maybe I just like the sound of her voice :-)

Check it out at http://www.runasradio.com/default.aspx?showNum=76.

Enjoy!

PS She also drops some hints about an upcoming project of ours...

At the last Connections conference in April, the conference organizers tapped a bunch of speakers for interviews, articles, and other content to put into a free "newsletter" called MyDevConnections, and now it's finally available. It covers all the Connections conferences, so isn't just limited to SQL Server. As far as SQL is concerned, Kimberly and I wrote an article about our favorite SQL Server 2008 features (data compression and filtered indexes, respectively), and Ross Mistry wrote an article on hardening a SQL Server 2005 installation. There's also an extract from a Women in Technology interview that Kimberly took part in. Overall there's 84 pages of content in the PDF, with some adverts for the Connections shows (obviously).

Check it out at http://www.devconnections.com/mydevconnections/S08_DevOnlineMag_Web.pdf

Every so often I'll see posts on the various data corruption forums discussing causes of corruption. In this post I want to debunk some of the myths around what can cause corruption. There are really two types of corruption to deal with, physical corruption and logical corruption.

Physical corruption

This is where something has altered the contents of a data or log file sector with no regard for what is being stored there. Possible causes of physical corruption are:

  • Problem with the I/O subsystem (99.8% of all cases I've ever seen - only 3 nines as I'd estimate I've seen around about a thousand corruption cases). Remember the I/O subsystem is everything underneath SQL Server in the I/O stack - including the OS, 3rd-party file system filter drivers, device drivers, RAID controllers, SAN controllers, network hardware, drives themselves, and so on. Millions of lines of code and lots of moving parts spinning very fast, very close to very fragile pieces of metal oxide (I once heard Jim Gray liken a disk drive head to a 747 jumbo jet flying at 500 mph at a height of 1/4 inch from the ground...)
  • Problem with the host machine hardware (0.1% of cases). Most of the time this is a memory error.
  • SQL Server bugs (0.1% of cases). Yes, there have been corruption bugs. Every piece of software has bugs. There are KB articles describing bugs.
  • Deliberate introduction of corruption using a hex editor or other means.

Physical corruption is what DBCC CHECKDB usually reports and the majority of cases are caused by a physical failures of some kind, with the minority caused by humans - software bugs.

Logical corruption

This is where something has altered some data so that a data relationship is broken. Possible causes of logical corruption are:

  • Humans

:-) Okay...

  • Application bug. The application deletes one part of an inherent data relationship but not the other. Or the application designer doesn't implement a constraint properly. Or the application designer doesn't cope with a transaction roll-back properly. You get the idea.
  • Accidental update/delete. Someone deletes or updates some data incorrectly.
  • SQL Server bug. See above.
  • DBCC CHECKDB when using the REPAIR_ALLOW_DATA_LOSS option. As is documented in Books Online, and I've blogged about and mentioned when lecturing, if you run repair, it doesn't take into account any inherent or explicit constraints on the data.

The point here is that a physical failure of a component does not cause logical corruption, it causes physical corruption. Conversely, application errors cause logical corruption, not physical corruption. DBCC CHECKDB errors are about physical corruption (okay, with the inclusion of DBCC CHECKCATALOG code in 2005, it will find cases where the DBA has manually altered the system tables, causing logical corruption) and applications cannot cause physical corruption as they can only manipulate data through SQL Server. If an application hits a SQL Server bug which causes physical corruption, that's still not the application causing physical corruption, it's SQL Server.

So - on to the myths.

  • Can an application cause physical corruption? No.
  • Can stopping a shrink operation cause corruption of any kind? No.
  • Can stopping an index rebuild cause corruption of any kind? No.
  • Can running DBCC CHECKDB without repair cause corruption of any kind? No.
  • Can creating a database snapshot cause corruption of any kind? No.

Hope this helps.

At the user group meeting on Monday I spent some time explaining how GUIDs can cause fragmentation in clustered indexes AND in non-clustered indexes, even if the GUID isn't specifically included in the non-clustered index key. GUIDs are essentially random values (pseudo-random in ranges if generated using NEWSEQUENTIALID) that are also unique. Their uniqueness is what makes them attractive to many developers as a key value, without understanding the havoc they can cause in production in terms of fragmentation and poor query performance.

A GUID key causes fragmentation because of its randomness. The insertion point of a new record in an index is dictated by the value of the index key, so if the key value is random, so is the insertion point. This means that if an index page is full, a random insert that happens to have to go onto that page will cause a page split to make room for the new record. A page-split is where a new page is allocated and (as near as possible to) half the rows from the splitting page are moved to the new page. The new row is then inserted into one of the two pages, determined by the key value. Usually the newly allocated page is not physically contiguous to the splitting page, and so fragmentation has been caused. In this case *two* kinds of fragmentation have been caused - logical fragmentation (where the next logical page as determined by the index order is not the next physical page in the data file) and physical (or internal) fragmentation (where space is being wasted on index pages). These can both affect query performance (topic for a later post), as well as the expense of having to do the page split in the first place.

It's fairly well known that GUIDs can cause fragmentation in the index where the GUID is the key (e.g. a clustered index), but not about the knock-on effects in non-clustered indexes. Here's an example - I'll create two clustered indexes with GUID keys (one generated from NEWID and one from NEWSEQUENTIALID), plus a non-clustered index on each. Let's see what happens when we insert 100000 rows:

-- Create a table with a GUID key
CREATE TABLE BadKeyTable (
   
c1 UNIQUEIDENTIFIER DEFAULT NEWID () ROWGUIDCOL,
   
c2 DATETIME DEFAULT GETDATE (),
   
c3 CHAR (400) DEFAULT 'a');
CREATE CLUSTERED INDEX BadKeyTable_CL ON BadKeyTable (c1);
CREATE NONCLUSTERED INDEX BadKeyTable_NCL ON BadKeyTable (c2);
GO

-- Create another one, but using NEWSEQUENTIALID instead
CREATE TABLE BadKeyTable2 (
   
c1 UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID () ROWGUIDCOL,
   
c2 DATETIME DEFAULT GETDATE (),
   
c3 CHAR (400) DEFAULT 'a');
CREATE CLUSTERED INDEX BadKeyTable2_CL ON BadKeyTable2 (c1);
CREATE NONCLUSTERED INDEX BadKeyTable2_NCL ON BadKeyTable2 (c2);
GO

DECLARE @a INT;
SELECT @a = 1;
WHILE (@a < 10000)
BEGIN
   
INSERT INTO BadKeyTable DEFAULT VALUES;
   
INSERT INTO BadKeyTable2 DEFAULT VALUES;
   
SELECT @a = @a + 1;
END;
GO

-- And now check for fragmentation
SELECT
   
OBJECT_NAME (ips.[object_id]) AS 'Object Name',
   
si.name AS 'Index Name',
   ROUND (ips.avg_fragmentation_in_percent, 2) AS 'Fragmentation',
   
ips.page_count AS 'Pages',
   
ROUND (ips.avg_page_space_used_in_percent, 2) AS 'Page Density'
FROM sys.dm_db_index_physical_stats (DB_ID ('DBMaint2008'), NULL, NULL, NULL, 'DETAILED') ips
CROSS APPLY sys.indexes si
WHERE
   
si.object_id = ips.object_id
   
AND si.index_id = ips.index_id
   AND ips.index_level = 0;
GO

Object Name    Index Name        Fragmentation  Pages  Page Density
-------------  ----------------  -------------  -----  ------------
BadKeyTable    BadKeyTable_CL    99.13          8092   66.08
BadKeyTable    BadKeyTable_NCL   30.97          78     64.1
BadKeyTable2   BadKeyTable2_CL   0.83           5556   96.26
BadKeyTable2   BadKeyTable2_NCL  1.88           372    99.61

The BadKeyTable_CL clustered index with the GUID key generated from NEWID is almost perfectly fragmented, with 34% space being wasted on each page. Conversely, the BadKeyTable2_CL clustered index with the GUID key generated from NEWSEQUENTIALID is hardly fragmented and only 4% of free space is wasted (and this is just because of the row size chosen). These numbers are entirely expected given the nature of the cluster keys.

Now look at the non-clustered indexes. BadKeyTable_NCL is 31% fragmented with 36% space wasted on each page! BadKeyTable2_NCL is harldy fragmented with no free space wasted on each page. So what's going on? The non-clustered index key in both cases is a datetime column, which has a minimum granularity of 3 milliseconds. The code above runs in a tight loop inserting records and so can insert more than one record per 3ms time interval - and I allow this because I didn't create make the non-clustered indexes unique. For all the records inserted in one 3ms time interval, there *has* to be something that makes the non-clustered index key unique internally (as even though an index can be defined as non-unique, the Storage Engine requires that each record really is unique in an index and will add whatever it needs to so that happens).

In this case, the clustered index key (which must be present in the non-clustered index anyway) is used to differentiate between all the non-clustered index records with the same datetime value. For BadKeyTable_NCL, the cluster key is a random GUID, so the non-clustered index record insertion points ALSO become random within each 3ms time interval - leading to the fragmentation above. The BadKeyTable2_NCL non-clustered index has the same time interval issue, but it's cluster key is a sequential GUID, so the non-clustered index doesn't get fragmented. If the non-clustered index key was a time datatype with a larger minimum granularity (like smalldatetime, or the new date), the fragmentation of BadKeyTable_NCL would be even worse - try it for yourself and you'll see.

So the answer to the question in the blog post title is really - it depends! Under the right conditions, a GUID cluster key can also seriously fragment a non-clustered index as well.

Fresh off a week of teaching classes on the Microsoft campus, we've finalized some user group dates. Here's what we have coming up:

  • Monday August 18th: user group meeting in Redmond
  • Monday September 1st to 3rd: public class in England
    • In conjunction with our UK partners, SQL Know How at Hatfield, England
    • Topic: Best Practices in Performance and Availability in SQL Server 2005/2008 
  • Thursday September 4th: user group meeting in Ireland
  • Friday September 5th: SQL Server launch event in Ireland
    • In conjunction with Microsoft Ireland at the Dublin Microsoft office
    • Topic: SQL Server 2008 Overview for DBAs
  • Monday September 8th to 9th: public class in Scotland
    • In conjunction with our UK partners, SQL Know How at Edinburgh, Scotland
    • Topic: Indexing for Performance in SQL Server 2000/2005/2008

It's going to be a busy few weeks - hope to see you at one of these events!

Well, almost... Kimberly and I have agreed to co-author the upcoming SQL Server 2008 Internals book with our good friend Kalen Delaney. This is the 2008 evolution of Kalen's Inside SQL Server series of books and will be published by MS Press around February 2009. The other authors are (also our good friends) Adam Machanic and Conor Cunningham.

Kimberly's going to write the chapter on Index Internals and I'm going to write the chapter on DBCC Internals. This is very exciting as these are our respective favorite subjects, as you probably already know - these won't be short chapters :-)

It's going to be a busy rest of the year - phew!

PS Next year we have plans to write a book ourselves - watch this space...

Over the last few weeks I've seen (and helped correct) quite a few myths and misconceptions about index rebuild operations. There's enough now to make it worthwhile doing a blog post (and it's too hot here in Orlando for us to go sit by the pool so we're both sitting here blogging)...

Myth 1:  index rebuild pre-allocates the necessary space

This myth has two variations:

  1. The space for the new copy of the index is pre-allocated
  2. The space for the sort portion of the rebuild is pre-allocated

Neither of these are true. Index rebuild (whether online or offline, and at least as far back as 7.0) will create a new copy of the index before dropping the old copy. The pages and extents required to do this will always be allocated as needed, as with any other operation in SQL Server. The sort phase of an index rebuild, if required (in certain cases it is skipped in 2005), will adhere to the same allocation behavior.

Myth 2: indexes are rebuilt within a single file in a multi-file filegroup

This is a new one that I just heard yesterday - (paraphrasing) "In a two-file filegroup, an index in file 1 will be rebuilt into file 2. The next time it is rebuilt, it will be built in file 1. And so on".

This is untrue. Any time any allocations are done in a multi-file filegroup, the allocations are spread amongst all the files using the allocation system's proportional fill algorithm. In a nutshell, this says that space will be allocated more frequently from larger files with more free space than from smaller files with less free space. There is no concept in SQL Server of limiting allocations to a particular file in a multi-file filegroup.

Myth 3: non-clustered indexes are always rebuilt when a clustered index is rebuilt

This is untrue. The rules are a little complex here but can be summed up as follows:

  • In 2005+, rebuilding a unique or non-unique clustered index (without changing its definition) will NOT rebuild the non-clustered indexes
  • In 2000:
    • Rebuilding a non-unique clustered index WILL rebuild the non-clustered indexes
    • Rebuilding a unique clustered index will NOT rebuild the non-clustered indexes

The first few service packs of 2000 had bugs that changed the behavior of rebuilding unique clustered indexes back and forth - this is the source of much of the confusion around this myth.

For a much more detailed discussion of this, see my blog post from last Fall - Indexes From Every Angle: What happens to non-clustered indexes when the table structure is changed?.

Myth 4: BULK_LOGGED recovery mode decreases the size of the transaction log and log backups for an index rebuild

This myth is partly true.

Switching to the BULK_LOGGED recovery mode while doing an index rebuild operation WILL reduce the amount of transaction log generated, which is very useful for limiting the size of the transaction log file (note I say 'file', not 'files' - you only need one log file).

Switching to the BULK_LOGGED recovery mode while doing an index rebuild will NOT reduce the size of the transaction log BACKUP. Although the operation will be minimally-logged, the next transaction log backup will read all the transaction log since the last backup plus all the extents that were changed by the minimally-logged index rebuild. This will result in a log backup that's almost exactly the same size as for a fully-logged index rebuild. The ONLY time a log backup will contain data extents is when a minimally-logged operation has taken place since the last log backup - see here on MSDN for more info.

If you're considering using the BULK_LOGGED recovery mode, beware that you lose the ability to do point-in-time recovery to ANY point covered by a transaction log backup that contains even a single minimally-logged operation. Make sure that there's nothing else happening in the database that you may need to effectively roll-back with P.I.T. recovery. The operations you should perform if you're going to do this are:

  • In FULL recovery mode, take log backup immediately before switching to BULK_LOGGED
  • Switch to BULK_LOGGED and do the index rebuild
  • Switch back to FULL and immediately take a log backup

This limits the time period in which you can't do P.I.T. recovery.

Myth 5: online index rebuild doesn't take any locks

This myth is untrue. The 'online' in 'online index operations' is a bit of a misnomer.  Online index operations need to take two very short-term table locks. An S (Shared) table lock at the start of the operation to force all write plans that could touch the index to recompile, and a SCH-M (Schema-Modification - think of it as an Exclusive) table lock at the end of operation to force all read and write plans that could touch the index to recompile.

The most recent time this came up on the forums was someone noticing insert queries timing out after an online index rebuild operation had just started. The problem is that the  table lock that online index rebuild needs has to be entered into the grant queue in the lock manager until it can be acquired - and it will stay there until existing transactions that are holding conflicting locks either commit or roll-back. Any transaction that requires a conflicting lock AFTER the index rebuild lock has been queued but not acquired (and then released) will wait behind it in the lock grant queue. If the query timeout is reached before the transaction can get it's lock, it will timeout.

This is still much better than the table lock being held for the entire duration of the index rebuild operation. For more info, checkout this whitepaper on Online Index Operations in SQL Server 2005.

Kimberly and I were presenting at our local (Redmond) .Net Developers Association on Monday and the following question came up while Kimberly was talking about missing and extra indexes (paraphrasing):

What's the best non-clustered index to use for the query with a predicate WHERE lastname = 'Randal' AND firstname = 'Paul' AND middleinitial = 'S'?

Kimberly said that the order of the keys (e.g. lastname, firstname, middleinitial; or middleinitial, lastname, firstname; etc) doesn't matter for this case. I thought about it for a second and then argued, saying that the most selective column should come first. We agreed to discuss with the group at the end, but I thought about it some more and realized (and admitted to the group) that she's right - I should know better than to question Kimberly's knowledge of indexing... :-)

She's right because for a pure equality query using AND for multiple predicates, the Storage Engine will seek straight to the first exactly matching record in the index (and then scan for more matches if it's a non-unique index). It doesn't matter what order the index keys are defined because the Storage Engine is looking for an exact match.

When I started arguing, I was thinking about a phone book, which is ordered by lastname, firstname, middleinitial. You may think that a phone book is ordered that way because lastname is the most selective. Wrong. It's because the lastname is what most people know - it just happens to be the most selective of the three choices. Most SQL geeks should be able to find Kimberly in a phone book by looking for Tripp, Kimberly. But what if it was ordered by middleinital? I'd have no problem finding Kimberly, but how many of you would remember that her middleinitial is L? Probably a few as we both use our middle initials in our public names. What about if it was ordered by middleNAME? Again, no problem for me but who how many other people know her middle name is Lynn?

Then I started thinking about other queries and how they would play into the index choice to answer to the question above. If I also wanted to support a query with the predicate WHERE lastname = 'Randal', then having the left-most index key be anything other than lastname won't work so well. If the key order was firstname, middleinitial, lastname then all the distinct lastname values would be spread through the index rather than being together. The index might still be used to satisfy the query if it's the lowest cost index to use. However, having lastname be the leading key probably wouldn't work very well for a query with a predicate of WHERE firstname = 'Paul' - that argues for having firstname be the left-most index key.

Which should I choose? I probably I can't have both in the same index, so maybe I'd have TWO non-clustered indexes, to support both queries. The answer depends on how often the various queries are used and the trade-off between how much of a performance gain the non-clustered index would provide against the performance drop of having to maintain it during DML operations.

I hear time and again about people adding a non-clustered index for every column in the table, thinking that this will help - and my thinking is that this is wrong because these indexes can only satisfy a query where the only predicate is the column being indexed. I ran this argument past Kimberly and she added that these indexes could also be used if the column is chosen as the most selective in a multi-predicate query, and no other index has a lower cost than that one (a slim chance usually). Even what I though of as a simple case has caveats!

So what's the point of this post? Well, I wanted to show how indexing for one very simple query is pretty straightforward, but as soon as the number of different queries grows, and the query predicates get more complicated, indexing becomes more complex. You really have to know your workload and your data to know which columns are used, in what combinations, and how often - and then it helps to know how indexes are costed and used so that you can make intelligent choices about which indexes to define.

This thought-exercise has really shown me that I didn't know how much I don't know about indexes - I know precisely how they work at the Storage Engine level but not too much about how they're used by the Query Processor. I have new-found respect for Kimberly's indexing expertise. Luckily she's teaching a class at Microsoft called Indexing For Performance next week - I think I'll attend :-)

It seems like all I've been talking about on the forums the last couple of days is the correct order of operations in a maintenance plan. The biggest confusion seems to be about the effect of rebuilding an index on statistics, with some incorrect advice being given out on what to do.

Rebuilding an index will update statistics with the equivalent of a full scan - doesn't matter whether you use DBCC DBREINDEX or ALTER INDEX ... REBUILD, the effect is the same. It can do this because it sees a complete picture of the index while its doing the rebuild.

Reorganizing an index (using the old DBCC INDEXDEFRAG I wrote, or the new ALTER INDEX ... REORGANIZE) will NOT update statistics at all, because it only sees a few pages of the index at a time.

The problem I've been seeing is people rebuilding indexes and then updating statistics. So why is this a problem? Well, it depends :-)

  • If your default for updating statistics is to do a sampled scan, then having rebuild update the statistics with a full scan (as a side effect), and then proceeding to manually update them again with the default sampled scan, means that after both operations you're left with sampled statistics. You've wasted resources doing the sampled scan AND you've lost the 'free' full-scan statistics that the index rebuild did for you.
  • If your default is to do a full scan, then you don't lose out on having the best statistics, but you do waste resources by unnecessarily updating statistics a second time.

So what's the solution?

The simple answer is not to update statistics on indexes that have just been rebuilt.

The more complicated answer is to:

  1. Have a list of indexes (F) that you know will cause workload performance problems if they get fragmented
  2. Have a list of indexes (S) that you know will cause workload performance problems if the statistics don't get regularly updated
  3. Evaluate fragmentation for list F, and choose to reorganize, rebuild, or do nothing
  4. For all indexes in list S that were not rebuilt in step 3, update statistics

Hope this helps.

One of the cool features of SQL Server 2008 for developers is spatial data support. There have been some great posts recently about using spatial (see Simon Sabin's blog here), which is all developer stuff - but what I'm interested in are the implications of spatial support for DBAs, and they are focused on spatial indexes

Note: all images in this post are taken from November CTP Books Online

There are two kinds of spatial data that 2008 supports - planar (i.e. points, lines, polygons on a single 2-D plane) and geodetic (i.e. points, lines, polygons on a geodetic ellipsoid - for example, the Earth). These are presented in SQL as geometry and geography data respectively. A common operation that's performed on spatial data is comparing two spatial values to see if they intersect at all. Now, this is a complicated calculation, which gets more computationally expensive as the complexity of the spatial values increase. Given a problem of 'which spatial values in this table does this spatial value X intersect with', it would be great to have some way of quickly pruning out spatial values in the table that cannot possibly intersect with X, and so avoid doing the expensive calculation for them. Enter spatial indexes.

Here's the basic idea behind a spatial index:

  • A plane is broken up into a grid of cells.
  • Each spatial value is evaluated to see which cells in the grid it intersects with
  • The list of cells is stored along with the primary key of the table row that the spatial value is part of
  • Comparing two spatial values for intersection is a matter of comparing the list of grid cells - if there are no matching cells, the spatial values do not intersect, and there's no need to do the expensive intersection calculation

In practice its a bit more complicated. For planar data (i.e. the geometry data type), you need to define a bounding box (i.e. 4 corner points that define a rectangle of space in which you're interested on the 2-D plane). That bounding box on the plane will be broken down into a grid of cells. The top-level grid can be up to 16x16, giving 256 cells. The next level of granularity breaks each of those top-level grid cells into a further grid, again up to 16x16. So now there could be (16x16) x (16x16) cells in the grid - or 65536 cells. This obviously allows a more exact description of a spatial value in the list of cells. And so on and so on. There are actually 4 levels of grid that the bounding box is broken up into - and each can be 16x16, for a possible total of 168 or 4 billion cells. The picture below illustrates this with a grid size of 4x4 at each level.

The bounding box and the size of the grid at each level are specified when the spatial index is created, as well as the maximum number of matching grid cells to store in the spatial index per spatial value - to a max of 8192. Once the bounding box has been decomposed into the various levels of grid, each value in the spatial column is evaluated against the grid. The value is first decomposed against the first level grid. If the number of cells it matches is less than the max per spatial value, the decomposition then moves to the second level grid. This decomposition continues until the maximum number of matching grid cells is reached. If the max is reached while processing a deeper level for a cell, (e.g. in the middle of processing the 2nd level grid of 4x4 for cell #13 in the 1st level), the deeper level matches are thrown away and only the coarser granularity matching cell is stored (e.g. continuing that example, the 2nd level grid matches are discarded and only cell #13 in the 1st level will be stored). The picture below helps to illustrate this.

So, each geometric spatial value is approximated in the spatial index by a list of matching cells in the defined bounding box. As there is a limit to the number of matching cells that can be stored in the approximation, it is an optimistic representation. This means that if two values are compared using the approximations, there will be no false negatives, as the approximations map a larger space than the actual spatial values. There can, however, be false positives. A false (or real) positive means the spatial values need to then be compared using the complex, computationally expensive intersection algorithm using the actual spatial values. So again, the spatial index serves as a way of pruning out the need to run the expensive algorithm.

The algorithm is very similar for geodetic data (i.e. the geography data type), however there's no bounding box. Instead, the entire geodetic ellipsoid is projected onto a 2-D plane and then the grid decomposition algorithm is applied to that plane in exactly the same way as for planar data. The picture below describes how the projection is done.

You may have already realized that the effectiveness of the spatial index in pruning is directly proportional to how exactly the approximations in the index actually describe the spatial values. In other words, the higher the number of grid cells at each level, and the higher the number of grid cell matches that are stored per spatial value in the index, the better the index is at pruning. More exact approximations require storing more matching grid cells at deeper granularities - i.e. taking MORE SPACE. Creating a more exact spatial index takes more space.

With all that in mind, the interesting thing for DBAs here is that there's a trade-off between CPU use to do the real intersection algorithm and spatial index size to use in pruning calls to the algorithm. It's too early to know what best practices there are - but I'll blog them as I here about them.

While we were in Barcelona we sat down with Richard Campbell and Greg Hughes from RunAs Radio to record a 1/2 hour interview on SQL Server 2008. We touch on a ton of different features (look at the number of Categories I've tagged this with!) and have a bunch of laughs along the way - check it out here.

PS There's been a ton of interest in the slide deck idea I had so we'll be going ahead with that. Look for an announcement sometime in the first few months of next year about how to get them. Thanks to everyone that replied!

I’m in the middle of a flight from Washington D.C. to Zurich on the way to Barcelona for TechEd IT Forum and I can’t sleep – Kimberly’s out like a light so what else is there to do except write another blog post? :-) OK - actually posting this from Barcelona on Tuesday before our first of 12 sessions here!

In the Database Maintenance workshop we did at SQL Connections last week I promised to blog a script I used to show how data file shrink operations cause massive fragmentation of indexes. The reason is that data file shrink starts at the end of the data file and moves a single page at a time to a free space below the shrink threshold. In the process of doing so, it perfectly reverses the physical order of the pages comprising the leaf level of an index – thus perfectly fragmenting it!!

Let’s try out my simple script that demonstrates this. First thing I’m going to do is create a new database and create a 10MB ‘filler’ table, which I’m going to then drop later to create a space that shrink can use.

USE MASTER;

GO

 

IF DATABASEPROPERTYEX ('shrinktest', 'Version') > 0

      DROP DATABASE shrinktest;

 

CREATE DATABASE shrinktest;

GO

USE shrinktest;

GO

 

SET NOCOUNT ON;

GO

 

-- Create and fill the filler table

CREATE TABLE filler (c1 INT IDENTITY, c2 VARCHAR(8000))

GO

DECLARE @a INT;

SELECT @a = 1;

WHILE (@a < 1280) -- insert 10MB

BEGIN

      INSERT INTO filler VALUES (REPLICATE ('a', 5000));

      SELECT @a = @a + 1;

END;

GO

Next I’ll create the ‘production’ table that we’d really like to keep in optimal shape for performance.

-- Create and fill the production table

CREATE TABLE production (c1 INT IDENTITY, c2 VARCHAR (8000));

CREATE CLUSTERED INDEX prod_cl ON production (c1);

GO

DECLARE @a INT;

SELECT @a = 1;

WHILE (@a < 1280) -- insert 10MB

BEGIN

      INSERT INTO production VALUES (REPLICATE ('a', 5000));

      SELECT @a = @a + 1;

END;

GO 

Now I’ll use the sys.dm_db_index_physical_stats DMV to check the fragmentation of the production table’s clustered index – it should be almost zero:

-- check the fragmentation of the production table

SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (

      DB_ID ('shrinktest'), OBJECT_ID ('production'), 1, NULL, 'LIMITED');

GO

avg_fragmentation_in_percent fragment_count

---------------------------- --------------------

0.390930414386239            6

This is what I expected. Now I’m going to drop the filler table, run a shrink operation and then check the fragmentation again:

-- drop the filler table and shrink the database

DROP TABLE filler;

GO

 

-- shrink the database

DBCC SHRINKDATABASE (shrinktest);

GO

 

-- check the index fragmentation again

SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (

      DB_ID ('shrinktest'), OBJECT_ID ('production'), 1, NULL, 'LIMITED');

GO

avg_fragmentation_in_percent fragment_count

---------------------------- --------------------

99.7654417513683             1277

Wow! The index went from almost 0% fragmented to almost 100% fragmented – the shrink operation totally reversed the physical ordering of the leaf level of the clustered index – nasty.

One of the common maintenance operations I see at customer sites is to run a database shrink at some interval, and I always advise against it – now you can see why. Running a regular shrink operation can cause horrible fragmentation problems. The worst problems I see are those customers with maintenance plans that rebuild all indexes and then run a shrink to remove the extra space necessary for the index rebuilds – completely undoing the effects of the index rebuild!

One other common thing I see is to have auto-shrink set on for one or databases. This is bad for several reasons:

  • Shrink causes index fragmentation, as I’ve just demonstrated above.
  • You can't control when it kicks in. Although it doesn't have any effect like long-term blocking, it does take up a lot of resources, both IO and CPU. It also moves a lot of data through the buffer pool and so can cause hot pages to be pushed out to disk, slowing things down further. If the server is already pushing the limits of the IO subsystem, running shrink may push it over, causing long disk queue lengths and possibly IO timeouts.
  • You're likely to get into a death-spiral of auto-grow then auto-shrink then auto-grow then auto-shrink... (in my experience, if someone is using auto-shrink, they're most likely using and relying on auto-grow too). An active database usually requires free space for normal operations - so if you take that free space away then the database just has to grow again. This is bad for several reasons:
    • Repeatedly shrinking and growing the data files will cause file-system level fragmentation, which can slow down performance
    • It wastes a huge amount of resources, basically running the shrink algorithm for no reason
    • Auto-grow itself can be bad, especially if you're using SQL Server 2000 (or don't have Instant File Initialization turned on - see this post from Kimberly's blog) where all allocations to the file being grown are blocked while the new portion of the file is being zero-initialized.

Bottom-line: auto-shrink should *NEVER* be turned on…

A couple more questions from the last couple of classes.

Q1) Why doesn't performing an index rebuild alter the fragmentation?

A1) Here are the possibilities - all of which I've seen happen:

  • There isn't an index - either DBCC DBREINDEX or ALTER INDEX ... REBUILD are being run on a table that only has a heap, and so the (extent) fragmentation of the heap isn't changing because there's no way to rebuild a heap (except by the heavily NOT recommended method of creating and then dropping a clustered index).
  • The index is too small. An index with only a handful of pages may not show any changes in fragmentation because all the pages are single, mixed pages (see my previous post on extent types for more info) and so rebuilding the index does nothing.
  • The workload and schema are such that by the time the rebuild has finished and the fragmentation calculation has been done again, the index is already getting fragmented again.
  • The Extent Scan Fragmentation result from DBCC SHOWCONTIG is being used to gauge fragmentation for an index stored in a filegroup with multiple files. The Extent Scan Fragmentation in DBCC SHOWCONTIG does not cope with multiple files (as is documented in Books Online) and so the value may even go UP in some cases!

Q2) What operations take advantage of minimal-logging when the recovery mode is BULK_LOGGED?

A2) The list is very small - 4 four classes of operations:

  • Index builds, rebuilds, or drop of a clustered index (NOT index defrags with DBCC INDEXDEFRAG or ALTER INDEX ... REORGANIZE - this is a common misconception).
  • Bulk load operations (i.e. BCP, INSERT ... SELECT * FROM OPENROWSET (BULK...), and BULK INSERT).
  • Insert or appends of LOB data (either using WRITETEXT/UPDATETEXT for TEXT/NTEXT/IMAGE data types, or UPDATE with a .WRITE clause).
  • SELECT INTO operations on permanent tables.

For these operations, only the allocations are logged in the transaction log. Any extents that are allocated and changed through a minimally-logged operation are marked in the ML bitmaps (one for every 4GB of each file) and then the next transaction log backup will also read all those extents and include them in the backup.

This is a quick answer to a question I was sent today by someone who'd read Kimberly's partitioning whitepaper - Partitioned Tables and Indexes in SQL Server 2005 - and is implementing a "sliding-window" scenario. (This is a mechanism to allow fast insertion and deletion of significant portions of data into/from a partitioned production table. Insertion is done by taking a table and making it a new partition of the production table - called switching-in. Deletion is done by removing a partition from the production table and making it into a stand-alone table - called switching-out.)

The question is - what indexes are required on the staging table to prevent the ALTER TABLE ... SWITCH PARTITION statement from failing with a message like that below:

Msg 4947, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. There is no identical index in source table 'PartitionTest.dbo.StagingTable' for the index 'NC_Birthday' in target table 'PartitionTest.dbo.ProductionTable'.

The answer is that the staging table has to have the exact same indexes - clustered and non-clustered - as the production table. I asked Kimberly if it has to have the same constraints too - the answer is yes, plus the staging table has to have a trusted constraint on it such that SQL Server can tell (without checking all the data in the staging table) that all the data satisfies the partitioning function for the partition that you're switching-in (i.e. the partition that the staging table will become in the production table). If it doesn't, the switching-in will fail with the following error:

Msg 4982, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. Check constraints of source table 'PartitionTest.dbo.StagingTable' allow values that are not allowed by range defined by partition 4 on target table 'PartitionTest.dbo.ProductionTable'.

One thing that confuses people is that SQL Server does not create the target table for you when doing a switch-out of a partition. The target table has to exist and have the exact same schema as the production table. Also, it has to be completely empty - otherwise you'll get an error like:

Msg 4905, Level 16, State 1, Line 1
ALTER TABLE SWITCH statement failed. The target table 'PartitionTest.dbo.StagingTable' must be empty.

The must-be-empty requirement also holds for switching-in operations - the partition that will be created has to be empty otherwise a similar 4904 error results.

Hope this helps!

Two of the cool features in SQL Server 2005 are CROSS APPLY and DMVs (Dynamic Management Views). Now, far be it for me to get my hands dirty explaining developer stuff like CROSS APPLY :-) but I was having a discussion with Colin Leversuch-Roberts in the UK about the composability limitations of the sys.dm_db_index_physical_stats DMV. (Btw - you should check out Colin's blog post series on Analysing Indexes - lots of useful stuff).

So CROSS APPLY lets you do join-like functionality with table-valued functions that take parameters - which you can't do using JOIN. This works for most of the DMVs, but some of them are written to an older internal implementation that doesn't support CROSS APPLY, and sys.dm_db_index_physical_stats is one of them. If you try it you'll get an error like:

Msg 413, Level 16, State 1, Line 26

Correlated parameters or sub-queries are not supported by the inline function "master.sys.dm_db_index_physical_stats".

Fortunately there's a way around this - wrap the DMV in an artificial TVF of your own, and then CROSS APPLY to that. Here's an example:

 

CREATE FUNCTION my_index_physical_stats (

@database_id INT,

@object_id INT,

@index_id INT,

@partition_number INT,

@mode INT)

RETURNS @result TABLE (

database_id SMALLINT NULL, object_id INT NULL, index_id INT NULL, partition_number INT NULL,

index_type_desc NVARCHAR(60) NULL, alloc_unit_type_desc NVARCHAR (60) NULL, index_depth TINYINT NULL,

index_level TINYINT NULL, avg_fragmentation_in_percent FLOAT NULL, fragment_count BIGINT NULL,

avg_fragment_size_in_pages FLOAT NULL, page_count BIGINT NULL, avg_page_space_used_in_percent FLOAT NULL,

record_count BIGINT NULL, ghost_record_count BIGINT NULL, version_ghost_record_count BIGINT NULL,

min_record_size_in_bytes INT NULL, max_record_size_in_bytes INT NULL, avg_record_size_in_bytes FLOAT NULL,

forwarded_record_count BIGINT NULL)

BEGIN

INSERT INTO @result SELECT * FROM

sys.dm_db_index_physical_stats (@database_id, @object_id, @index_id, @partition_number, @mode)

RETURN

END;

GO

(This is an updated repost from earlier this year on my old blog)

Whenever I’m discussing index maintenance, and specifically fragmentation, I always make a point of saying ‘Make sure the index is being used before doing anything about fragmentation’. If an index isn’t being used very much, but has very low page density (lots of free space in the index pages), then it will be occupying a lot more disk space than it could do and it may be worth compacting (with a rebuild or a defrag) to get that disk space back. However, usually there’s not much point spending resources to remove any kind of fragmentation when an index isn’t being used. This is especially true of those people who rebuild all indexes every night or every week.

 

You could even go so far as to say if a non-clustered index isn’t being used, why is it there at all? Extra non-clustered indexes drag down performance in a number of ways. Consider a non-clustered index called IX_MyNCIndex on the table MyTable:

  • Any time a record is inserted into MyTable, a matching record is inserted into IX_MyNCIndex. This is a bunch of extra IOs, extra log records, plus maybe even a page-split.
  • Any time a record is deleted from MyTable, the matching record in IX_MyNCIndex must be deleted. Extra IOs again and log records again.
  • Any time a record in MyTable is updated:
    • If MyTable has a clustered index, and the clustered index key value changes, then the matching record in IX_MyNCIndex must be updated. Extra IOs and log records again.
    • If any of the non-clustered index key values changes, or any of the INCLUDEd column values changes, then the matching record in IX_MyNCIndex must be updated. Extra IOs and log records again.
  • If a clustered index is created on MyTable, then IX_MyNCIndex has to be rebuilt to include the logical RIDs rather than the physical heap RIDs (see this post for an explanation). Lot of extra IOs and log records again.

That’s a significant amount of extra IOs and log records to maintain each extraneous non-clustered index.

 

So, how can you tell if an index is being used?

  • In SQL Server 2000 there is no way to do it
  • In SQL Server 2005 there are a few different ways in SQL Server 2005 – the one I want to discuss in this post is the sys.dm_db_index_usage_stats DMV.

This DMV exposes the information that is tracked about index usage (as the name suggests). It does not generate any information itself; it just returns info from a cache inside SQL Server. This cache is empty when the server instance starts, and is not persisted across instance restarts. All cache entries for indexes in a database are removed when that database is closed. So, the cache tracks usage information about which indexes have been used since the database they are part of was last opened (either manually or as part of instance start-up).

 

This continues to confuse people so I'll call it out: if the output from the DMV does not have an entry for the index you're interested in, it has not been used since the last database startup.

 

The cache tracks the following info for each index (for user queries and system queries):

  • The number of times it was used in a seek operation (either looking up a single row, or doing a range scan) along with the time of the last seek.
  • The number of times it was used in a scan operation (e.g. a select * operation) along with the time of the last scan
  • The number of times it was used in a lookup operation (this means a bookmark lookup – where a non-clustered index does not fully cover a query and additional columns must be retrieved from the base table row) along with the time of the last lookup.
  • The number of times it was used in an update operation (this counts inserts, updates, and deletes) along with the time of the last update.

Let’s have a look at its use.

SELECT * FROM sys.dm_db_index_usage_stats;

GO

The output is too wide for a single image so I've split it in two (I won't post any more output from the DMV - I'll just talk about it):

indexusage11.jpg

indexusage21.jpg

Unless you've just re-started your instance, you'll see a bunch of output from this, representing all index activity since the instance/databases started. If you're interested in whether an index is being used, you can filter the output. Let's focus in on a particular table  - AdventureWorks.Person.Address.

SELECT * FROM sys.dm_db_index_usage_stats

WHERE database_id = DB_ID('AdventureWorks')

AND object_id = OBJECT_ID('AdventureWorks.Person.Address');

GO

You'll probably see nothing in the output, unless you've been playing around with that table. Let's force the clustered index on that table to be used, and look at the DMV output again.

SELECT * FROM AdventureWorks.Person.Address;

GO

SELECT * FROM sys.dm_db_index_usage_stats

WHERE database_id = DB_ID('AdventureWorks')

AND object_id = OBJECT_ID('AdventureWorks.Person.Address');

GO

Now there's a single row, showing a scan on the clustered index. Let's do something else.

SELECT StateProvinceID FROM AdventureWorks.Person.Address

WHERE StateProvinceID > 4 AND StateProvinceId < 15;

GO

SELECT * FROM sys.dm_db_index_usage_stats

WHERE database_id = DB_ID('AdventureWorks')

AND object_id = OBJECT_ID('AdventureWorks.Person.Address');

GO

And there's another row, showing a seek in one of the table's non-clustered indexes.

So, its easy to look at the index usage for particular tables and indexes. But how can you monitor this over time? This is easy too - let's see how. First we need to create our own table to store snapshots of the DMV output.

IF OBJECTPROPERTY (object_id (N'master.dbo.MyIndexUsageStats'), 'IsUserTable') = 1 DROP TABLE dbo.MyIndexUsageStats;

GO

SELECT GETDATE () AS ExecutionTime, * INTO master.dbo.MyIndexUsageStats

FROM sys.dm_db_index_usage_stats WHERE database_id=0;

GO

Next we need to take a baseline snapshot of the DMV output.

INSERT master.dbo.MyIndexUsageStats

SELECT getdate (), * FROM sys.dm_db_index_usage_stats;

GO

And now simulate a few operations and take another snapshot of the DMV:

SELECT * FROM AdventureWorks.Person.Address;

GO

SELECT * FROM AdventureWorks.Person.Address;

GO

SELECT StateProvinceID FROM AdventureWorks.Person.Address

WHERE StateProvinceID > 4 AND StateProvinceId < 15;

GO

INSERT master.dbo.MyIndexUsageStats

SELECT getdate (), * FROM sys.dm_db_index_usage_stats;

GO

And look at the filtered contents of our snapshot table:

SELECT * FROM master.dbo.MyIndexUsageStats

WHERE database_id = DB_ID('AdventureWorks')

AND object_id = OBJECT_ID('AdventureWorks.Person.Address');

GO

You should see four rows - two from the baseline snapshot and two from the final snapshot. If you ran just the statements above, you'll see that the user_scans count for the clustered index has increased by two, and the user_seeks count for the non-clustered index has increased by one.

So this is a pretty simple example of how you can track index usage. By putting something like this into a regularly run script you can tell which indexes aren't being used and could be candidates for less-regular index maintenance or removal altogether.

Let me know how you get on.

I'd like to kick off the Indexes From Every Angle series this evening by re-posting some articles from my old blog. Both of these topics continue to crop up on newsgroups and forums.

What happens to non-clustered indexes when changes are made to the underlying table? Are they always rebuilt or not? Well - it depends! (my favorite answer :-)).

Before we get into that discussion, I’ll give you a little background.

 

One way to describe a non-clustered index is whether it is a covering index or not. A covering index is one which has all the table columns necessary to satisfy a query, and so there is no need to go back to the underlying table (which I’ll call the base table) to fetch additional columns. Most non-clustered indexes are not covering indexes and so it is necessary for each non-clustered index record (even in covering indexes) to include a link back to the corresponding base table record. This link takes various forms, depending on the structure of the base table:

  • Heap: A heap does not impose any logical ordering on the records within it, so the only way to find a particular record within a heap is to do a full table scan or to know the physical location of the record (i.e. which record slot on which data page) – called the physical record ID or physical RID. Clearly doing a full table scan every time a query needs to go from a non-clustered index record back to a heap is not feasible, so each non-clustered index record stores the physical RID of the corresponding heap record.
  • Clustered index: A clustered index does impose a logical ordering on the records within it and the ordering exactly matches the definition of the clustered index keys – known as the cluster key. The fastest way to find a particular record in a clustered index is to use the cluster key to navigate through the clustered index b-tree to find the exact record. So, non-clustered indexes over a clustered index include the cluster key of the base table record in each non-clustered index record. This is known as the logical RID. There is a twist though, depending on whether the clustered index is unique or not.
    • Non-unique clustered index: Every record in a clustered index HAS to be unique, otherwise there would be no way to deterministically navigate to a particular record using the index b-tree. In a non-unique clustered index, SQL Server has to add a special column, called the uniquifier, to each record, so that if multiple records have the same cluster key values, the uniquifier column will be the tie-breaker. This uniquifier column is added as part of the cluster key, and so it is also present in all non-clustered index records as part of the logical RID.
    • Unique clustered index: Every record in a unique clustered index is already unique and so no extra column is required in the cluster key.

Ok - background out of the way. Now let's discuss what happens to non-clustered indexes when various actions are performed to the base table.

 

SQL Server 2000

  • Going from a heap to a clustered index: Changing the fundamental structure of a table from a heap to a clustered index means you're changing the way that records can be looked up in the table, from physical to logical. This means that all the physical RIDs that are stored in the non-clustered index records need to be changed to logical RIDs. So - in this case, all non-clustered indexes are rebuilt after the clustered index has been built.
  • Going from a clustered index to a heap: This is the opposite of the case above - you're changing the way records are looked up from logical to physical. This means that all the logical RIDs that are stored in the non-clustered index records need to be changed to physical RIDs. So - in this case, all non-clustered indexes are rebuilt after the clustered index has been dropped.
  • Rebuilding a unique clustered index: This operation has no effect on the cluster key, so there is no need for the non-clustered indexes to be rebuilt. Various early Service Packs of SQL Server 2000 had bugs that caused this behavior to be broken - and this is the cause of much of the confusion around this behavior.
  • Rebuilding a non-unique clustered index: In SQL Server 2000, when a non-unique clustered index (which contains uniquifier columns) is rebuilt, all the uniquifier values are regenerated. This essentially means that the cluster keys have changed and so all the non-clustered indexes are rebuilt after the clustered index rebuild has completed.
  • Changing the clustered index schema: This is simple - any change to the clustered index schema that changes the cluster keys will cause all the non-clustered indexes to be rebuilt.

 SQL Server 2005

  • Going from a heap to a clustered index: This is the same as SQL Server 2000 - all non-clustered indexes are rebuilt, regardless of whether the clustered index was created online or offline.
  • Going from a clustered index to a heap: Again, all non-clustered indexes are rebuilt, regardless of whether the clustered index is dropped online or offline.
  • Rebuilding a unique clustered index: This is the same as SQL Server 2000 - the cluster keys aren't changing and so the non-clustered indexes are not rebuilt.
  • Rebuilding a non-unique clustered index: Aha! This is different from SQL Server 2000. SQL Server 2005 will RE-USE the old uniquifier values so the cluster keys don't change. This means that non-clustered indexes are NOT rebuilt in this case - that's very cool!
  • Changing the clustered index schema:
    • Changing the cluster key: This behavior has to be the same as SQL Server 2000 - any change to the cluster keys forces all the non-clustered indexes to be rebuilt after the new clustered index has been created.
    • Changing anything else (e.g. partitioning it or moving it): This is one of the cases that confuses people in SQL Server 2005. Applying a partitioning scheme to a clustered index, or moving it to a different filegroup, doesn't change the cluster key at all - so there's no need to rebuild the non-clustered indexes.

So the rule of thumb to remember is: if the physical or logical RID contained in the non-clustered index records needs to change then all the non-clustered indexes need to be rebuilt. Its as simple as that.

 

Hopefully this will help to clear up some confusion - let me know of any questions!

Theme design by Nukeation based on Jelle Druyts