OK, for years I've been saying that SQL Server doesn't care about the order in which you define the columns of your table because internally SQL Server will re-arrange your columns to store all of the fixed width columns first and the variable columns last. In both the fixed-width portion of the row as well as the variable-width portion of the row, the columns are defined in the order in which they are declared. So, what does matter?

It's all in the cost of the variable array's offset values. If the large majority of your NULLable records are at the end, then SQL Server doesn't need to completely populate the variable block array (which saves 2 bytes per column). If you have a table where 36 columns are NULLable and generally they are NULL, then defining those columns at the end of the row can save you space.

The following script will show you how the maximum length of the row changes based on whether or not a later column in the variable block is NOT NULL - even when most/all of the prior columns are!

CREATE TABLE RowSizeVariableBlock
(
ID
int NOT NULL identity
,
c01 char(10) NOT NULL default 'test'
,
c02 datetime2(7) NOT NULL default sysdatetime
(),
c03 char(80) NOT NULL default 'junk'
,
c04 varchar(100)
NULL,
c05 varchar(100)
NULL,
c06 varchar(100)
NULL,
c07 varchar(100)
NULL,
c08 varchar(100)
NULL,
c09 varchar(100)
NULL,
c10 varchar(100)
NULL,
c11 varchar(100)
NULL,
c12 varchar(100)
NULL,
c13 varchar(100)
NULL,
c14 varchar(100)
NULL,
c15 varchar(100)
NULL,
c16 varchar(100)
NULL,
c17 varchar(100)
NULL,
c18 varchar(100)
NULL,
c19 varchar(100)
NULL,
c20 varchar(100)
NULL,
c21 varchar(100)
NULL,
c22 varchar(100)
NULL,
c23 varchar(100)
NULL,
c24 varchar(100)
NULL,
c25 varchar(100)
NULL,
c26 varchar(100)
NULL,
c27 varchar(100)
NULL,
c28 varchar(100)
NULL,
c29 varchar(100)
NULL,
c30 varchar(100)
NULL,
c31 varchar(100)
NULL,
c32 varchar(100)
NULL,
c33 varchar(100)
NULL,
c34 varchar(100)
NULL,
c35 varchar(100)
NULL,
c36 varchar(100)
NULL,
c37 varchar(100)
NULL,
c38 varchar(100)
NULL,
c39 varchar(100)
NULL,
c40 varchar(100)
NULL
)
go

insert RowSizeVariableBlock DEFAULT VALUES
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed'
)
-- review "max" record size = 114
go

insert RowSizeVariableBlock (c01, c03, c20)
values ('med row', 'up to c20', 'test'
)
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed'
)
-- review "max" record size = 154
go

insert RowSizeVariableBlock (c01, c03, c30)
values ('med+ row', 'up to c30', 'test'
)
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed'
)
-- review "max" record size = 174
go

insert RowSizeVariableBlock (c01, c03, c40)
values ('large row', 'up to c40', 'test'
)
go

select * from RowSizeVariableBlock
go

select * from sys.dm_db_index_physical_stats
(db_id(), object_id('RowSizeVariableBlock'), null, null, 'detailed'
)
-- review "max" record size = 194
go

While there are some other optimizations at this level, most tables cannot benefit from this as the data populations aren't as predictable nor are most tables filled with so many variable-width and NULLable columns. However, if you do have this data pattern, defining these columns at the end of your table's definition - MIGHT save a tremendous amount of space, especially when this table is very large!

Paul's blogged more on these structures as well as the NULL bitmap here: http://www.sqlskills.com/BLOGS/PAUL/post/Misconceptions-around-null-bitmap-size.aspx.

Enjoy! And, thanks for reading,
kt

I've always been concerned with security and I've always stressed the importance of auditing the REAL user context not just the current user (see this post on EXECUTE AS and auditing). So, I generally try to avoid using dynamic string execution and if necessary create well tested/protected parameters (fyi - using QUOTENAME can be a fantasic solution to protectng identifiers as input parameters but it can't protect more complex strings).

Having said that, what if I'm looking at a database for the first time... just poking around trying to see if there's anything that needs further attention? I've come up with a quick query... And, while it's not going to "solve" your problem (as that's going to take some re-writing of code) or even truly verify if you're vulnerable, it gives you a "quick list" of where you should look first! If your code uses dynamic strings AND it's elevated - then start there! 

SELECT OBJECT_NAME(object_id) AS [Procedure Name],
  CASE
      WHEN sm.definition LIKE '%EXEC (%' OR sm.definition LIKE '%EXEC(%' THEN
'WARNING: code contains EXEC'
      WHEN sm.definition LIKE '%EXECUTE (%' OR sm.definition LIKE '%EXECUTE(%' THEN
'WARNING: code contains EXECUTE'
  END AS [Dynamic Strings]
,
  CASE
     
WHEN execute_as_principal_id IS NOT NULL THEN N'WARNING: EXECUTE AS ' + user_name(execute_as_principal_id
)
      ELSE
'Code to run as caller - check connection context'
  END AS [Execution Context Status]
FROM sys.sql_modules AS sm
ORDER BY [Procedure Name]

Is this enough? Anything else you'd check? What do you think?

THANKS!
kt

OK, I'll definitely take a beating from all of you for having gone so long between my survey posts and now. I won't even go into the details but between some crazy work schedules, multiple sinus problems and even migraines... well, I've been a bit behind. Let's just say that April/May were rough at best. I'm feeling better and well, now I'm trying to catch up. I had really gotten the blogging bug in March but I completely lost it in April. But, this tipping point series is in dire need of lots of explaining so I'm really hoping to get a few posts done in this area for sure!

First, I started the discussion around this in a few surveys:

Survey/Question 1

Q1 was described as this: if a table has 1 million rows at 20 rows per page (50,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged here. Here's what the survey said as of today:

And, for Q1 the correct result (Between 0-2% of the rows) is actually the best result (but, by no means the overwhelming majority at only 28%). However, often people just "think" the answer is very small. So... I did a few more questions/surveys. 

Survey/Question 2

Q2 was described as this: if a table has 1 million rows at 100 rows per page (10,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged here. Here's what the survey said as of today:

And, for Q2 the correct result (Less than .5% of the rows) is actually at a tie for the best (but, again, even a small percentage at only 22%). Again, often people just "think" the answer is very small. So... I did one more question/survey. 

Survey/Question 3

Q3 was described as this: if a table has 1 million rows at 2 rows per page (500,000 pages), at what percentage (roughly) of the data would a nonclustered index no longer be used. Blogged here. Here's what the survey said as of today:

And, for Q3 the correct result (Between 10-20% of the rows) is actually NOT the highest answer. And, this is even more convincing that there's confusion around what's going on and why.

The Tipping Point

What is the tipping point?

It's the point where the number of rows returned is "no longer selective enough". SQL Server chooses NOT to use the nonclustered index to look up the corresponding data rows and instead performs a table scan.

When does the tipping point occur?

It depends... it's MOSTLY tied to the number of pages in the table. Generally, around 30% of the number of PAGES in the table is about where the tipping point occurs. However, parallelism, some server settings (processor affinity and I/O affinity), memory and table size - all can have an impact. And, since it can vary - I typically estimate somewhere between 25% and 33% as a rough tipping point (and, you'll see from a bunch of my examples, that number is not EXACT). Then, I translate that into rows.

Math for Tipping Point Query 3: If a table has 500,000 pages then 25% = 125,000 and 33% = 166,000. So, somewhere between 125,000 and 166,000 ROWS the query will tip. Turning that into a percentage 125,000/1million = 12.5% and 166,000/1million = 16.6%. So, if a table has 500,000 pages (and 1 million rows) then queries that return less than 12.5% of the data are likely to USE the nonclustered index to lookup the data and queries over 16.6% of the data are LIKELY to use a table scan. For this table, that percentage seems "reasonable". But, most of us say that the tipping point happens at a much lower percentage... why? Because row size - which determines table size (and therefore pages) is really what has the greatest impact. So, let's look at Tipping Point Query 2... 

Math for Tipping Point Query 2: If a table has 10,000 pages then 25% = 2,500 and 33% = 3,333. So, somewhere between 2,500 and 3,333 ROWS the query will tip. Turning that into a percentage 2,500/1million = .25% and 3,333/1million = .33% (not even 1%). So, if a table has only 10,000 pages (and 1 million rows) then queries that return less than a quarter of 1% of the data are likely to USE the nonclustered index to lookup the data and queries over one third of one percent are LIKELY to use a table scan. For this table, that percentage seems really low BUT, at the same time it makes sense (to a point) that a small table would be scanned... but, for less than 1%. 1% is NOT selective enough. For small tables, it might not matter all that much (they're small, they fit in cache, etc.) but for bigger tables - it might be a big performance problem. 

Math for Tipping Point Query 1: If a table has 50,000 pages then 25% = 12,500 and 33% = 16,666. So, somewhere between 12,500 and 16,666 ROWS the query will tip. Turning that into a percentage 12,500/1million = 1.25% and 16,666/1million = 1.66% (under 2%). So, if a table has 50,000 pages (and 1 million rows) then queries that return less than 1.25% of the data are likely to USE the nonclustered index to lookup the data and queries over 1.66% are LIKELY to use a table scan. Again, this seems like a low number. Again, for small tables, it might not matter all that much (they're small, they fit in cache, etc.) but as tables get larger and larger - it CAN be a big performance problem. 

Why is the tipping point interesting?

  • It shows that narrow (non-covering) nonclustered indexes have fewer uses than often expected (just because a query has a column in the WHERE clause doesn't mean that SQL Server's going to use that index)
  • It happens at a point that's typically MUCH earlier than expected... and, in fact, sometimes this is a VERY bad thing!
  • Only nonclustered indexes that do not cover a query have a tipping point. Covering indexes don't have this same issue (which further proves why they're so important for performance tuning)
  • You might find larger tables/queries performing table scans when in fact, it might be better to use a nonclustered index. How do you know, how do you test, how do you hint and/or force... and, is that a good thing?

Real example of an interesting tipping point

Earlier today, I went on facebook and twitter and gave the following information - very vaguely - and I asked "why" is Q2 so much slower than Q1 if Q2 returns only 10 more rows. Same table and no hints (other than MAXDOP)...

Q1: SELECT * FROM table WHERE colx < 597420 OPTION (MAXDOP 1)

  • returns 197,419 rows
  • takes 116,031 ms (1 minute, 52 seconds)
  • 1,197,700 logical reads, 5 physical reads, 137,861 read-ahead reads
  • 7,562 ms CPU time

    Q2: SELECT * FROM table WHERE colx < 597430 OPTION (MAXDOP 1)

  • returns 197,429 rows
  • takes 244,094 ms (4 minutes, 4 seconds)
  • 801,685 logical reads, 1410 physical reads, 801,678 read-ahead reads
  • 9,188 ms CPU time

There were lots of great guesses... but, it's the tipping point. SQL Server chose to "tip" the second query because it was "over the line". But, it's important to realize that there are cases when that's NOT a good idea. And, what are your options?

In SQL Server 2005 - the only option is to force the nonclustered index to be used:

Q2: SELECT * FROM table WITH (INDEX (NCInd)) WHERE colx < 597430 OPTION (MAXDOP 1)

But, this can be TERRIBLY bad on some machines where the IOs could be a lot faster (and where data might already be in cache). These specific numbers are exactly that - specific to this HARDWARE (and, I chose not-so-optimal HW in this case to highlight this problem). And, depending on what number you use (what if this is a parameter in sps?) you might force SQL Server to do WAY more IOs by forcing the index than allowing the tipping point to do its job. But, depending on your hardware (and/or what you know to be in cache at the time of execution), it might be better to force an index instead of letting SQL Server choose. So, should I force the index? Be careful, if you're wrong - it could take more time and actually be slower.

In SQL Server 2008 - there's a new hint - FORCESEEK:

Q2: SELECT * FROM table WITH (INDEX (FORCESEEK)) WHERE colx < 597430 OPTION (MAXDOP 1)

FORCESEEK is better because it doesn't tie you to a particular index directly but it also doesn't let SQL Server tip to a table scan. However, just like forcing an index - you can be wrong!

So, what should you do? It depends. If you know your data well and you do some extensive testing you might consider using a hint (there are some clever things you can do programmatically in sps, I'll try and dedicate a post to this soon). However, a much better choice (if at all possible) is to consider covering (that's really my main point :). In my queries, covering is unrealistic because my queries want all columns (the evil SELECT *) but, if your queries are narrower AND they are high-priority, you are better off with a covering index (in many cases) over a hint because an index which covers a query, never tips.

That's the answer to the puzzle for now but there's definitely a lot more to dive into. The Tipping Point can be a very good thing - and it usually works well. But, if you're finding that you can force an index and get better performance you might want to do some investigating and see if it's this. Then consider how likely a hint is to help and now you know where you can focus.

Thanks for reading,
kt

A couple of weeks ago I wrote a blog post titled Whose job is it anyway? It's an interesting debate and something I've been hearing more and more - that SQL Server is a "set it and forget it" technology - a black box where you just don't need to know how it works to do well with it. In fact, I've even had a few folks comment that they think it would be better to "roll their own" database rather than have to learn how to work in a "general purpose" database. And, while there are certainly lots of different angle to this debate - one fact remains... if you don't know anything about the database on which you're developing (whether it's SQL Server, mySQL, Oracle, whatever), I *PROMISE* you won't have a truly scalable, optimal solution. Why do you think there are so many knobs? It's because there are so many different ways to work with data. There is more than one way to query, more than one way to design. This is also why every answer to a "how should I do this" question starts with "It depends". And, while that seems like a scary response it's actually a good one. It means that you have lots of options - options that can offer many different pros/cons. And, as a result of knowing these pros/cons, you can make better decisions - decisions that will ultimately determine how well you can scale.

So..... while I don't think this debate will EVER be finished (as to WHOSE job it is to know these things), I do think a lot of folks are seeing the effects of not knowing more about their store (and, again, this is NOT limited to SQL Server in any way, shape or form).

At a minimum, hear the discussion on RunAsRadio with Richard, Greg and I and let us know what you think!

Kim Tripp on the Roles of Developers and DBAs with the Database!

Cheers,
kt

I started the series here: http://www.sqlskills.com/BLOGS/KIMBERLY/post/Spring-cleaning-your-indexes-Part-I.aspx and I want to continue with Part II today by clarifying some great questions/comments that have come up on the series. In Part III, I'll give you a few more ways to get rid of (or consolidate) indexes. And, I think there's still a bit more that Paul and I will investigate further (wrt to operational stats) but, I want to address a few comments and a few interesting things that both Paul and I have found.

In the Part I post, I talked about using sys.dm_db_index_usage_stats to see if there are any indexes that just aren't being used at all... A few comments asked why I didn't use operational_stats instead. To address that first, there are a few key differences:

  • dm_db_index_operational_stats is persisted only as long as an object is in cache (however, it's not cleared when objects are forced out of cache with DBCC DROPCLEANBUFFERS). If you want to clear ALL DMVs for a specific database, then a relatively easy way to do this (IN TESTING) is to take the database offline and then immediately bring it online again.
    • ALTER DATABASE <dbname> SET OFFLINE
    • ALTER DATABASE <dbname> SET ONLINE
      • NOTE: If there are any suspect files, you will NOT be able to bring your database back online without FIRST taking all suspect files OFFLINE. And, if you take a FILE offline then it's even more important to know that THERE IS NO WAY TO BRING A FILE ONLINE without restoring it from backups. So, it's VERY important to understand that OFFLINE/ONLINE for a database is really easy IF AND ONLY IF there are no other problems with the DB. You really need to resolve those problems first (or at least know that you're going to need to resolve those problems later through backup/restore) before you take a database offline.
  • dm_db_index_operational_stats is (from BOL) neither persistent nor transactionally consistent. This means you cannot use these counters to determine whether an index has been used or not, or when the index was last used. For information about this, see sys.dm_db_index_usage_stats.

Having said that though, none of these are really any guarantee of perfect information. And, they're not meant to be. I look at these DMVs as being a quick and easy way to get some relatively descent insight into what is or is not happening in my environement. However, even though the sys.dm_db_index_operational_stats might give you insight that you have a problem it still doesn't give you good insight into exactly what that problem might be. For example, it *might* be splits that causes some of your wait times to increase (column: page_io_latch_wait_in_ms) but, it could be something else too (some other system issue).

The main point, you can use these to get insight into which tables have the biggest problems (i.e. the biggest waits) and where they might have a lot of splits (column:nonleaf_allocation_count) but, in all honesty, that's not a guarantee. In fact, the reason I said "might" is that pages that are allocated at the end of the leaf level STILL allocate a page and require an entry to be made in the next level up in the index. So, a lot of nonleaf_allocations COULD be for a perfectly unfragmented index. So, it still doesn't tell you how fragmented the objects are or what the REAL problem is (or even if it is a problem yet).

Basically, these just tell you where you have the most activity and give you a starting point for problem solving. But, none of these (usage or operational) really tell you how to solve the problem. However, sometimes even knowing where to start IS the problem in and of itself. So, I'm not against these DMVs and I really do think you can get some good insight from them. Just use them as a tool to help focus your investigations. Use better tools like sys.dm_db_index_physical_stats to really see if you have fragmentation and where it's the worst.

hth,
kt

First and foremost, happy spring! I truly hope we're on the path to summer (although who would know it here - we're in Florida for SQL Connections and the weather is a bit chilly and it's been raining off/on today - I hope this is short term (no, I don't want to look at the forecast as I don't want to jinx it :)). But, wherever you are - I hope you're on your way to nicer weather and minimal cold weather (ok, I guess I only have warm wishes for the northern hem... for you southies - I hope your fall is lovely!!).

But, for everyone - now's a good time to start thinking about cleaning out the [non-literal index] closet... and getting rid of some of those dusty indexes?

Why/when should you get rid of some of your indexes?

  1. It's possible that some of them aren't being used at all...
  2. Especially when they're not being used but even when they're "redundant" (or minimally useful) they're still costing you in many ways:
    1. Wasting space on disk
    2. Wasting space in memory (well, if they are being used then they're cluttering up your cache)
    3. Wasting space/time in your maintenance routines (so, here they're cluttering up your cache for sure!)
    4. Wasting space in your backups
  3. You might be able to reduce your overall indexes with index consolidation...

So, for this post, I'm going to target #1 - are there any indexes that just aren't being used at all...

First, how do you know if your indexes are being used?
In SQL Server 2005 and higher, there's a DMV (dynamic management view) called sys.dm_db_index_usage_stats and it's there to track index usage patterns. However, it's not persisted since the beginning of time and as a result, if you look at this and believe that it's telling you ALL of the indexes that have been used in your database - then you might be mistaken. The index usage stats DMV is cleared when SQL Server is restarted as well as when you detach/attach the database or when you backup/restore the database.

Therefore, you don't want to just run the following query and drop all of the indexes that aren't being used. A better way to "trust" this information is to periodically persist the data from the DMV in your own table and then query it after you've completed a business cycle's worth of activity - logging all of the usage stats. Then you can trust this much more. Again, here are a couple of negatives:
1) it's not persisted
2) it only keeps the database_id and the index_id (which could change over time). You're right in thinking it probably shouldn't change but, a nonclustered index's ID is not permanent so, it's better to track the index name in addition to the index_id. And, when you run your queries to determine what to delete, you can easily verify indexes against the current indexes because your comparison is within the same db (more on this below).

If you want to persist this, then you have two ways to do this:

Store the index usage patterns in a table within the specific database you're tracking:

Pro: it goes with the database when you back it up, etc. and, it's easier to reverse engineer which actual indexes you're referring to (grabbing the names and not just the IDs).

Con: it's a bit more complex of a query to run and you'll need to run it for all of your databases (ok, it's really not all that bad - but, using something like sp_msForEachDB will really help)

Store the index usage patterns from all databases in a table within master or your own "performance database":

Pro: you only need one job to handle all the index usage info AND object_name *does* support TWO parameters (object_id and database_id) so, as long as you trap the name at the time of insert then you'll be good.

Con: it doesn't go with the database (e.g. backup/restore - and if you're restoring to a test system and you want to see what the usage patterns were then you'll need to get this information as well...)

Here's a simple query that you can run that shows all the indexes used right now - and adds the databasename/objectname into the results - in a persisted table you'll also want to add the runtime:

SELECT getdate() AS RunTime
, DB_NAME(i.database_id) as DatabaseName
, OBJECT_NAME(i.object_id, i.database_id) as ObjectName
, *
FROM sys.dm_db_index_usage_stats AS i
WHERE object_id > 100

And, if you want to get a few more insghts into how to persist this on Paul's blog here: http://www.sqlskills.com/BLOGS/PAUL/post/Indexes-From-Every-Angle-How-can-you-tell-if-an-index-is-being-used.aspx.

OK, so, you have a few options to think about and I have a few more parts to post!
kt

PS - The Tipping Point is coming soon too. I'm still adding a few things to that one!!! ;-)

This is a tough topic. It's a big topic and more than any other - I think there are a lot of misunderstandings about what the log is for, why it's so critical and ESPECIALLY when/why it gets extrememly large. Simply put, it gets large when it's not managed correctly. OK, there are times when it can become large - even if it is well managed. But, more often than not, when a transaction log is wildly out of control (orders of magnitude larger than the data itself) it indicates a management/maintenance problem.

There are a lot of places where you can go to find out the technical details behind the transaction log but I'm going to target this blog post to the relatively straightforward easy (no, really easy!!) facts about transaction log maintenance.

What kind of transaction log management is right for YOUR database?

First and foremost, you MUST decide whether or not you need to do log backups. SQL Server *requires* you to make some form of decision. Well, I take that back. They don't tell you anywhere that you need to make this decision but the transaction log can get wildly out of control if you don't (see the next section for more details on this one :)).

Why? Transaction log backups will allow you better recovery options in the event of a disaster. If you create a good backup strategy, you should be able to recover from a disaster very close (possibly even up-to-the-minute) to the time of the disaster. Howevre, you are not required to do log backups. Instead you can do only database-level backups and recover with those. That's fine. There's really nothing wrong with that strategy. However, it does mean that you have a greater potential for data loss. Basically, if you decide that you're doing to do weekly full backups - then you need to be OK with losing everything that's happened since your last full backup. If that's OK, then performing full database backups (and never worrying about the log) is absolutely fine.

However, if you want more granular control and more recovery options (again, possibly even up-to-the-minute recovery - which is transactional recovery up to the time of the disaster), then you MUST add transaction log backups into your disaster recovery strategy.

So, make this decision FIRST:

  1. Am I OK with some data loss? (then you're probably OK with just database-level backups... but, you will need to do something else! be sure to keep reading!!!)
  2. Do I want to minimize data loss to the smallest amount possible? (then you're going to want to AUTOMATE transaction log backups)

But I didn't do anything - why is the log WAY out of control (in terms of size)?

OK, even if you consciously make the decision to ONLY do database-level backups, you are NOT DONE!!! In fact, this is actually what led me to do this post. I found these two (relatively dated but interesting nonetheless) MSDN forum discussions for TFS (Team Foundation Server) databases:

    MSDN Forum discussion "Recommended SQL Maintenance Plan": http://social.msdn.microsoft.com/forums/en-US/tfsadmin/thread/b23f7018-3eaa-4596-96e4-728b02cf6211/ 
    MSDN Forum discussion "Huge log files":
http://social.msdn.microsoft.com/forums/en-US/tfsadmin/thread/605d51f7-23fd-470c-945e-53fa7ed5aa87/

And, I know EXACTLY what happened in ALL of these cases (and MANY more... Paul and I see this ALL the time, in fact). In the "Huge log files" thread, there's a database mentioned (TfsWareHouse) with a 124MB mdf and a transaction log of 61.8GB. It didn't mention whether or not there were other data files but my guess is that there weren't. My guess is that they were completely shocked by why the data portion had grown to a size that's 510 TIMES the size of the database... The reason is actually somewhat simple (no pun intended). If you're not going to do transaction log maintenance (meaning transaction log backups), then you need to tell SQL Server that. (This is the part that's completely unexpected.)

When a database is created, SQL Server runs that database in a "pseudo simple recovery model". (Yes, I know - that didn't help.) What that means is that SQL Server automatically clears inactive records from the transaction log once it knows that it no longer needs them. It no longer needs them to be stored in the log because no one is using the log (i.e. you're not doing ANY backups). However, once you do start to do backups (and, people generally start by doing a full database backup), then SQL Server looks to your recovery model to determine what to do with log records. If the recovery model is set to full (and, yes, this is the default), then SQL Server gives you the "full feature set" with regard to backup/restore. SQL Server is expecting YOU to manage the transaction log by backing it up. Once it's backed up, SQL Server can remove the inactive records from the transaction log (and when you do a transaction log backup, it automatically clears the inactive records by default).

So, there are really two choices - and ONLY two choices here:

  1. Perform transaction log backups as part of your maintenance plan
  2. Change the recovery model to the SIMPLE recovery model so that SQL Server clears inactive transactions from the log automatically

Is there anything else to do for the transaction log? 

Yes! If you decide that you want to do transaction log backups then I would recommend a few things. I'd first recommend reading 8 Steps to Better Transaction Log Throughput and when you decide how large your transaction log needs to be, then also read Transaction Log VLFs - too many or too few?. These two posts will help you to create a more appropriately sized log as well as one that won't be prone to performance problems (such as internal VLF fragmentation).

If you want to learn more about the transaction log, I'd suggest a few of Paul's resources (it's probably because he has such a fantastic tech editor... oh, I'm asking for trouble with this comment!! ;-):

  1. Read Paul’s blog post to his TechNet article on Logging & Recovery. It’s a great article that covers a lot of different aspects of logging. He also did a great short video on why the transaction log grows wildly out of control. Here’s a link to the blog post that pulls all of the TechNet resources together: http://www.sqlskills.com/BLOGS/PAUL/post/TechNet-Magazine-feature-article-on-understanding-logging-and-recovery.aspx.
  2. Read Paul’s blog post to his TechNet article on Database Maintenance. It’s a great overview of all of these maintenance tasks and will give you a good overview of what each one does. Here’s a link to the blog post that pulls all of the TechNet resources together: http://www.sqlskills.com/BLOGS/PAUL/post/TechNet-Magazine-Effective-Database-Maintenance-article-and-August-SQL-QA-column.aspx

OK, so, I think that sums up part III. I think that's the last one in the series for now. I'll go through and explain "The Tipping Point" next. However, I was hoping for more results to my brain teasers (in those two posts)!!

Cheers,
kt

OK, it seems as though there's A LOT of confusion about what steps are required for proper database maintenance. And, it seems as though some recommendations are being given as "quick fixes" without any real recommendation for root cause analysis. I'm not saying that the generalizations are horribly wrong but in many cases they're just too broad and/or unspecific to actually be useful (and, well, in all honesty, some of them are just really bad recommendations because they’re so ambiguous). And, in my random internet trolling for the day, I found 4 different references that I want to go through (which is why this is only Part I). For this post, I’ll focus just on Sharepoint.

First, what did I see that’s motivating this post?
I found the following KB article – which was referenced by numerous sites as recommended reading. And, without knowing a lot about SQL (and, that’s NOT meant as a dig at all – it’s just that most apps that sit on SQL don’t ever even recommend that you need to know SQL and I can argue certain aspects of that point as well BUT, wrt to maintenance, it can really become a problem if you don't know a few things about these tasks), it does seem like good reading: Information about the Maintenance Plan Wizard in SQL Server 2005 and about tasks that administrators can perform against SharePoint databases

Here is the part that over-simplifies picking what maintenance tasks to run vs. what not to run:

DIRECTLY TAKEN FROM THIS KB ARTICLE IS THE FOLLOWING:

We have tested these tasks and the effects that these tasks have on database schema and performance. The following table summarizes the results of these tests.

Task                                        Safe to perform this task?
Check database Yes
Reduce a database Yes
Reorganize an index Yes
Clean up the history Yes
Update statistics Yes. However, this task is unnecessary because the SharePoint Timer service performs this task automatically.
Rebuild an index No. The task does not restore existing index options before the rebuild operation. However, you can use scripts that restore index options.
Note This problem was corrected in SQL Server 2005 Service Pack 2.

We used the following criteria to determine whether a task was safe to perform:

  • Whether the task modified the database schema from its default state
  • Whether the task decreased performanceResults may vary depending on the environment.

However, if you use the Maintenance Plan Wizard to perform the tasks that are listed in the table as "safe to perform," you are likely to experience increased performance in SQL Server 2005.

The big problem is: this is just too little information about too many VERY important tasks!

Let me break this down task by task and give you a few other places to go for more information.

Check database

The check database task refers to DBCC CHECKDB. This is definitely an important part of any maintenance plan. And, it really is a safe task to run as it’s NOT corrective by default. However, there is nothing mentioned about how this command may completely flush your buffer pool as it reads all of the pages of all of the objects it’s checking. So, this might impact performance but, of all of the tasks, this is the safest to run and it’s definitely a recommended task.

If you want to learn more about DBCC CHECKDB, check out these things: 

  1. Read Paul’s blog post to his TechNet article on Database Maintenance. It’s a great overview of all of these maintenance tasks and will give you a good overview of what each one does. Here’s a link to the blog post that pulls all of the TechNet resources together: http://www.sqlskills.com/BLOGS/PAUL/post/TechNet-Magazine-Effective-Database-Maintenance-article-and-August-SQL-QA-column.aspx  
  2. Read Paul’s blog post on Myths around causing corruption – so that you can get better insight into where/why the actual corruptions are occurring. 
  3. Finally, if you’re really interested in the internals of CHECKDB and how it works – Paul’s written a ton about it in his CHECKDB from Every Angle category. FYI, 3 of his 9 years on the SQL Server Development Team were spent writing CHECKDB and repair for SQL Server 2005 (so, he definitely knows how it works J). Here’s the link to the category: http://www.sqlskills.com/BLOGS/PAUL/category/CHECKDB-From-Every-Angle.aspx

Reduce a database 

OK, I’m sure I’ll get a lot of responses to this one but IMO, a database maintenance plan SHOULD NEVER INCLUDE A SHRINK.

Let me explain… J
To be honest, I'm not even a fan of manually running database-level shrinks (DBCC SHRINKDATABASE) either. Don't get me wrong - there are ACCEPTABLE times to shrink parts of a database but, in general, I'd recommend only using DBCC SHRINKFILE for individual file-level shrinks. I wouldn't schedule shrinks nor would I EVER turn on [the database option] autoshrink. I don't think shrinks should EVER be automated - either through the database option OR through maintenance plans.

If you need to do regular shrinks - then it's likely that you have some other problem. And, without DIRECTLY addressing this problem, you *might* be making things worse.

This is a bigger discussion and I’ve found a few other references that I want to pull together. I’ll post another post about this within the next day or so – and link to it from here BUT, for right now…Know this – free space is generally GOOD. Excessive free space has happened FOR A REASON. Maybe there’s a pattern to it but often shrinking is worse than just leaving the free space for the next data explosion (a bunch of data comes in, the database grows, the data is archived, the free space remains for the next set of data that comes in).

If you shrink the database you might make things worse by fragmenting everything. Paul’s video that goes with the TechNet article on Database Maintenance shows you the [shocking if you didn’t know this] effect of shrinking a database on indexes.

Reorganize an index, Update statistics and Rebuild an index

These need to be grouped together to start because this KB article does NOT address the impact of running these together. In fact, the problem, if you run these together – is that you MIGHT make things worse. First, let me give you an overview of each:

Reoganizing an index removes fragmentation in the largest part of an index (it’s called the leaf level of the index) and removing fragmentation in this level has the greatest (and positive) effect on range query scans and cache. So, this is really the most important type of fragmentation to remove. However, this is NOT the only way to do it… 

Rebuilding an index completely and totally removes ALL forms of fragmentation in all levels of an index; however, this is the most expensive (yet most effective) way to do it. As a result of rebuilding an index, SQL Server also updates the statistics for the indexes that were rebuilt. Therefore you do not need to update statistics OR reorganize an index if it gets rebuilt.

Updating statistics is important for query processing and optimization. The query processor uses statistics on your data to help determine how many rows will be processed by your query/statement. If SQL Server can accurately estimate the rows, then it can choose a more effective plan. However, if it doesn’t have good statistics, then it may not do as good of a job at accurately estimating rows and therefore it might not come up with as optimal of a plan. So, this is an integral part to good database health. However, some of this might be done via the database option: auto update statistics which is ON by default in SQL Server (and, YES, you should leave this on). Check out this post on: Auto update statistics and auto create statistics - should you leave them on and/or turn them on??

However, if you use a maintenance plan then I really see two problems: 

  1. You’ll end up doing maintenance on things that may not need it. The default behavior for these tasks is just to run them on the selected objects. And, since many people will choose all objects (possibly even of all databases) then you’ll probably select objects that won’t really need this as frequently as you run this maintenance plan. 
  2. You might end up running a combination of things that either – wastes cycles/CPU and a MASSIVE amount of log space (which can translate into all sorts of concerns for DR technologies like database mirroring which will need to send all log rows to the secondary server). For example, if you run ALL three of these things then they’ll have to be run in a certain order (you can change this in a maintenance plan). However, the default order is: Reorganize Index(es), Rebuid Index(es), Update Statistics. This means that the work that’s done by reorganizing is effectively wasted as the rebuild would have taken care of it AND the work that’s done for updating statistics could mean that they update statistics TWICE (during the rebuild AND after) and the end result is WORSE because the update statistics command might use a sampling mechanism to generate statistics (which can lead to LESS EFFECTIVE statistics information). However, this is ONLY if you change the wizard’s default. The default is for the updating statistics command to do a “full scan”. So, even if the statistics will end up being the same – it’s still problematic because it means that for all indexes you’ve just rebuilt – you’ve now updated their statistics TWICE.

SUMMARY

A database maintenance plan is CRITICAL for best performance (especially for databases that are prone to some of the problems corrected by these maintenance tasks (yes, you can read SHAREPOINT into that statement). Sharepoint uses GUIDs as PRIMARY KEYs (read this post to hear more about the side effects of this choice) and as a result, as clustering keys. This means that many Sharepoint tables are prone to [potentially a MASSIVE amount of] fragmentation.

You absolutely need to have a maintenance plan. But, what should it be?

My absolute preference is to NOT use the Database Maintenance Plan Wizard UNLESS you really know what you’re doing. It just doesn’t give enough prescriptive advice. And, if you just select the defaults, you will end up with an inoptimal maintenance plan.

A better approach would be to create your own maintenance plan. If you write the code yourself (or leverage one of the custom ones that are already out on the web) then you can strategically target ONLY the objects that have the warning signs and/or are out of date and you can set when to rebuild vs. when to reorganize (generally people rebuild if a table has more than 30% fragentation and they reorganize when it's less than 30%). Fragmentation is something that can be detected programmatically using the DMV: dm_db_index_physical_stats (in SQL 2005/2008) or by using DBCC SHOWCONTIG (in SQL 7.0/2000). Here are a few places to go to see the more flexible and programmatic way of rebuilding/reorganizing indexes:

Smart Indexing Part II - Conditional Rebuilding a blog post (with conditional index rebuild code) from SQLMCA Bob Duffy (a good friend who is located in Dublin, Ireland and whose wife (Carmel) just had a baby last week – congrats again Bob!! You guys are seriously outnumbered now!!!) here: http://blogs.msdn.com/boduff/archive/2007/06/08/smart-indexing-part-ii-conditional-rebuilding.aspx

Custom Index Defrag / Rebuild Procedures - a blog post with some posted code as well. http://www.sqlstuff.info/post/2008/03/Custom-Index-Defrag--Rebuild-Procedures.aspx

Rebuild and Reorganize Indexes in SQL 2005 – an article (with conditional index rebuild code) from SQL Server Central here: http://www.sqlservercentral.com/scripts/31857/  (NOTE: You will need to become a subscriber to get to this article.)

Rebuild Only the Indexes that Need Help - an article by Andrew Kelly (SQL MVP) on SQL Server Magazine here: http://www.sqlmag.com/articles/index.cfm?articleid=99019&pg=1 (NOTE: You will need to become a suscriber to get to the full text of the article.)

Or, build your own! Check out the BOL topic for the sys.dm_db_index_physical_stats for SQL 2005 here: http://msdn.microsoft.com/en-us/library/ms188917(SQL.90).aspx, Example D has sample code to help you get started! For SQL 2008 it’s here: http://msdn.microsoft.com/en-us/library/ms188917.aspx. It’s still Example D for the sample code to leverage. J

The most important thing I can tell you is that a SMALL amount of time getting familiar with what’s really happening in SQL as well as WHY it’s happening to you WILL BE A LOT MORE PRODUCTIVE then just slamming in a maintenance plan that solves some problems but probably creates others.

Hope this helps!
kt

OK, so this is interesting. I've got a few answers to my last survey (Tipping Point Query #1) and well, there's a good mix of answers (and, yes, some are correct! ;)). Be sure to go back and review that last post so that you can evaluate it and these two tipping point questions completely. So, now I want to see if people really know the basis of "the tipping point".

Try these two:

Tipping Point Query #2

Table1 (t1) has 1 million rows at 100 rows per page. The table has 10,000 pages. A nonclustered index exists (on name) but it does not cover the query. At what percentage (of the table) is this nonclustered index no longer selective enough to use:

Tipping Point Query #3

Table2 (t2) has 1 million rows at 2 rows per page. The table has 500,000 pages. A nonclustered index exists (on name) but it does not cover the query. At what percentage (of the table) is this nonclustered index no longer selective enough to use:

OK, so I'd really love to see quite a few responses to these *3* "tipping point" questions. I PROMISE to do a nice long (and detailed) post for what is the actual tipping point AND the answers to all three of these questions. I'll explain the math as well as how you can generalize "what is selective enough" so that you can better create your nonclustered indexes!!!

Thanks for reading - and responding to these brain teasers!!

Cheers,
kt

PS - It's snowing here (ah...again)... maybe I'll spend the day creating brain teasers??! Do you guys like this kind of a post? (well, I suppose you won't really know until I post the answer part of it... but, just in general??). I think it's pretty cool. But, don't worry, I won't (nor will Paul) make all of my posts surveys. But, I think this is a really good one. I'm anxious to see if the asnwers come in correctly for these two as well! Have at it!

Along the same lines of improving database design and getting better performance on SQL Server (which [IMO] DOES take an experienced SQL Server database developer - but, we'll talk more about "whose job this really is" in many more posts and probably even a RunAs - which Richard and I just setup to record on Thursday (Mar 12)), I started thinking about how I could convince people of why they NEED a database developer. So, I thought I'd ask this VERY important question...

What percentage of data IS selective enough to use a nonclustered index which doesn't cover the query... in other words (just in case you're not entirely sure of what I mean :)), think of indexes in the back of a book... if you need to go to the back of the book to reference a bunch of data (this is called a [bookmark] lookup in SQL Server), there's a point where the randomness of the lookups (especially if you think in terms of many rows on a page) becomes too expensive. For example, imagine that the index is customer name and the data (the book) is customer orders - and, each page (of this rather weird book ;)), has 20 orders on it. Doing a query to lookup customer number 12's orders might be really easy (if they only have only a few orders) BUT, what if the query is "show me all of the orders for people that have an 'e' in their name". First, the number of people have have an 'e' in their name is probably better than 50% (that's TOTALLY a guess) and, if there's 20 orders per page then a lookup from the index into the book would require SQL Server to touch every page roughly 10 times. If the table has 50,000 pages (therefore 1 million rows - at 20 rows per page), then to find the 500,000 rows (remember, I'm estimating half), SQL Server would have to do 500,000 bookmark lookups. For a table with only 50,000 pages that's terribly expensive.

So, here's the question - what's the tipping point? When is a nonclustered index on customer name NOT going to be used to lookup rows of sales orders? I'm going to use a survey to see what you think and then within a week, I'll give the specific SQL Server math AND a query you can run within your own DBs to see EVERY one of your table's "tipping points". It's really interesting and I think will really help you to understand why SQL Server might not be using those nonclustered indexes.............

Cheers,
kt

It's an exciting year for us for DevConnections! SQL Server 2008 has now been out for a few months and an SP is coming up soon. This is the sign that some customers wait for to migrate over to the new release feeling that an SP indicates a higher level of stability. But, this is also a time when some companies are shying away from upgrades because of the immediate and very quantifiable costs. And so Paul and I really struggled with what to focus on when we put together our Connections line-up.

What we decided to do is focus on your getting the most from the system that you have now - with best practices that apply to SQL Server 2000, SQL Server 2005 and SQL Server 2008. Personally, Paul and I are going to demo and focus on 2005/2008 but the concepts work on all 3 versions (and even most of the syntax as well - but, for index fragmentation analysis and maintenance the commands changed between 2000 and 2005/2008 so that's one minor difference). For the conference itself, we're focusing on upgrade, new features in 2008 and things to be aware of architecturally in all areas of Administration/Ops, Development and Business Intelligence. And, given that this isn't a "new product year" for SQL, other big conferences are likely to have fewer SQL sessions than usual (and most do...seriously).

So, with SQLConnections you get 46 *SQL* sessions and 3 full-day *SQL* workshops (and workshops on other technologies as well - all of which are spread over 2 pre-con days and 1 post-con day - with none of the SQL ones running concurrently so you could attend all 3). Top it all off with a more intimate event than many others means more interaction to get your tough questions answered! In fact, to help make sure we see as many people as possible, Paul and I usually schedule our sessions before and after lunch so that we can spend the entire lunch gap inside our session room answering even more questions! We even have a session called "Follow the Rabbit" where YOU drive the session with your questions. It's great fun and we've been doing this for the past few years with a lot of success!

If you want to see a bit of the personality and flair offered at Connections - check out MyConnections - it's our conference magazine (note: it's 9.80 MB to download but, it's 84 pages). It's something you get automatically after attending and it's filled with technical articles and all sorts of additional information that comes from Connections. And, here's a link to a fun and fast-paced video with highlights of the conference itself (nothing technical - just fun shots of the event). For example, did you know that EVERY year a Harley is given away at Connections? Here's the quick video: mms://bcast.sswug.org/sswugtv/DevConnectionsFall08.wmv.

Finally, did you know that EVERY attendee gets a FREE SQL Server 2008 Standard Edition license with one CAL. That can cover your attendance right there and get you started on development and learning with SQL Server 2008.

We really have a great time at Connections and we hope to see you there!! (And, Florida in March is a nice destination from the winter weather for many of us as well. :)

We hope to see you there!!
Kimberly and Paul

Something I learned while the SQL Server 2008 Internals book was in tech edit (thanks to our *awesome* tech editor Ben Nevarez - who, unfortunately, does not have a blog or anything...yet! (well, I'm hopeful)), was that you can use a FOREIGN KEY constraint to reference a UNIQUE index - one without a PRIMARY KEY or UNIQUE key constraint. At first glance this might seem like something relatively insignificant but in terms of reducing indexes and/or consolidating indexes it offers something that constraints do not. When you create a UNIQUE index you can use INCLUDE to reference (and include) non-key columns in the leaf level of an index. This offers more choices for covering and if you want to cover a query using INCLUDE but also have a UNIQUE column(s) as the key - you can do that with a regular index but not with a constraint based index. So, that got me thinking even more - can I use a UNIQUE index with INCLUDE and even a filters - from a FOREIGN KEY. My guess was that it probably wouldn't work because it would be too costly to have to verify it on every referencing row BUT, I did have hopes that a filter of IS NOT NULL would work. However, it does not. ;-( 

So, you CAN reference a UNIQUE index with INCLUDEd columns but not filters. Even that's really cool!

And, when you start your spring cleaning - try and cleanup and/or consolidate some of those redundant indexes!!

Cheers,
kt 

Given the general state of the economy...many companies are looking to cut back. Going back over what we've done and "optimizing" things -> budgets, expenses, etc. is the norm right now. And, scaling back is not always a bad thing - unless the wrong things are cut. Unless the wrong things are used to motivate you. Prioritizing and/or really assessing what gives you the biggest gains for your dollars is hard. In fact, one of the things that always seems to be first on the cutting block is training. Training is hard to quantify. And, the results of good training are also hard to quantify. Instead of fixing a problem (which you can often see the exact improvement) you might instead avoid a problem. Avoid downtime. Avoid data loss. Process more rows - with the same hardware. But, how do you know the cost of what could have happened. Ugh. To be honest, if I could do that - Paul and I would be on a beach. ;)

But, I do have a reason for this post... what should you be thinking? Where should you focus your attention? What can you cut - safely, temporarily, permanently and what might you help to prioritize?

Should you upgrade software?

  • Is there a feature that makes something easier? Some new features are really powerful "big" features. For example, Policy-Based Management (PBM) might help you to better centralize certain rules (in PBM-speak "policies") and then enforce them on many servers - even 2005 servers... so, you might be able to upgrade a smaller number of servers and still get some of the benefits. Many of the tools work against multiple versions so you might be able to minimize (and/or prioritize) which servers you upgrade and slowly migrate others. Potentially following an every-other-version upgrade strategy... upgrading some servers from 2000 to 2008 and leaving some of your 2005 servers to wait to upgrade until SQL11 (the next version after SQL10 - which is SQL Sever 2008).
  • Are you starting a new project - architecting a new database? Wouldn't it be easier to start on the newer version and get better longevity (maybe?!)? For example, sparse columns might make a major difference in your base table's architecture...and be easier than if you were to architect (and write all of the code) for 2005 but then later need to do a major architectural change to move to 2008 (well, to *really* benefit from things like sparse columns). There are some really good features in 2008 and some *might* warrant upgrading... upgrading now. But, if you don't have a direct need then I'd argue that you could probably stay with 2005 (or even 2000) and then push this out a bit until you absolutely need to move forward.

Should you upgrade hardware?

  • Again, are there features that will directly impact: performance, availability, manageability?
  • Can you wait? I can't really answer this and - for everyone - the answer is going to be "it depends". There might be something that significantly reduces costs and/or minimizes downtime and as a result, you'll just have to do cost-benefit analysis. This is a tough one... but, maybe you can do rolling upgrades and let some of the lesser servers take the hand-me-downs. :)
  • Can you do rolling upgrades moving the most critical to a new server and then a less critical server to the one freed up by the last upgrade...

Is there anything you can do to get more out of what you already have??

In my opinion, this is probably even more important than the two above. Upgrading hardware and software is something you will ALWAYS need to consider but if you could get better performance, scalability and availability out of the hardware/software you have now, then you'll benefit *now* without additional funds spent (actual outgoing funds) and you still be able to leverage what you do today when you do upgrade. So, what this really translates to (IMO) is tweaking and tweaking a bit more - what you already have? How? What can you look for? What can you do to help??

  • Upgrade to the latest service packs/hotfixes (at least upgrade to the free stuff - you might see some gains and in some cases (like SQL Server 2005 SP2+) you might get some new features. (important note: test this on a non-production server FIRST!!)
  • Update your hardware's firmware? You might have missed an update that improves performance (important note: test this on a non-production server FIRST!!)
  • Bottleneck Analysis - Some good resources for this are: Performance Tuning Using Waits and Queues and the SQLCAT team.
  • Workload Analysis - Some good resources for this are: Troubleshooting Performance Problems in SQL Server 2005, Working with Tempdb in SQL Server 2005, Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005...well, there are lots of good whitepapers that are specific to certain types of workloads and/or perf problems...check out our whitepapers page here: http://www.sqlskills.com/whitepapers.asp and the CAT team's whitepapers pages here: http://sqlcat.com/whitepapers/default.aspx and the general SQL Server on microsoft.com pages here: http://www.microsoft.com/sqlserver/2008/en/us/white-papers.aspx and for 2005 here: http://www.microsoft.com/sqlserver/2005/en/us/white-papers.aspx
  • Maintenance - often overlooked and incredibly important. A database that has solid maintenance practices (fragmentation analysis and cleanup, VLF analysis and cleanup, transaction log management, finding corruption in its early stages through automated CHECKDB executions...) performs better, is easier to recover, might naturally stay smaller (more compact) and therefore require less hardware. In fact, analyzing indexes - to get rid of unused indexes and to consolidate redundant indexes can end up saving disk space, backup space, cache, maintenance costs, etc. Both Paul and I have blogged quite a bit about many of these!
  • Other tips and tricks
    • Blogs... which is why you're here and there are so many out there! Here's a link I recently found that lists a bunch of SQL-related blogs: http://technet.microsoft.com/en-us/sqlserver/bb671052.aspx and, of course, Paul's post on "So many blogs" and the PASS list of blogs here: http://www.sqlpass.org/Community/BlogDirectory.aspx.
    • Webcasts... there are lots out there and we now have a page which has most of ours listed on it (thanks to Paul for creating this!!) here: http://www.sqlskills.com/webcasts.asp and there are LOTS more on TechNet, MSDN, etc.
    • Conferences... OK, maybe a shameless plug for conferences like SQLConnections *but* in having put together the agenda (with Paul) where we specifically focused on best practices topics and performance tuning - I can tell you that some of the tips and tricks that we recommend can significantly improve performance, may minimize needed disk space (by creating more optimal and often fewer indexes), may improve availability with better design practices and/or maintenance and much more than that! And, in getting away from the office for a few days and focusing just on learning you might do two things. First, you might learn some tips and tricks that you never would have (or it would have taken *a lot* more time and/or been harder to really understand?). Second, you might come back with a whole new and renewed enthusiam for doing things - and with an ordered/prioritized list of things to try. And, this might even help to motivate you because it also shows that your company really is committed to you/your job (having spent money specifically on your learning) - and you to them.

So, I do think that there are SMARTER ways to save. A well trained employee is worth a lot more than a cheaper one. And, there are smarter things to cut. I hope this might help you think of things to do and/or places to look to get better performance with what you have! I think blanket "no training" or "no upgrades" statements are never good for anything - even the budget (the longer term effects can be much worse - but also much harder to quantify).

Really, the answer is always different. It depends............

kt

Paul and I started discussing a comment that came up regarding the many issues surrounding logging & recovery. It's one of our favorite topics and in fact was the title to an article that Paul recently wrote for TechNet here: http://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx. And, as a sidenote, depending on how much you already know about the transaction log - you might want to review that article first!

The comment that came up was related to a common misunderstanding on what is and what is not required to make a backup transactionally consistent when restored. And, in my opinion, some of the confusion as to whether or not log backups are "required" is because many changes have occurred release to release. Also, a lot of us say "log backups are required for better recovery" and while restoring log backups is what allows features like up-to-the-minute recovery and point-in-time recovery, not all strategies or recovery procedures actual require additional and/or separate log backups (some backups actually backup part of the log during their backup - and this is actually something that has changed release to release). And so, this is the reason for this post, I want to try and clear up a few of the many misconceptions about what happens with regard to the log during backup and restore. What's really interesting is that some of the best features (seemingly minor) have been around exactly this - the behavior of the transaction log during other backups and the requirement during restore. So, I thought I'd give a play by play from 2000 to 2005/2008 to discuss the differences and what's changed and why those changes were significant. The biggest changes were between 2000 and 2005.

First and foremost, the log portion of a database is required to make that database transactionally consistent. The transaction log is the key to SQL Server's durability (data integrity even after power loss). Transaction log backups are the key to our being able to recover from more catastrophic failures (possibly even point-in-time recovery if the right backup strategy exists). Inside the database, SQL Server doesn't really need all of the transaction details after they've guaranteed a transaction's durability (or, more simply put, once the effect of the change has been reflected in the data portion of the database then the details of that change are no longer needed in the log portion of the database). As a result, you can have SQL Server clear the "inactive" portion of the log by setting the database's recovery model to the SIMPLE recovery model. Loosely translated the SIMPLE recovery model means "when SQL Server no longer requires the transaction information to guarantee durability - then the log information can be removed from the log". Setting the recovery model to SIMPLE limits your backup options and makes administration easier (i.e. simple :)); however, it does not offer any other protection in the event of a more catastrophic disaster (because the log is being regularly cleared then there's no transactional information to backup). For some development/test databases and databases where data loss is not a major concern, then this can be an easy choice because log management (i.e. backups) does not need to be performed. However, if you want to minimize data loss - you can't choose the SIMPLE recovery model; you must choose either the FULL (which is the default) or the BULK_LOGGED recovery model. However, the discussion on when/why to choose BULK_LOGGED is a lengthy one and it does NOT impact the rest of this blog post. However, I did write a chapter for a SQL Server 2000 HA book and I described in detail the best uses for the BULK_LOGGED recovery model as well as the benefits and concerns. While this was written for SQL Server 2000, most of it *still* applies (and there are a few timeline based examples as well). You can download a pdf of this chapter here: http://www.sqlskills.com/resources/SQLServerHAChapter9.pdf.

As for the main purpose of this post - there are basically a few key questions that I want to answer/clarify by version:

  • Is the log backed up as part of the other backups?
  • Is it cleared?
  • Is there anything else that's affected?
  • What happens to the log during other database, filegroup, file, database-differential, filegroup-differential and file-differential backups? And, since the behaviors and internals seem to be grouped into two groups, I will differentiate between these two different groups of backup strategies with the following types:   
    • Database-level backup strategies are backups that use database and optionally database-differential backups
    • Granular backup strategies are backup strategies that use file and/or filegroup backups and optionally file-differential/filegroup-differential backups

SQL Server 2000
Database-level backups cannot occur simultaneously with log backups. However, granular backups *CAN* occur concurrently with log backups.
If a log backup is attempted while a database-level backup is running, then the log backup is paused. This can have the following affects:

  • the transaction log may require auto-growth and become very large
  • the transaction log for a secondary server (i.e. through log shipping) can fall *very* far behind the primary server. And, this is a HUGE concern for high availability. If a full backup takes 4 hours to run, then logs cannot be shipped for 4 hours. As a result of this limitation, some chose to use a granular backup strategy. The reason why log backups CAN occur concurrently in SQL Server 2000 is because, in implementation, SQL Server does NOT to backup the log as part of these more granular backups. As a result, transactional integrity is not guaranteed until the appropriate log chain is rolled forward. This has the following affects:
    • Granular backups only support the BULK_LOGGED or FULL recovery models (somewhat negative but not really)
    • The transaction log backups could run and even clear the inactive portion of the log while these granular backups were running (this is a huge benefit because it limits the need to auto-grow during these backups)
    • Recovery during restore is required (for transactional integrity) which means that all logs need to be restored to cover the time of the granular backup (and then all of those up-to-the-minute or to the desired point-in-time). And, even if a filegroup is set to READ_ONLY - *all* transaction logs need to be restored (this is a big negative but there is a trick: perform periodic file/filegroup differentials (after setting the filegroup to READ_ONLY) so that you can avoid having to perform numerous transaction log restores).

So, to answer the questions for SQL Server 2000:

  • Is the log backed up as part of the other backups?
    • for Database-level backups: YES
    • for Granular backups: NO
  • Is it cleared?
    • When a transaction log backup is performed then the default behavior is to clear the inactive portion of the log
    • When a database-level backup is performed AND there's no corresponding log chain (meaning the first time you backup the database OR the first time you backup the database after the transaction log chain was broken), then the transaction log is cleared. NOTE: Breaking the continuity of the log is relatively easily done in SQL Server 2000 when someone executes a BACKUP LOG with NO_LOG or a BACKUP LOG with TRUNCATE_ONLY command. To disable these from execution (for the FULL or BULK_LOGGED recovery model, use TRACE FLAG 3231). This is a VERY COOL and *SAFE* trace flag. I blogged about this trace flag in a "MSDN webcast Q&A" here. An important side note here is that in SQL Server 2000, log backups can be performed AFTER the continuity of the log has been broken. So, if someone manually cleared the log (using NO_LOG or TRUNCATE_ONLY) and did NOT follow that with a database-level (or appropriate granular backups), then scheduled log backups could continue to run without failure or errors. However, log backups performed AFTER the continuity of the log has been broken CANNOT be restored. So, during recovery you might receive an error that a log backup cannot be applied because it's too "late" to apply. Using Trace Flag 3231 reduces this possibility. However, SQL Server 2005 fixes some of these issues.
  • Is there anything else that's affected?
    • Log backups are paused during database-level backups
    • When restored, database-level backups are transactionally consistent (and can be recovered directly - without restoring additional logs)
    • When restored, granular backups require transaction log backups to guarantee transactional integrity (note: this can be complex to determine the "minimum effective log sequence" and I wrote a series of articles for SQL Server Magazine on how to determine the appropriate log sequence here)
  • What happens to the log during other database, filegroup, file, database-differential, filegroup-differential and file-differential backups?
    • for Database-level backups: log backups cannot occur concurrently 
    • for Granular backups: log backups can occur concurrently and are required for recovery

SQL Server 2005
The biggest improvement in SQL Server 2005 was that log backups are no longer paused by database-level backups - they *can* occur simultaneously; however, this change to database-level backups also applied to granular backups. While 2000 did allow log backups at the same time as a granular backup, they did so by NOT maintaining transactional integrity in the backup. In SQL Server 2000, you need to restore logs to make the granular backup transactionally consistent. In SQL Server 2005, they changed ALL backup strategies to follow the same behavior - database-level and granular backup strategies ALL backup the required log information needed to recover the backup to a transactionally consistent point in time which is essentially when the backup completes (this is a lot more complex than it sounds but Paul wrote a comprehensive post on exactly what this means here). Simply put, this requirement means that transaction log backups CAN occur concurrently; however, the log CANNOT be cleared until the backup completes. The primary negative effect is that the transaction log may require auto-growth and become very large. However, the positives are that you can do granular backups in any recovery model (although there are still some limitations to how this works in the SIMPLE recovery model but they added a new option during backup to allow a backup of ALL of the READ_WRITE_FILEGROUPS as a unit - separately from the read-only file groups which could be backed up at any time after they are set to READ_ONLY).

So, to answer the questions for SQL Server 2005:

  • Is the log backed up as part of the other backups?
    • for Database-level backups: YES
    • for Granular backups: *YES*
  • Is it cleared?
    • When a transaction log backup is performed then the default behavior is to clear the inactive portion of the log
    • When a database-level backup is performed AND there's no corresponding log chain (meaning the first time you backup the database), then yes, the inactive portion of the log is cleared. As far as breaking the continuity of the transaction log... In SQL Server 2005, they significantly reduced the problems that occur after the log chain is broken by NOT allowing log backups to continue. If a log backup is attempted after the continuity of the log is broken then you will receive error: 
        Msg 4214, Level 16, State 1, Line 1
        BACKUP LOG cannot be performed because there is no current database backup.

      So, this means that you don't necessarily need the trace flags. However, I still recommend using the trace flag because it would be better to not break the continuity of the trace flag to begin with! And, in fact, in SQL Server 2005, there are two trace flags: 3231 and 3031. They are both safe and here's how the two differ:
      • Trace Flag 3231 (same as 2000): When set, BACKUP LOG with TRUNCATE_ONLY and BACKUP LOG with NO_LOG do not allow a log backup to run if the database's recovery model is FULL or BULK_LOGGED.
      • Trace Flag 3031 (new in 2005): When set, BACKUP LOG with TRUNCATE_ONLY and BACKUP LOG with NO_LOG run as a CHECKPOINT - regardless of recovery model.
  • Is there anything else that's affected?
    • Log backups are *NOT* paused during database-level backups
    • When restored, database-level backups are transactionally consistent (and can be recovered directly - without restoring additional logs)
    • When restored, granular backups are transactionally consistent (and can be recovered directly - without restoring additional logs). However, you must always remember that the database cannot be brought online until the entire database is at a single transactionally consistent point in time. All read-write-filegroups must be restored as a unit (if in the SIMPLE recovery model) OR you must use transaction log backups to recover the entire database up to the SAME point in time.
  • What happens to the log during other database, filegroup, file, database-differential, filegroup-differential and file-differential backups?
    • for Database-level backups: log backups *can* occur concurrently (but the log will not be cleared until the backup completes) 
    • for Granular backups: log backups can occur concurrently (but the log will not be cleared until the backup completes) 

SQL Server 2008
Almost everything is the same in SQL Server 2008 as it was in 2005 - they made the largest number of improvements in 2005. However, one thing did change. In SQL Server 2008, the BACKUP LOG with NO_LOG and BACKUP LOG with TRUNCATE_ONLY options are not allowed at all. There is no need for the trace flags (3231/3031) because breaking the continuity of the log is not allowed (well, there is still a way... I'll get to that in a moment :)). In SQL Server 2008, if BACKUP LOG with NO_LOG or BACKUP LOG with TRUNCATE_ONLY are attempted, you will receive this error:
     Msg 3032, Level 16, State 2, Line 1
   One or more of the options (no_log) are not supported for this statement. Review the documentation for supported options.
But, what if you really don't want to backup the log? Why? Take this scenario (from a real customer!)... You have a 10GB database that's been around for quite some time AND you're doing regular full database backups... then, all of a sudden you run out of disk space. In looking around for large files (to investigate why you ran out of space), you find that this 10GB database's log is 987GB... so, you wonder - what happened? A database that is in the FULL recovery model (remember, this is the default) requires transaction log management. The easiest way to manage the log is with regular log backups; however, you're only doing full database backups (which do NOT clear the log). As a result, the transaction log grows and grows and grows and grows - until you're out of disk space (Paul demo'ed this in a TechNet Podcast here). At this point, how do you get rid of this 987GB transaction log? In prior releases, you can "clear" the log by using TRUNCATE_ONLY or NO_LOG but in 2008, what do you do? Switch to the SIMPLE recovery model. And, if you only want to do full database backups, stay there. And, if you want to physically shrink down the size of the transaction log file to a reasonable size - check out these two related blog posts: 8 Steps to Better Transaction Log Throughput and Transaction Log VLFs - Too many or too few?. And, in related news, Linchi Shea posted a good post on some tests he ran related to too many VLFs here and a second post that shows that some workloads don't see any issues wrt to lots of VLFs here. But, the long story short is that you still want to be proactive about creating a reasonably sized transaction log (my two other previously mentioned posts). Significant auto-growth can cause problems and backup operations (and managment in general) can be more difficult with lots of VLFs.

Wow, that was much longer than I was expecting... and, in writing it all down - pretty complex (I had a hard time trying to section things but I think this works?!). Regardless, all the facts are there so this should help to clarify what happens the when, where and why - wrt to the transaction log. Let me know if you have more questions!

Thanks for reading,
kt

Way back in June 2005, I blogged about '8 Steps to better transaction log throughput'. I did this blog post after seeing (again and again) overly fragmented transaction logs... Transaction logs can become *VERY* fragmented when they are not preallocated and instead they grow excessively through unmanaged (and probably the default settings for) auto-growth.

While having WAY too many VLFs because of auto-growth is still the most common form of problem within transaction logs, another problem has been creeping up more and more... too few VLFs. If you preallocate a very large transaction log (10s to 100s of GB), SQL Server may only allocate a few VLFs - as a result, log backups will be allowed to run normally but, SQL Server only clears the inactive VLFs when you've moved into a different VLF. If your VLFs are 8GB in size, then you need to accumulate 8GB of log information before the log can be cleared...so, many of your log backups will occur normally but then one (the one that finally hits > 8GB in used size) will take quite a bit more time AND possibly cause you performance problems because it's now clearing 8GB of log information.

First, here's how the log is divided into VLFs. Each "chunk" that is added, is divided into VLFs at the time the log growth (regardless of whether this is a manual or auto-grow addition) and it's all dependant on the size that is ADDED not the size of the log itself. So, take a 10MB log that is extended to 50MB, here a 40MB chunk is being added. This 40MB chunk will be divided into 4 VLFs. Here's the breakdown for chunksize:

chunks less than 64MB = 4 VLFs

chunks of 64MB and less than 1GB = 8 VLFs

chunks of 1GB and larger = 16 VLFs

And, what this translates into is that a transaction log of 64GB would have 16 VLFs of 4GB each. As a result, the transaction log could only clear at more than 4GB of log information AND that only when it's completely inactive.

To have a more ideally sized VLF, consider creating the transaction log in 8GB chunks (8GB, then extend it to 16GB, then extend it to 24GB and so forth) so that the number (and size) of your VLFs is more reasonable (in this case 512MB).

Have fun and thanks for reading!!
kt

PS - I've been made aware of a bug when you use an exact size of 4096MB. I'll get more details and post them here but the long story short is to avoid 4096MB as an exact value. I've been told (and I haven't played with this one yet), that 4095 doesn't have the problem. Oh, and the problem is that the 4GB does NOT get divided into equally sized VLFs.

Well... I think I had had too much tea that morning ;-). But, as always, chatting with Richard and Greg was great. Here's the specific show link: http://www.runasradio.com/default.aspx?showNum=76.

Oh, and just for the record, I didn't come up with that title. But, I do hope that all your [high-priority and important] queries are indexed!

Enjoy!
kt

I first posted an update to sp_helpindex here. My version of sp_helpindex was solely to expand what sp_helpindex showed and adds 1 or 2 things based on version: for SQL2005+ it adds included columns and for SQL2008 it also adds the filter predicate. So, there were two versions of sp_helpindex2 depending on which verison you're using. A lot of folks like the changes to this sp but, alas, it had a bug (or two :) and in fact, I found a few others when I went back over this as well. So, thanks to Josh (who commented here) and to a private email (thanks Vasco!), I have an updated version of sp_helpindex2:

For SQL Server 2005, here's your new sp_helpindex2 script: sp_helpindex2_2005.zip (2.89 KB)

And, here's a simple test script for 2005:

DROP TABLE tbl1
GO

CREATE TABLE tbl1( c1 int, c2 int, c3 int, c4 int)
GO
CREATE INDEX ix_1 ON tbl1(c1) INCLUDE (c2)
CREATE INDEX ix_2 ON tbl1(c1)
CREATE INDEX ix_3 ON tbl1(c1) INCLUDE (c2, c3)
CREATE INDEX ix_4 ON tbl1(c1, c3) INCLUDE (c2)
CREATE INDEX ix_5 ON tbl1(c3) INCLUDE (c1, c2, c4)
CREATE INDEX ix_6 ON tbl1(c1, c2) INCLUDE (c3, c4)
go

sp_helpindex2 tbl1
go

index_name

index_description

index_keys

included_columns

ix_1

nonclustered located on fg1

c1

c2

ix_2

nonclustered located on fg1

c1

NULL

ix_3

nonclustered located on fg1

c1

c2, c3

ix_4

nonclustered located on fg1

c1, c3

c2

ix_5

nonclustered located on fg1

c3

c1, c2, c4

ix_6

nonclustered located on fg1

c1, c2

c3, c4

For SQL Server 2008, here's your new sp_helpindex2 script: sp_helpindex2_2008.zip (2.84 KB)

And, here's a simple test script for 2008:

DROP TABLE tbl1
GO

CREATE TABLE tbl1( c1 int, c2 int, c3 int, c4 int)
CREATE INDEX ix_1 ON tbl1(c1) INCLUDE (c2)
CREATE INDEX ix_2 ON tbl1(c1)
CREATE INDEX ix_3 ON tbl1(c1) INCLUDE (c2, c3)
CREATE INDEX ix_4 ON tbl1(c1, c3) INCLUDE (c2)
CREATE INDEX ix_5 ON tbl1(c3) INCLUDE (c1, c2, c4)
CREATE INDEX ix_6 ON tbl1(c1, c2) INCLUDE (c3, c4)

CREATE INDEX ix_1f ON tbl1(c1) INCLUDE (c2)
WHERE c3 IS NOT NULL

CREATE
INDEX ix_2f ON tbl1(c1)
WHERE c4 > 2

CREATE INDEX ix_3f ON tbl1(c1) INCLUDE (c2, c3)
WHERE c4 > 2 AND c1 < 50 AND c2 = 12

CREATE INDEX ix_4f ON tbl1(c1, c3) INCLUDE (c2)
WHERE c4 IS NOT NULL AND c1 = 12

CREATE INDEX ix_5f ON tbl1(c3) INCLUDE (c1, c2, c4)
WHERE c1 > 5

CREATE INDEX ix_6f ON tbl1(c1, c2) INCLUDE (c3, c4)
WHERE c4 < 20
go

sp_helpindex2 tbl1
go

index_name

index_description

index_keys

included_columns

filter_definition

ix_1

nonclustered located on PRIMARY

c1

c2

NULL

ix_1f

nonclustered located on PRIMARY

c1

c2

([c3] IS NOT NULL)

ix_2

nonclustered located on PRIMARY

c1

c2

NULL

ix_2f

nonclustered located on PRIMARY

c1

c2

([c4]>(2))

ix_3

nonclustered located on PRIMARY

c1

c2, c3

NULL

ix_3f

nonclustered located on PRIMARY

c1

c2, c3

([c4]>(2) AND [c1]<(50) AND [c2]=(12))

ix_4

nonclustered located on PRIMARY

c1, c3

c2

NULL

ix_4f

nonclustered located on PRIMARY

c1, c3

c2

([c4] IS NOT NULL AND [c1]=(12))

ix_5

nonclustered located on PRIMARY

c3

c1, c2, c4

NULL

ix_5f

nonclustered located on PRIMARY

c3

c1, c2, c4

([c1]>(5))

ix_6

nonclustered located on PRIMARY

c1, c2

c3, c4

NULL

ix_6f

nonclustered located on PRIMARY

c1, c2

c3, c4

([c4]<(20))

Have fun!
kt

YES!!!

OK, well, I guess I should be more specific because as in most things in SQL Server, the real answer is "it depends". And for these two options, it depends mostly on your SQL Server version. Since SQL Server 7.0, the way that auto update works, has changed (much so for the better!!). So, if you're in SQL Server 2005 or SQL Server 2008, I would say most definitely - leave these ON (or if you turned them off - turn them back on!!!)! If you still have problems with a specific index causing your grief, then turn off auto update at the index level NOT at the database level. To turn of auto update at the index level use STATISTICS_NORECOMPUTE in the index (or statistics) definition.

Now, as for why (and how!) this has changed over the versions... here we go:

SQL Server 7.0

  • Invalidation: Statistics were *invalidated* when a row modification counter (sysindexes.rowmodctr) was reached. This meant that they could not tell where the modifications were occuring and, if modifications were somewhat isolated to a specific column ALL of the statistics for the TABLE would be invalidated (so, statistics could be invalidated earlier than necessary)
  • Updating: Even worse, in SQL Server 7.0, when statistics were invalidated, they were immediately updated. This caused two problems - thrashing at the time of invalidation because all of the stats needed to be updated AND two, if the statistics were not used for awhile then extra work was involved to update them and by the time they were used, they might already be somewhat out of date already.

SQL Server 2000

  • Invalidation: Statistics were still invalidated based on a row modification counter.
  • Updating: SQL Server 2000 fixed the "updating-potentially-too-often" problem by only updating statistics when they were needed.

SQL Server 2005

  • Invalidation: The biggest changes were introduced in SQL Server 2005 where they decided to NO LONGER user the sysindexes.rowmodctr and instead use an internal (and undocumented) columns specific modification counter. Now, statistic invalidation is more isolated to only those columns which are heavily modified. This internal/undoc'ed column is sysrowsetcolumns.rcmodified and can only be seen when connecting to SQL Server using the DAC (Dedicated Admin Connection).
  • Updating: Updating didn't really change but, SQL Server 2005 added "Aynch Auto Update" for statistics so that when the QO (query optimizer) encounters an out-of-date (i.e. invalidated) statistic, they can "trigger" the update but not wait for the update (meaning that they'll optimize using the out-of-date statistic). This can be both positive (faster) and negative (might not be the best plan if the statistics have changed drastically). It is off by default and IMO, I'd leave it off in most cases but if you find that auto update events (which can be Profiled) are causing you grief, then you can turn this on at the database level.

SQL Server 2008

Nothing new except "Filtered Statistics" and these are interesting as the density vector is still relative to the table (not the predicate) but the histogram is just over the predicate (OK, I know I'll have to blog a lot more about this one!). Anyway, I'm still playing/learning a lot more about these and they make the most sense with filtered indexes (as opposed to just a filtered statistic) but, just like statistics on secondary columns you will also potentially want statistics on the secondary columns of your indexes. The next question is should they have a filter or not. I've found that sp_createstats doesn't seem to create statistics with filters and I'm going to need to do some testing here but I think statistics with filters (filters that match the non-clustered index) should help to make the stats better (and even allow better usage of filtered indexes) but, I'm really going to need a bunch of time with this - and another post :). As for auto create/auto update - no changes there!

Long story short, if you're using SQL Server 2005 or SQL Server 2008, you should leave auto create/auto update ON.

Thanks for reading!
kt

PS - A few of you have mailed me about a bug in the sp_helpindex2 script(s). OK, that's my next post!!! Possibly with an sp_helpstats2 script as well!

OK, I first posted on some of the limitations to indexes in SQL Server 2005 and 2008 in part one here. Now, I want to dive into index internals for a post (or two). And, I often get the question “who is the best audience for your blog – or, for this post” and well, that’s a bit hard to answer. At SQL Connections in Orlando, I delivered a session titled: Index Internals & Usage and while we (fyi – Paul and I co-chair the SQL Connections portion of “DevConnections”) put it in the "developer-focused track," it was more of a Dev/DBA "hybrid" session with the emphasis on database development and best practices in creating and managing indexes (rather than management/maintenace/operational tuning - which is more for DBAs). Here at TechEd this week, I'm going to focus more on the management/maintenace/operational tuning side with a session called Are your Indexing Strategies Working? I'll also do a complementary blog post for that as well...

Having said that thought, indexes are definitely in a group of topics - very much so related to performance and scalability (index internals, indexing strategies, log maintenance, general database maintenance) which really needs to cross almost all database-related disciplines (dev, admin, ops, etc…). If you work with SQL Server in almost any capacity, you need to get a feel for at least some aspect of indexing for performance.

So, for this post, I’m continuing with some internals. In the first post (in this series), I wrote about limits. Limits/boundaries are interesting to discuss but it's also important to remember that good performance takes a lot more than just staying within the bounds of what’s possible. Creating indexes solely because you can – without reason and only with upper limits in mind – can be even worse than under indexing. So, if you find that you're wanting more about indexes (I have many blog posts that are solely Q&A posts), check out my Indexing category here. Now that you know how many indexes you can create, a better question would be when is it appropriate to create indexes at all?

So, what is “finding the right balance” in indexing? In my opinion, there are three requirements/pre-requisites:

  1. knowing the data
  2. knowing how the users use the data
  3. knowing how the underlying structures and database stores/manipulates and uses indexes

Bringing all of these things together is what I try to do in my workshops, seminars and lectures – in this post, I'll start with a smaller more digestible piece - internals.

Indexes have 2 components: a leaf level and a non-leaf level (or b-tree). The non-leaf level is interesting to understand and discuss (in terms of internals) but simply put, it’s used for navigation to the leaf level (more than anything else). So, we'll start with the leaf level (as does SQL Server - the leaf level is always built first). The leaf level of an index contains something (I’ll explain more coming up) for every row of the table in indexed order (note: I am focusing on traditional indexes in every release from SQL Server 2000 up to and including SQL Server 2008 – with the exception of filtered indexes which I will write about in a later post). Once the leaf level is built, non-leaf level(s) can be built to help navigate to the leaf level but the architecture is rather straightforward. The non-leaf level stores something for every page of the level below – and levels are added (each smaller than the previous because each level only contains one the first entry from every page) until the index gets to a root of one page. While it sounds like this could result in a lot of levels (ie. a tall tree), the limitation on the size of the key (which has a maximum of 900 bytes or 16 columns) helps to keep index trees relatively small. In fact, in the example I’ll show coming up – which has a fairly large (large meaning WIDE) index and has a key definition which is at the maximum size – even the tree size of this example index (at the time the index is created) is only 8 levels high/deep…

To see this tree (and the math used to create it – which is the same thing that SQL Server would go through to create it), we’ll use an example where the leaf level of the index contains 1,000,000 “rows.” I put quotes around “rows” because I don’t want to imply that these have to be data rows – these are really just leaf level rows and I’ll explain more on what leaf level rows can be... The leaf level rows are 4,000 bytes per row (therefore only 2 rows per page) or 500,000 pages. This is not ideal but at least the pages are almost full and we’re not wasting a lot of space – if we had two 3000 byte rows we’d still only fit 2 per page and then we’d have 2,000 bytes of wasted space. Now, as for why these are just “rows” and not specifically data rows is because this leaf level could be the leaf level for a clustered index (therefore data rows) OR these leaf level rows could be rows in a non-clustered index that uses INCLUDE (which was new to SQL Server 2005) to add non-key columns to the leaf level of the index (which therefore creates wider leaf rows (wider than the 900 bytes or 16 column maximum). Again, while this doesn’t currently sound interesting, I’ll explain why this can be beneficial coming up (possibly in another post depending on how long this particular post becomes… J).  

The leaf level of this index would result in a 4 GB structure (and this is only at the time it’s created – if a lot of rows are added and the key is not ever increasing then this structure could become heavily fragmented and therefore much larger/taller). In this case, it’s relatively large (again because of “row” width) and with an index key of 900 bytes you can even see that in this case, the tree would be relatively small and only result in 8 levels – as shown below.

Root page of non-leaf level (Level 7) = 2 rows = 1 page

Intermediate non-leaf level (Level 6) = 15 rows = 2 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 5) = 122 rows = 15 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 4) = 977 rows = 122 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 3) = 7,813 rows = 977 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 2) = 62,500 rows = 7,813 pages (8 rows per page at 900 bytes)

Intermediate non-leaf level (Level 1) = 500,000 rows = 62,500 pages (8 rows per page at 900 bytes)

Leaf level (Level 0) = 1,000,000 rows = 500,000 pages (2 rows per page)

 

Having said that though, this is NOT a goal. :) In more realistic scenarios [where the key is much smaller and] even when there are more rows, there are fewer levels (3-4 is quite normal). Most importantly, the size of an index (and the number of levels) depends on two things – the width of the key (in terms of the number of bytes) and the number of pages in the leaf level of the indexes. The number of pages in the leaf level of an index depends on the number of rows and the size of the rows (again, in terms of bytes) of the rows in the leaf level.

You can see the size of your index by using one of the following commands:

In SQL Server 2000: DBCC SHOWCONTIG … WITH ALL_LEVELS

In SQL Server 2005/2008: querying the dmv: sys.dm_db_index_physical_levels

To see the syntax of these commands and their output, we’ll use some structures created in the credit sample database. Using credit, you can see exactly how these commands work and how they return the details about every level.

NOTE: you can download a zip of a SQL Server 2000 backup of this database here – and since this is a SQL Server 2000 backup, you can restore this to SQL Server 2000, SQL Server 2005 or SQL Server 2008.

USE credit
go

SELECT *
FROM sys.dm_db_index_physical_stats
    (db_id(), object_id('Charge'), 1, NULL, 'DETAILED')
go

DBCC SHOWCONTIG('charge', 1) WITH ALL_LEVELS, TABLERESULTS
go

Using the DMV or DBCC SHOWCONTIG you can get the same picture of the charge table. Using the detailed (or ALL_LEVELS) parameter, you get the entire structure (all levels) for the clustered index (index_id = 1 is always the clustered index, IF the table is clustered). The reason it returns all levels is that the 'DETAILED' mode has been specified.

The clustered index in this table has 1,600,000 rows (DMV column: record_count or SHOWCONTIG column: rows) and these are stored on 9303 pages (DMV column: page_count or SHOWCONTIG column: pages). If you read to the next level which is level 1 because the leaf level is level 0 (remember index levels always start with the leaf level 0 and then go up to the root), you can see that it's number of "rows" is equal to the number of pages in the leaf level... and this keeps going until you get to a root of 1 page. In this case, the clustered index (which is the widest structure of the table) has a very narrow clustering key (the key is on charge_no which is an int) only has a total of 3 levels even though the table has 1,600,000 rows. Ideally, you should run this on a few of your production tables (in a development/test environment) and you can start to get some insight into how big your structures are. However, a BIG factor that you might see in production is fragmentation. If a particular level (or levels for that matter) are heavily fragmented then each level might be wider and less compact (and therefore less performant). Reviewing the DMV columns avg_fragmentation_in_percent and avg_page_space_used_in_percent, you can get a feel for how full each page is. Poor page density reflects that your pages are not as full as they could be but there are many factors for why this is the case: bad row size, splits due to inserts, splits due to updates of varchar columns or even a poorly chosen fillfactor that has left too much space on the pages. However, page density is only one piece of the puzzle and if your avg_fragmentation_in_percent is very low (0-5%) then I wouldn't be over worried about your pages not being entirely full unless you have the time to possibly re-design tables (eg. vertically partition them) and then rewrite your applications to direct your statements at only the appropriate base table. But, another factor to consider is the rate at which your fragmentation occurs as well as when you can fix that fragmentation. This is a HUGE discussion that requires time... And, I want to get back to index structures for now. However, both Paul and I have blogged quite a bit about rebuilding v. defragging indexes and what those operations do/how, etc. In fact, just today, Paul has blogged a Q&A about myths and misconceptions about index rebuild operations. So, I'll get back to internals for now! :)

You can use LIMITED (which is the default mode), SAMPLED, or DETAILED. All three have excellent uses and all use IS locks (to minimize blocking). Limited gives you a quick overview of fragmentation and mostly describes how intact and in order the levels are. Limited is quite clever in that it only scans the first non-leaf level above the leaf to determine how much fragmentation there is... since the non-leaf level always tracks the first entry (and a pointer to the page) then they know EACH and EVERY page in the leaf level by ONLY reading the non-leaf level (which is [typically] a lot smaller and therefore faster). However, because they don't touch every page and determine page density then they only track how out of order the levels are and not how dense/full the pages are (which is also a form of fragmentation). So, if you want a bit more details, you can use SAMPLED. The SAMPLED mode returns the fragmentation from reading every 100th page of the index (or heap). If the table has less than 80MB used (which is 10,000 pages), every page is read instead (which is a DETAILED scan). The DETAILED mode reads every page of every level to calculate the most accurate picture of your tables fragmentation. This is the best form of analysis but also takes the most time.

If you’re interested in learning a few more of the tips/tricks with using this DMV, check out the following script: Using dm_db_index_physical_stats.zip (2.23 KB)

A favorite tip is that the database in which you want to analyze tables does NOT have to be in 9.0 compatibility mode in order to use this DMV. Don’t get me wrong, you will get errors if you try to use this DMV in a database that’s not in 9.0 compat mode; however, if you are in master (which is set appropriately and cannot be changed) and then use the first parameter to target a non-9.0 compat mode database, then this DMV works great. However, a second "gotcha" is for parameter 2... as long as you don’t use 2-part naming for the objectname (2nd) parameter, everything will work as expected. If you specify object_id('tablename') from master for a table that's in credit then object_id will return NULL. The query will still run but against all tables in credit rather than the one you thought you were targeting. If you want to use this DMV across databases, you will need to supply the database name in the first parameter and then make sure that you use 3-part naming for the second parameter.

Now that you are getting to know some of the structures (in terms of seeing physical structures and internals), where do we go from here? The best route to start “finding the right balance” for performance is to know the data and as well as get some general insight into usage patterns (this is probably the hardest component to know and sometimes you only know exactly what’s going on if you profile what’s actually happening in production – is that too late? To a certain extent yes and to another extent no…there are still many things for which you can plan and other things you can confirm or test once the application is running (i.e. Profiler). All of those things together are going to help to “find the right balance”.

Having said that, and having discussed the general internals of a b-tree (and therefore an index structure), what’s the difference between a clustered and non-clustered index? Well… stay tuned, that will be part 3 in this series. And, then (finally), we'll get to appropriate uses for INCLUDE (which was new for SQL Server 2005) and then appropriate uses for Filtered Indexes (a new feature in SQL Server 2008). Also, somewhere in there I'll post a few tips from my TechEd session so that you can start to determine if your indexing strategies are working??

Thanks for reading!
kt

Memorial Day weekend we were in Chicago to celebrate my Father's life. We did a "Celebration of Life" memorial and we had a few drinks (celebratory Meyers, Tonic and lime - which was my Father's favorite drink), we (7 of us) gave a few heartfelt speeches, and a few friends wrote a song (and passed out the words - to which we all sang along) and we grieved... but, in a refreshingly-not-overly-depressing way. I have to admit - it was exactly what I'd want as well. It was a wonderful day filled with memories and friends. After that, we visited with my Mom as well as my Grandmother. Paul blogged a couple of pics (yes, that chair is VERY big!). Then, we were back in Seattle for only one week...back to work...and preparing for TechEd 2008 ITPro week.

As for TechEd being spread over 2 weeks, well... I think it offers some excellent logistical options (smaller size means more possible venues AND/OR it means that they could possibly grow the size for each event). And, for some topics, I think there is a very strong separation between developer and ITPro (admin/ops) but for SQL - I think it's hard to get it perfectly right. I think there's a lot of developers who need to know more about admin/ops just so that they can develop more optimal (and even manageable solutions) and I think that DBAs should have a really good architectural overview of a lot of features to better administer them. So, for SQL, I'd *love* to hear your comments on what you think............

For Paul and I, the decision is relatively simple, we came for this second week for ITPro/Ops. But, we've also spoken at the developer events (and sometimes we even write/present sessions specifically targeted at developers at our SQL Connections shows and/or at User Groups (we just did a local .NET user group in Redmond and the discussion around Indexes became so popular that we're going back in August (for Indexes) and again (tbd) for Disaster Recovery techniques). Basically, developers tend to say...oh, that's why I should x or y or z....... so, maybe next year we'll hit both? Regardless, I'd still like to know what you think? Were you at the Developers event? Do you wish you could be at both? Are you at both?

As for what we're doing - Paul's already blogged it here: http://www.sqlskills.com/blogs/paul/2008/06/05/OffToTechEdUSITProTomorrow.aspx.

But, I thought I'd do a quick recap so that you can get some insight into our week as well as where to find us to come and chat. We'd love to meet you and/or hear your success (or disaster) stories!

Monday

  • Full day pre-con seminar: SQL Server 2008 Overview for DBAs

This is ACTION packed (and a very full day!) and will include giving out the updated SQL Server 2008 HOLs DVD. We weren't really sure we were going to be able to do it... we didn't really burn too many of the CTP6 version of the DVDs, nor were we sure that CTP6 would still be the most relevant. But, it's still excellent to learn on and this time our DVD includes 17 labs:

HOL Lab Filename

Length

Lab Name

Lab Abstract/Description
Using Policy-based Management.doc 75 minutes SQL Server 2008 Policy-based Management Security, best-practices, proper configuration settings - how do you control these things on one or more server? These hands-on labs show you how to implement and leverage the new policy-based management framework to define and control your business rules and your server's compliance for one or more instances of SQL Server 2008. 
Data Recovery Preventative Techniques.doc 75 minutes SQL Server 2008 Data Recovery and Preventative Techniques Hands-on Lab Can you recover from a dropped table? Can you reconcile tables that have become out of sync due to human error? These hands-on exercises show you how to bring a database back online quickly after a table is dropped as well as how to reconcile the differences between a production environment and a recently restored version of your database - so that you can manually merge the recovered data back into your production database. Once all of the recovery techniques are shown, the last exercise shows how DDL triggers can prevent some of these human errors altogether.
Table and Index Partitioning.doc 75 minutes SQL Server 2008 Table Index & Partitioning Hands-On Lab Table and Index Partitioning allows large tables to be managed more granularly. These hands-on labs show you how to implement and leverage these key features: a partition function, a partition scheme and the sliding window scenario.
Database Mirroring Part I.doc 75 minutes SQL Server 2008 Database Mirroring, Part I Database Mirroring allows you to create a secondary (mirror) database to handle requests either automatically or manually, in the event of a diaster at the principal database. These hands-on labs show you how to implement and leverage as well as when and how to use Database Mirroring. You will setup database mirroring in a High Availability configuration (synchronous mirroring with a witness), see the affects of failover, and see how automatic page repair restores damaged pages in the principal or the mirror.
Peer to Peer Replication.doc 75 minutes SQL Server 2008 Peer-to-Peer Replication Hands-On Lab Replication gives you a scale-out configuration where multiple servers participate in bi-directional transaction replication. Setting up and configuring this topology has a few requirements - many of which are minimized by using the Replication Wizards - but all need to be understood to configure a peer topology correctly. These hands-on exercises show you how to implement a peer topology correctly.
Using Performance Data Collection.doc 75 minutes Performance Data Collection in SQL Server 2008 Performance Data Collection brings together many key tuning features into one cohesive toolset. These hands-on labs show you how to create a Management Data Warehouse, setup and control the collection intervals and analyze the results of system data collection sets.
Instant Initialization.doc 45 minutes SQL Server 2008 Instant Initialization Instant Initialization allows data files of any size to be created instantly - eleminating zero-initialization. These hands-on exercises show you how to configure your server's permissions to leverage instant as well as the security vulnerability created by enabling this feature.
Online Operations.doc 75 minutes SQL Server 2008 Online Operations Hands-On Lab Online Operations are critical to the success of any server that needs to be highly available. These hands-on labs show you how to implement and leverage these key features: online index operations, partial database availability and online piecemeal restore.
Database Development Clients Lab.doc 120 minutes SQL Server 2008: Database Development Hands-On Labs The goal of these hands-on lab materials is to get an understanding of when to use one or more of the advanced features of SQL Server 2008 Database Development. After completing these self-paced labs, you will be able to:
* Set up a Database Project using Visual Studio 2008 Team System Database Edition
* Make changes to the database schema and deploy those changes
* Create and edit a project that uses the LINQ to SQL Object Relational mapper
* Use LINQ to SQL to query and maintain a SQL Server database using the managed classes
* Use LINQ to SQL with stored procedures
* Create and edit a project that uses the ADO.NET Entity Data Model mapper.
* Use the ADO.NET Entity Data Model to model a many-to-many relationship in the database
* Use the ADO.NET Entity Framework classes and LINQ to Entities to query and update a database
* Use Visual Studio 2008 to quickly get an ADO.NET Sync Services application up and running.
* Set up SQL Server 2008 Change Tracking
* Use ADO.NET Sync Services with SQL Server 2008 Change Tracking
Snapshot Isolation.doc 75 minutes SQL Server 2008 Snapshot Isolation Hands-On Lab The goal of these hands-on lab materials is to get an understanding of the appropriate uses of transaction isolation levels as well as how snapshot isolation affects conflicting readers and writers.
Database Mirroring Part II.doc 120 minutes SQL Server 2008 Database Mirroring, Part II Part II of the Database Mirroring HOLs allows you to go through setup, implementation and numerous failover scenarios - step-by-step. While Part I offers quicker setup through SQLCMD scripts, Part II works through the setup process more slowly allowing you to see how things work together. This lab is longer but also goes through changing the mirroring configuration as well as forcing failover. Part I should be completed first and Part II should be completed only if time permits.
Service Oriented Database Architecture.doc 120 minutes SQL Server 2008 Service Oriented Database Architecture Hands-On Lab Manual The goal of these hands-on lab materials is to get an understanding of how and when to use Service Broker in deploying a service-oriented database application.
Database Snapshots.doc 75 minutes SQL Server 2008 Database Snapshots Hands-on Lab The goal of these hands-on lab materials is to get an understanding of how to use the Database Snapshot feature of SQL Server 2008. After completing these self-paced exercises, you will be able to:
* Understand how to create a database snapshot
* Understand how to investigate file sizes and sparse file configuration (using both T-SQL queries and Windows Explorer)
* Understand the benefits and challenges with creating multiple snapshots
* Understand how a database snapshot is created when transactions are in flight as well as when they're not
* Understand how to use database snapshots for testing and reverting databases
* Understand the requirements to drop database snapshots and drop databases that have database snapshots
* Understand how to create a database snapshot on a mirror database
Dynamic Management Views.doc 75 minutes Understanding and Using DMVs Hands-on Lab The goal of these hands-on lab materials is to get an understanding of the more advanced new features of SQL Server 2008 that give access to server information that can be used for performance tuning, server health monitoring, and problem diagnosis. After completing these self-paced labs, you will be able to:
* Determine what DMVs exist, what their input parameters are, how and where their data is stored, and be able to persist DMV data to your own tables.
* Access information from the query plan cache, including determining frequently executed queries and their query plans.
* Access physical statistics information about indexes (e.g. fragmentation).
* Access information about tempdb space utilization.
Resource Governor in Action.doc 45 minutes SQL Server 2008 Resource Governor Hands-on Lab The goal of these hands-on lab materials is to get an understanding of when to use one of the more advanced features of SQL Server 2008: Resource Governor.  After completing these self-paced labs, you will be able to:
* Understand appropriate uses for Resource Governor
* Create Resource Pools
* Create Workload Groups
* Monitor Resource Usage
Understanding Spatial Data.doc 120 minutes SQL Server 2008: Understanding Spatial Data Hands-on Lab The goal of these hands-on lab materials is to get an understanding of one of the more new features of SQL Server 2008:  Spatial Data Support. After completing these self-paced labs, you will be able to:
* Understand what spatial data is
* Understand the different types of spatial data
* Create instances of spatial data
* Investigate the properties of spatial data
* Query the relationships between different instances of spatial data
* Integrate spatial data into a managed code application
* Move spatial data between managed code and the database
* Create spatial data graphically using WPF 
Using SQLCMD.doc 75 minutes Understanding Command-line Management with SQLCMD in SQL Server Hands-on Lab The goal of these hands-on lab materials is to get an understanding of one of the more new features of SQL Server 2008:  Spatial Data Support. After completing these self-paced labs, you will be able to:
* Use SQLCMD with an initialization file, system environment variables and parameters to create customized “master” scripts for automation and administration
* Use SQLCMD and SQL Server Management Studio with the Dedicated Admin Connection for troubleshooting
* Use SQL Server Management Studio to modify and execute SQLCMD mode scripts
* Upgrade databases from SQL Server 2000 using a parameterized script running with SQLCMD

This is a GREAT resource for playing with a lot of these new technologies and it's exciting that we have enough copies to give away to our pre-con attendees! A few of these are featured as HOLs at this year's TechEd as well and some of these can also be found with our prior Jumpstart resources.

Tuesday

  • 13.15 - 14.30 (Room N230) DAT354 Are Your Indexing Strategies Working? (featuring me as speaker/presenter)
  • 15.00 - 16.00 (TechEd Online Stage) Panel: Leveraging SQL Server Technologies to Build a Solid High-Availability Strategy
  • 16.00 - 18.00 DAT track booth (green)

Wednesday

  • 10.15 - 11.30 (Room N220D) DAT375 Corruption Survival Techniques: From Detection to Recovery (featuring Paul as speaker/presenter)
  • 11.30 - 14.45 DAT track booth
  • 15.00 - 16.00 Blogger's Lounge

Thursday

  • 10.15 - 11.30 (Room S230E) DAT363 Essential Database Maintenance (we're co-presenting this one)
  • 11.45 - 13.00 Speader Idol judging (I'll do my best to play Paula and I have hopes that Paul won't play Simon but he does have that British thing going for him)
  • 14.30 - 18.00 DAT track booth

Friday, we fly home... then, we're going to test all of our dive gear and take a little dive vacation at the end of the month. Hopefully, I'll be able to post a couple of nice underwater shots!

Oh, and I've finally tweaked my Indexing post (the one that survived the drive corruption - oh, but as an update to that post... Even though I got that drive repaired, almost all of the jpgs, some of the pdfs and even a few of the Office files were still corrupt. Office opened a few of them and "repaired" them on open (which was really cool) but I did lose the photos I had taken that weekend (well, all of the ones that I removed from my camera's SD card). Anyway, I plan to (well, hope to) post the Indexing post (part 2 to this one) tomorrow!

Cheers,
kt

PS - It's hot as hell here... and the humidity is NASTY!!! But, it beats the SOLID rain that we've been having in Seattle...

OK, we were in Iceland and then Florida for our Accidental DBA workshops and both went really well. People agree that there are quite a few involuntary/accidental DBAs out there and overall, we helped quite a few to see a lot of options for better performance, availability, recovery, and/or just manageability.

So, this is our "resources post". We waited until after the SQL Connections delivery to post these as we figured we might add a few more to the list (as is typical when you deliver content more than once - it's really never the same twice!).

Also, I used a few "interactive" (or build) slides in my presentation - specifically on transaction log backups and the concepts of "clearing the log" which really only clears the inactive portion of the log. To help you visualize this, I've added these slides here: TrippRandal_ClearingTheLog-BuildSlides.zip (647.2 KB).

Finally, we've taken all of the scripts that we demo'ed and placed them on SQLskills on our Past Events page here: http://www.sqlskills.com/pastConferences.asp.

And, if you were there and you think we missed something, feel free to ping me (or Paul!) with an email and we'll make sure to update this resources post (and/or [at least] help you find it what you're looking for!!).

Next stop - Microsoft TechEd ITPro in June (we're back in Orlando again)!
kt

In my blog post on my new sp_helpindex proc (sp_helpindex2), I mentioned that the indexes in my sample were not necessarily a recommended set of indexes - just a test set of indexes. So... in this post, I thought I'd start a series on indexes, limitations and best practices/uses... Especially, why/how to best choose when to use INCLUDE v. having columns in your key. To start, I thought I'd give some background, as well as limitations that exist in various releases from 2005 to 2008 CTP6 (Feb CTP), plus what's expected in the SQL Server 2008 RTM (ah... I did say "expected" so don't come back and yell at me if/when I'm wrong ;-))

First, let's go through a few rules and limitations and background:

SQL Server 2005

  • 250 total indexes per table: 1 clustered index and up to 249 nonclustered indexes (no, this is not a goal!)
  • The index key can be up to 16 columns OR 900 bytes - whichever comes first...
  • The leaf level is limited just as a table is limited to 1024 columns (and, all column types are acceptable in the leaf level of an index - even LOB columns)
  • Statistics are kept for every index (so, up to 250 index-related stats) and there can also be statistics on columns or sets of columns in addition to the index-related stats. In earlier releases, statistics used index ids and as a result, the number of statistics were limited to 250 total minus the statistics used by indexes... in SQL Server 2005, they changed to having statistics kept/managed separately (use sys.stats to see them). As a result of using sys.stats, you can now have 2000 statistics on a table, in addition to the 250 (total) indexes and their statistics. If you want to test this out (and check it on various versions of SQL Server), use this script to setup a test database, a test table and then use dynamic string execution to loop through (until it errors) with creating nonclustered indexes and statistics: testindexmax.zip (.47 KB)

SQL Server 2008 CTP6

  • So far, it seems as though most of the maximums have not yet been lifted...
  • 250 total indexes per table: 1 clustered index and up to 249 nonclustered indexes (and this number  - for CTP6 - includes filtered indexes AND spatial indexes too!)
  • The index key limit hasn't changed (it can be up to 16 columns OR 900 bytes - whichever comes first)
  • The leaf level is still limited just as a table is limited to 1024 columns (and, all column types are still acceptable in the leaf level of an index)
  • Statistics in CTP6 seem to be limited to only 2000 stats per table...

SQL Server 2008 RTM (expected/target... no guarantees on this one :)

  • 30,000 columns per table (mostly to allow sparse columns)
  • 1,000 total indexes per table: 1 clustered index and up to 999 nonclustered indexes. This is also not a goal BUT, it makes sense because of both sparse columns and filtered indexes. Both Paul and I will try to post some entries about sparse columns and filtered indexes in the coming days...
  • The index key limit won't change
  • The leaf level is will be limited just as a table is limited to 30,000 columns (and, all column types are still acceptable in the leaf level of an index)
  • Statistics are also said to be increasing and likely to 30,000... And, for having extra statistics just sitting around and possibly not being used - well, outside of a minimal amount of disk space taken by the stat blob (which does start to get interesting at 1,000s I suppose), even stats that don't get used don't really create much of a problem. So, I'm OK with this one increasing - even significantly - but I have to admit I'm somewhat nervous about the significan't increase in indexes.........

So... you can have A LOT more indexes in SQL Server 2008 but just because you can - DOES it mean that you should?!

And on that - I'll leave you hanging for my next post where I start to talk about WHY they're increasing this (hint: sparse columns and filtered indexes = more columns/more indexes)....

Have fun,
kt

OK - so this has been frustrating me for many months... when you create indexes with included columns (which was a new feature of SQL Server 2005), they're not shown by sp_helpindex or by DBCC SHOW_STATISTICS. I understand this not showing for statistics because included columns are not factored into the histogram (that's only the high order element which is the first column in the index) OR the density vector (which only shows the densities (or averages) for the left-based subsets of the key). So, why doesn't sp_helpindex show it? Well... I guess it just didn't get updated for SQL 2005. So, in SQL 2008, I was hoping I'd not only see included columns but also filtered indexes... well, neither is there and sp_helpindex is still the same old proc. Don't get me wrong, you can use SSMS to see all of the index properties for a single index (pane, by pane for each property) OR you can run queries to find the included columns for a given index:

SELECT
(CASE ic.key_ordinal WHENTHEN CAST(AS tinyint) ELSE ic.key_ordinal END) AS [ID],
clmns.name AS [Name],
CAST(COLUMNPROPERTY(ic.object_id, clmns.name, N'IsComputed') AS bit) AS [IsComputed],
ic.is_descending_key AS [Descending],
ic.is_included_column AS [IsIncluded]
FROM sys.tables AS tbl
   
INNER JOIN sys.indexes AS
      
ON (i.index_id >AND i.is_hypothetical = 0) AND (i.object_id = tbl.object_id)
   INNER
JOIN sys.index_columns AS ic 
      
ON (ic.column_id >AND (ic.key_ordinal >OR ic.partition_ordinal =OR ic.is_included_column != 0)) 
         
AND (ic.index_id = CAST(i.index_id AS int) AND ic.object_id = i.object_id)
   INNER
JOIN sys.columns AS clmns 
   
ON clmns.object_id = ic.object_id AND clmns.column_id = ic.column_id
WHERE (i.name = N'[MyIndex]') AND ((tbl.name = N'[MyTable]' AND SCHEMA_NAME(tbl.schema_id) = N'[MySchema]'))
ORDER BY IsIncluded, [ID] ASC

but, there isn't a nice clean way to show all of the included columns for all indexes for a particular table... until now :)

A couple of weeks ago I sat down and rewrote sp_helpindex. I was actually on a plane from Hyderabad to Frankfurt or from Frankfurt to San Fran or from San Fran to Seattle (it was a long day :) and I was using (and well, forcing myself to learn how to use :) my new Vista laptop. OK, that's a HUGE story in and of itself and it definitely warrants its own post but I'll sum up the story with the fact that I had to purchase a new laptop while in Hyderabad because BOTH my primary laptop (T61p) AND my backup laptop (T60p) BOTH (yes, BOTH!!!) suffered catastrophic disk failures on their boot drives within 24 hours of each other. In the end, I really cannot believe the "coincidence" of two laptops crashing within 24 hours of each other. Yes, I thought MTBF too (at first) but the laptops were two Lenovos - one Lenovo (the T60p) was purchased in Feb 2007 and the second, a Lenovo T61p was purchase in Oct 2007. And, it was the T61p that went first. The only thing I can even begin to speculate about and/or think to attribute it to (as I was in India for 17 days from Mar 3 through Mar 20 and this all started on Mar 17) was an overactive metal detector at the hotel at which I was staying (or something related to St. Patrick but I've since ruled that out - and no, I wasn't drinking green beer either...). OK, I really need to do another post to give you all of the details about this trip BUT, I did get a new laptop... and, having just bought it only shortly before I flew back I felt like I really needed to get my money's worth so I just *had* to work on the flights home (ah, security with *3* laptops was NOT fun and I'm *VERY* glad that none of them asked me to "boot" my laptops to prove they were working... that could have been a VERY bad situation... lol).

OK - so back to the story... I was working on the flights and I was preparing to deliver some content on the Friday after I returned (yes, I taught a full day in India on Wednesday then flew back leaving India at 2:15am Thursday morning so that I could arrive back in Redmond at roughly 7pm Thursday night - about 30 hours later - and then teach Friday morning for an 8:30 start time... ah, I was *really* tired on Friday night :). Anyway, in preparing, I decided that I finally needed to re-write sp_helpindex. When I was first writing it, I was only thinking of SQL Server 2005. So, here's the 2005 version that I wrote: sp_helpindex2_2005.zip (2.71 KB).

So, I had wanted to blog that when I got back to Redmond but in preparing for the trip we're on now AND rebuilding my primary and backup laptops, well, it got tabled. So now, today, Paul and I are in Iceland (working with our great friends at Miracle Iceland) and we're teaching "the Accidental DBA" (this past Monday) and SQL Server 2008 New Features in Database Infrastructure and Scalability (Tue through Thursday)... I was giving a lecture on Filtered Indexes in SQL Server 2008 and I, once again, found myself needing a better sp_helpindex. So, when Paul got up to talk about Compression (which is no short lecture for him :), I had time to rewrite sp_helpindex... again. And, here's what I ended up with...

exec sp_helpindex2 'member'

index_name index_description index_keys included_columns filter_definition
member_corporation_link nonclustered located on PRIMARY corp_no NULL NULL
member_ident clustered, unique, primary key located on PRIMARY member_no NULL NULL
member_region_link nonclustered located on PRIMARY region_no NULL NULL
NCIndexCoveringLnFnMiIncludePhone nonclustered located on PRIMARY lastname, firstname, middleinitial phone_no NULL
NCIndexCoversAll4Cols nonclustered located on PRIMARY lastname, firstname, middleinitial, phone_no NULL NULL
NCIndexLNinKeyInclude3OtherCols nonclustered located on PRIMARY lastname firstname, middleinitial, phone_no NULL
NCIndexLNOnly nonclustered located on PRIMARY lastname NULL NULL
QuickFilterTest nonclustered located on PRIMARY lastname phone ([lastname]>'S' AND [lastname]<'T')

So, in the end, I can quickly see whether or not my index has included_columns and/or a filter_definition. Don't get me wrong, these indexes above are NOT necessarily a good combination of indexes (or recommendation of ANY kind) to have - these were just created to make sure that my code works. And, as my good friend Gunnar would say - "it's not my best code but it's not my worst code either" <G>. And, so, here it is: sp_helpindex2_2008.zip (2.75 KB).

Pretty darn useful for sure! Oh, and I used the undoc'ed sp_MS_marksystemobject so that I could still create the sp_ in master but then execute it in all other databases. It's frustrating that this behavior (with sp_ named objects) no longers works in 2005/2008 but at least the sp_MS_marksystemobject still sets the behavior so that we can create this one proc in master but use it in all other databases.

Have fun!
kt

OK, I still have a way to go in learning about data compression in SQL Server 2008 but one thing that I do know is that nothing is free. So, the trade-off will be performance (i.e. CPU) v. space. And, that's not really a new trade-off wrt to compression. Sometimes that trade-off has other benefits that still minimize the overall cost (for example, backup compression compresses in-memory and before it goes to disk... this actually makes the overall backup process faster because the actual backup written to disk is smaller). However, if we're talking about data and data access, then we need to think more about how the data is going to be used as well as the impact on performance. I can definitely think of many reasons to compress older (and read-mostly, if not read-only) data (mostly due to volume) but depending on the queries and the impact to uncompress it (based on the volume of data being accessed), I'm going to do a lot of testing before I compress high performance/OLTP data. To help estimate the savings on space, SQL Server 2008 offers a stored-proc: sp_estimate_data_compression_savings.

Compression in SQL Server 2005
SQL Server 2005 offers the ability to have read-only data compressed using Windows NTFS file compression. File compression is only supported for secondary non-primary data files and only when they're set to read-only. If the entire database is set to read-only then all files (incl. the primary and log) can be on compressed drives. While supported, and it can make sense to do this when you have large amounts of historical data, it's still not very granular.

The other form of compression in SQL Server 2005 was introduced in SP2 as data compression for the decimal/numeric data types, called vardecimal. First, you enable compression at the database level and then you turn it on at the table level. The primary form of compression used by vardecimal is when your actual values are generally much smaller than the defined/declared decimal/numeric column. For example, if you've chosen to define a lot of columns as precision/scale (38,4) then as a decimal column each value (per column, per row) will take 17 bytes whether you use all of it or not. If you only store the value 87.5 (which would normally take only 5 bytes as a decimal(3,1)) then you're wasting 12 bytes. This form of compression will still be supported in SQL Server 2008 so if you're interested in how the vardecimal type works, check out this whitepaper. As for the new forms of compression... row-level compression is similar to vardecimal, but the other forms are quite different, and very interesting (especially the page-level dictionary compression)!

Compression in SQL Server 2008
In addition to offering support for NTFS file compression and vardecimal, SQL Server 2008 offers row-level compression or page-level compression (which includes row-level compression) AND it offers the ability to turn these on at the partition-level or at the table-level for all partitions. While I think the per-partition option is excellent, you might still want to separate your OLTP and read-only data into separate tables for other benefits (like online index operations which I mentioned here) but, the "table-level only" options are certainly starting to decrease! And, more granular options always means better manageability.

So, how does compression work in SQL Server 2008:
   Paul wrote about backup compression here.
   Sunil wrote about data compression here and here.
   Chad Boyd wrote about both here.

Paul and I will post more on compression... I really want to get some numbers regarding performance and Paul will dive into all of the internals using DBCC PAGE (go figure! :).

Enjoy!
kt

OK, so thought I'd do a follow up to the post I did a couple of days ago titled: The perils of case-insensitive data (and our life in tangent-land). The reason I'd like to followup on it is that I received some excellent comments and I want to make sure that you're all aware of the tips/tricks and recommendations that there were (some of you may not have returned to see all of the comments). Really, I was impressed by the speed at which people responded as well as the great comments (and things I learned!). It just reminds me of the fact that none of us can know everything AND that our SQL community is awesome in its willingness to share and communicate.

As for the tips/tricks and "yes, duh!" realizations I came to... here are the interesting points from the comments:

First - why did my comparison work for a single character (e.g. '%A%') but not when I did a character range (e.g. '%[A-Z]%')? Well, it was because it was unicode! This was a "right! duh!" realization that I think I dreamed after I wrote this BUT, Hugo Kornelis is exactly right in his comment. Thanks Hugo! Here is a direct cut/paste of his comment:

The reason [A-Z] doesn't work, is that a collation doesn't just govern case sensitive vs case insensitive but also (amongst others) the sort order of letters. And most case sensitive collations sort like A - a - B - b - ... - Z - z. So [^A-Z] would include all letters except the lowercase z.

You can use [A-Z] to find uppercase characters in a binary collation (since all uppercase characters are in one range of ASCII, and all lowercase characters in another), but not in any other collation.

And, you can check out more from Hugo on his blog: http://sqlblog.com/blogs/hugo_kornelis/default.aspx

Second - the comparison query that I wrote all together (where I stated each letter individually in the WHERE clause) only took a few seconds to write (thanks to cut and paste :) AND it did work...And, sometimes getting something to work and moving on is all we can do (come on - you've ALL been there, eh? :). However, my main comment was that "it wasn't pretty". A much more elegant and unbelievably simple solution came from David R Buckingham (aren't the great answers always the really simple ones :)). Here is a direct/cut/paste of his comment:

The following query will return any fully lower case names in the table:

SELECT LastName
FROM Person.Contact
WHERE LastName COLLATE Latin1_General_CS_AS_KS_WS = LOWER( LastName ) COLLATE Latin1_General_CS_AS_KS_WS

I don't believe that David has a blog... maybe he should :).

Third - a very cool and clever trick that came in from Denis Gobo is related to the performance of repeatedly doing case-sensitive searches on a case-insensitive column. I suggested that creating an additional column (preferably a computed column that uses the case sensitive collation) would be an easy and optimal solution. This is still definitely true when the case-insensitive values are NOT selective enough to warrant using an index and the case-sensitive values are... However, if both the case-sensitive AND the case-insensitive values are reasonably selective then the trick that helps is from his comment. Here is a direct cut/paste of his comment:

Kimberly, the way to force an index seek is to do this

SELECT *
FROM MyTestContacts
WHERE Lastname = N'adams'
AND Lastname COLLATE Latin1_General_CS_AS_KS_WS = N'Adams'

The WHERE might return more than one row but the AND will return only the case sensitive one

I wrote about that a while back here:
http://sqlservercode.blogspot.com/2007/05/make-your-case-sensitive-searches-1000.html 

And, you can check out more from Denis on his blog: http://sqlservercode.blogspot.com/

Now, as for the issues related to creating a view in a database that has a different collation from the server's collation... Here, I'm fairly certain that there's still a bug. However, I'm happy to say that I don't think that it's the most likely situation that exists for collations. I think the two most likely situations are:

  1. The server has one collation. The database inherits that collation. The database developer makes column level collation changes throughout the db. This seems to work well. OR
  2. The server has one collation. The database has a different collation. The database developer consistently uses that collation throughout their app. A good example of this is where people have case-sensitive databases on case-insensitive servers. This works fairly well (although there are some issues wrt to temp tables, etc. and default_collation is a good thing to know).

I guess there's even a third one where column level changes are made in a database whose collation is different from the server but where there aren't any views that also change the collation to yet a different collation (and this is where there seems to be a bug).

So, this was an excellent (and reasonably fun :) :) exercise to go through wrt collations. And, this is how I (we?) learn! I really want to thank everyone for reading - and commenting/sharing! - the things they learned/knew. That's part of why I love the SQL Server community. And, speaking of which, I thought I'd end this entry with a few community links - as a reminder to everything that's out there:

Thanks for reading! Thanks for commenting!
kt

Categories:
SQL Server 2005 | Tips

ITForum.gifITForum.gif

 

 

 

OK, so after SQL Connections in Las Vegas, Paul and I head off to Barcelona for the second week of TechEd's two week event (week one for developers and week two for IT professionals). November's definitely a busy month. So, if you're in the US - we hope to see you at Connections and if you're in Europe, we hope to see you at Microsoft TechEd ITForum 2007.

 

Here's what we're doing:

Sessions

  • (DAT205) The Next Release of Microsoft SQL Server: Manageability Overview

The next release of SQL Server will contain exciting new manageability features targeted at reducing total cost of ownership. Come learn more about what's in store in areas like policy-driven administration and performance data collection and analysis. The session focuses on the database engine.

  • (DAT301) SQL Server Indexing - Unravelling the Unknown

Knowing tips and tricks to indexing is extremely helpful and will help you to solve "known" query problems. But what's lurking in the unknown? Is SQL Server using your indexes? Or, do you have a bunch of indexes sitting around wasting space and negatively impacting performance? Finally, SQL Server 2005 has an answer! SQL Server 2005 DMVs (Dynamic Management Views) can provide you with valuable information about your current indexing strategies, what should be removed, and even what's missing. Do you know how to find this information, leverage it, and then programmatically respond to it? Come to this session to learn a few tips and tricks as well as how to figure out the unknown indexing problems!

  • (DAT305) Secrets to Fast Detection and Recovery from Database Corruptions

How can you tell whether your data is corrupt? If you have corruption, how do you work out what’s wrong with the database? How do you ensure you have a valid backup? If you don’t have a valid backup, how and what do you repair? If you do have a backup, how do you work out whether you should restore or repair? And at what granularity? How do you go about determining what went wrong in the first place? It’s all about limiting downtime and data-loss when a corruption occurs - from knowing the tools to understanding the choices to planning a successful strategy. Some of the features discussed:

  • Torn-page detection and page checksums
  • IO read-retry
  • Backup checksums
  • Consistency checks (DBCC CHECKDB and related commands)
  • Database repairs

Facing database corruption is almost inevitable in every DBAs career - make sure you're prepared when it happens to you.

Chalk-talks

  • (DAT01-IS) SQL Server Upgrade Best Practices, Tips, and Tricks

Even though SQL Server 2005 has been out for a while, many companies are only just getting ready to upgrade. Come to this session to learn best practices, tips, and tricks distilled from two years of customer experiences. We'll also touch on some issues you'll face going to SQL Server 2008 when it's released next year. Come to this session to learn and share - bring your questions and experiences!

  • (DAT02-IS) SQL Server 2005 Database Mirroring: Setup to Implementation to Monitoring

Database Mirroring is one of the most exciting technologies in SQL Server 2005. With more and more people including it in their Disaster Recovery Strategies, it's important to know when to implement Database Mirroring as well as the implications of the architecture you choose. In this session there will be no slides just demos that will explore how Database Mirroring works in its various configurations and how that may effect your performance. Join us and see database mirroring in action and get your questions answered!

  • (DAT07-IS) DBCC Internals

All DBAs should have heard of (and used) DBCC – especially for consistency checking. Get down deeper than 400-level with this session on how some of the most important DBCC commands work. Topics covered include CHECKDB, SHRINKFILE, INDEXDEFRAG, and more.

Instructor-led labs

  • (DAT03-ILL) SQL Server Always On Technologies Instructor-Led Lab: Part 1 - Database Mirroring

See Database Mirroring in action! From implementation to monitoring to failover, database mirroring provides an ideal solution for many disaster recovery scenarios and this session will prepare you to handle them with minimal downtime or data loss.

  • (DAT04-ILL) SQL Server Always On Technologies Instructor-Led Lab: Part 2 - Database Snapshots

Database Snapshots are useful in many situations: database maintenance, data recovery, and point-in-time data access. You can even create a snapshot on a mirror database to get better return-on-investment (ROI) on your high-avaialbility (HA) investments. In this session we will explain how database snapshots work as well as go through several exercises, including working with multiple database snapshots and creating database snapshots on a mirror database.

  • (DAT05-ILL) SQL Server Always On Technologies Instructor-Led Lab: Part 3 - Online Operations

The bane of any DBA's life is to have to take data offline to perform maintenance or recover from a disaster. The various Online Operations in SQL Server 2005 alleviate much of this frustration. This session will show you how to move a table online for better isolation and control, partition a table online, access a database that's partially damaged, and perform online piecemeal restore.

Lunchtime Demos

  • (DAT01-PD) Database Recovery Techniques

In this fast-paced demo session nasty things will be done repeatedly to database. Then the methods and approaches to recovery will be shown. Not for the faint-hearted!

So, just like SQL Connections the week before, serious amounts of info with tips and tricks for you to take home and implement!

It's going to be a great week. We hope to see you there!!

Kimberly (and Paul)

120x240_SQLConn_IBT.jpg120x240_SQLConn_IBT.jpg

 

 

OK, it's about that time again - the Fall conference season - is here! Building on our co-presented Database Maintenance workshop at SQL Connections in Orlando, Paul and I are doing a *ton* of stuff at SQL Connections this Fall. The conference is back at the Mandalay Bay hotel and officially runs from November 5th to 8th, with pre-con workshops on the 5th. But, after Spring, we decided to significantly add recovery content to our maintenance content that we decided to have our maintenance content as a pre- pre-conference workshop on the 4th and then on the 5th, we have all new content on Disaster Recovery and Lessons Learned. AND, after the week of sessions, we decided to add a HANDS-ON (bring your own laptop) post-con workshop on the 9th! Our day off is Tuesday the 6th as it's Microsoft day... with a session line-up looks great with lots of juicy details about SQL Server 2008 - as well as some best-practices sessions for those of you who are happy with SQL Server 200x for now.

 

 

Here's what we're doing:

Workshops

  • November 4th - Pre-pre-con: SPR301: SQL Server Database Maintenance: From Planning to Practice to Post-Mortem

No matter how much effort you spend on the design of your database, if you don't maintain it in production then it will suffer from performance and manageability problems. The key to continued performance and smooth operations is a well thought-out and automated database maintenance plan. This full-day workshop has three sections: planning, practice, and post-mortem. Planning for database maintenance actually starts with database design, so one of the things covered will be how to avoid design choices that limit database maintenance or contribute to maintenance problems. We'll discuss a laundry-list of maintenance problems and then explore how to tell if you need to mitigate them, strategies and best-practices for doing so, and how to avoid having your mitigation choices cause unforeseen and undesirable side-effects. Topics covered will include database files (shrink, grow, virtual log files, log size/management), consistency checks and corruption detection, fragmentation, statistics, backup/restore (options, granularity, strategies) and recovery models. The workshop will vary between 200-400 level covering ALL the key concepts of maintenance features. Finally, we'll spotlight some real-world examples where people made good and bad choices and discuss how you can repeat or avoid them, respectively. If you're wondering how to bring your database back under control, and keep it there, then this full-day workshop will help you tame maintenance problems whether you're a full-time system administrator or a reluctant DBA.

  • November 5th - Pre-con: SPR303: SQL Server Disaster Recovery: From Planning to Practice to Post-Mortem

Every DBA's nightmare is having down time and data loss and not knowing how to recover. However, designing and implementing a successful disaster recovery strategy is easier said than done. It's about asking all the right questions and figuring out all the best answers for your situation. This full-day workshop has three sections: planning, practice and post-mortem. Planning is a critical part of disaster recovery, but the most-often disregarded. Topics we'll cover here include: How do you choose technologies to fit requirements and effectively use key features of SQL Server 2005? How do technology choice affect workload performance? Putting a well-thought out plan into practice requires even more planning and in this section we'll discuss technology implementation, building step-by-step recovery/operation guides for when disasters happen, and, most importantly, testing your plan by simulating real problems. In the final section, we'll spotlight some real-world examples where people made costly mistakes and show you how they could have been avoided with a little planning and practice. If you've ever had nightmares about disaster recovery (or actually had a disaster!) and been at a loss for what to do, then this full-day workshop will give you the direction and technical details you need for success!

  • November 9th - Post-con: SPS302: SQL Server - Put Your Knowledge Into Action (Bring Your Own Laptop)

After a week of learning and watching demos - spend your last conference day putting your knowledge into action and diving deeper into the implementation details. Bring your own laptop to install our VPC environment setup with hands-on lab exercises to walk you through some of our most important features in Database Maintenance and Disaster Recovery. All labs will be ILLs (instructor-led labs) with supporting hands-on lab content *and* you will walk away with your own copy of the DVD to continue the exercises back at your office. You can attend without a laptop but your experience will be significantly better with one! This is meant as an advanced workshop and you're expected to bring a reasonable laptop configuration in order to participate: * Virtual Server or Virtual PC - already installed * At least 1GB of physical memory w/512MB dedicated to the VPC environment (2GB is preferred w/1GB dedicated to VPC) * 12 GB of physical disk space (20+ GB is preferred) * DVD Drive

Sessions

  • SDB351: Follow the Rabbit - Interactive Q&A on Availability

In this session, Kimberly Tripp and Paul Randal will have only 5-10 slides. The focus of this session is on mixing availability technologies to create the best overall architecture to minimize downtime and data loss. In general, we're going to focus on best practices and then open up to your questions so that you can drive the discussion! This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is! Grab your lunch and come back - we'll probably still be hanging out!

  • SDB350: SQL Server Table Strategies - Designing for Performance and Availability

Often tables are designed based solely on the data that needs to be tracked (here's a column name, here's a data type - done!). Unfortunately, design does not usually take into account how the data is going to be used OR how SQL Server uses the data. Knowing the internals of table structures as well as the optmizations that come with good design will make your database truly scalable. Come to this session to learn some internals as well as various design strategies such as vertical and horizonal partitioning. Additionally, are there any other features that require changes in your design and thinking? For example, online index operations impact design because of the limitations that exist with partitioning and LOB columns. If you want to scale, you need to be here!

  • SDB347: SQL Server Indexing for Performance - Finding the Right Balance

In terms of performance tuning, there are few silver bullets. If I had to choose ONE area that improves performance the most (when designed appropriately!), it's indexing. However, indexing strategies depend on the data and even more so, the usage of the data. Come to this session to see what indexing strategies help the base table the most as well as how to optimize your worst performing queries.

  • SDB348: SQL Server Indexing Strategies - Are You Sure?

Knowing tips and tricks to indexing is extremely helpful and will help you to solve "known" problems. But what's lurking in the unknown? Is SQL Server using your indexes? Or, do you have a bunch of useless indexes? Finally, SQL Server 2005 has an answer! SQL Server 2005 DMVs (Dynamic Management Views) can provide you with valuable information about your current indexing strategies, what should be removed, and even what's missing. Do you know how to find this information, leverage it, and then programmatically respond to it? Come to this session to figure it out!

  • SDB349: Follow the Rabbit - Interactive Q&A on the Storage Engine and the Relational Engine

In this session, Kimberly Tripp and Paul Randal will have only 5-10 slides. Each slide cover topics for discussion as well as the reason(s) for why something might be behaving badly and/or things to try to solve your problems. In general we're going to focus on best practices and then open up to your questions so that you can drive the discussion! Paul will focus on the SE (Storage Engine) and internals and Kimberly will focus on the RE (Relational Engine) and query tuning/performance. This session might not seem as structured as other sessions, but you'll be surprised at how informative and fun it is!

At this event, you'll be able to get what most conferences offer (and that's breadth - in terms of session choices, etc.) but with the large number of workshops and the detailed planning that went into sequencing the conference sessions, you'll also get depth that no other conference offers. Serious amounts of tips and tricks for you to geek-out on with us and take home to immediately apply.

It's going to be a great week. We hope to see you there!!

Kimberly (and Paul)

OK, let me start by saying that I absolutely love when a feature improves in granularity options. Better granularity in locks means that contention is reduced and concurrency improved. And even though the overhead to manage smaller locks (and typically more of them) is usually higher - the improved concurrency benefits often significantly outweight the costs. Additionally, design is often simplified as more granular locks typically means you don't have to work as hard to minimize contention. Let me give you some history...

In the old days (ok, remember, I started working with SQL Server when I was 12 :) :), SQL Server used to have page-level locking (all releases prior to SQL Server 6.5 sp3). In SQL Server 6.5 sp3 they made an internal change to allow "insert row locks" but that was very targeted in what it improved (in terms of locking). However, in SQL Server 7.0 the locking architecture completely changed (as well as the SE and most of the RE) and that's where they introduced true row-level locking. This resulted in a significantly reduced complexity in table design. No longer did we have to choose clustered indexes to remove page-level locking (and therefore insert hotspots). And, in fact, some designs solely improved their performance by upgrading. The nice thing about internal changes like these is that they mean you can get away without knowing all of the internals, not worry as much about design and yet still get gains in performance. All of which is good.

However, if you do know the internals and you leverage this knowledge then you might be able to see even greater gains. With the change in locking from page to row (as well as based on other changes to the internal dependencies of non-clustered indexes on the clustering key), databases whose indexing strategies changed between 6.5 to 7.0 made the greatest gains in performance. How did they change - I've blogged about "the clustered index debate" a few times so I'll stay away from that one here... but, the key point is that while these changes might allow you to do more with less work - a bit more work to truly leverage the new features/changes might result in the best combination!

And so, that's what brings me to partition-level lock escalation. This is an absolutely necessary step to truly allowing SQL Server to treat partitions like mini-tables. Here are a few of the concerns I've had with regard to SQL Server 2005 table and index partitioning:

* lock escalation can still occur between the read-only and read-write portions of your partitioned table if the read-only portion is accessed by large queries that escalate (in SQL Server 2005 escalation is either row to table or page to table)
* indexes must be exactly the same for all partitions (not related to escalation but it does have bearing on my solution)
* index rebuilds are supported at the partition level; however, partition-level rebuilds must be performed OFFLINE. Only table-level index rebuilds can be performed online (again, not directly related to escalation but it's another problem around blocking)

And, this last one is very frustrating to me in general as I'm finding more and more environments moving to "real-time analysis" where they want to do queries on as-close-to-current data as possible. In fact, real-time data warehousing in a relational database is one of the primary areas of improvements for SQL Server 2008 with features such as partition-level lock escalation, improvements in indexed views, grouping sets, and star join optimizations - just to name a few.

So, in terms of partition-level lock escalation. Am I happy that it's going to be there - for sure! However, the other two issues mentioned above might not change. Having different indexes at the partition level is likely through a feature called "Filtered Indexes" which has not yet appeared in any CTPs but it has been discussed at conferences/events. So, we might solve 2 out of 3 but what about online index rebuilds at the partition level? At this point, I'm pretty sure that they won't be able to solve that for SQL Server 2008... As a result, I would suggest a slightly different architecture. Instead of using only a single partitioned table for both read-only and read-write data, use at least two tables. One table for read-only data and another for read-write data. If you think this might be defeating the purpose of partitioning... then look at these benefits:

* the read-only portion of the table (which is typically the *much* larger portion of the table - can still be managed with partitioning)
* the read-only portion - once separated from the read-write - can have additional indexes for better [range] query performance
* the read-only portion of the table can actually be partitioned into multiple partitioned tables - to give better per-table statistics (statistics are still at the table-level only so even if your partitioning scheme is "monthly" you might want to have tables that represent a year's worth of data...especially if your trends seem to change year to year)
* large range queries against the read-only portion of the data will only escalate to the "table" (which is now separated from the read-write data)
* the read-write portion of the data can have fewer indexes
* the read-write portion of the data can be placed on different disks (MORE fault tolerant disks) due to the importance/volatility of the data
* finally, and most importantly, the read-write portion of the data can be maintained completely separately from the read-only portion with regard to index rebuilds

So, then how do you make it appear as one table? Use partitioned views over partioned tables and consider using a synonym for the hot/insert table. At the end of each month (or whatever your partitioning strategy uses - daily, weekly, monthly, etc.), "switch" the read-write portion of the table into the read-only portion of the table. You should be able to do all of this with no data movement and the synonym used for inserts will mean that your applications don't need to change either.

In summary, I do like the partition-level lock escalation feature especially as it doesn't require rearchitecting your solution/design. However, by creating two or more tables where read-only data is isolated from read-write, you can leverage many other features (like online index rebuilds).

If you're more interested in hearing about lock escalation at the partition-level check out Paul's recent blog entry on it: SQL Server 2008: Lock escalation changes.

Have fun and thanks for reading!
kt

OK, have you ever been working on one thing...that led you to another (and another and another) and then you seem to have lost hours? OK, I know. That's our life [in the computer industry and I'm sure others!] - putting out fires and chasing strange behaviors that we eventually call "gremlins" when we really can't figure them out (especially when they don't repro). And, I know that we all want to be incredibly prepared but, sometimes bugs happen. And, sometimes bugs lead to serious problems possibly even data corruption/loss (which I've seen) and NO, I'm not directly relating this to anything about SQL Server. I'm just wanting to stress the necessity of a backup strategy (ah, a *tested* backup strategy) but, the bugs I've run into today are really not all that serious (they do NOT corrupt data). But, they do lead me to believe that far fewer changes are made to collations than I had thought? Or, that many of you change collations at the most granular level (probably at the column level?) and that database collation changes are done but without additional column level changes later.  

And, that's really the point of this blog post... for now, I'm going to recommend that you make changes at the column-level OR you don't make additional changes AFTER you've changed a particular database's collation. In other words, if you have a case insensitive server and a case sensitive database then things will probably work well. You can even leverage things like default_collation for temp tables. However, if you try to make addition changes to collations in other objects - such as views - it doesn't seem to work. Basically, I've run into problems creating views with different collations only when the database's collation is different than the server's collation. So far, that's the only thing that I've found that's wrong with what I've been doing. And, I didn't even figure this one out on my own - I did a live search on the error and found this: http://cc.msnscache.com/cache.aspx?q=72171562874629&mkt=en-US&lang=en-US&w=286a60c3&FORM=CVRE which seems like the same problem I'm having (and sorry for the cached page, I couldn't seem to get to the live page).

Regardless of this issue (is it a bug?), the real reason for this blog post is that a great discussion came up on the Regional Director tech alias (it's an internal thing we use to leverage each other's skills). The original question led to a few discussions and in the end, I think there are really two questions that I thought I'd discuss here:

#1) do you want a ONE-TIME way of checking a bunch of data to find rows that are lower-case (and shouldn't be)
#2) do you want to REPEATEDLY find rows based based on a case-sensitive search (where the data is stored case-insensitive).

In my first response, I answered #2. And, I'm going to start with that one here too. If you want to query a case-insensitive column with a case-sensitive search then changing the collation on the fly (with a where clause) works (although there are some performance issues related to this). So, I took an old example of mine (which was against pubs) and I decided that it needed a refresh (meaning, I wanted to update this to work against AdventureWorks). And, that's where half of my fun today started since this is where I've run into what I think is a bug. Anyway, let's start with what works:

-- First, I'll create a test database. Without a collation specified,
-- it will use the server's default collation.

CREATE DATABASE TestAdventureWorks
go

-- Verify the database collation
SELECT DATABASEPROPERTYEX('TestAdventureWorks', 'Collation')
go

-- database is set to SQL_Latin1_General_CP1_CI_AS as expected
-- this is a case-insensitive database

USE TestAdventureWorks
go

SELECT LastName collate database_default AS LastName
, FirstName collate database_default AS FirstName
, MiddleName collate database_default AS MiddleName
INTO MyTestContacts
FROM Adventureworks.Person.Contact
go

SELECT *
FROM MyTestContacts
WHERE Lastname = N'Adams'
go -- (86 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE Lastname = N'adams'
go -- (86 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE Lastname COLLATE Latin1_General_CS_AS_KS_WS = N'Adams'
go -- (86 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE Lastname COLLATE Latin1_General_CS_AS_KS_WS = N'adams'
go -- (0 row(s) affected)

-- Next, create a view:
CREATE VIEW ContactLastNameCaseSensitive
AS
SELECT
LastName COLLATE Latin1_General_CS_AS_KS_WS AS CSName
FROM MyTestContacts
go

SELECT *
FROM ContactLastNameCaseSensitive
WHERE CSName = N'Adams'
go -- (86 row(s) affected)

SELECT *
FROM ContactLastNameCaseSensitive
WHERE CSName = N'adams'
go -- (0 row(s) affected)

And, everything works... in TestAdventureworks. In the *real* AdventureWorks, I get an error when I try to create the view:
Msg 2791, Level 16, State 5, Procedure ContactLastNameCaseSensitive, Line 3
Could not resolve expression for schemabound object or constraint.

So, this is the first issue. It seems as though you can't create the view if your database has a different collation than the server collation. Well, (again), I haven't spent all that much time on this one but I did repro what the chain on the forum seemed to have found.

Now, as for the second issue... the query can be EXTREMELY painful and slow if you run this against a large set of data. See, changing the collation on the fly will need to perform a row-by-row comparison of the data. So, to minimize that HUGE hit on performance - you have two options.

1) actually consider changing the column's collation so that it matches your queries AND then create an index (note: the actual use of the index will be determined by the selectivity of the data).

SELECT LastName collate database_default AS LastName
, FirstName collate database_default AS FirstName
, MiddleName collate database_default AS MiddleName
INTO MyTestContacts2
FROM Adventureworks.Person.Contact
go

ALTER TABLE MyTestContacts2
ALTER COLUMN LastName nvarchar(100) COLLATE Latin1_General_CS_AS_KS_WS
go

CREATE INDEX CSNameInd on MyTestContacts2 (LastName)
go

-- let's use a query that's highly selective (selective enough to use the index):

SELECT *
FROM MyTestContacts2
WHERE LastName = N'Barlow'
go -- (1 row(s) affected)

SELECT *
FROM MyTestContacts2
WHERE LastName = N'barlow'
go -- (0 row(s) affected)

2) create an index with a different collation... but this is harder than it sounds as the CREATE INDEX statement doesn't directly allow changing collation (however, it should!). But, you can do this by either creating another column (real or computed) with the case-sensitive collation and then indexing it OR you could do this through an indexed view (but that adds a few complexities as well). I think the computed column that's indexed is GREAT if the searches are generally highly selective. If they are not, then it is probably better to create a real column - as a computed copy of the inserted value - that is case sensitive. However, at that point, I'm not entirely sure why you're keeping the case-insensitive version around...unless it's to keep the actual inserted value (maybe for printing and/or display?). Regardless, here's how you can create an indexed computed column.

ALTER TABLE MyTestContacts
ADD
CSName
AS LastName COLLATE Latin1_General_CS_AS_KS_WS
go

SELECT *
FROM MyTestContacts
WHERE CSName = N'Adams'
go -- (86 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE CSName = N'adams'
go -- (0 row(s) affected)

CREATE INDEX CSNameInd ON MyTestContacts (CSName)
go

SELECT *
FROM MyTestContacts
WHERE CSName = 'Barlow'
go -- (1 row(s) affected)

SELECT *
FROM MyTestContacts
WHERE CSName = 'barlow'
go -- (0 row(s) affected)

And, the index will be used if the query is highly selective.

OK, so that ends the answer to part 2 of the question (see how tangents can take us a bit off track :)....

Now, let's get back to question #1.

What if you want to do a one-time search through your data to find all of the lower case data? Well, there are a few thoughts here.... First, let's modify the ONE Barlow row to be lowercase barlow so that we have something to find:

UPDATE MyTestContacts2
SET LastName = N'barlow'
WHERE LastName = N'Barlow'
go -- (1 row(s) affected)

NOTE: This is an ABSOLUTELY horrible quiery as I didn't use any key to point to the exact row I wanted to modify. Had there been a lot of Barlow's I would have modified them all. This worked here because I knew there was only one row. But, all of your tables should have a primary key, etc. (not even going to begin this tangent :).

Now, having said that... let's see if we can find this row easily? You should be able to do this using Transact-SQL and using some type of wildcard pattern matching such as:

SELECT * FROM MyTestContacts2
WHERE Lastname like N'b%'
go -- (1 row(s) affected)

And, that works without any problems.

So, what about NOT an upper case B.

SELECT * FROM MyTestContacts2
WHERE Lastname NOT LIKE N'B%'
go -- (18768 row(s) affected)

SELECT * FROM MyTestContacts2
WHERE Lastname NOT LIKE N'%B%'
go -- (18765 row(s) affected)

tangent number 87 <g>: if you're wondering what the 3 rows are (as was I :)... they are 1 row of O'Brien and 2 rows of Smith-Bates. Here's that query:

SELECT * FROM (SELECT * FROM MyTestContacts2
WHERE Lastname NOT LIKE N'B%') AS Bs
WHERE Lastname like '%B%'
go

OK, so, I thought we were there... I thought we could go to what I thought was the next logical step.....

SELECT * FROM MyTestContacts2
WHERE Lastname NOT LIKE N'%[A-B]%'
go

And... well, we lose barlow from the result set. For some reason...when you do ranges of characters it seems to lose the case??? I remember that [A-Z] and [a-z] were different in some release? Is this a regression? Someone help me out with this one as I'm without a clue. In the end, the ONLY way I could get this to work is to do this:

SELECT * FROM MyTestContacts2
WHERE Lastname not like N'%A%'
AND Lastname not like N'%B%'
AND Lastname not like N'%C%'
AND Lastname not like N'%D%'
AND Lastname not like N'%E%'
AND Lastname not like N'%F%'
AND Lastname not like N'%G%'
AND Lastname not like N'%H%'
AND Lastname not like N'%I%'
AND Lastname not like N'%J%'
AND Lastname not like N'%K%'
AND Lastname not like N'%L%'
AND Lastname not like N'%M%'
AND Lastname not like N'%N%'
AND Lastname not like N'%O%'
AND Lastname not like N'%P%'
AND Lastname not like N'%Q%'
AND Lastname not like N'%R%'
AND Lastname not like N'%S%'
AND Lastname not like N'%T%'
AND Lastname not like N'%U%'
AND Lastname not like N'%V%'
AND Lastname not like N'%W%'
AND Lastname not like N'%X%'
AND Lastname not like N'%Y%'
AND Lastname not like N'%Z%'
go

And, well, that works. But, it is NOT pretty! The query's going to require a table scan anyway AND it is a one-time query. I'm OK with this as a solution to this problem BUT, am I missing something here? Please tell me there's something more clever here? Is this a bug?

I'm definitely interested in feedback on this one!
kt

Categories:
SQL Server 2005

DDL Triggers were a new feature of SQL Server 2005 and while seemingly simple, they are very powerful. DDL Triggers allow you to trap an attempted DDL operation to audit it, prevent it, or do anything you want to validate/verify/”authorize”/etc – you write the code. And, since a trigger fires as part of the transaction, you can roll it back.

In many conference demos/webcasts, etc., I have provided a sample script that prevents ddl within a [production] database. That script has been really helpful/useful but recently I thought about an update to it…

SQL Server 2005 has another new feature "execute as". While I definitely see many benefits, I’m also a bit concerned. To a certain extent, I feel that the potential for SQL Injection is actually higher. If a developer creates a poorly written/tested stored procedure (ok, therein lies the problem, really!) that includes dynamic string execution AND then uses "execute as" to essentially elevate a user with minimal privileges to a higher level (so that they don’t need to give the base object rights to the user), a malicious user could “inject” code in and actually succeed if the “execute as” user has rights to the injected code. In prior releases, and with the default behavior (execute as caller), this is not possible (which is good for security but bad for dynamically executed strings within stored procedures as base object rights are necessary).

Having said that, and since security is always a concern, my DDL Trigger only audited for the login of the user who executed the statement, not for the actual user that’s logged in. In other words, if EXECUTE AS is used (or SETUSER is used), then the context of the user executing is actually different then the logged in user. To see this shift in context, SQL Server 2005 added a new function: ORIGINAL_LOGIN().

(reading between the lines is even more frightening in that prior to SQL Server 2005, the original user could not be tracked from SETUSER. The good news is that SETUSER is ONLY allowed to be used by DBOs so it’s not as widespread as the potential for “execute as”).

OK, so how can we put all of this together? We’ll want to add the ORIGINAL_LOGIN function into our audit table in our DDL Trigger. Even if you choose NOT to rollback, at least you’ll know who performed the operation (even if from a dynamically executed string!).

USE AdventureWorks;

go

--Create a login/user - just for this exercise

CREATE LOGIN Paul WITH PASSWORD = 'PxKoJ29!07';

go

CREATE USER Paul FOR LOGIN Paul;

go

sp_addrolemember 'db_ddladmin', 'Paul'

go

 

CREATE SCHEMA SecurityAdministration

go

CREATE TABLE SecurityAdministration.AuditDDLOperations

(

            OpID                            int                                NOT NULL identity    

                                                                                    CONSTRAINT AuditDDLOperationsPK

                                                                                                PRIMARY KEY CLUSTERED,

            OriginalLoginName    sysname                     NOT NULL,

            LoginName                 sysname                     NOT NULL,

            UserName                   sysname                     NOT NULL,

            PostTime                     datetime                     NOT NULL,

            EventType                   nvarchar(100)            NOT NULL,

            DDLOp                        nvarchar(2000)          NOT NULL

);

go

GRANT INSERT ON SecurityAdministration.AuditDDLOperations TO public;

go

 

CREATE TRIGGER PreventAllDDL

ON DATABASE

WITH ENCRYPTION

FOR DDL_DATABASE_LEVEL_EVENTS

AS

DECLARE @data XML

SET @data = EVENTDATA()

RAISERROR ('DDL Operations are prohibited on this production database. Please contact ITOperations for proper policies and change control procedures.', 16, -1)

ROLLBACK

INSERT SecurityAdministration.AuditDDLOperations

                        (OriginalLoginName,

                         LoginName,

                         UserName,

                         PostTime,

                         EventType,

                         DDLOp)

VALUES   (ORIGINAL_LOGIN(), SYSTEM_USER, CURRENT_USER, GETDATE(),

   @data.value('(/EVENT_INSTANCE/EventType)[1]', 'nvarchar(100)'),

   @data.value('(/EVENT_INSTANCE/TSQLCommand)[1]', 'nvarchar(2000)') )

RETURN;

go

 

--Test the trigger.

CREATE TABLE TestTable (col1 int);

go

DROP TABLE SecurityAdministration.AuditDDLOperations;

go

EXECUTE AS LOGIN = 'Paul' -- note: Remember, Paul is a DDL_admin

go

DROP TABLE SecurityAdministration.AuditDDLOperations;

go

REVERT;

go

 

SELECT * FROM SecurityAdministration.AuditDDLOperations;

go

DROP TRIGGER PreventAllDDL ON DATABASE;

go

DROP TABLE SecurityAdministration.AuditDDLOperations;

go

DROP SCHEMA SecurityAdministration;

go

DROP USER Paul;

go
DROP LOGIN Paul;
go

 

So, have fun testing with this one.

 

Thanks for reading!

kt

Categories:
Security | SQL Server 2005 | Tips

OK...SP2, the SP2 refresh and then the parallel/subsequent GDRs has seemingly (and rightly so) confused some of us... However, thanks to the PSS Engineers blog (and specifically Bob Ward - Senior Escalation Engineer, Microsoft PSS), this blog entry clears up a lot of that confusion. The end result is that you should be at 9.00.3054 or 9.00.3159. 3054 is the correct one if you haven't had any special hotfix/GDRs directly from Microsoft PSS and 3159 is for those of you that have. For me, I think the best part was the reiteration of the fact that "Microsoft Update will notify you of this" and the comments made that "Microsoft Update is smart enough to recognize you need this specific version of the GDR2 fix...". The most interesting part of all of this is the reminder that SQL Server IS included in Microsoft Update. What's the most interesting is that most people are still using Windows Update and Microsoft Update is DIFFERENT. You need to (essentially) replace Windows Update with Microsoft Update (although it's not that simple - of course...). Basically, you need to install Microsoft Update and then remove Windows Update. So.... if you haven't done this - you should. At least on your main desktop/laptop machine (at first) and then on other machines from there. I can't remember when this originally came into place but a few folks asked me about the difference, etc. and how SQL Server fits in and well... it's all about Microsoft Update now not Windows Update (however/fyi, Microsoft Update looks and feels exactly like Windows Update but it includes Windows, Office, SQL and Exchange). If you want to find out more, check out the Microsoft Update FAQ here.

And, along the lines of maintenance... Paul Randal (of the SQL Server Storage Engine blog) would like to know if you have time to fill in a survey on YOUR VLDB maintenance practices. This is pretty important for them to know. He explains what they'll use it for and why it's useful to them. Be sure to check out his blog entry here.

Finally........... lots of final session writing/planning going on for TechEd. Bob Beauchemin and I are delivering a pre-conference workshop titled: Leveraging SQL Server Always-On Technologies to Achieve High Availability and Scalability. It's on the Sunday prior to TechEd and it's a new session for us. Here's the abstract:

PRCN06 Leveraging SQL Server Always-On Technologies to Achieve High Availability and Scalability 
System down time and lack of scalability for mission critical applications can result in loss of revenue and business creditability. Planned downtime is typically caused by hardware upgrade, application or OS upgrade, applying a service pack, or performing routine maintenance task. Examples of unplanned downtime are hardware or software failure, natural disasters, and human error. In fact, human error has been identified as the number one cause of downtime. SQL Server 2005 Always-On Technologies provides a full range of options for achieving and maintaining appropriate levels of availability. Because the product offers so many choices, it is difficult to choose features that provide the best availability solution for a given application. In this session, we provide an in-depth description of these technologies and delve into scenarios and best practices in deployment of the availability technologies. The high availability technologies covered include Database Mirroring, Database Snapshots, Peer-to-Peer Replication, Clustering, Online Indexing, Online Restore, Piecemeal Backup & Restore, Partial Database Availability, Table and Index Partitioning, Snapshot Isolation, DDL Triggers, and others. The second part of this session focuses on scalability and building systems that scale-out to multiple servers. Building a scale-out application with SQL Server 2005 may entail using techniques and features that are unfamiliar, or are new. This session provides in-depth information about the internal implementation of scale-out features such as Service Broker, Query Notifications, Distributed Partitioned Views, Scalable Shared Databases, and Peer-to-Peer Replication. The session also includes troubleshooting techniques using Profiler and the new dynamic management views.

As for content, we'll have our lecture content available to all attendees, we're going to giveaway AlwaysOn DVDs (more info coming up) AND Bob, Paul and I are going to hang out after the workshop to answer even more questions... So, if you're looking to burn budget for FY'07 AND you want to attend an information packed (and fun ;) pre-con workshop AND a great conference for breadth/futures (a bunch of Katmai sessions at the event too), then you should sign up for TechEd before it sells out......again. Also, there are a bunch of sessions at the conference that might interest you - Paul and I are doing a Chalk/Talk Q&A on VLDB Maintenance, I'm doing a demo fest on AlwaysOn, Paul's doing a session on Corruption Detection and Recovery, Bob's doing a session Windows PowerShell and SMO Together (oh, and he's listed as Robert Beachemin...not sure why???) ...and that's just to name a few!

Oh, and the AlwaysOn DVDs are cool because:

  1. they have a setup.exe that runs to create vhd/vmc files that allow you to access a predefined VPC image.
  2. Virtual PC is free and Virtual Server is free... you can use EITHER for the Virtual Environment.
  3. the VPC is a Windows 2003 Server setup with SQL Server 2000 and SQL Server 2005 (multiple instances) and allows you to access an environment that's excellent for learning and testing and...self-paced labs
  4. the DVD includes 9 lab manuals for roughly 16 hours of self-paced lab time AND they're really good labs with multiple parts, excellent links and even useful undoc'ed commands too (if I might say so myself as I wrote most of them :)
    1. Database Snapshots - 4 Exercises, 75-90 minutes
      • Exercise 1: Repartition the SalesDB Database
      • Exercise 2: Create and Examine a Database Snapshot
      • Exercise 3: Working with Multiple Snapshots
      • Exercise 4: Creating a Database Snapshot on a Mirror Database
    2. Data Recovery & Preventative Techniques - 4, exercises, 75-90 minutes
      • Exercise 1: Examining Foreign Key Relationships between Tables
      • Exercise 2: Point-In-Time Recovery
      • Exercise 3: Using the tablediff.exe Command-Line Utility to Compare ALL Data Modifications
      • Exercise 4: Using DDL Triggers to Prevent Tables Being Dropped
    3. Instant Initialization - 2 exercises, 30-45 minutes
      • Exercise 1: Enabling Instant Initialization
      • Exercise 2: Security Vulnerabilities Created by Instant Initialization
    4. Peer to Peer Replication - 5 exercises, 75-90 minutes
      • Exercise 1: Implementing a Replication-Ready Schema
      • Exercise 2: Configuring and Implementing Peer-to-Peer Replication Configuration Using the Replication Wizards in SQL Server Management Studio
      • Exercise 3: Using the Dual Database Monitor
      • Exercise 4: Adding a new Peer Server
      • Exercise 5: Monitoring Peer-to-Peer Data Flow after a Fault
    5. Table and Index Partitioning - 4 exercises, 75-90 minutes
      • Exercise 1: Range Partition Function
      • Exercise 2: Partition Scheme
      • Exercise 3: Partitioned Table
      • Exercise 4: The Sliding Window Scenario
    6. Snapshot Isolation - 5 exercises, 75-90 minutes
      • Exercise 1: Pessimistic Locking
      • Exercise 2: Activating Snapshot Isolation & Read Committed with Snapshot Isolation 
      • Exercise 3: Using Snapshot Isolation (SI)
      • Exercise 4: Using Read Committed with Snapshot Isolation (RCSI)
      • Exercise 5: Monitoring Snapshot Isolation & Read Committed with Snapshot Isolation 
    7. Online Operations - 2 Parts, 75-90 minutes
      • Part 1: Online Index Operations
        • Exercise 1: ONLINE Index Move (for better isolation)
        • Exercise 2: Partition an Active Table ONLINE
      • Part 2: Partial Database Availability and Online Piecemeal Restore
    8. Database Mirroring - 2 large sesions with TONS of exercises, 4+ hours
      • Part I: Database Mirroring in Action
        • Exercise 1: Configuring and Implementing the High Availability Database Mirroring Configuration – using Transact-SQL through a SQLCMD master script
        • Exercise 2: Using the Dual Database Monitor and Transparent Client Redirect
        • Exercise 3: Initiating Failover in the High Availability Configuration
      • Part 2: Understanding and Implementing Database Mirroring
        • Exercise 1: Configuring and Implementing Database Mirroring using the SQL Server Management Studio
        • Exercise 2: Configuring the Database Mirroring Monitor, Mirroring Threshold Alerts and WMI Event Alerts
        • Exercise 3: Converting to the High Protection Configuration and Comparing Performance between Synchronous and Asynchronous forms of Database Mirroring 
        • Exercise 4: Configuring and Implementing the High Availability Database Mirroring Configuration – using Transact-SQL through a SQLCMD master script 
        • Exercise 5: Initiating Failover 
          1. Part I: Manual and Automatic Failover in the Synchronous forms of Database Mirroring Configuration 
          2. Part II: Preventing “split brain” in the High Availability configuration 
        • Exercise 6: Converting to the High Performance Configuration and Forcing Failover with Potential Data Loss 
    9. Service Oriented Database Architecture - 5 exercises, 3+ hours
      • Exercise 1: Setting up simple Service Broker messaging
      • Exercise 2: Setting up Inter-instance Services
      • Exercise 3: Setting up dialog security and encryption
      • Exercise 4: Setting up application-specific functions
      • Exercise 5: Using Query Notifications

And........ if that doesn't motivate you - we might also giveaway a Manageability DVD that's packed with Tools demos/labs and some SP2 specific stuff such as customized reports (which we'll talk about in the last part of our pre-conference workshop). OK, so I hope to see you at TechEd.......... the pre-conference alone is worth it!

THANKS,
kt

Categories:
Events | Resources | SQL Server 2005

While at SQL Server Connections in Orlando, Stephen Wynkoop of SSWUG stole some morning time for an interview (morning time is not my best but we did get a lovely "I got my mug on SSWUG tv" mug so that made it OK :) :). We (Paul and I) had a great time chatting about Diaster Recovery, Backup/Restore, general best practices and well - games (specifically - the VERY addictive game of Blokus). Here's the interview link: http://www.sswug.org/columnists/editorial.asp?id=1135.

Enjoy!
kt

PS - If any of you pick up (and become completely addicted to) Blokus, let us know! It's great for 2 to 4 players and extremely fun when a 5 year old "wild card" sits in and throws moves that you just can't understand (but later come to really frustrate you :) :).

Categories:
Events | Opinions | SQL Server 2005

In the quest for more (and more and more ;) information, I've been told about a new link - from the SQL Server Books Online team... it's call the "SQL Server 2005 Books Online Scoped Search" and it allows you a "live" search format for accessing content in the SQL Server books online. And - because they're online - they are the most up-to-date. I'm not sure how frequently they update these BOL (vs. the downloads that we get) but my guess is that they update the online version frequently and then do a BOL refresh every now and then which includes that which is online.

Anyway, I always like to have the latest version on my laptop BUT it's nice to have a quick/easy way to find content online. Oh - and please make sure you give these guys feedback as this is a new site, still with the ability to make tweaks where necessary!

Have fun: http://search.live.com/macros/sql_server_user_education/booksonline

Cheers,
kt

PS- Check out Buck Woody's Blog here: http://blogs.msdn.com/buckwoody/ and subscribe here. He's a Technical Content Developer for the SQL Server Documentation Team and he's blogged items about the BOL, how to search, updates, etc.

Categories:
Resources | SQL Server 2005

OK, I've been complaining about finding resources - for a long time... AND, I've been complaining about how I can never tell if a whitepaper is on MSDN or on TechNet or on Microsoft.com or on x, y, or z. Well........ finally, I've done something about it. I've *started* to put together (and verify) a list of what I think are the top whitepapers out there. This is by no means a complete list AND I haven't read every one of the papers I've referenced. However, I've only linked to whitepapers written/published by reputable sources and I've checked every link to make sure it works. Primarily, this is the list of whitepapers that I reference the most - in classes, seminars, workshops, etc. And, it's not a blog entry - it's an actual webpage so it should [hopefully] be easy to find. I plan/hope to do this for blogs and other useful stuff BUT, it takes quite a bit of time verifying each of the links (and of course, searching/finding the darn thing when someone has broken the link :).

So.... after the first 5+ hours of work - here it is: http://www.sqlskills.com/whitepapers.asp.

Thanks for reading (and wow - you have a lot more reading to do now :) :) :),
kt

Categories:
Resources | SQL Server 2005

Last week while Paul and I were in the UK delivering a one day seminar on Crucial Database Maintenance Techniques, we met David McMahon from the Next Generation User Group. They're doing some exciting things in the UK and even for the wider community - for example - podcasts. Paul and I were interviewed for one and it's ready for download here.

Enjoy!
kt

Categories:
Opinions | SQL Server 2005

Another great DotNetRocks interview has been completed. It's Paul Randal's session on Disaster Recovery, DBCC, Index fragmentation (and defrag) and [unfortunately for me] a lot more. All I can say is that I was ambushed...

thanks Richard
   thanks Carl...

Enjoy: http://www.dotnetrocks.com/default.aspx?showNum=217
kt

Categories:
Events | Resources | SQL Server 2005

I had a discussion earlier today (with Paul Randal) about many misconceptions that exist about upgrading databases and more importantly, about "downgrading" databases. Really, the issue is that I've heard people get frustrated when they find that things like backup/restore works FROM SQL 7.0/2000 TO SQL Server 2005 but not the other way around - even if the database is in SQL Server 2000 (80) compatibility mode. First and foremost, compatibility mode only affects parsing, query processing, and general data manipulation; it does not affect physical storage (well, there's more to it than that but that's a general overview). When you upgrade a database to SQL Server 2005, you WILL benefit immediately from changes in the storage engine, etc. regardless of compatibility mode. Compatibility modes are there to give you time in updating/upgrading your code - if/when necessary. Most code will work when upgrading but some code may not be supported because of changes to keywords, syntax changes, etc... The best thing to do is check your application compatibility with the Upgrade Advisor. I did a a couple of webcasts on Installation/Upgrade as part of my 11-part series on TechNet. See the blog entry for the entire series here. Part 3 and part 4 are focused on Installation and Upgrade and their associated blog entries have a lot of additional links (including links to the Upgrade Advisor as well as a series of things you might want to do before you upgrade). Also, be sure to checkout the upgrade site off of the main Microsoft SQL Server site.

How to move USER databases around - a quick list of what's supported between versions

Backup/Restore from 7.0 to SQL Server 2000
Detach from 7.0, copy the files, then attach to SQL Server 2000
Backup/Restore from 7.0 to SQL Server 2005
Detach from 7.0, copy the files, then attach to SQL Server 2005
Backup/Restore from 2000 to SQL Server 2005
Detach from 2000, copy the files, then attach to SQL Server 2005

Why use Backup/Restore?

PROs

  1. Because you have a backup! This will allow you to go back to the version from which you came. However, without any changes made on the uplevel version.
  2. Because it doesn't require the database to be taken "offline" when the backup is performed (note: that this is both good and bad - bad because you don't really know the exact point in time to which the database reconciles...which may not matter if you're just testing).
  3. Because the backup will be the size of data only and will not include database free space. Free space is not backed up (e.g. a database with a 100GB data file with only 20GB of data should yield a file that's roughly 20GB in size). I say "roughly 20 GB" because the internals of a backup require that the transaction log records for the activity that occured during the backup process are also backed up with the full database (or differential) backup. This is actually the basis for why transaction log backups are not supported during a full/differential backup in SQL Server 2000 (they are in SQL Server 2005). However, this is the reason why the transaction log cannot be cleared while a full or differential is ALSO running in SQL Server 2005.

CONs

  1. You don't know the exact point in time to which the database reconciles (it will be the time that the backup completed) AND logs CAN be restored uplevel as well. NOTE: If you're interested in creating an exact point in time version of the database - consider putting the database into "restricted user" mode or "single user" mode (so that user transactions are not allowed during the backup). Again, this may not be a concern.
  2. It takes time to complete the backup (there are four phases of a restore: create/initialization, copy, redo, undo). Make the create/initialization *much* faster by enabling Instant Initialization. See my Instant Initialization blog post for more details.

Why use detach/attach?

PROs

  1. It's simple, it's fast... but once detached then the database is OFFLINE.
  2. You know the exact point in time to which it reconciles because no transactions are allowed into the database once it is offline. Again, this may not be a concern.

CONs

  1. You must copy the entire file - including the free space to the other location and the network copy might be the most expensive (meaning time consuming) part of the entire process. However, once copied, the files do NOT need to be created on the destination because on attach, these files will be used.
  2. The database is offline once detached and during copy.
  3. If you don't COPY the files and instead you attach the detached files, you will have ABSOLUTELY NO WAY of getting back to the version from which you detached. (ah, this is probably the single most important reason for why I prefer backup/restore!)

Summary for "How to move USER databases around"
Between these versions "upgrades" are supported ONLY to the uplevel version. There is NO single (or simple) feature that can be used to get back to the version from which you started (without exporting/importing all of the data). There is also no undocumented back-door to do this either (no trace flags, no DBCC commands, NA DA!!! as per Paul).

What about System Databases?
This is a whole other can of worms to open and the easiest thing I can say here is that you generally should not move/upgrade system databases across machines. These are upgraded through "in-place" upgrades of SQL Server (on the same machine) or through manual migrations (to different machines) of the users/objects (SQL Agent Jobs, user-defined system procedures in master, logins in master, etc.). This is not an easy process (manual migration) but may prove to be a better choice over an upgrade in place if something were to go horribly wrong (which is unlikely but I'm a "what's the worst case scenario" person when it comes to availability :). The other benefit of NOT upgrading in place - and instead MOVING databases from one version to another on an upgrade - is that you get to complete some basic "spring cleaning". New hardware, freshly formatted, freshly installed/configured OS, clean disks, etc. This can often alleviate some of the strangest, hard-to-determine-problems, that have plagued you for weeks/months. Like I said, this is a whole other can of worms to open!

But - if you're interested in moving system databases around on the SAME machine, here's a great KB that covers the required options, syntax, rules and restrictions: How to move SQL Server databases to a new location by using Detach and Attach functions in SQL Server

And - if you're interested in transferring logins and passwords between instances (for upgrade or for sync'ing two servers used to create a standby partnership - with Database Mirroring and/or Log Shipping), here's a great KB article that includes links to other articles even uplevel transfering of logins (like 2000 to 2005): How to transfer logins and passwords between instances of SQL Server

And - that's it for this week (probably)... two in a row is not likely to become three in a row (just setting expectations :) :) :),
kt

Categories:
Resources | SQL Server 2005 | Tips

Instant Initialization is a new feature of SQL Server 2005 that is based on an NTFS feature that was added to Windows XP (and therefore is also available in Windows 2003 Server). It's a feature that's seemingly simple; it allows file allocation requests to skip zero initialization on creation. As a result, file allocation requests can occur instantly - no matter what the file size. You might wonder why this is interesting or why this make a difference? Most file allocation requests are small requests, with small incremental changes (like .doc files, .xls files, etc.) but database files can be rather large. In fact, they should be rather large as pre-allocation of a reasonable file size is a best practice to reduce file fragmentation. Additionally, autogrowth causes performance delays (more so in 2000 than 2005) but it's generally something that you want to avoid when possible. As as result, database creation times can take minutes to hours to days, depending on file allocation request. But - it's not just for database creation. ALL file requests can leverage this feature: file creation for a new database, adding a file to an existing database, manually or automatically growing a file and (IMO - the best) restoring a database where the file (or files) being restored does not already exist. The reason I think the last feature is the best is that it can reduce downtime if a database is damaged and allow you to get back up and running more quickly. This is especially important for databases that cannot leverage partial database availability, which is an Enterprise Engine feature. So, to give you some motivation, here is a test that I performed just to have some interesting and comparable numbers.

Performance Test with Zero Initialization
Hardware: Dell Precision 670 Dual Proc (x64) with Dual Core, 4 GB Memory, RAID 1+0 array w/4-142 GB, 15000rpm disks
   CREATE DATABASE with 20 GB Data file = 14:02 minutes
   ALTER DATABASE BY 10 GB = 7:01 minutes
   RESTORE 30 GB DATABASE (EMPTY Backup) = 21:07 minutes
   RESTORE 30 GB DATABASE (11GB Backup) = 38:28 minutes

Performance Test with Instant Initialization
Hardware: Dell Precision 670 Dual Proc (x64) with Dual Core, 4 GB Memory, RAID 1+0 array w/4-142 GB, 15000rpm disks
   CREATE DATABASE with 20 GB Data file = 1.3 seconds
   ALTER DATABASE BY 10 GB = .4 seconds
   RESTORE 30 GB DATABASE (EMPTY Backup) = 5 seconds
   RESTORE 30 GB DATABASE (11GB Backup) = 19:42 minutes

SQL Server can leverage this feature for DATA file requests ONLY; the transaction log must be zero initialized because of its circular nature... which brings me to why this is not "on by default" or more specifically - HOW do you get this feature. First, there's absolutely no syntax change required - SQL Server will use it if it has access to it (so what does that mean?). The SQL Server service must have been granted the Windows permission - "Perform Volume Maintenance Tasks". By default, Windows Administrators have this permission but as yet-another-best-practice, we recommend that your SQL Server run under an account that is a "lower privileged" account (i.e. NOT an administrator). Other ideal options include running as "network service" or running as a dedicated user/domain account that has very few permissions except to SQL Server and it's required resources. A lot of folks recommend using network service for its simplicity (it doesn't have a password and it has limited network/local rights) and I agree with this as long as it's truly dedicated to SQL Server. If network service is used by other services on the same machine then you could compromise security of your SQL Server (or the other services) with the elevated permissions that SQL Server grants or visa versa by the permissions that other applications may have granted to network service. Again, I'm not against network service BUT I would check your local permissions to see if there's anything that jumps out at you. If you've installed other applications/services/etc. you may have already compromised the security of the network service account. I would love to know if anyone has a quick/easy way to check windows permissions to see what may have been granted in addition to the default permissions OR even a link to where the defaults are listed online... I've had trouble doing exactly this when searching, etc... feel free to post links/comments in your comments!. Anyway, with a dedicated user account, you can make sure that it's not compromised because you only use it for SQL Server. But, even these have negative issues - like required passwords that networks invalidate after n days and that you must change on a server/service basis. From a management perspective, this can be difficult.

SIDE NOTE: Managing the service account password is a lot easier in SQL Server 2005 with the SQL Server Configuration Manager (SQL-CM). The SQL-CM allows you to change the password to a service without an active connection (meaning even if the service isn't started) and it invalidates the login token so that password changes don't require a restart of the service. SQL-CM also has a command-line interface and is scriptable with WMI. The WMI Provider allows server settings, client and server network protocols, and aliases to be scripted through the WMI Provider by means of simple VBScript code (or by using the command-line tool). What you could end up doing is creating a script that changes the password of your services on all of your servers (for example, when a password policy is enforced that requires that the passwords of service accounts be changed). I've recently completed a whitepaper that highlights the Management tools (it's really just an overview but even then it turned out to be quite large as we looked at the tools in many different ways). I'll certainly let you know when the whitepaper is published (which should be within the next couple of weeks).

Granting the permission "Perform Volume Maintenance Tasks"
To use instant initialization, your SQL Server service must be running with an account that has the required privilege. If your SQL Server service is running as a local administrator this permission already exists. For a service account which is not a local administrator (again, recommended!), the necessary privilege to grant is Perform Volume Maintenance Tasks. This permission can be granted by an administrator through the Local Security Policy tool (Start, All Programs, Administrative Tools) and once granted, SQL Server automatically uses instant initialization. IMPORTANT NOTE: If this permission is given while SQL Server is running, then SQL Server must be stopped and restarted. However, once the server is restarted, no other syntax or permissions are needed. All data file creation/extension options will use instant initialization for data files created on NTFS volumes when SQL Server 2005 is running on Windows XP or Windows 2003 Server.

Why isn't this on by default?
OK, so after all of this... the gains that you see and the lack of changes to syntax, etc. You're probably wondering why this isn't on by default? It's a security issue. The biggest vulnerability is with SQL Server Administrators who are NOT also Windows Administrators. Windows Administrators have access to local files and can easily see all files stored on the local server. For files that are not encrypted (and are not already open to another process), an Administrator can open and/or modify these files using an appropriate editor. For files that are encrypted, an Administrator can at least view the encrypted information using a hex editor. By granting “Perform Volume Maintenance Tasks” to a SQL Server instance, you are giving administrators of the instance the ability to read the encrypted contents of a recently deleted file (ONLY IF the file system decides to use this newly freed space on the creation of a new database - created with instant initialization) with the undocumented DBCC PAGE command.

SIDE NOTE: The format for DBCC PAGE is undocumented in the Books Online but you will find tips and tricks on many “official” Microsoft blogs. The SQL Server Storage Engine blog (http://blogs.msdn.com/sqlserverstorageengine/ has some very good blog posts on internals and often describes undocumented commands. Specifically, check out the blog entry titled: How to use DBCC PAGE. The first three components are fairly straightforward: database id (or name), file id, and page id. The fourth component is the tricky one: printopt (or print option). The print options for DBCC PAGE are as follows (taken almost verbatim from the SQL Server Storage Engine blog - and Paul said I could :):

0 - print just the page header
1 - page header plus per-row hex dumps and a dump of the page slot array (unless it’s a page that doesn't have one, like allocation bitmaps)
2 - page header plus whole page hex dump
3 - page header plus detailed per-row interpretation (in this case, this option is not available - even if the data you are trying to read is from a previously deleted database because the metadata is not accessible only the raw page data)

Bear in mind, even if you can access this [encrypted] information, making sense out of this data will be challenging if not incredibly difficult.

In production environments, database files should NOT be located on file server drives - especially those where restricted and/or sensitive files are stored. As a result of prudent security measures, the true impact of using instant initialization is low. However, because this vulnerability exists, this feature is off by default.

Want to try this?
I've written a lab on Instant Initialization AND it has an interesting sequence of exercises (using multiple instances):

  • You create a database on instance: SQLDev01
  • Populate a large chunk of pages with very contrived (and easy to find) "junk" data

USE TestWithZeroInitialization
go

 

CREATE TABLE JunkData
(
   
JunkDataID int identity,
   
JunkDataValue char(8000) 
      DEFAULT REPLICATE('Junk', 2000)
)
go

 

SET NOCOUNT ON
go

 

DECLARE @Counter int
SELECT
@Counter = 0
WHILE @Counter < 20000
BEGIN
   
INSERT JunkData DEFAULT VALUES
   
SELECT @Counter = @Counter + 1
END
go

Drop the database from instance: SQLDev01

  • Create a new database on a different instance: SQLDev02 (which has been configured to use Instant Initialization) and hope that it uses the freed space by having dropped the first database
  • Start walking various pages (using DBCC PAGE) to see if you can view the "junk" data from the dropped database.

DBCC PAGE ('TestSecurityExposure', 1, 200, 2)
DBCC PAGE ('TestSecurityExposure', 1, 400, 2)
DBCC PAGE ('TestSecurityExposure', 1, 600, 2)
DBCC PAGE ('TestSecurityExposure', 1, 800, 2)
DBCC PAGE ('TestSecurityExposure', 1, 1000, 2)
DBCC PAGE ('TestSecurityExposure', 1, 1500, 2)
DBCC PAGE ('TestSecurityExposure', 1, 2000, 2)
DBCC PAGE ('TestSecurityExposure', 1, 2500, 2)
DBCC PAGE ('TestSecurityExposure', 1, 3000, 2)
DBCC PAGE ('TestSecurityExposure', 1, 3500, 2)
DBCC PAGE ('TestSecurityExposure', 1, 4000, 2)
DBCC PAGE ('TestSecurityExposure', 1, 4500, 2)

 

In most cases the very first output (for page 200) will return Junk data. If this is not the case, simply drop the TestSecurityExposure database and recreate it again. Sometimes it’s a timing issue and sometimes it could be a background process (like Windows Update) that uses the expected pages. Regardless, if you do get the same pages again - our contrived data should be easy to find.

You can certainly create the environment on your own and see if you can get it to work. OR, you can get a copy of our AlwaysOn DVD that has the appropriate lab environment. I tend to give away the AlwaysOn DVD at events I speak at (on Availability/Disaster Recovery) but I'm also happy to send a few out over snail mail (it needs 2GB of memory and 10GB of disk space for the virtual environment AND you need to have Virtual PC or Virtual Server installed - which are freely downloadable from Microsoft). Paul asked for DBCC CHECKDB information here (to get a free DVD sent to you) and I'm going to ask for instant initialization numbers and how this has helped. Post a comment here and then send me an email with your snail mail address information as well. I'm willing to do this for the first 10 responses... go!

Have fun...and thanks for reading!
kt

PS - For those of you in our UK (London) Event tomorrow, we're giving away the AlwaysOn DVD (and an even cooler SQLskills pen... lol). Now there's motivation if the content doesn't (NOT!).

OK - I feel like I know a fair amount about SQL Server but sometimes I also feel like I don't :) :) I'm continuously amazed at how big a product SQL Server is... today was one of those days when I felt "I don't"!

I've been wanting to know more and more about the new features in the tools and the direction in which the tools are headed so... I setup a meeting with Paul Mestemaker (it helps that I live in Redmond and I'm working on some SP2 resources for the team :); it was a great meeting. Some exciting new features and some great new directions in which the tools are headed. I like the way they're thinking and I especially like the options that are now in place to discover, use and customize the "reports" feature within Management Studio (just to name one!). What I learned (that was the highest on my "I didn't know that" list) is that quite a few gems are released as part of the Feature Pack for SQL Server. The Feature Pack "is a collection of standalone install packages that provide additional value for SQL Server 2005. It has been updated for SP2." From that description, it doesn't leap out at me as exciting AND I often know about many of these tools through other channels - but usually it's just "a tool at a time". The thing that's nice about the FP page is that it seems to be a nice and central, single location for ALL of the "add-ons" for SQL Server. It includes things like the Upgrade Advisor (which I typically point people to individually on it's main page) and will include (once it ships) things like the BPA (SQL Server 2005 Best Practices Analyzer) BUT it also includes things like a standalone download for SQLCMD so that you don't need to install all of the tools if you just want this lightweight client for automation. Additionally, it includes the SQL Server 2000 DTS Designer Components if you want to edit/modify/maintain DTS packages in 2005 before you rewrite/convert them to SSIS.

So - the point is that there's lots of great stuff out there and sometimes it just takes another person, a blog entry, or a few minutes hitting a company site to see what's new. I'd strongly suggest that you and your team pick a morning - maybe once or twice a month (and round robin who brings the coffee/doughnuts :)) to just browse around and see if there's anything new on your hardware, software and other supporting sites - especially those that don't already have a blog/rss feed (or other form for notifications). No one needs to know everything but knowing where to look can really make all the difference when you do need something (or when you have a concern/problem). And browsing a few sites (occasionally) might make the difference in applying a hotfix/patch before something becomes a big problem. Staying current with hotfixes, service packs, bios updates, firmware updates, etc. is difficult so make it a team effort.

Speaking of service packs, here's the primary page for SQL Server 2005 SP2: http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/servicepacks/sp2.mspx
And don't forget the Books Online Update as it is NOT installed when you update an instance to SP2: http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx

Also, be sure to watch the SQL Server Manageability Blog (aka Paul Mestemaker's blog) moving forward as he'll have the first news about BPA and many other tips/tricks with regard to the tools.

Finally, (and this is great timing too), Paul Randal - prolific author at the SQL Server Storage Engine Blog - blogged about all of the active "SQL Server Product Team"-related blogs here.

And........ if that doesn't keep you busy, I'm not sure what will! :)

Instant Initialization technical details are next and then I'll get back to the Clustered Index Debate. Thanks for reading!
kt

Enterprise only.

OK - I really need to blog more and well - I'm starting today by blogging a "quickie" blog entry on something that I just learned recently and that most of us (who speak/write/whatever on SQL Server) have been saying incorrectly...even marketing :). What's been said is that the new SQL Server 2005 feature "Instant Initialization" is an Enterprise Only feature (remember that Enterprise Only includes ALL skus that have the enterprise engine (EE) - the EE is in Enterprise Edition, Enterprise Eval and Developer Edition). Well... that's not the case. And - personally, I never tried it on standard edition as most of my customers are enterprise customers OR we're doing development work on the Developer Edition. All I have to say here is COOL! Now - I'll post another entry (shortly) that tells you all about Instant Initialization as I think it's a very simple and important feature to allow (no, it's not necessarily on by default - this is part of why I need another blog entry).

As for upcoming events - there are 3 in March.

March 6 - Reading, UK
   
One day workshop on Crucial Database Maintenance Techniques hosted by Tony Rogerson of sqlserverfaq.com.

March 8-10 - Lalandia, Denmark
   
SQL Server OpenWorld hosted by Miracle Denmark.

March 25-29 - Orlando, Florida - USA
   
SQLConnections hosted by Penton Publishing and SQL Server Magazine.

Many of the above events are focusing on HA/DR and Database Maintenance and are copresented with Paul Randal. He blogged about these events here.

And - I'll be back soon! I promise!! (notice that the time between blogging is decreasing - in general :)
kt

Categories:
Events | SQL Server 2005

Well, I've promised to blog more and I'm really going to try to do so. This morning I got the perfect question/comment (in email) to respond to and after working through a response that was taking me upwards of 3 hours (you'll learn later why I have 3 "spare" hours :)......... I figured that it was time to turn the response into a blog post. ;)

Background: The Clustered Index Debate
In the years since the storage engine was re-architected (SQL Server 7.0+) there's been constant debate on how to appropriately choose the clustered index for your tables. I've generally recommended an ever-increasing key to use as a clustered index and many find that counterintuitive. The primary reason people feel it's counterintuitive is that it creates a hotspot of activity. [If "hotspot" is not a familar term - a hotspot is solely an active place within your table.] Hotspots were something that we greatly tried to avoid PRIOR to SQL Server 7.0 because of page level locking (and this is where the term hot spot became a negative term). In fact, it doesn't have to be a negative term. However, since the storage engine was rearchitected/redesigned (in SQL Server 7.0) and now includes true row level locking, this motivation (to avoid hotspots) is no longer there. In fact (and probably even more counterintuitive), the opposite is true. Hotspots (specifically hot PAGES not hot ROWS) can be very beneficial because they; minimize the number of pages needed in cache, improve the likelihood of the required page already being in cache and in general, they minimize the overall amount of cache required. So, this is why many of us have changed our recommendation on where to create the clustering key in 7.0+. Instead of focusing on range queries we now focus on placing the clustering key on an ever-increasing key. In earlier releases, focusing on range queries for the clustered index reduced hotspots for insert/update and this in fact was the PRIMARY motivation to choose them, NOT range query performance! But - there are even MORE reasons to choose an ever-increasing key and they are based on internals as well. These internals are based on the significant changes made in the storage engine for 7.0+. For a quick start on these, I went through them in the Blog entry here.

And, today's email is not uncommon. This is the basis for the title clustered index debate. In general, there are still a lot of questions related to creating clustered indexes to improve "range query" performance. Don't get me wrong, there's definitely a benefit in performance for some range queries but the first thing to remember is that you get only one CL index per table (therefore only one type of range query can benefit). In the real world, t's not likely that you want to see your data exactly in the same way all the time. Therefore it's very challenging to come up with the "right clustered" index if you're using range queries as your strategy. Even worse, the affect of choosing the clustering key to improve range queries causes problems for modifications against that table (INSERTs/DELETEs and UPDATEs). So.............. this is what started my day today. A great email from a reader that brought up these points. The question/comment (modified to hit only the highlights and to protect their identity :) was this:

The most important characteristic for a Clustered Index key is to satisfy range queries. More often than not, if a sufficient range of data will be scanned, the Optimizer will choose the Clustered Index over all others due to the excessive cost of Bookmark Lookup operations. As such, the table KEY is a more suitable clustered index candidate than any surrogate (few every query a database by range of surrogate keys).  [kt note: this second sentence is not entirely true... SQL Server will certainly choose a clustered index over non-clustered that require table scans but there are A LOT of algorithms that SQL Server can use instead of either of these and my examples later show this... non-clustered covering seekable indexes, non-clustered scanable indexes, index-intersection, etc. ] 

Now, when the default behavior for SQL Server was designed such that the PRIMARY KEY was chosen as the default clustered index, it was exactly for this reason.  It is the business key.  It would satisfy uniqueness (by definition of logical KEY).  And, it is well suited for a wide variety of range scans.  However, this is when the PRIMARY KEY is defined on the Business Key of the data.

But, when you introduce the usage of surrogate keys (i.e., IDENTITY) as a physical implementation, and thus transfer the PRIMARY KEY definition to it, two things must be considered.  First, the Business Key this IDENTITY will be a proxy for must still exist as it is still apart of the logical design.  As part of the physical design, the logical key needs to be implemented as a physical constraint to maintain logical uniqueness.  Second, just because a proxy has been defined does not make it a natural candidate for the clustered index.  The business key still maintains this distinction.

What is often cited as the “reason” for IDENTITY PRIMARY KEY clustered index definitions is its monotonic nature, thus minimizing page splits.  However, I argue that this is the only “reason” for defining the clustered index as such, and is the poorest reason in the list.  Page Splits are managed by proper FILLFACTOR not increasing INSERTS.  Range Scans are the most important “reason” when evaluating clustered index key definitions and IDENTITies do not solve this problem.

Moreover, although clustering the IDENTITY surrogate key will minimize page splits and logical fragmentation due to its monotonic nature, it will not reduce EXTENT FRAGMENTATION, which can cause just as problematic query performance as page splitting.

In short, the argument runs shallow.

Luckily, this email arrived with perfect timing for me as I'm sitting in a "bootcamp" event on Always On technologies and I'm not speaking this morning (my colleague Bob Beauchemin is doing lectures on Scale Out technologies: Scalable Shared Databases, Service Broker, DPVs, etc.). Anyway, in addition to listening to Bob, I've decided to continue the blog series on "the clustered index debate". The first and most important point to stress is that minimizing page splits is NOT the only reason nor is it the most important. In fact, the most important factors in choosing a clustered index are that it's unique, narrow and static (ever-increasing has other benefits to minimizing splits).

The Clustered Index Debate Continued
First, there are many angles to look at wrt to "the clustered index debate" and it's not until all of the issues are reviewed, that this strategy (a monotonically increasing key) becomes obvious. So, I think it will probably take a couple of blog posts to really prove this. I'll start up this debate again here...... When you look at a general purpose table (which is most) where the table has ALL DML (S/I/D/U) then you are best off with an ever-increasing key (again, you have to look at the overall impact of all operations against the table - not just select... because I/D/U will also impact select in the long term). So, I'll break this down into each DML operation here. If you don't look at the overall impact, then large tables can end up having a tremendous number of problems once they're put into production. I've certainly heard this concern/debate before (and most people are skeptical at first glance) but when you look at the situation overall, you'll find that "finding the right balance" includes not just looking at range queries. In fact, here's a quick list of the things/tests/numbers/scenarios that help to prove my strategy:

  • Inserts are faster in a clustered table (but only in the "right" clustered table) than compared to a heap. The primary problem here is that lookups in the IAM/PFS to determine the insert location in a heap are slower than in a clustered table (where insert location is known, defined by the clustered key). Inserts are faster when inserted into a table where order is defined (CL) and where that order is ever-increasing. I have some simple numbers but I'm thinking about creating a much larger/complex scenario and publishing those. Simple/quick tests on a laptop are not always as "exciting". But - this is a well documented issue (IAM/PFS lookups) and poor performance on a heap is also referenced in this KB: PRB: Poor Performance on a Heap. note: this KB is quite dated and I don't actually agree with everything in this article however, the general concern of poor performance for inserts is still true on SQL Server 2005.
  • Updates are often faster (when the row needs to be relocated) and for the same reason (IAM/PFS lookups) BUT there are many types of updates and not all updates cause records to be relocated. Here are a few things to think about wrt to updates:
    • Updates that are completely in-place (some examples are where the update is updating a fixed-width column OR to variable-width columns where the row size doesn't change, etc.). These types of updates don't really care.
    • Updates that cause record relocation (where the row size changes) are definitely better by having a clustering key because the record relocation (which will be handled by a split) is defined by the clustering key
    • Updates to the clustering key are the WORST (in this case) which is one of the key reasons for having a cl key that is static (so we have to keep this in mind when we choose a clustering key).
  • Deletes aren't nearly as big of a concern BUT deletes in heaps create more gaps and more gaps creates more work in PFS/IAM lookups and while this helps to reduce wasted space, it still requires the time to find the space........ hence the slowed performance of Inserts/Updates. I've also written some blog entries that cover very interesting test cases for large scale deletes and why you'd want to consider partitioning to optimize for the "sliding window scenario" in this blog entry: MSDN Webcast Q&A: Index Defrag Best Practices - Fragmentation, Deletes and the “Sliding Window” Scenario and it's the LAST one!.
  • Selects.............. now this is the hardest one to go through in just a couple of bullets (ah, I guess this will lead to another one or two posts :) BUT I'll start by saying that the best way to tune the vast majority of range queries is through non-clustered [covering] indexes. But, it's also important for me to stress that I do NOT advocate covering every query (it's impossible to do). What's important to realize in terms of covering is that SQL Server 7.0 and up continues to include internal algorithms to improve performance when you don't have the "perfect" non-clustered covering seekable index and instead still gives better performance than going to the base table (or performing bookmark lookups - as mentioned in the mail...and I completely agree that these [bookmark lookups] can be evil!). To start this discussion, I'll give one of my favorite examples of a large-scale aggregate. The absolute best way to improve the performance is through an indexed view but the data can be gathered through many other algorithms - ideally through a non-clustered covering index that is in order by the group by and that includes the column(s) being aggregated. For example, take this query:

SELECT c.member_no AS MemberNo,
 sum(c.charge_amt) AS TotalSales
FROM dbo.charge AS c
GROUP BY c.member_no

On a charge table of 1.6 million rows here are the performance numbers to handle this aggregation:

  • Clustered table scan (CL PK on Charge_no) with a hash aggregate = 2.813 seconds
  • Index scan (non-clustered covering but NOT in order of the group by) with a hash aggregate = 1.436 seconds
  • Index scan (non-clustered covering in order of the group by) with a hash aggregate = .966 seconds
  • Indexed view = .406 seconds

Now this was a pretty small table (narrow rows and only 1.6 million rows) AND I didn't have any concurrent activity. The concurrent activity would have caused this to be even slower for hash aggregates, etc. Regardless, it proves the point (at least generally). Now, if I wanted to improve this range query then I'd have to cluster on the member_no column (and this is an ideal example because I often hear people say that clustering on a foreign key column helps to improve range/join queries - which can be true as well)......... But - this strategy has a few problems in addition to a few benefits (and we have to look at everything to be sure of our choice/decision). First, member_no is not unique (in the charge table) so SQL Server has to "uniquify" the rows. The process of "uniquification" impacts both time (on insert) and space (the rows will be wider to store each duplicate row's uniqufier). Also, theoretically it could change (in this case that's not true). Anyway, the time it takes for the clustered index is 2.406 seconds which is better than the clustered on the PK (of course) but if I were to also start modifying the rows (which creates splits) or even just insert 15% more rows........ then my table would become fragmented. At that point, the query performance should get worse in the table clustered by member_no table and it will continue to get even worse in the table clustered by charge_no (because of the worktable created in tempdb by the hash aggregate) BUT it won't be all that much worse in the non-clustered index examples (especially the covering index that's in the order of the group by - because this doesn't require a worktable).........

  • CL on member_no = 4.906 seconds
  • CL on charge_no = 6.173 seconds
  • Index scan (non-clustered covering but NOT in order of the group by) with a hash aggregate = 3.906 seconds
  • Index scan (non-clustered covering in order of the group by) with a hash aggregate = 1.250 seconds
  • Indexed view = .516 seconds

This is a great start to furthering the clustered index debate but I do have to admit that it's a counterintuitive and difficult issue to tackle because often isolated tests lead you to different conclusions. In this case though, the non-clustered indexes are better for this range query and the indexed view is the best (but I wouldn't consider the Indexed unless this were more of a read focused database rather than read/write). [and - of course, that statement warrants yet another blog post :)]

So, depending on the tests that you do - especially if you focus only on selects and you don't have modifications (i.e. fragmentation) - then they will make "creating the clustered index for range queries" appear to be best. Again, I'm not just saying this to prevent fragmentation, I'm saying this because I wouldn't use the clustered index OR a non-clustered index with bookmark lookups to handle this query. I'd consider a non-clustered covering that's seekable OR even a non-clustered covering that's scanable before I'd even choose the clustered (and that's what the optimizer would prefer as well). In the end it's really a bit of an art and a science to "finding the right balance" of indexing.

Oh - and if you arbitrarily add a column to use for clustering (maybe not as the primary key) that can help but many would prefer to use actual data... which means [potentially] creating your primary key with a new identity [or similar] column and this can impact your business logic (absolutely). I'm certain that certain tests can show that range queries are faster and it's absolutely correct that business application/usage can be a concern but when you look at the big picture (and the impact on I/D/U) then the benefits of the monotonically increasing key significantly outweigh these concerns. Simply put, a small/narrow key can help join performance and an ever increasing key can also help lookups for rows! (yes, definitely more coming)

Happy Friday! Have a great weekend. I'll try to continue more threads on this debate shortly!
kt

Categories:
Indexes | Opinions | SQL Server 2005 | Tips

Ok - a strange title indeed but it's been a strange couple of months. It all started with a "much needed" vacation and I thought that would get me through the persistent "cold" that I was having all through my travels. Anyway, vacation didn't help and I came back to find that what I had was a sinus infection. Janaury has been filled with antibiotics, sleep and well......still a lingering cough even though the month is over today. The long story short is that I'm starting to come out of it and I promise to start blogging a lot more frequently starting now. In fact, I have 3 or 4 entries in the queue that I'm plotting for upcoming posts.

To get you started - there are some great and NEW resources that were posted just this week by some of my SQLskills colleagues. If you read their blogs then you've probably already seen this but for completeness, I'm going to post them here:

Bob Beauchemin's Blog Entry: http://www.sqlskills.com/blogs/bobb/2007/01/30/TheFirstOfMyScaleoutWhitepapersIsAvailable.aspx
Bob's Whitepaper: Planning, Implementing, and Administering Scaleout Solutions with SQL Server 2005

Liz Vitt's Blog Entry: http://www.sqlskills.com/blogs/liz/2007/01/30/AnalysisServicesPerformanceTuningWhitepaperHasArrived.aspx  
Liz's Whitepaper: Analysis Services 2005 Performance Guide

Ok - there's my first post of 2007 and NOT my last. More to come. Thanks for reading and HAPPY NEW YEAR (at least I got that in January :),
Kimberly

OK, it's been a heck of a long time since I blogged... and for that I apologize. I'm also WAY overdue in my posting my demo scripts from a TON of conferences BUT... now everything has been posted. Check out the past events page on SQLskills and you can find the demo scripts that you're looking for......lots of fun stuff and TONS of scripts to play with and test.

Now - as for the reason(s).... many are business and for that I blame the following (yes, 17 flights [yes, one boarding pass is missing] over ONE 5 week trip with 7 events and 5 continent changes):

The other reason(s) are personal...suffice it to say that the last 6 months have been some life changing times for me and what I'm finding (or trying to find) is that ever important balance between work and life. During this holiday season (and always), I wish you and your loved ones well and I hope that you too can find (and cherish) what's most important to you.

So, you won't see anything else from me for this year but I do hope to be better (and more frequent) with blogging in the New Year and I especially hope to see you again at an upcoming conference. Let me leave you with the most exciting picture I've witnessed this year...it was during my one day of site seeing in Cape Town - where I went cage diving with Great White Sharks (and got horribly sea sick - which is rare for me) but where I was able to witness these amazing and powerful creatures....

Have a happy and safe holiday season!
kt

Hey there everyone - Been a LONG time since I last blogged (sorry!)... key reason (fyi) is that I'm trying to find the ever-challenging work/life balance during the best months of the year (here in Seattle July/Aug are GREAT! months - September is almost always good too). Anyway, it's been a few weeks and I thought I'd catch you up... It all started with some travel (of course) and I was in Chicago for a SQLskills Immersion event and then off to London for another event with SQL Server FAQ (aka Tony Rogerson). I returned during the CRAZY travel restrictions and had to check two laptops (sigh) BUT they both made it back without damage after I purchased kitchen towels, bed sheets, a blanket and a duvet (and two new [cheap] suitcases) in which to pack them. When I got back, I relaxed! I've been up to see the Athabasca Glacier and Johnston Canyon and just this week I took off a day to do the "Lotus Experience" Advanced Driving Course at Pacific Raceways/ProFormance Racing School. I've always LOVED driving so driving 700 miles to Johnston Canyon didn't bother me at all (especially with great company and fantastic tunes - have you heard the latest Snow Patrol?) and the driving course in the Lotus was not only great fun but VERY informational. I've taken the Advanced Driving course before and I think EVERYONE should consider continuing their education in driving - things like Collision Avoidance, Advanced Braking (understanding proper driving under ABS), "high eyes" and so many other things...just make you safer on the road. OK, enough about all that fun/practical stuff.....let's get back to SQL.

Some great new resources are out:

And - it's now time to start enjoying Labor Day weekend, there ain't no labor going on here this weekend. Enjoy - and check out those links next week. ;-)

Cheers everyone,
kt

Been thinking a lot about something that was mentioned in a few of my most recent posts... Especially when I get comments like "that's another item to add to our checklist" or "that's a good trick to add to our arsenal" and well, I thought in this blog entry I'd ask for your tricks that fall under the umbrella of designing for performance.

For example - do you change collations? I had a recommendation here.
For example - do you have a view that you want ordered? I had a recommendation (with caution) here. But - Adam Mechanic came back and said that he's used that trick to improve performance... and, I'm sure that's the case as well!
For example - do you have stored procedure parameters that are giving you grief? I had a series of recommendations in my Optimizing Procedural Code category here.

In fact, sometimes the best form of "hint" to SQL Server is NOT an optimizer hint but instead a more subtle change to the join (derived tables for example) or the infamous subquery -> join rewrite or the join -> subquery rewrite. I'm always asked "which is better - a subquery or a join" and I always answer YES. ;-)  OR taking a complex process and breaking it down into temp tables (I'd try to create views instead of temp tables first and see if the optimizer figures it out but there are cases when sometimes they just don't). Remember, it's not the optimizer's job to find the absolutely BEST plan; it's their job to find a good plan fast. And - they typically do. Really, no general "tricks" work ALL of the time and often they don't help at all but there are LOTS of things that I'm sure you've done and you really want to tell someone about it. How about here? I'm going to try to compile these tips/tricks into a best of...

Categories:
Design | SQL Server 2005 | Tips

Hey there everyone - If you're into Analysis Services, Integration Services, Reporting Services and BI in general - you'll want to check out Elizabeth Vitt's new blog. Liz has been specializing in BI since SQL Server started adding BI-centric components. She's got a wealth of information to share and many great insights into performance tuning as she's working on a BI Performance Tuning resource that will probably hit 100 pages (from current guestimates).

And - no surprise from Liz, she's out the gates running with her first entry on Influencing Aggregation Candidates.

Subscribe now!

And a big welcome to blogging for Liz!
kt

Categories:
Resources | SQL Server 2005 | Tips

Well, it's been a GREAT week here in Switzerland while working with my partner Trivadis. Today, we wrapped up a two-day course on Designing for Performance (in Geneva) while on Monday/Tuesday we did a two-day course on Indexing for Performance in Zurich. The food, the wine, the cheese, the butter, yum! Oh... and the questions/comments/technical focus, etc. has all been great. :) I'm flying home today (Sat) so wish me luck on having internet access at 36,000 feet again (probably not...I'm flying United instead of Lufthansa - and it's only Lufthansa that has FlyNet). Wow - can you imagine where we're going to be in only a couple more years? Internet access everywhere! (hmmm.. how do we escape? well, that's another blog entry for another day :)

Anyway, one of the great things about teaching is that I get to meet all sorts of people and work through all sorts of interesting problems... And - this blog entry is based on a discussion I had with [a very blogless ;-] Meinrad Weiss - a Trivadis employee/consultant AND a fellow RD. (I was bullied into blogging by CV so now I do my part and do the same to others)

I can't remember how it started but somehow a discussion started on Top 100 PERCENT being used in views. I mentioned that while this was a good trick in SQL Server 2000, it has been REMOVED from SQL Server 2005 (meaning that TOP 100 PERCENT does NOT order the data within a view). Theoretically, I agree with this decision - data within a view should NOT be ordered. A view should SOLELY define a tabular set. It is up to the query which is accessing the view to define the presentation of the view. Using TOP within a view should be limited to ONLY when it is used to further define the data set (i.e. TOP 10 PERCENT... ORDER BY TotalSales DESC makes perfect sense).

Now, having said that - it was a cool trick - but with Pros/Cons. The obvious Pro is simplicity in access. While adding the ORDER BY to the query against the view really isn't all that difficult, it does make it a bit easier for quick/simple query access. BUT - there's a HUGE con here too. If the view starts getting used for other purposes (like in joins to other tables), then the being ordered before the joins, etc. can cause you an additional step that is NOT necessary. As a result, performance was compromised.

Long story short, I generally recommended against it but it was still cool. So - then Meinrad started playing and came up with - what about 99.9 on a table that has < 100 rows OR 99.99% on a table that has < 1000 rows, etc. And - yes - that DOES work, because SQL Server rounds to 100%. So, you are back to getting 100% of your data, ORDERED within a view. But - you need to set your percentage to an appropriate percentage based on rows - but what if you don't know the row count? How about TOP n where n = the max value for a bigint (9,223,372,036,854,775,807)?? That should always work...and it does.

OK - so what's the point? Yes, we DO have a workaround for the removal of TOP 100 PERCENT in SQL Server 2005 - but be CAREFUL - you are potentially shooting yourself in the foot. If this view is NEVER used for anything but SELECT * FROM View, you're OK. If you start adding joins, etc. then you might get into trouble. In the showplan below - the data returned is EXACTLY the same.

SELECT C.ContactName, Sub.*
FROM OrderSubTotalsViewOrdered2 AS Sub
   
JOIN Orders AS O ON O.orderid = Sub.Orderid
   JOIN Customers AS c ON c.customerid = o.customerid
WHERE C.City = 'Madrid'
ORDER BY SubTotal DESC
go

SELECT C.ContactName, Sub.*
FROM OrderSubTotalsViewNOTOrdered AS Sub
   
JOIN Orders AS O ON O.orderid = Sub.Orderid
   JOIN Customers AS c ON c.customerid = o.customerid
WHERE C.City = 'Madrid'
ORDER BY SubTotal DESC
go

BUT - the first plan of execution queries against the ORDERED set and the second against the un-ordered. Check out the showplans below:

This is a VERY COMPELLING reason to BE CAREFUL ordering data within a view. While the trick does work, please use it sparingly.

If you want to play with the views created above, you'll need a copy of the Northwind Database and you'll need this script: Top 100 Percent in SQL Server 2005.sql (4.46 KB).

Have fun!
kt

PS - I'm adding my blog to Technorati so I need to post their link to start generating my profile... here we go: Technorati Profile

Categories:
SQL Server 2005 | Tips

So, I've now spent the last couple of hours playing with Database Mail and HTML formatted messages being sent to the SQLskills subscribers. It's been a fun learning experience as I think I've found a bug with the email account name length...let's just put it this way - don't be too descriptive with your account names.

Outside of that - it's amazing how well queue based email works. The old mapi based mail would take a LONG time to complete the batch mail processing but now - with queue based mail it's done in seconds.

The best part is that I'm on a flight over the Atlantic right now...on my way to Frankfurt. I think this is the MOST productive flight I've ever had!

Have a great weekend,
kt

Categories:
Events | Opinions | SQL Server 2005

I've posted all of my demo content from TechEd 2006 and wow - it was a lot of fun! I created many new and fun demos as I tried to keep everyone awake through the sessions (cause it seems like there were way too many evening events - didn't it? ;-)). All of the content is posted here.

Finally, can I just say that Paul (and team) has been on a roll. They're blogging machines. If you're not reading this blog... you're DEFINITELY missing out.

Have a great weekend... I'm off to Switzerland today.
kt

Categories:
Events | Resources | SQL Server 2005

Well, if you're wondering why I've been so quiet this week... it's a myriad of events all coming together and/or being finalized right in time for TechEd. In working really hard (especially crazy was today) for some final TechEd content, I realized that a lot of people don't really know what goes on behind the scenes of some of these really huge events. Brian Marble has been blogging about this and you can learn some interesting things by checking out his blog. And for some fun statistics related to TechEd, here is an idea of the quantity of food and drink that will be consumed:

  • 1,250,000 pieces of "Mikes & Ikes" will be consumed over the course of the Tech Ed 2006 week
  • 18,750 pounds of salad will be prepared and offered at meals
  • 83,700 ice cream novelty/fruit and yogurt bars have been ordered for this function
  • The total amount of fruit ordered will fill 3/4 of full size tractor-trailer
  • 60,000 eggs will be eaten by attendees at breakfast (this is equal to 4,800 dozen cartons of eggs)
  • It will take 4 semis to transport the 150,000 bottles of water consumed
  • 1.6 million ounces of coffee will be poured and consumed (conservative estimate)
  • More than 50,000 pounds of carbohydrates will be consumed at Tech*Ed (Atkins who?)
  • 1,500 table cloths will be used and re-set on a daily basis (7,500 for the week)
  • A minimum of 2,000 antacid tablets are likely to be consumed at this event

As for the technical content, well that's not too shabby either. There are over 900 breakout sessions, chalk talks, ILLs (Instructor-led labs), HOLs (Hands-on labs) and general/keynote sessions. There's a lot of technology that comes together for a show like this and there's even a DVD that's available after the show with all of the breakout sessions on it. The key point is that there's a lot going on and I'd have a hard time believing that you couldn't find something to do during every timeslot (for me there are multiple time slots where I'm torn between delivering my own session and attending another...but, I have a feeling I know where I'll end up :).

One thing that you can do in almost every timeslot is an HOL (Hands-on Lab) and for SQL Server there are more than 10 of them. Each HOL is focused and technical and each covers a specific technology or topic. For TechEd 2006, I've written two of the HOLs: DAT007 and DAT010. Specifically, DAT007 is Database Mirroring in SQL Server 2005 SP1 and DAT010 is Table and Index Partitioning. These HOLs cover everything from design to implemenation to failover to monitoring - for Database Mirroring and for Partitioning the lab goes from design to implementation to performance to the sliding window scenario. They were a lot of fun to write and I hope a lot of fun to go through. If you're interested in hearing more about them, Mark Penaroza did a couple of interviews about them. He blogged about it here and mentioned that the interviews are available on Commnet (the Microsoft TechEd attendee website). I've also posted the interviews here (DAT007 Interview (4MB mp3 file) and DAT010 Interview (2.75MB mp3 file)) so that you can get some insight into the things we're doing to help get you started and ready with these new technologies.

Finally, since TechEd is sold out, I know that not all of you will be there. As a result, there's "Virtual TechEd". Virtual TechEd is a site dedicated to getting some of the content and resources out to folks that just couldn't attend. The Virtual TechEd site is here: http://virtualteched.com/default.aspx

So, I think that's it for now. Still enjoying the comments you're making on the last blog entry about the version you're running and why. Seems like we all have the same problem - time and money ;). Keep those comments coming!

Thanks for reading,
kt

Categories:
Events | Resources | SQL Server 2005

Ha... did that get your attention? Well, what I really hope to do is make everyone aware of what's made the Developer Community rounds this week. On Wednesday, Microsoft announced "Data Dude" (aka the Visual Studio sku for Database Developers). This was an announcement that may be glanced over by many DBAs thinking it's just another tool for developers...what can it offer me? And, well, that's where I think there are some VERY cool things to point out. I've been following Data Dude for a couple of months now (ah... a little birdy told me :) and at first I wasn't sure how much it would impact me. However, after starting to get a better feel for their future directions, I've realized that even though I'm not their initial and/or direct target audience that I'll definitely find some great uses for their product. In fact, in getting ready for their announcement and in chatting with a few press folks, I wrote up a small amount of text. Some of this was quoted in the eWeek article here but there's a few more things that I really think you'll (yes, even DBAs) be interested in. This is the second half of the content that was quoted:

For Administration and Operations, I especially like their direction with regard to unit testing and sample data generation. I work with a myriad of customers who do not let development/operations perform testing/tuning on real production data (even a copy) due to data sensitivity requirements/policies. As a result, performance testing can be horribly flawed. With the ability to generate large sample volumes of statistically "real" data, real-world tuning will be possible without compromising data sensitivity. This is the area that I'm most interested in initially but refactoring and schema comparisons are very interesting as well. One of my favorite sayings is "The sooner you start to code, the longer the program will take." (Roy Carlson) as schema changes can be challenging at best and often things can be missed (data types, columns names, etc.). Often the changes are made on alternative systems and then they need to be integrated in - often through comparing schemas and with hand-created alteration scripts. With the ability to have intelligent refactoring, application and database logic can be fixed through a straightforward and flexible interface. This will help to minimize later errors or even harder to recognize performance problems caused by implicit conversions.

So, in the end, I'm not their primary target but I think I'll probably get really into it and try to consider a variety of ways to leverage it for Operations/Administrations teams for "after the fact" tuning cases. However, I do think teams will be even more productive if they adopt it earlier in their production lifecycle.
Now, if you're even slightly interested. You can get a lot more information about Data Dude already. Here's a beginning list:

And, if you're going to be at Microsoft TechEd 2006 - in Boston in a couple of weeks, there's a LOT more information coming. In fact, my pre-conference workshop co-presenter Brian Randell has authored some Hands-on Labs for Data Dude and those will be available in the HOL area. He's on DotNetRocks this week (to be released on DNR's site next Tuesday) and he's chatted with them in the past about Virtual Server/Virtual PC (for hours...now you know how I feel ;). Honestly though, we use VPC/VS a lot in our HOLs and Brian is REALLY knowledgeable about how to optimize them, compact drives, etc. Anyway, here's the link to his last show on DNR and here's Brian's blog entry on Data Dude. And, just as a small hint... you should consider making a Sched+ for the "Live from TechEd" show from DNR. All I have to say is that it might have some great guests on it (maybe even related to Data Dude, hint, hint)!

Now, the last thing that's the most exciting for me to announce is that there are some new bloggers as a result of the Data Dude annoucement. FINALLY, one of my best friends - Gert Drapers - has started blogging (don't forget his already awesome content site: http://www.SQLDev.Net). If you're at all interested in geeky database development stuff, subscribe now! And - many of his team members are great friends too (Richard and Matt!) and I'm very excited to see them blogging as well (it's just that I've been begging Gert to do it for the past couple of years ;)). Anyway, it will be great fun watching this team grow and watching this product evolve.

Here are the Data Dude team blogs:

Here's the official Visual Studio Team Edition for Database Professionals site.

The times are changing............. for the better!
kt

Categories:
Resources | SQL Server 2005 | Tips

But - it was a lot more laid back this time... Once again, it was fun! Thanks Carl. Thanks Richard.

Here the link for the show: http://www.dotnetrocks.com/default.aspx?showID=181 and of course, the general link to DNR is http://www.dotnetrocks.com.

Enjoy,
kt

Categories:
Events | SQL Server 2005

In part 11 of the TechNet webcast series for the ITPro, I spoke briefly about the Oracle Migration Assistant and the recent release of the Community Tech Preview versions of both the Access and Sybase Migration Assistants... A few of you asked for their download location and there were even a few replies that folks had found them... However, after looking around, I figured out that what was found were the OLD and very outdated Access Upsizing Wizards (and that's not this!). The new SQL Server Migration Assistant tools are truly Migration tools - tools that can help convert code, change data types, etc. More than anything they're targeted at being more complete and feature rich than just an "upsizing tool" which is excellent for what it is but still requires a lot of additional work.

Having said all of that, I have the details about the TRUE Migration Assistants.

SSMA for Access download instructions:

Download instructions

  1. Open the download page here.
  2. Select ‘Receive File from Microsoft’.
  3. Enter the Password: w$%dIcKP_TZrf
  4. Download and run ssma-for-access-xxx.msi

SSMA for Sybase download instructions:

Download instructions

  1. Open the ftp download site page here with the following username and password:
         username: SSMA4Syb2
         password: i456$Lk
  2. Download the msi for Sybase or the msi for the Sybase Enterprise Portal (ep).

    IMPORTANT: If you access the files from Internet Explorer, please verify IE Browser Settings using Tools, Internet Options, Advanced, under the Browsing section:

CHECK - Enable folder view for FTP sites
UNCHECK - Use Passive FTP (for firewall and DSL modem compatibility)

So... have fun with those downloads and if you run into any issues/concerns - be sure to post in the SQL Server Migration Assistant newgroup.

Cheers,
kt

OK, so... I don't know how many of you use different collations but if you do then you know that there are two truths:

1) They're very flexible
2) They can cause you a bit of grief (changing collations and tempdb)

Flexibility

As of SQL Server 2000 (or heck, maybe it was 7.0?), database collations could be changed at installation OR set/changed later. You can set the collation when a database is created (if not set, the database will use the server's default). You can set the collation when a table is created (if not set, the table will use the database's default). You can set the collation when a query is executed (which doesn't really make sense unless it's in a WHERE clause or ORDER by clause). And - you can set the collation in a view or stored procedure to do things like case sensitive searching - on the fly. However, neither of these will perform well over large results sets (at least not without indexes) so, I'd be careful of doing any adHoc changes to collations (even in views/sps - without appropriate indexes)!

Anyway, the key point is that they're very flexible. In many international databases/localized databases, column collation differs by table (in order to do efficient sorting, etc.) and different language data may be separated (either with a column that described which language/country code is used OR in different tables).

Grief in Changing Database Collations

Actually, changing database collation is *very* simple. Literally, it only takes an ALTER DATABASE to do. For example, the following code runns flawlessly:

USE master
go

DROP DATABASE TestCollation
go

CREATE DATABASE TestCollation
COLLATE
SQL_Latin1_General_CP1_CI_AS
go

sp_helpdb TestCollation
go

ALTER DATABASE TestCollation
COLLATE Latin1_General_CS_AS_KS_WS
go

sp_helpdb TestCollation
go

BUT... if you go from case sensitive to case insensitive... be careful! It is important to realize that ALL of your tables AND data will need to be checked against the new collation. In fact, changing database collation will not be allowed if the objects/data would no longer adhere to your unique constraints, etc. Check out this more complete script (ChangingDatabaseCollation.sql (2.85 KB)), if you want to see what happens.

Grief with temporary objects

So.. the other area (and this seems to be the one where everyone has trouble), is with temporary objects. If you create a temp table and your database has a different collation other than TempDB (which has the same collation as the system - based on installation), then comparisons/lookups/joins - may have problems. A simple trick to get around this is to use database_default. Check out this sample and you'll see how it works:

CREATE DATABASE Test
COLLATE Icelandic_BIN
go

USE Test
go

CREATE TABLE #test1
(
   col1 varchar(12)
)
go

CREATE TABLE #test2
(
   
col1 varchar(12) COLLATE database_default
)
go

USE Tempdb
go

CREATE TABLE #test3
(
   
col1 varchar(12) COLLATE database_default
)
go

sp_help 'tempdb..#test1' -- Will use TempDB's collation
exec sp_help 'tempdb..#test2' -- Will use Test's collation (Icelandic BIN)
exec sp_help 'tempdb..#test3' -- Will use TempDB's collation
go

So simple, so obvious... and, well - I just found out about that one?! I used to recommend that you explcitly set the collation for every column. Now, that still works - but, it doesn't offer you any flexbility. So, you could get around that with dynamic string execution but that can also get very complicated, very quickly. So... database_default is a VERY simple and clean way of doing this.

Have fun,
kt

Categories:
SQL Server 2005 | Tips | tempdb

The SQL Server team has a few *very* interesting blogs and the Engine Team just started blogging - check it out here: http://blogs.msdn.com/sqlserverstorageengine/ (thanks for the heads up Sunil).

For completeness, here are the bulk of the other SQL team blogs - which I leveraged (aka stole - thanks Euan!) from Euan Garden's EXCELLENT list (his blog roll) of SQL Server Team Blogs.

SQL Server Team Blogs

Excellent CORE/Related SQL Server Team Blogs

Now there's some entertainment for the [holiday] weekend ;). Hope that all of you enjoyed a bit of rest and relaxation this weekend.... now, back to work!

Cheers,
kt

Categories:
Resources | SQL Server 2005

Hey there everyone - The series has completed and I know that many of you struggled to get access to the surveys... Microsoft has asked me to post links to the surveys...so, for completeness, I decided to create this blog entry to have links for every session, every blog link (resources, demo scripts, etc.) and the survey links. I really did have a lot of fun on the series and I hope we can do this again!

TechNet Webcast Series

Session 1: A Fast-Paced Feature Overview and Series Introduction (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 2: Security (Level 200)
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 3: Understanding Installation Options and Initial Configuration (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 4: Upgrade Considerations and Migration Paths (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 5: Effective Use of the New Management Tools (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 6: New Application Design Patterns for Scalability and Availability and the Operational Implications of Service Broker (Level 200)   
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 7: Technologies and Features to Improve Availability (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 8: Implementing Database Mirroring, Part 1 of 2 (Level 200)
   Presenter: Mark Wistrom, SQL Server Program Manager - Microsoft Corp., 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 9: Implementing Database Mirroring, Part 2 of 2 (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry is here
   Session's survey is here.

Session 10: Recovering from Isolated Disasters and Human Error (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry here. And a second blog entry here.
   Session's survey is here.

Session 11: Best Practices in Building Robust, Recoverable, and Reliable Systems (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry here.
   Session's survey is here.

And that's about it! I hope you really enjoy the series... and if you like that one, you might want to checkout the entire 10-part series on MSDN. The link to the blog entry that has all the links (like this one) is here.

Have fun,
kt

Well... 11 of 11 has completed. Friday was our last chat - until next time ;). It was a summary event where I took a slightly different spin on things focusing on grouping technologies by the amount of effort that's needed to implement them. Simply put, we looked at the technologies in order of what gives you the biggest bang for the buck. We ended the session with a ton of great questions (as always!) and there was even a question on the origin of foo (make sure to also see fubar).

First, there were a few links that I wanted to provide from the session, I'll start with those:

And, we also talked about Migrations:

Finally, capacity planning:

  1. Calculate the amount of space needed for your tables (calculate this as rows per page and then required pages as MB)
  2. Calculate the amount of space needed for your indexes (you can use sp_spaceused to get a current ratio of index to data and then use that OR you can estimate 1-3times your current data in indexes...yes, if you have 10GB of tables - you should estimate 10-30GB for indexes)
  3. Calculate in your estimate on future growth
  4. Take your single largest table and multiply by 1.5 for free space. (Use 2.5 IF you're going to use ONLINE index operations). So, if the single largest table is 3GB then I'd add 7-8GB for free space)
  5. Add a "just in case" extra 10-20%
  6. And, I didn't mention this BUT you should also include alerts to help you monitor space usage and significant changes to your free space!

And that wraps up the series. Wow - I can't believe how many of you joined in for questions as well as stayed on until the end. It's really great that so many of you are still having fun with SQL Server as well. I look forward to another series with you...at some point! In the interim, here are a few places where I'll be:

SQLskills Immersion Events - in the US... will be announced shortly. The BEST place to be when we announce the dates for these events is a subscriber on SQLskills. Subscribing is FREE and the announcements are going to be later this month. Here's a link to directly subscribe on SQLskills: http://www.sqlskills.com/login.aspx.

Thanks again for attending the series! It was great fun. I'll post a final blog entry with ALL of the links as well as all of the survey links. I know that they're going to send me these so that you can get easier access to them.

See you next time,
kt

In part 9 of our webcast series titled: Implementing Database Mirroring, we covered the steps from setup to failover to monitoring. There were lots of great questions and I think we could easily go back and do a couple more hours on database mirroring, failover combinations - including manual failover and client application questions. Having said that, there were a few interesting scenarios that came up that I thought I'd add a bit more details about here. For simplicity I created sections...

Where to go for more information on Database Mirroring and SQL Server SP1

Database Mirroring between Editions

Database Mirroring is supported in both the Standard Edition (SE) and the Enterprise Engine (EE) Edition(s): Enterprise, Enterprise Eval and Developer. In the EE Editions all configurations (synchronous and asynchronous) are supported: High Availability (sync), High Protection (sync) and High Performance (async). In the SE, only the synchronous forms of Database Mirroring are supported: High Availability and High Protection. One thing that is true however, (and I learned this as well - durin the webcast in Part 9 - thanks to the question submitted and Mark being present...thanks Mark!), is that even while synchronous mirroring is supported in both SE and EE, you can only create a mirroring partnership between servers of the same edition.

Database Mirroring between Platforms

Database Mirroring is supported in both the Standard Edition (SE) and the Enterprise Engine (EE) Edition(s): Enterprise, Enterprise Eval and Developer. In the EE Editions all configurations are supported: High Availability, High Protection and High Performance. In the SE, only the synchronous forms of Database Mirroring are supported: High Availability and High Protection but not the asynchronous High Performance configuration. One thing that is true however, (and I learned this as well in Part 9 - thanks Mark!), is that even while synchronous mirroring is supported in both SE and EE, you can only create a mirroring partnership between servers of the same edition.

Combining Database Mirroring with Other Technologies

The Books Online has a section targeting exactly this discussion. Review this section in the SQL Server 2005 Books Online (April Update): Database Mirroring and Other Features and Components. Additionally, I've provided a few comments for you to review as well as links to some of the specific BOL topics that exist on these combinations.

Database Mirroring with Failover Clustering

These two technologies CAN be combined but there are multiple things with which you should be aware. First, a failover of a cluster is SLOWER than a failover of a Mirror pair... as a result, it is likely that your secondary server will come online as the new principal in the time that it takes your principal (which is on a cluster) to recover. In a lot of cases, this is good because this keeps you online longer and results in less downtime but it may also be undesireable when your primary is now running at your alternate operations site - which is unstaffed. So, in some cases you may want to prevent automatic failover and instead only use the secondary mirror when you absolutely have to (i.e. NOT just when the cluster fails). If this is the case then you might prefer running with the High Protection configuration of Database Mirroring instead of the High Availability configuration.

This will allow you to manually failover when desired.

As another option - you can increase the timeout for Database Mirroring failover to 90 seconds. If the cluster comes back online within 90 seconds then the automatic detection/failover of the High Availabilty configuration will not occur unless the cluster does not come back online (as the principal) within x seconds. You can configure the Database Mirroring Failover timeout by using ALTER DATABASE.

ALTER DATABASE dbname SET PARTNER TIMEOUT x

Please note, this is only one timeout of many. There are many different types of timeouts in the system that can cause a failover. However, a hard error code generally starts the failure procedure sooner.  Mark pointed this out in his failure detection slides in our TechNet webcast series, Part 8.

Review this section in the SQL Server 2005 Books Online (April Update): Database Mirroring and Failover Clustering.

Database Mirroring with Replication

These two technologies CAN be combined together but not all configurations are supported and where supported, there are specific setup requirements. From the BOL: Replication supports mirroring the publication database for merge replication and for transactional replication with read-only Subscribers or queued updating Subscribers. Immediate updating Subscribers, Oracle Publishers, Publishers in a peer-to-peer topology, and republishing are not supported.

 
Review this section in the SQL Server 2005 Books Online (April Update): Replication and Database Mirroring

Database Mirroring with Log Shipping

These two technologies CAN be combined together but it will require a bit of manual configuration to continue log shipping when a mirror becomes the new principal.

Review this section in the SQL Server 2005 Books Online (April Update): Database Mirroring and Log Shipping.

And - there are others in the BOL. Please reference the sections listed above for more details.

And - with that - we're caught up with our resources and references for this series. Part 11 - the LAST one - is this Friday, May 19. I look forward to your being there LIVE. Register here and come ready with your questions, this one is going to be VERY focused on best practices, ideas/architectures and your questions. Those of you that are there LIVE will help to direct the session.

Thanks!
kt

In the last few minutes of the webcast (part 10), I goofed up one line of code and didn't realize it until today. As my very last demo (and there were at least 10 different scenarios/concepts/demos yesterday) in my webcast, I decided to show a Database Snapshot on a Mirror database. It was the second database snapshot that I had created so my first database snapshot demo was just fine. However, when I went to create the database snapshot on the mirror, I inadvertently left off the most important part "AS SNAPSHOT OF AdventureWorks". The irony is that I tried to query some tables and just ended up (because we were right at the end of the webcast ;)) saying that I probably wasn't getting the table names right. Ha - there were no tables... I hadn't created a database snapshot, I had created just another database - so the only tables I was seeing were the catalog views.

Anyway, just for clarity, I corrected the "Demo Scripts" zip that's associated with Part 10 BUT if you've already downloaded it then you'll have the old (and incorrect) version of this script (SnapshotOnMirror.sql). And, for completeness, I'll put the code that I executed during the webcast here:

USE AdventureWorks
go

USE master
go

CREATE DATABASE AdventureWorksSnap
ON
( NAME = N'AdventureWorks_Data',
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AdventureWorksSnap_Data.mdfss')
-- , SIZE = 167872KB , MAXSIZE = UNLIMITED, FILEGROWTH = 16384KB )
go

and the code that I should have executed here:

USE AdventureWorks
go

USE master
go

CREATE DATABASE AdventureWorksSnap
ON
( NAME = N'AdventureWorks_Data',
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AdventureWorksSnap_Data.mdfss')
-- , SIZE = 167872KB , MAXSIZE = UNLIMITED, FILEGROWTH = 16384KB )
AS SNAPSHOT OF AdventureWorks  <<<< ----------
go

USE AdventureWorksSnap
go

SELECT * FROM person.contact
go

So, quick demos right at the end of the webcast might not have been my best idea ;). But - I'm surprised none of you called me on it?! I'll blame it on this for now.

Have a great weekend,
kt

OK - today's session was quite fun... lots of demos and quite a few "tie-ins" where I tried to bring together many things that we've touched on in our series. And - that's really the point of the series - creating a reliable, robust, scalable and available environment takes MANY different features. You really need to architect a complete solution in order to handle the many potential problems that may occur. And, unfortunately, it's a never ending process; you're never done and you're never going to get everything (sorry!). You will need to re-evaluate, monitor, and manage your system as long as it runs to keep it reliable, available and fast. Something will come up...someday...that you didn't think about, evaluate and/or prevent. But, then you'll know and then you'll put something into place to keep it from happening again.

So - to tie back into some of the other sessions and resources, here is a list of everything to date in the series as well as a few specific references I made during the session.

Demo Scripts are here: 20060512_TechNetWebcast-Part10.zip (25.46 KB) (updated on Sat, May 13 at 2:55 PDT)
Credit Database zip is here. NOTE: This is a 48MB zip which expands to a 175MB backup and restores to a 700 MB database (with a lot of free space for testing, etc.).

TechNet Webcast Series

Session 1: A Fast-Paced Feature Overview and Series Introduction (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 2: Security (Level 200)
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry, here

Session 3: Understanding Installation Options and Initial Configuration (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 4: Upgrade Considerations and Migration Paths (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 5: Effective Use of the New Management Tools (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 6: New Application Design Patterns for Scalability and Availability and the Operational Implications of Service Broker (Level 200)   
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry, here

Session 7: Technologies and Features to Improve Availability (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 8: Implementing Database Mirroring, Part 1 of 2 (Level 200)
   Presenter: Mark Wistrom, SQL Server Program Manager - Microsoft Corp., 
   Session's corresponding blog entry, here

Session 9: Implementing Database Mirroring, Part 2 of 2 (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here.

Session 10: Recovering from Isolated Disasters and Human Error (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   You're reading it! :-)

Recovery Models and Backup/Restore

  • MSDN Webcast Parts I and II cover Recovery Models and some issues/best practices related to changing recovery models. Check out the blog entry here which has links to the sessions and their associated blog entries.
  • MSPress Title: SQL Server 2000 High Availability, Chapter 9: Database Environment Basics for Recovery is here. The MSPress page for this title is here.
  • SQL Server Magazine Article on Isolated Disasters and Recovery (using RESTORE with STANDBY/STOPAT to investigate when a database became damaged) is here. Check out a consolidated list of all of my SQL Server Magazine Articles here and SQL Server Magazine here.

Table and Index Partitioning

RAID 0+1 and RAID 1+0

There was a question that came up on this and the question basically asked - which is better. Well, this is a hard question to answer because they both have pro's and con's BUT before I get to the pro's/con's there's also another [more important] issue; these two get confused and swapped all the time. In fact, many vendors USED to refer to these interchangeably and even just lumped them together as RAID 10. Today, most people don't do this and most people also try to refer to the underlying technlogy instead of the numbers. Having said all of that, RAID 1+0 is Striped Mirrors and is my general recommendation because it tends to be more reliable than 0+1 and can tolerate more drive failures than 0+1. RAID 0+1 is Mirrored Stripes - which generally outperforms RAID 1+0 but cannot tolerate the loss of more than one drive and because of that it's more vulnerable. In the end, I'd suggest a simple "educational" site here (it's on a commercial site but it has a nice - and short - description of the different types of RAID arrays).

See you next week - for our LAST part in this series - Part 11: Best Practices in Building Robust, Recoverable, and Reliable Systems (Level 200).

Thanks for reading, listening and continuing to ask great questions!
kt

I completely spaced in blogging about a recent interview I did...poolside, in Orlando, FL when I was at SQL Connections back in April. I had the pleasure of meeting Chuck Boyce (a DBA from Philly, PA) who feverishly works in his spare time to spread the word about technology and specifically about all things SQL. His blog is here and he does a great job of summarizing good links and useful resources - almost daily (just so you don't have to!) on his "WHERE Clause" resource blog posts. You should check that out while you have your morning coffee. A great way to quickly find some useful stuff.

Additionally, Chuck has a radio program (What's Happening in SQL Server) that he does for SSWUG (Steven Wynkoop's excellent SQL Server Worldwide User Group). The entire list of SSWUG Broadcasts are here and specifically, the chat that we did poolside is here.

So - sorry that took me so long to remember... I wish I could blame it on too much sun (and/or drinks) poolside but......... sadly, I can't.

See you Friday on our 10th part of our TechNet Webcast series. Wow, we're on the home stretch!

Thanks,
kt

Last week Mark Wistrom (Program Manager in the SQL Server Team at Microsoft), delivered part 8 of our TechNet webcast series. Most of the resources needed to prepare for this session - as well as learn more about Database Mirroring - have already been posted in the blog entry for part 7 (as homework!). However, there were two things that we wanted to post from Mark's session:

(1) The case study that was presented during the session is here.
(2) The Q&A that was created by a few of Mark's team who were answering during the session (and then Mark did a scrub of it as well to clean it up- THANKS Mark) is here (29.1 KB).

Enjoy!
kt

Well, Part 7 has completed and we're on the home stretch... focusing on part of the new Always On technologies of SQL Server 2005. We've made our way through quite a few discussions and my main point for the sequence - as defined - was to make clear that keeping a system available takes a myriad of choices, features, configurations - and more. In fact, even once you think you've done it you still need to monitor, manage and re-evaluate your configuration if unexpected events occur and bring your system offline and/or unavailable in any way. And - well, that's also a big part of my focus... what does "availability" mean to you? Do you believe that only unplanned downtime counts or that *any* impact to the system's availability counts as "downtime"? (btw - I'd really like to know!)

Regardless, that's been our primary focus for the series... I believe that the Enterprise Edition of SQL Server 2005 can keep your system available through a very wide number of system hiccups, damage and even more catastrophic disasters. In the previosu sessions we looked at migration and installation (ensuring a proper configuration - right from the start), we covered creating a secure environment (which also impacts availability), we looked at "finding the right tool for the job" and then we started looking into alternative designs that may help to improve availability by scaling out our design. If you missed any of the sessions you might want to go back and see what's what! Here's the list of sessions at a quick glance:

Session 1: A Fast-Paced Feature Overview and Series Introduction (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 2: Security (Level 200)
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry, here

Session 3: Understanding Installation Options and Initial Configuration (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 4: Upgrade Considerations and Migration Paths (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 5: Effective Use of the New Management Tools (Level 200)
   Presenter: Kimberly L. Tripp, SQLskills.com, 
   Session's corresponding blog entry, here

Session 6: New Application Design Patterns for Scalability and Availability and the Operational Implications of Service Broker (Level 200)   
   Presenter: Bob Beauchemin, SQLskills.com, 
   Session's corresponding blog entry, here

Session 7: Technologies and Features to Improve Availability (Level 200)

Session 7 was a great deal of fun - we covered 11 different technologies (12 if you count partitioning) and discussed the architectural reasons to choose each teachnology - as well as the barriers it provides protection against. We talked about a lot of technologies and a lot of resources:

  • Remote Mirroring - Always consult your hardware vendor and make sure they support block size preservation and write-order preservation. Ideally, RM should be combined with Failover Clustering - when that's the case you have a "Geographically dispersed failover cluster" which removes the single point of failure in Failover Clustering. See the Windows Server Catalog, specifically for the Geographically Dispersed Cluster Solution category.
  • Failover Clustering - A combination of hardware and software to provide protection against server failure. Only solutions from the Windows Server Catalog, specifically for the Cluster Solution category for supported solutions in failover technologies.
  • Database Mirroring - See the homework references below as well as come back for the next two upcoming sessions where we cover DBM for two weeks.
  • Log Shipping - While this is still supported *and* while there are still some excellent uses for Log Shipping, this is not a "favorite" solely for failover. If you're looking for a "warm" failover solution (warm = no automatic detection, no automatic failover) with less potential for data loss - you should consider the "High Performance" configuration of Database Mirroring. If you would like to continue using Log Shipping for a more latent secondary (a log load delay) for managing disasters (either investigation or recovery) of data from an older "version" of the database then LS is an option but Database Snapshots can also help in *some* cases. This technology is well documented as well as written about.
  • Peer to Peer Replication - I demo'ed and discussed this in session 1 as well as referenced a few helpful links for TechNet sessions, etc. See the session and "blog" links as listed above.
  • RAID - Redundant Array of Independant Disk
  • Partial Database Availability, Online Piecemeal Restore and Database Snapshots - come back for Session 10 where I'll cover these and demo these!
  • Raid.edu - a short - but interesting overview of all the different raid types.
  • MSPress title: Microsoft SQL Server 2000 High Availability, Chapter 9: Database Environment Basics for Recovery
  • SQL Server 2000 and SQL Server 2005 support for mounted volumes
  • MSDN "Developer/Design" Webcast Series: Blog entry with all of the links
    • Online Index Operations, Part 5
    • Snapshot Isolation, Part 6
    • Partitioning, Part 8
  • Scalable shared databases are supported by SQL Server 2005
  • Oracle Real Application Clusters and Industry Trends in Cluster Parallelism and Availability

Finally, be ready to watch Mark's session on Friday, April 28. Here's your homework for Session 7:

  1. Review: Release notes and information for SQL Server 2005 Service Pack 1 
  2. Watch: TechNet Webcast: How to Increase Availability Using Database Mirroring in SQL Server 2005 (Level 200) 
  3. Read: Database Mirroring in SQL Server 2005 

And the details for Session 8:

TechNet Webcast: SQL Server 2005 for the IT Professional (Part 8 of 11): Implementing Database Mirroring in SQL Server 2005 (Part 1 of 2) (Level 200)
Presenter: Mark Wistrom, Program Manager, Microsoft Corporation

Database mirroring was released for testing when Microsoft SQL Server 2005 shipped in November. As the first service pack has shipped, it's time to get prepared for database mirroring in production! In this session, understand the barriers of what database mirroring will protect against, what constitutes a "failover", what the performance criteria are and how the monitoring has been brought together for release. Attend this first part of two - as the eighth webcast in the SQL Server 2005 for the IT Professional series to obtain better insight for when database mirroring should be implemented as well as what to expect moving forward in service pack 1 (SP1). Part 9 will cover implementation from start to finish - as an end to end demo.

Start Time:   Friday, April 28, 2006 9:30 AM (GMT-08:00) Pacific Time (US & Canada) 
End Time:   Friday, April 28, 2006 11:00 AM (GMT-08:00) Pacific Time (US & Canada) 

See you in Part 9: TechNet Webcast: SQL Server 2005 for the IT Professional (Part 9 of 11): Implementing Database Mirroring in SQL Server 2005 (Part 2 of 2) (Level 200) on May 5th.
kt

In doing my final preparations for part 7 of my TechNet webcast series on Building Robust, Reliable and Recoverable Systems, I decided to (once again) review my abstract. I do this as a last step to make sure I cover everything I said I would cover. Here's the abstract:

TechNet Webcast: SQL Server 2005 for the IT Professional (Part 7 of 11): Technologies and Features to Improve Availability

Find the right technology for the job in this seventh webcast of the SQL Server 2005 for the IT Professional series. Join us to learn which technologies provide the right solution for a specific problem, as well as the pros and cons of each technology. Designing a system to protect you against the faults most likely to occur is the first and most important strategy, but finding the right combination to minimize both downtime and data loss is critical. This webcast covers many of the “AlwaysOn” technologies at a glance: remote mirroring, failover clustering, database mirroring, log shipping, [peer to peer] replication, RAID, partial database availability, piecemeal online restore, database snapshots, snapshot isolation, and online index operations.

Start Time: Friday, April 21, 2006 9:30 AM (GMT-08:00) Pacific Time (US & Canada) 
End Time: Friday, April 21, 2006 11:00 AM (GMT-08:00) Pacific Time (US & Canada)  

So, in re-reading this it certainly sounds like a lot to cover. But - rest assured, this session is what we're going to use to lead into the rest of the series. Parts 8-11 go into more detail on some of the new and more complex topics covered in that list. For example, parts 8 and 9 cover Database Mirroring and part 10 covers Partial Database Availability, Online Piecemeal Restore and Database Snapshots. Also, for a few topics, I'll point you to some great resources to keep you going in learning these other technologies. In the end, my goal for Friday is make sure you understand the best use case for each of these technologies. Once you know when it's best to use them, you can really begin to architect the *right* solution for your system! Parts 8-11 will focus more on implementation and demos!

If you're wondering what your options are and how to get better direction on the architecture to implement, join us on Friday: http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?EventID=1032290562&EventCategory=4&culture=en-US&CountryCode=US

Oh, and in the actual abstract, there's a typo...not sure if we'll have time to cover log hipping. ;) ;)

Talk to you on Friday!
kt

Well, Friday brought another flood of great questions from everyone as we moved our way through many of the new 2005 tools. The one thing that I really wanted to stress was that *many* SQL Server 2005 tools (SQLCMD, SSMS and SQL Profiler) offer important features that can be leveraged today, even if your primary production servers are still SQL Server 2000. I did move through the tools quickly and showed quite a few new features; there are a lot of excellent resources to help you dive in deeper now that you're interested, ready and know some of the rewards of starting now. Here are a few of those resources:

For deleting old database backup history, there are a couple of stored procedures in msdb that can be used:

  • sp_delete_backup_and_restore_history
  • sp_delete_backuphistory
  • sp_delete_database_backuphistory

For cycling errorlogs, use: sp_cycle_errorlog.

And - lots of other questions that I primarily answered online in the last 40+ minutes. We had a great group and I hope everyone had fun. For the second half+ of the series we're going to focus on architectures and solutions - mostly related to disaster recovery and avoidance. However, the next part of the series is going to branch into a new (and *very* interesting) area of SQL Server 2005 - Service Broker. There are many impacts of Service Broker on the SQL Server system AND you might find a few applications of the technology within your own application as well. Have a great time with Bob for part 6 and I'll be back for part 7 next Friday.

See you soon!
kt

And another one bites the dust! Wow - what a great group today... soooooo many questions! For those of you that weren't there - the lecture was 80 minutes and the additional Q&A went on for another 45 minutes. So - as a result, there were *a lot* of additional resources needed. Let me get started with all of those right away.

To prepare for moving to SQL Server 2005 there are a few EXCELLENT resources with which you should start:

Phase 1 - Prepare to Upgrade/Migrate

Phase 2 - Database-level Testing

  • Copy Database Wizard or
  • Backup/Restore or
  • Detach/Attach

Phase 3 - Server-level Testing

  • Consider upgrade in-place or
  • Make sure that you manually migrate all EXTERNAL objects, logins, jobs, error messages, etc.

Phase 4 - Testing/Updating after the upgrade/migration

  • Update statistics immediately
  • Test application code, database compatibility modes, session settings
  • Check for "broken code" in terms of system table changes
  • MOST of this should have already been done and assessed in Phase 1 but better to be safe!

And - finally - the other things we talked about and the rest of the links are here:

And - that's it for this week. See you next Friday when we chat about the new Management Tools and how to effectively use them!

Thanks for listening/watching and asking GREAT questions,
kt

OK - so Bob Beachemin delivered Part 2 and I was back for Part 3. We had lots of folks on board with this session (more than 400) and as a result, I had a lot of questions. More than anything it seems like a lot of you wanted to know which versions of which came with what and could go with what (in terms of OS)... so, even getting started - and probably installing at home to play around ;). I was expecting tons of questions on the technical tidbits of installation options so you sure kept me on my toes!! Here are probably two very useful MSDN Links to SQL Server BOL Topics:

Hardware and Software Requirements for Installing SQL Server 2005 - In fact, this has a GREAT matrix of all the different platforms and which versions can be installed where!

Features Supported by the Editions of SQL Server 2005 - The BOL topic is very detailed.

And - in addition to those, there were quite a few more topics discussed during the webcast. The rest of the blog entry focuses on those questions! I hope this helps... enjoy!

Resource Links for all On-demand TechNet Sessions in our series for the ITPro

Part 1 - A Fast-Paced Feature Overview and Series Introduction
   On-demand link
   My blog entry for the session

Part 2 - Security
   On-demand link
   Bob's Blog Entry for the session

Part 3 - Understanding Installation Options and Initial Configuration (Level 200)
   On-demand link
   Blog entry link (well, you're already here ;)

Session 3 Resource Links as discussed during the session:

Submitting Product Feedback

  • MSDN Product Feedback Center: http://lab.msdn.microsoft.com/productfeedback/
  • Tips for Submitting Feedback on the Feedback Center (tips were related to Visual Studio but there are some great general tips about how to file useful feedback!): http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=138235&SiteID=1
  • Add the Help Toolbar and connect to Product Feedback from within the SQL Server 2005 Tools. In SQL Server Management Studio, select View, Toolbars, add Help. Once the Help toolbar is visable, select the Send Feeback button which should be in the "Ask a Question" section at the end of the toolbar.

If you're thinking about downloading - check out the newly bundled SP1 downloads:

SQL Server 2005 RTM Enterprise Eval Edition
SQL Server 2005 RTM Express Edition

SQL Server Express Edition with SP1 (SQLEXPR.EXE)
SQL Server 2005 Express Edition with SP1

SQL Server Express Edition with Advanced Services (SQLEXPR_ADV.EXE)
SQL Server 2005 Express Edition with SP1 + Advanced Services includes SQL Server Management Studio Express (SSMSE), support for full-text catalogs, and support for viewing reports via report server.

SQL Server Express Edition Toolkit (SQLEXPR_TOOLKIT.EXE)
SQL Server 2005 Express Edition Toolkit (SQL Server Express Toolkit) provides tools and resources to manage SQL Server Express and SQL Server Express Edition with Advanced Services. It also allows creating reports by using SQL Server 2005 Reporting Services (SSRS).

SQL Server Management Studio Express (SQLServer2005_SSMSEE.msi)
SQL Server Management Studio Express (SSMSE) provides a graphical management tool for managing SQL Server 2005 Express Edition and SQL Server 2005 Express Edition with Advanced Services instances. SSMSE can also manage relational engine instances created by any edition of SQL Server 2005. SSMSE cannot manage Analysis Services, Integration Services, SQL Server 2005 Mobile Edition, Notification Services, Reporting Services, or SQL Server Agent.

And - that's it for this week. See you on Friday, March 31 when we'll chat more about Upgrade and Migration. Here's the link to register for this upcoming session: http://msevents.microsoft.com/cui/WebCastEventDetails.aspx?EventID=1032290477&EventCategory=4&culture=en-US&CountryCode=US

See you soon,
kt

Hey there everyone - Well there was lots of excitement around our first session...so much so that apparently a Live Meeting server went down and caused MANY of you to get booted-out or even blocked-from attending (figures, right!).... Ugh (talk about the irony here - a series on high availability that isn't available because a server crashes...hhmmm, I think I know where to go for my next potential customers ;) ;). Regardless, I'm glad that at least a couple hundred of you did get in. For the more than 1000 others that were registered but unable to get in - I truly want to apologize!

The good news is that we now have the on-demand link available and for all of you who registered, it should have been sent to you via email. Also, as promised, I've attached the resources and demo scripts we talked about today.

Partial Database Availability Demo Scripts: PartialDBAvail-DemoScripts.zip (4.19 KB)
Database Mirroring Demo Scripts: DatabaseMirroring-DemoScripts.zip (3.74 KB)
Replication
Demo Scripts - Since this demo was completed through the UI, here are some useful references on Replication:

Other Resources:

SQLCMD Resources: My blog entry after Michiel Worries' Webcast (includes links to webcast, etc.)
TechNet Resource Center: SQL Server 2005 Mission Critical High Availability
Demo: Windows Server System Reference Architecture Design Considerations for SQL Server 2005 High Availability
Whitepaper: Choosing a Database for High Availability: An Analysis of SQL Server and Oracle

Also, to get you ready for SQL Server 2005 - check out the Upgrade/Migration Resource Center: Upgrading to SQL Server 2005

And... that should keep you busy between now and next week!

Have fun,
kt

Hey there everyone - Sorry for the delay in blogging. Lots of great stuff to chat about but right now I'm in the throws of a lot of event planning! I hope that some of you will be able to attend one or more of these GREAT upcoming events:

Webcasts

A TechNet, 11-part Series starts on March 10. Read more about it here.

Workshops

  • Tuesday, 14 March in Reading, UK - SQL Server: Indexes from Every Angle. Read more about it here.
  • Thursday, 16 March in Edinbrugh, Scotland - SQL Server 2005: Pratical Guide to Recovery and Availability. Read more about it here.
  • Thursday, 6 April in Orlando, FL - SQL Server 2005 Availability Strategies: Building a Reliable VLDB in Depth. Read more about the SQL Connections conference here.
  • Sunday, 11 June in Boston, MA - Making the Most of SQL Server 2005: Developing World Class Database Applications. Presenting with Brian Randell. Read more about the Microsoft TechEd conference here.

And - we're (SQLskills) planning other events too. The webcast series is a great place to start and hopefully, I'll see you at one of the other events!

Categories:
Events | SQL Server 2005

First - for what is logging needed?

This seems like an easy question - with possibly an easy answer... it's to aid in transaction durability and help in recovery - when the system loses power. Simply put, the transaction log is a way for SQL Server to ensure that a transaction "survives" a power failure. While a transaction is processing, information about that transaction exists within memory. Once that transaction is complete, log rows are written to the log portion of the database on disk. In the event of a power failure - and when SQL Server restarts, SQL Server performs restart recovery (two phases - REDO and then UNDO). Restart recovery happens everytime SQL Server starts; this ensures that completed transactions are persisted into the data portion and that no incomplete transactions end up within the database. For this discussion the specifics about log rows are not important - just that they are enough to "redo" the operations from *just* log information...in the event of a power failure. The information that's needed to recover the log in the event of a power failure - is really just - what's on disk. The data portion is probably out of date (how much so?) and the information kept in the log is used to bring the data up to date. A good question at this point is - how out of date is my data? The answer depends on a background process that runs almost solely to minimize this restart recovery process; it is called CHECKPOINT. A checkpoint occurs to make the data and log more current (but not neccesarily transactionally consistent). What this means is that periodically what is in memory is "synchronized" to disk. Since users do NOT directly read from disk, the data portion of the database (on disk) does not need to be up to the minute. Users accessing data ONLY read from cache - which is current - so only the data in memory needs to be accurate. It is VERY possible that at any given time that not only is your disk out of date but it's not even transactionally consistent. This is NOT a problem. If memory were to be lost (i.e. a power failure) then SQL Server would perform recovery on restart. Restart Recovery runs everytime SQL Server starts. In fact, if you think ONLY about restart recovery needing to bring a database "forward" after a power failure then you could argue that SQL Server would not need information to stay within the log after it's been "synchronized" with the data portion of the database - as long as the transaction(s) had been completed. And - YES - that's true. You can choose to clear the information from the log by changing your recovery model. Where you might have a problem is when you have a more significant failure - such as the loss of a hard drive (and even more interesting - which hard drive: a data drive or a log drive).

Key points:

  • The Log is a "write-ahead" log
  • The data on disk is NOT guaranteed to be accurate without the transaction information in the log
  • The Transaction Log (on disk) ensures transaction durability
  • Restart Recovery happens everytime SQL Server starts

OK - so that's it for now... In the next blog entry, I'll tackle "what affects logging."

Thanks for reading,
kt

Categories:
Opinions | SQL Server 2005

If you're interested in scale-out improvements for reporting and read-only scenarios...check this out: KB 910378. This KB is actually a feature release KB and describes a new feature of SQL Server 2005 which allows multiple servers to simultaneously share the same database files on a SAN. This is NOT possible for read/write databases, only read-only databases; however, it does allow you to leverage multiple server's hardware to perform complex reporting locally - using that node's memory, tempdb, etc.

If you implement this - report back (no pun intended) as I'd love to hear your good/bad experiences!

Have fun,
kt

Categories:
SQL Server 2005

Many of you have probably already downloaded the refreshed Books Online but if not - you should! LOTS AND LOTS of updates/good stuff in there.

Check it out: http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx

That should keep you busy for a bit! ;-)

Happy New Year!
kt

Categories:
SQL Server 2005

If you're reading *my* blog then you're probably just as interested in the upcoming SQL Server 2005 launch as I am. There are many launch events scheduled around the world and I hope you'll find time to attend one - even if remotely. IT's ShowTime - from TechNet - will be broadcasting Steve Ballmer's Launch Keynote on Monday - LIVE - at 5pm GMT. Get all of the details about the launch broadcast by clicking the picture above.

To complement the LIVE broadcast, IT's Showtime has also dedicated a whole SQL Server section of presentations here: http://www.microsoft.com/emea/itsshowtime/sqlserver.aspx

Enjoy!

Categories:
SQL Server 2005

Hey there everyone! I know I still owe you a few Q&A entries (for sessions 7, 8 and 9) but I wanted to get this blog entry out there so that you can play a bit with some of the resources. This series was targeted at developers but really helps to "bridge the gap" between development and administration by always remembering the impacts of what you implement (and techniques to help you to see if you do). More specifically, everything you do and/or design, has the potential for a negative impact to something else - there's no free lunch, eh?

So, this series focused more on Scalability but always remembered the impact to availability and/or reliability. This last session brought together the three primary areas to remember while developing a scalable and reliable architecture:

  1. Know your data
    • Design for Performance - Sessions 1, 2, 3, 6, and 8
  2. Know your users
    • Indexing for Performance - Sessions 4, 5 and 9
    • Optimizing Procedural Code - Session 7
    • Controlling Mixed Workloads and Concurrency - Session 6
  3. Users lie
    • Profile - to make sure that you're tuning what's really happening as opposed to what you think was going to happen! - Session 9

This last session had some great questions and as a result, a few new resources were used. Here are a few of the things we talked about:

Event Notifications and DDL Triggers

DMVs

Webcast links for the entire series!

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures
For the MSDN Download for Part 3, click here.
For the SQLskills Blog Entry for Part 3, click here.

Part 4: SQL Server Indexing Best Practices
For the MSDN Download for Part 4, click here.
For the SQLskills Blog Entries for Part 4
Resource links blog entry, click here.
Q&A blog entry, click
here.

Part 5: SQL Server Index Defrag Best Practices
For the MSDN Download for Part 5, click here.
For the SQLskills Blog entry, click here.

Part 6: Mixed Workloads, Secondary Databases, Locking and Isolation
For the MSDN Download for Part 6, click here.
For the SQLskills Blog Entry for Part 6, click here.

Part 7: Understanding Plan Caching and Optimizing Procedure Performance
For the MSDN Download for Part 7, click here.

Part 8: Data Loading and Aging Strategies - Table and Index Partitioning
For the MSDN Download for Part 8, click here.

Part 9: Profiling for Better Performance
For the MSDN Download for Part 9, click here.

Part 10: Session Summary - Common Roadblocks to Scalability
For the MSDN Download for Part 10, click here.
Transcript can be found here.

So, the series comes to an end (even though I still have more work to do). I have to say that it was a lot of fun and I enjoyed everyone's questions. And /start shameless plug here/ starting in March, SQLskills will begin a 10-12 part series on TechNet. The series will include sessions from my colleague Bob Beauchemin as well as me. This will definitely be more Operations and DBA focused but for all of you developers - it may help you better understand the system, High Availability and a myriad of New Features in SQL Server 2005.

I hope to see you there - or at least your DBA... ;-)

Thanks again everyone,

Kimberly

Effectively Designing a Scalable and Reliable Database

A Primer to Proper SQL Server Development

SQL Server Mixed Workloads, Secondary Databases, Locking and Isolation, Part 6 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast?Part 6 can be replayed by clicking here.

Q: Where can we get the demo scripts AND the sample database: Credit? The demo scripts are in this zip (20050916MSDNDemoScripts.zip (6.11 KB)); here in this blog entry. However, at the series completion, I will also create an entry under Past Event Resources for the entire webcast series.  To download the ZIP of the Credit Database Backup click here. Once unzipped, restore this backup to SQL Server 2000 or SQL Server 2005. The backup is a SQL Server 2000 backup and can be restored to either version! If restoring to SQL Server 2005, you might want to change the destination for the data and log file as the path will probably be different.

Q: Where are the links to all prior Webcast Q&As from this series?

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures
For the MSDN Download for Part 3, click here.
For the SQLskills Blog Entry for Part 3, click here.

Part 4: SQL Server Indexing Best Practices
For the MSDN Download for Part 4, click here.
For the SQLskills Blog Entries for Part 4
Resource links blog entry, click here.
Q&A blog entry, click here.

Part 5: SQL Server Index Defrag Best Practices
For the MSDN Download for Part 5, click here.
For the SQLskills Blog entry, click here.

Q: How can I replay previous sessions? I thought we were going to get emails for replaying -- but I haven't received any replay emails. You will receive replay emails ONLY when you register for these sessions through MSDN. We’ve come to find out that there are other ways to register but it’s only through MSDN that we know for sure you will receive the replay information.

Technical Questions

Q: I know you have covered indexes and backups in other webcasts, but here is my question I use heavily temporary tables. My TempDB grows up to 5 GIG. Should I backup or truncate the transaction log in order to bring it back to its normal size? No, there is no need to specifically maintain the transaction log of the TempDB database. If the transaction log (and subsequently, the database) grows large – there could be multiple reasons for that and instead of thinking in terms of trying to manage the log, I’d look at long running transactions and/or large transaction. You can use Profiler to help you see long running and/or large transactions.

Q: Can I perform a database snapshot to another server? No, database snapshots must be created on the same server as the database on which the snapshot is being based.

Q: Can I snapshot by filegroup? No, however if what you want to do is create a snapshot which does NOT include certain files – you can take those filegroups offline and then create the snapshot. In the snapshot the only file/filegroups available will be those which were online when the snapshot was created…even if those files/filegroups are brought online after the snapshot was created.

Q: Could a reader be blocked on the snapshot DB while SQL updates the changed page? No. The copy on write mechanism is really a copy before write mechanism and the pages will be copied before the write and essentially before the locks, etc. The only possible “blocking” could be caused by the excess I/Os that need to be performed. However, the I/Os are performed only on the FIRST change to the page after the snapshot is created – so it’s minimal!!

Q: Are DMVs in SQL Server 2005 only?Yes, DMVs = Dynamic Management Views and these are a feature of SQL Server 2005.

Q: Is read uncommitted the lowest/least in terms of data consistency? Yes, read uncommitted is also known as “dirty read.” A dirty read is a read against an “in-flight” transaction; this transaction could be rolled back. As a result, the query that read that data would be inaccurate.

Q: What is the effect of versioning on fragmentation and performance (I'm assuming I create a split of the page is full)? Actually, I’m not sure I’m following this one… But – I think I can answer it by just giving you some insight into how things work. Versioning – in terms of the data overhead added to the data row – does add a 14 byte value to help store the offset. This overhead is added ONLY once, to each row, after one of the snapshot isolation options is turned on (either or both – the READ_COMMITTED_SNAPSHOT or ALLOW_SNAPSHOT_ISOLATION). When this 14 byte value is added to each of the rows, the additional 14-bytes might cause the page to split. Again, this is only a one-time addition. The trick to optimizing this structural change is to change the database option and then rebuild your indexes. This will make the data contiguous and versioning will have no additional affects on the data row. Now, if what you were thinking is that the versions were stored in the data row – then this is NOT the case. The version store comes from the TempDB and as a result, there is no additional overhead (over the 14-bytes) needed within the data row.

Q: What if we're not using transactions? Will repeatable read still lock the table during the read (particularly if the select is long)? Repeatable reads locks – and holds – the resources as they are read. So, YES, in the case of a select statement, you will acquire and hold the read locks for the life of the transaction.

Q: How much additional overhead does versioning require from the SQL Server engine? Most of the overhead comes from TempDB but there’s also a bit of overhead in looking up the version. There are no direct numbers associated with the overhead but in a lot of cases you should think in terms of roughly 10% additional costs for your transaction… So, I guess the best point is that you will have slower overall performance when implementing row version; however, you might solve a lot of your blocking problems. Slower but not blocked is better than not running at all – even when it does run at all quickly. J  In all seriousness though, if blocking is NOT your primary problem, you will add overhead without a possible benefit.

Q: So, is it the new transaction data or the old transaction data held in the snapshot store (seems like it might be different for statement vs. transaction level snapshotting)? It’s always the BEFORE image. The general process of the write is called “copy on write” but I think of it better as copy before write.

Q: Can we optimize the snapshot store (different physical device, file group(s), etc.)? No. However, you should look at optimizing TempDB. There are multiple things that you might want to consider. I discuss those thing in this blog entry here.

Q: If I don’t need locking why shouldn’t I use read-uncommitted? Hmm, you can… you just need to be aware of the fact that the data is “dirty” and is not guaranteed to persist.

Q: Where does SQL store all the row versions (with snapshot isolation turned on)? The version store is in TempDB.

Q: How do I view all of the Report options from the summary page? I am looking at Adventureworks (compatibility level = 90), but all that I see is the General report. Ah, ha! The new summary windows were added to after the beta II April CTP. So, what this tells me is that you’re running a build lower than 9.00.1187.07. At this point, I’d go for the September CTP which is build 9.001314.06.

Q: This question is from previous webcast... Is there anything new with SQL 2005 that does datetime support data types? Time datatype or Date datatype only? No, SQL Server 2005 only includes the datetime datatype for date/time data. However, by using “custom types” you can create your own types which are date only or time only (just for one example). There were separate SQLCLR types of date only and time only in SQL Server 2005; however these were non-native types and subsequently removed. Instead, they will be shipped as examples in a resource kit which ships after RTM.

Q: Kimberly, the downloadable zip file from your blog for at least the first session will only unzip to a "C:" drive (which my system doesn't have ;^) Could you please re-zip it to allow election of the drive to which it should unzip? This one still perplexes me. I didn’t set any options that would restrict this…  

For the next session, we’re going to cover how SQL Server keeps plans, where you can look to see what's in cache AND how you can know better if the stored procedure's plan should be kept...or not? If you’re interested in hearing more - here’s the registration link:

MSDN Webcast: A Primer to Proper SQL Server Development (Part 7 of 10): Understanding Plan Caching and Optimizing Procedure Performance

 See you on Friday!

kt

Brian A. Randell's Blog: http://www.mcwtech.com/CS/blogs/brianr/default.aspx
MCWTechnologies Website: http://www.mcwtech.com/

Kimberly L. Tripp's Blog: http://www.SQLskills.com/blogs/Kimberly
SQLskills Website: http://www.SQLskills.com

Presentation Resources
Presentation in PDF form
Kimberly's Demo Scripts
Brian's Demo Scripts and Code

Running SQL Server 2000 tools and SQL Server 2005 tools side-by-side
We talked about re-registering all of your COM components and I didn't have a slide for this. So, if SQL Server 2000 Enterprise Manager crashes when you try to access database properties OR SQL Server 2000 Query Analyzer doesn't seem to do the color coding correctly, then you need to re-register your COM componenets in BOTH of the following directories:
   c:\program files\microsoft sql server\80\tools\binn 
   c:\program files\microsoft sql server\mssql\binn

To re-register the components, execute: FOR %i IN (*.dll) DO regsvr32 /s %i

Resources and Presentations on Indexing Best Practices
First, start by reviewing the blog entries listed in the Indexes category here.
As for the Webcasts - there are 6 from which to choose! Each webcast has an associated Q&A posted to my blog - make sure to look for the Q&As. Usually they are posted within 1 week (give or take :) from the actual webcast.
   MSDN Webcast: Indexing for Performance - Finding the Right Balance (SQL Server 2000), recorded 11 June 2004
   MSDN Webcast: Indexing for Performance - Index Maintenance Best Practices (SQL Server 2000), recorded 19 July 2004
   TechNet It’s Sh0wtime Webcast: Index Creation Best Practices with SQL Server 2005, recorded at Tech Ed Amsterdam, July 2005
   TechNet It’s Sh0wtime Webcast: Index Defragmentation Best Practices with SQL Server 2005, recorded at Tech Ed Amsterdam, July 2005
   MSDN Webcast Series: Part 4 of 10, Best Practices in Indexing, recorded 26 August 2005
   MSDN Webcast Series: Part 5 of 10, New Features in Indexing and Index Maintenance Best Practices, recorded 2 September 2005

MSDN Webcast Series: Building Highly Reliable and Available Systems with SQL Server 2005
Watch one on-demand and/or sign up to attend one of the remaining!

Part 1: Creating a Recoverable Database

      For the MSDN Download for Part 1, click here.

      For the SQLskills Blog Entry for Part 1, click here

 

Part 2: Creating a Reliable and Automated Backup Strategy

      For the MSDN Download for Part 2, click here.

      For the SQLskills Blog Entry for Part 2, click here.

 

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures

      For the MSDN Download for Part 3, click here.

      For the SQLskills Blog Entry for Part 3, click here.

 

Part 4: SQL Server Indexing Best Practices

      For the MSDN Download for Part 4, click here.

      For the SQLskills Blog Entries for Part 4

         Resource links blog entry, click here.

         Q&A blog entry, click here.

         Part 5: SQL Server Index Defrag Best Practices
         
      For the MSDN Download for Part 5, click here.
               
For the SQLskills Blog entry, click here.

         Part 6: SQL Server Mixed Workloads, Secondary Databases, Locking and Isolation
         
      For the MSDN Download for Part 6, click here.

         Part 7: Understanding Plan Caching and Optimizing Procedure Performance 
         
      To register to attend, click here.

         Part 8: Data Loading and Aging Strategies 
         
      For the MSDN Download for Part 6, click here.

         Part 9: Profiling for Better Performance 
         
      For the MSDN Download for Part 6, click here.

         Part 10: Most Common Roadblocks to Scalability and Reliability 
         
      For the MSDN Download for Part 6, click here.

Profiling SQL Server and Creating a Server-side Trace
INF: How to Create a SQL Server 2000 Trace (283790)
HOW TO: Programmatically Load Trace Files into Tables (270599)
How To: Stop a Server-Side Trace in SQL Server 2000 (822853)
INF: How to Monitor SQL Server 2000 Traces (283786)
INF: Stored Procedure to Create a SQL Server 2000 Blackbox Trace (281671)
BUG: BOL Incorrectly States That Users Do Not Need to Be Sysadmin to Use Profiler or SQL Profiler SPs (310175) 
   NOTE: This is ONLY a SQL Server 2000 limitation.
INF: Job to Monitor SQL Server 2000 Performance and Activity (283696)
Support WebCast: SQL Server 2000 Profiler: What's New and How to Effectively Use It

Great KB to Start with for Troubleshooting
HOW TO: Troubleshoot Application Performance with SQL Server

What about Whitepapers - we referenced quite a few!
Get a Lean, Mean Dev Machine with the Express Editions of Visual Basic and SQL Server 2005 by Brian A. Randell 
SQL Server 2005 Beta 2 Transact-SQL Enhancements by Itzik Ben-Gan 
SQL Server 2005 Partitioned Tables and Indexes by Kimberly L. Tripp
SQL Server 2005 Snapshot Isolation by Kimberly L. Tripp
SQL Server 2005: the Database Administrator’s Guide to the SQL Server Database Engine .NET Common Language Runtime Environment by Kimberly L. Tripp

Other Whitepapers, Websites, and Webcasts
Blog Entry: 8 Steps to Better Transaction Log Throughput
MSDN Whitepaper: An Overview of SQL Server 2005 for the Database Developer
MSDN Whitepaper: Processing XML Showplans Using SQLCLR in SQL Server 2005
MSDN Whitepaper: Using CLR Integration in SQL Server 2005
MSDN Whitepaper: XML Support in Microsoft SQL Server 2005
MSDN Whitepaper: XML Options in Microsoft SQL Server 2005
MSDN Whitepaper: What's New in FOR XML in Microsoft SQL Server 2005
MSDN Whitepaper: XML Best Practices for Microsoft SQL Server 2005
MSDN Whitepaper: Usage Scenarios for SQL Server 2005 Native Web Services
MSDN Whitepaper: Managed Data Access Inside SQL Server with ADO.NET and SQLCLR
MSDN On-demand Webcasts 
MSDN Live Webcasts 
SQL Server 2005 Hands-On Labs
         
SQLCLR Hands-On Lab Manual
Microsoft SQL Server TechCenter on TechNet
Sample Book Chapters for SQL Server 2005 is a list of chapters posted from a variety of authors for books related to SQL Server 2005.
Hosting the .NET Runtime in Microsoft SQL Server on the Association for Computing Machinery (www.ACM.org). To access this article you need membership in SIGMOD, the ACM, or you can purchase just this article for download.
Service Oriented Database Architecture by David Campbell, also on the Association for Computing Machinery (www.ACM.org). To access this article you need membership in SIGMOD, the ACM, or you can purchase just this article for download.

Gert E.R. Drapers' website
Microsoft SQL Server Development Customer Advisory Team
PDC Information Site

Well, if that doesn't keep you busy, I don't know what will!

Enjoy!
Kimberly

Categories:
Events | Resources | SQL Server 2005

This is a much needed and much overdue blog entry... In 8 Steps to Better Transaction Log throughput, I mentioned a customer that was helped by TWO typical optimization problems I see. In that blog entry, I said I would write two blog entries - that one on transaction log optimization and another on common tempdb optimizations. Well, I forgot...until I was reminded with an email this morning (thanks Marcus!).

First - a bit of understanding of TempDB - what goes there?

  • Internal temporary objects needed by SQL Server in the midst of other complex operations. For example, worktables created by a hash aggregate will be stored in TempDB or interim tables uses in hash joins (almost anything that shows as "hash" something in your query plan output is likely to go to tempdb).
  • User objects created with either # (for local temporary objects), ## (globabl temporary objects) or @ (table variables)
    • # = Local temporary object
      Local temp objects are objects accessible ONLY in the session that created it. These objects are also removed automatically when the session that created it ends (unless manually dropped).
    • ## = Globabl temporary object
      Global temporary objects are objects that are accessible to ANYONE who can login to your SQL Server. They will only persist as long as the user that created it lasts (unless manually dropped) but anyone who logs in during that time can directly query, modify or drop these temporary objects. These objects are also removed automatically when the session that created it ends (unless manually dropped) OR if being used by another session when the session that created it ends, when the session using it finishes using it (and it's only as long as any locks are held). If other sessions need more permanent use of a temporary object you should consider creating a permanent objects and dropping it manually.
    • @ = User-defined Table Variable
      User-defined Table Variables were introduced in SQL Server 2000 (or, wow - was it 7.0?) and provide an alternative to temporary tables by allowing you to create a variable defined as type TABLE and then you can populate and use it in a variety of ways. There has been A LOT of debate over whether or not you should always use table variables or always use temp tables. My response is that I ALWAYS avoid the word always! My point is that table variables are NOT always better nor are temp tables always better. There are key uses to each. I tend to like temp tables in scenarios where the object is used over a longer period of time - I can create non-key indexes on it and it's more flexible to create to begin with (SELECT INTO can be used to create the temp table). I also have the ability to use the temporary table in nested subprocedures because it's not local to the procedure in which it was created. However, if you don't need any of those things then a table variable might be better. When it is likely to be better - when you have smaller objects that don't need to be accessed outside of the procedure in which it was created and when you only need KEY indexes (a table variable ONLY supports the indexes created by a create table statement - meaning PRIMARY KEY and UNIQUE KEY).
  • Objects created by client applications - this is possibly a large part of your problem... Profiling can help you to determine if there's a lot of TempDB usage from your client applications.

OK, so now that you know what goes there - how do you make it optimal?

First and foremost, TempDB is in cache just as any other database is in cache. TempDB does not spill to disk unless you are low on cache and/or if you have a lot of inflight transactions in TempDB. Although TempDB is not persisted from shutdown to restart - it still needs to do some logging and therefore you should consider its optimization a lot like other databases.

Things you should do for TempDB (that are a lot like what you should do for every database):

  1. Isolate the data and log portion of TempDB.
  2. Place them on clean, defragmented disks.
  3. Pre-allocate them so they don't need to do a lot of autogrowth.
  4. Make sure you have sufficient memory to support active objects (check for disk activity to the disks that contain TempDB files).
  5. Make sure that transactions are written efficiently so that there are no unusually long running transactions that are unnecessarily holding resources (and therefore locks and therefore log activity).

And - if you need to move TempDB, you should review this KB Article: Moving SQL Server databases to a new location with Detach/Attach

Things you should do SPECIFICALLY for TempDB (especially if you're running on a multiproc machine):

Before I say what... let me tell you why? TempDB has a large number of objects being created all the time. For an object to be created, space must be allocated to it. Space allocation is determined by looking at some of the internal system pages (the GAM, and SGAM). In the end, it is these pages that start to have significant contention (with just one file) in a VERY active TempDB. To minimize that contention you can create multiple files.

  1. Consider creating multiple files for TempDB (even if on the same physical disks) so that there is less of a bottleneck when objects are being allocated. Make sure to read associated KB.
  2. Consider setting a trace flag to have object allocation grab extents rather than pages. Make sure to read associated KB.

BOTH of these last two are described in detail by a KB article: FIX: Concurrency enhancements for the tempdb database.

OK - so that should really help! Moving forward (meaning SQL Server 2005), having multiple files can still help for TempDB.

Effectively Designing a Scalable and Reliable Database

A Primer to Proper SQL Server Development

New Features in Indexing and Index Maintenance Best Practices, Part 5 of 10

Presented by Kimberly L. Tripp, SQLskills.com

Q: Can I view a recording of this webcast? Part 5 can be replayed by clicking here.

Q: Where can we get the demo scripts AND the sample database: Credit? The demo scripts are in this zip (20050902MSDNDemoScripts.zip (8.52 KB)); here in this blog entry. However, at the series completion, I will also create an entry under Past Event Resources for the entire webcast series.  To download the ZIP of the Credit Database Backup click here. Once unzipped, restore this backup to SQL Server 2000 or SQL Server 2005. The backup is a SQL Server 2000 backup and can be restored to either version! If restoring to SQL Server 2005, you might want to change the destination for the data and log file as the path will probably be different.

Q: Where are the links to all prior Webcast Q&As from this series?

Part 1: Creating a Recoverable Database
For the MSDN Download for Part 1, click here.
For the SQLskills Blog Entry for Part 1, click here.

Part 2: Creating a Reliable and Automated Backup Strategy
For the MSDN Download for Part 2, click here.
For the SQLskills Blog Entry for Part 2, click here.

Part 3: Designing Tables that Scale, Best Practices in Data Types and Initial Table Structures
For the MSDN Download for Part 3, click here.
For the SQLskills Blog Entry for Part 3, click
here.

Part 4: SQL Server Indexing Best Practices
For the MSDN Download for Part 4, click here.
For the SQLskills Blog Entries for Part 4
Resource links blog entry, click here.
Q&A blog entry, click here.

Q: How can I replay previous sessions? I thought we were going to get emails for replaying -- but I haven't received any replay emails. You will receive replay emails ONLY when you register for these sessions through MSDN. We’ve come to find out that there are other ways to register but it’s only through MSDN that we know for sure you will receive the replay information. Regardless, you can always find the “on-demand” version of the sessions here.

Related Resources

MSDN Webcast: Indexing for Performance – Proper Index Maintenance MSDN Whitepaper: Microsoft SQL Server 2000 Index Defragmentation Best Practices TechNet It’s ShOwtime Webcast: Index Defragmentation with SQL Server 2005 

Technical Questions

Q: In your script, what is "HA Requirements"? HA = High Availability. This is the requirement that your table stay online and available. Some companies are trying to achieve 99.999% uptime, this is especially challenging when even maintenance operations take a table offline.

Q: If you create extra indexes is there a easy to configure utility that you can run across an application after it has run for a few months to list keys that were never or infrequently used? Use one of the new DMVs: sys.dm_db_index_usage_stats. To see the complete list of DMV objects, use the following query:SELECT * FROM sys.system_objects WHERE [name] LIKE 'dm[_]%'

Q: What is DMV again? Dynamic Management View. These are new objects which give information about in-memory objects and state information.

Q: What are the parameters and their usage – for sys.dm_db_index_physical_stats? (DatabaseID, ObjectID, IndexID, PartitionNumber, Mode)

DatabaseID = [ NULL | 'DatabaseID' ] NULL: returns information for ALL databases, if NULL is used no other options can be supplied. This returns ALL indexes for all objects in all databases. Easy but possibly slow.
DatabaseID: smallint type. Refers to the ID for a specific database. DB_ID() or DB_ID('DatabaseID') can be used. The latter allows you to run this from ANY database as long as you have access. However, 3-part naming must be used.
ObjectID = [ DEFAULT | NULL | 'ObjectID' ]DEFAULT/NULL: return ALL base data: CL, Heap, LOB for the specified database.
ObjectID: int type. Refers to the ID for a specific object. OBJECT_ID('TableName') can be used. When using OBJECT_ID, you can use 1/2/3-part naming. Be sure to use 3-part when executing outside of database.
IndexID = [ DEFAULT | NULL | 'IndexID' ]DEFAULT/NULL: All indexes
IndexID: tinyint type. Refers to the ID of a specific index.
PartitionNumber = [ DEFAULT | NULL | # ]DEFAULT/NULL/0: return ALL partitions
#: returns only the details about specific partition. When a PartitionNumber is specified then an IndexID must also be specified.
Mode = [ DEFAULT | NULL | 'SpecificMode' ]DEFAULT/NULL/LIMITED: return FAST scan and use only an IS (Intent Shared) Table-level lock. This lock blocks ONLY eXclusive TABLE-level locks and schema changes. Excellent, relatively unobtrusive way to get fragmentation details.
LIMITED: IS Lock. Same as SQL 2000 WITH FAST, only page counts and EXTERNAL fragmentation displayed. Does not detail INTERNAL fragmentation and page density.SAMPLED:  IS Lock. For tables less than 10,000 pages (~80MB), all details are produced. For tables of more than 80MB, two samples are done (1% and 2%) at every nth page. The samples are compared and if close, 2% sampling output returned. If not close, then up to 10% will be sampled.DETAILED: S Lock. Entire table analyzed for both internal and external fragmentation. Returns one row for each level of the index from the leaf level (level 0) all the way up to the root level. This can help you determine fragmentation in the non-leaf levels but at the expense of holding a shared table level lock.
Q: How often should you run DEFRAG on your SQL server box? Should this be a part a regular schedule? Taking down SQL is their any other consideration? First, the only thing that’s not available is the table being REBUILT. Defragging an index does not take that table/index offline. So, more than anything, it depends on what you’re trying to achieve. If you want achieve better availability on SQL Server 2000 then you might choose to defrag rather than rebuild – to keep your tables available.

Q: How often do you get such perfect tables in practice? A table is always completely clean and contiguous after a rebuild. To periodically fix a table, you should use consistent and automated rebuild strategies.

Q: Do you have suggestions for developers using MSDE when customer’s demands can vary? Vary from few transactions to a large customer with many transactions. The general best practices in database and table design scale from the low end all the way up to the high-end and in the end – helps your database scale!

Q: Can you touch on rules of thumb for "pad index"? If fragmentation in the leaf level is minimized through proper index maintenance and fillfactor – then fragmentation in the non-leaf levels should be low as well. You rarely need to specify padindex unless you have widely varying distribution of data and really want to leave larger gaps because of strange densities of data.

Q: Do most of these "Index Rules" apply to Indexed-Views? Yes! All indexes can become fragmented after data modifications... Your scripts should always look for fragmentation across all scripts.

Q: Can you discuss fragmentation WRT horizontal partitioning, especially range partitioning on the primary key? SQL Server 2005 offers more granular rebuild options –but not necessarily online. In many cases, you might want to design a read-only partitioned table and keep the volatile portion of the table (especially if only one partition), in its own separate table – possibly using a partition view (or an inline table valued function) over these two tables.

Q: If I'm selecting from a table with a where FirstName = ... and LastName = ... and I have 2 indexes, one on LastName and another on FirstName. Are they both used? With an AND – maybe. The optimizer will look at the Index statistics to determine if either of them selective enough to use only one index. If neither is selective alone and a better index does not exist (a better index for AND would be one that includes BOTH of the columns in the SAME index – as a composite index), then SQL Server may choose to join the indexes (index intersection).

Q: URLs on the Resources slide can't be read. Could you type then into the Q&A, please? When the session is available for download (which is what happens when MSDN put this online), then you can access the URLs there as well. Typically, I place all of the links at the beginning of the Q&A – resources section. I’ll make sure to do this consistently!

Q: How does an uniqueidentifier used as a clustered primary key effect performance? This is best answered by session 4. In short, a non-sequential GUID can cause a lot of fragmentation.

Q: What is ExtentFragmentation as reported by DBCC SHOWCONTIG and is it less important than Logical Fragmentation? Extent Fragmenation refers to how many extents are next to each other. This is a bit more important than Logical Fragmentation as logical fragmentation shows whether or not the pages are next to each other.

Q: How much danger is there in the defrag processes? What kind of backup procedures do you suggest when you defrag? More frequent transaction log backups. A defrag generates a lot of log information. However, it does so in mini transactions. As a result, transaction log backups can occur concurrently with the defrag process and even though the defrag is not complete, the transaction can still be cleared because the defrag process runs as small transactions instead of one long running transaction. This also improves concurrency because the locks are released throughout the process.

Q: Defraging a large index can cause the log file to grow quite large. Is there a way to minimize this other than frequently log backups? Yes, you’re correct – defraging a large index WILL grow the log file quite large! As for minimizing this activity in the log – no way to do that. But – you’re correct in increasing the frequency of log backups!

Q: With very large tables, how much available disk space (both transaction logs and data drive) do you need to have to rebuild? Does it take less space to defrag than to rebuild? Well, this is really a multipart question… First, log space for rebuilds is mostly dependent on the recovery model. If you are running the FULL recovery model then creating and/or rebuilding indexes will take enough log space for the entire rebuild to complete. If you are running in the BULK_LOGGED or SIMPLE recovery models then this operation will run as a bulk operation and will be minimally logged. While this will take less time and significantly less log space, you are giving up some features when switching recovery models. I would strongly suggest reviewing the second session to see if this is appropriate.Now, as for data space – a rebuild will always require at least the table size in free space and possibly as much as double (if an online rebuild is being performed). Typically, when space estimates are being done (when capacity planning the database) I always recommend taking the largest table size and multiplying it by 2 or 3 – in order to make sure you have enough space for rebuilds. There is space needed for sorting as well – this can come from the database OR from tempdb (using the SORT_IN_TEMPDB option).Defraging doesn’t move an object so it doesn’t take additional data space BUT it does require more overall log space because it runs as mini transactions instead of just one.

Q: Should we look at different fragmentation stats if there are multiple files in the same filegroup? No, you still want to review average fragmentation. However, you may have more “fragments” in a table that spans filegroups; this does not necessarily mean that your table is fragmented.

Q: Are there any good third party tools for checking fragmentation and performing maintenance? Unfortunately no revolutionary ones (that I know of and/or can recommend)...but I still have high hopes :)

Q: How do you determine the appropriate fill factor? Unfortunately, there isn't a magic number... but, you can test your guestimate by seeing how fragmented the table becomes between your regularly scheduled defragmentation routines.

Q: Does it matter if I build the clustered index before/after rebuilding the non-clustered indexes? You should always create the clustered index before creating non-clustered index but as for rebuilding - you can rebuild your indexes independently as a rebuild of the clustered does not (generally) cause a rebuild of the non-clustered. There is one exception to this rule in SQL Server 2000 – in 2000 ONLY, SQL Server will automatically rebuild the non-clustered indexes when a non-unique clustered index is rebuild. Why? Because the uniqueifiers are rebuilt.

Q: Will doing a defrag followed later by a rebuild decrease the work of the rebuild? Not really. A defrag doesn’t move the object – only a rebuild does. However, you might minimize the cost of the sort…

Q: How does cache map to table pages, i.e., does free space in table pages have a 1:1 correspondence to wasted cache? SQL Server reads the 8K page from disk into memory. The number of bytes that are wasted on disk are also wasted in memory. This is often the motivation for vertical partitioning! You might refer back to session three for more details on row/page structures!

Q: If switching a varchar cluster to a bigint and vice-versa in 2000, what would the best order of drop/create index? Actually, this is the reason that CREATE with DROP_EXISTING was created… so that you could “change” the definition of the clustered

DROP TABLE test
go
CREATE TABLE test
(
      testid      int               not null,
      col1       varchar(100)      not null
)
go
CREATE CLUSTERED INDEX testind ON test(col1)
go
CREATE CLUSTERED INDEX testind ON test(testid) WITH DROP_EXISTING
go
sp_rename 'test.testind', 'NewIndexName', 'INDEX'
go
sp_helpindex test
go
Q: What about instances of one name only? (like Madonna, Cher, etc. ;^) Well, this is a good question and this is something that you might need to plan for in design. In these cases, you might allow NULLs in the lastname column and then make sure to search both when a lookup is performed. To be honest, I probably won’t do all that much to find these special first names – unless you wanted to do searches across both columns without knowing whether or not this is a first or last name. You might do something like this in a lookup
SELECT * FROM NamesTable
WHERE LastName = @variable
      OR (FirstName = @variable AND LastName IS NULL)
Comment: Just wanted to say I appreciate the blog you have put together.

Thanks for the thanks! It's a lot of work but I think it's great as a reference!! Even for me! To be honest, I can't always remember where to find things either! J

Thanks! So – we’re half way there – 5 more to go! And, lots more questions coming I’m sure J For the next session, we’re going to cover Isolation and options in Isolation in SQL Server 2005. If you’re interested in hearing more isolation, locking/blocking – here’s the registration link:MSDN Webcast: A Primer to Proper SQL Server Development (Part 6 of 10): Mixed Workloads, Secondary Databases, Wait States, Locking and Isolation See you on Friday!