When you decide to rebuild or reorganzie an index, you have an option to set something called FILLFACTOR. This is probably the MOST IMPORTANT thing to understand about index maintenance and reducing fragmentation (especially in databases that are prone to it). Unfortunately, we need to dive into some internals to really understand why this is so helpful...

What is FILLFACTOR?

FILLFACTOR defines how full the leaf level pages of the index are filled when rebuilding or reorganizing an index. The leaf level of an index is the largest structure of an index and it's the most prone to fragmentation. A good analogy would be to use the phone book (ah, loosely for this one) and imagine that new people come to the area. The *entire* phone book needs to get reprinted. Obviously this is terribly expensive and therefore it's not done often (well, we all use the internet anyway...). But, imagine if EACH page of the phone book were only filled to 90%. When new people come to the area we could just add their information - on the specific page where it should reside - and the phone book stays relatively intact and we don't have to kill a bunch of trees (OK, you could argue that you would have had to kill more trees to make a phone book that has 10% whitespace...but, let's not go there right now. :). OK, I know this analogy is weak at best but think back to "a page in the phone book" and think of it representing a "page" within SQL Server (quite literally here - SQL Server actually stores data on 8KB pages and indexes are doubly-linked lists of these 8KB pages).

So, if the pages are full and there's nowhere to put a particular row (let's say someone with a lastname of 'Tripp'), how does SQL Server "insert" this new row? They can't possibly shift all of the data down and rewrite all of the pages AFTER 'Tripp' could they? I suppose they could but it would get REALLY expensive - especially if someone with the last name of 'Anderson' moves to town...

How does SQL Server make room for new rows? They do something called a page split. Now, this can be A LOT more technical than I'm going to explain here but simply put, they split the page (on which the row is to be located) into half. The first 50% of the rows STAYS on the current page and the 50% MOVES to a new page (and this new page is probably nowhere near the page being split). However, they keep this list LOGICALLY in order by maintaining a doubly-linked list. So, SQL Server knows where to go when it's reading the data - even if the pages are not physically/sequentially in order. If this happens once or twice, it's not a huge problem. But as it continues to happen, it makes scans of your data more and more expensive and the process of the split can be expensive in and of itself. The biggest problems are:

  1. The data is not physically in order (and can create lot random IOs when scanning)
  2. The pages are no longer very dense (they've gone from 100% to 50%). Splitting does has a positive side in that it's going to take awhile before the page splits again BUT, splitting is meant as a temporary fix and ideally, it shouldn't be happening to every page. But, this also negatively impacts cache - you're only using 50% of the page that's split so now you're wasting half of that page in memory (and, on disk). So, while making a table 10% wider from the start seems bad; it reduces the need for splitting and therefore your table will stay more intact over time. 

What indexes have fragmentation?

Some indexes are prone to more fragmentation than others. For example, indexes with a high-order element (the first column of the index) that is NOT in ever-increasing order. This does NOT mean that ALL indexes should be created such that their first column is ever-increasing (that's NOT a good way to create indexes!). But, it does mean that you should pay special attention to some of your larger tables and therefore larger indexes (meaning ESPECIALLY the clustered index since the clustered index is over ALL of the columns). 

In terms of index management, here are the main things to think about for each index type: 

Rebuilding the clustered index is the most expensive index to rebuild. It's also the one that you'd rather avoid if you can. It has the most limitations... A clustered index CANNOT be rebuilt as an online operation IF the table has ANY LOB columns in it at all. What this means is that the clustered index - if fragmented - must be rebuilt using an offline operation. This translates into downtime for that table - while it's being maintained. So, if you can create a good clustering key EARLY, then you might be able to minimize clustered index maintenance.

Rebuilding a nonclustered index is much easier and likely to be an ONLINE operation. Because nonclustered indexes contain a much smaller number of columns - and rarely ever include a LOB column - they can be managed/maintained through ONLINE index operations. So, even if these are prone to more fragmentation (and, in general, they are) then it's not as big of a deal.

How do I solve this problem?

So, the BEST thing to do is to visually picture the ORDER of the data in the index (just think of it like a phone book - a LIST of records on PAGES - ordered by the KEY of the index). If you can visualize that structure then you can visualize what happens on an INSERT, UPDATE and a DELETE. Once you have a good visual you should be able to define a FILLFACTOR based on these two things:

  1. How much fragmentation is happening (is it becoming massively fragmented quickly) - then set a lower fillfactor (like 70 or 80 - depending on how big the table is)
  2. How frequently are you going to be running maintenance (if you can do this daily or weekly - for some objects) then you might not need much free space. If you can run maintenance against the table frequently then you won't need as much free space.

And, of course, you can directly check the level of fragmentation your table has by using the DMV: dm_db_index_physical_stats (which I referenced in Part I) and ideally, you'll create a procedure that runs through your objects determining how much fragmentation they have and the appropriate FILLFACTOR. Sometimes the best way to do this is to create your own "management table" which lists all of your tables and then have definitions for when to rebuild, when to reorg and what fillfactor to use. There are a few folks that have done things like this. Check out Ola's site here. He's put a lot of effort into a more robust and flexible combination of options for database maintenance. It might be more than you need but it's a GREAT EXAMPLE of all of the things that you can evaluate and consider wrt to maintenance.

Solving this using Database Maintenance Plans

Finally, if you CHOOSE to use a maintenance plan you ONLY have two options:

Reorganize pages with the default amount of free space
Does NOT sound like what it actually does (IMO). It sounds like it uses some form of SQL Server default value. However, it actually uses the "OrigFillFactor" value that's set for the index. This is actually a good thing IF you've set the FILLFACTOR on the CREATE INDEX statement OR on a rebuild. Reorganizing does NOT allow you to set this ONLY a rebuild does. So, if you have set this then this choice is probably best. If you run a query against sysindexes (yes, the backward compatibility view NOT the new sys.indexes catalog view), then you can see the OrigFillFactor setting for ALL of your indexes. (And, it's probably 0 - which sounds strange... 0 actually means 100... I could explain this but just trust me on this one.)

Change free space per page percentage to _____ %
Well, this is certainly an option you can consider. If you've never set the FILLFACTOR (or, even if you have...), this would overwrite it AND change the OrigFillFactor value to the percent you use here. If you decide to take this "sledgehammer" approach then I'd suggest something quite high - like 90-95% so that you don't end up wasting a lot of space. Some indexes (ones that have a high-order element that IS ever-increasing) don't really need free space and this might be completely wasted. So, I think it's better to set this indivually using stored procedures and/or a more custom maintenance procedure.

NOTE/TIP FROM COMMENTS POSTED: While I explain this concept (above) using the index option/setting (called FILLFACTOR) the UI is wanting the FREE SPACE percentage. So, if you want to use 90-95% FULL then be sure to set this value to 5-10%. Using the syntax, you will set page fullness.

OK, I still have at least 2 more in this series and EVEN THOUGH I've been completely sucked into facebook (as Paul blogged here), I'm still trying to manage to work and save SQLskills. Paul, on the other hand, just uploaded more photos. ;-)

Cheers,
kt

The new Seattle Public Library opened on May 23, 2004 and it's an interesting structure that people seem to love or hate. I've always loved architecture (and I really wanted to be an architect when I was growing up :)) so more than anything I just wanted to learn more about the structure. I didn't expect to hear all about it last night; I thought we were just going to wander...but in fact, last night's event was great! A Principal from the LMN Architects firm discussed the ideas and principals behind the library's design and I was SOLD! The concepts behind the design make a ton of sense. They created “platforms,” each with a function and each with room to grow. And that's what surprised me the most; they started talking about “fillfactor.” Ok, they didn't use that term but there it was....

One of the biggest problems facing libraries is their ever expanding collection of books... and as we know they don't just place books at the “end” of the shelves. They have to put the books into the proper section, etc. And - as sections EXPLODE (think of the size of the computer section in 1960 and think of it now) and others shrink they need to “shift” books to make room. When a library is broken into defined floors it makes it very challenging to expand/shrink. Instead, libraries tend to create “forwarding pointers” (no, they didn't use this word either)... These forward references get you to where the “overflow” books are - on another floor, in another building or in the basement. This becomes confusing and eventually a ton of movement needs to be done and/or the library just doesn't work any longer and needs to expand. 

So - when they were designing the SPL (Seattle Public Library) they created a “books spiral” which cleverly creates a rotating set of floors as if they were one really long floor just spiraling down (about the equivalent of seven city blocks)... OK, I'm doing a horrible job explaining it but the bottom line is that they currently house somewhere around 900,000 books and with this new spiral they can expand selections fairly easily and still have a uniform feel to the library (they expect to be about to hold roughly 1.45 million books). They also (and this is where the fillfactor component comes in) kept free space on the top shelves as well as the bottom shelves so that they had room to grow without a lot of restructuring.

There are so many other interesting design techniques used I've love to spend hours learning more... In fact what they did with the glass to prevent heat is so simple yet effective (the building's entire exterior “net” is made of glass). They inserted thin slices of metal at a parallel to the ground (and so that you could still see through the glass) but at an angle to the glass (because the glass pieces are angled - you'll have to look at the pictures to really see what I mean) and what happens is that the sun reflects off the metal and the library doesn't get overly heated and they still allow in light without having to go with UV film, etc.

Check out the Seattle Times for their slide gallery here and the SPL's slide gallery here. If you're in Seattle you must make a stop at the SPL. There's a lot of info about the library on the web but the best place for details and lots of photos is the main site for SPL, specifically here. Here are a few others of interest:

Seattle Public Library - Main Site
Seattle Public Library: Design is fun on a grand scale
Projects of Rem Koolhaas

Here are a few pics I took:

NeonEscalator.JPG (24.16 KB) - the colors are used to help guide people where to go... I found some of the colors a bit overwhelming (especially the neon green elevators - neon green on the inside)
ReadingArea.JPG (30.67 KB) - lots of defined spaces for reading, relaxing and having a latte (yes, there are coffee stands within the library... of course!)
ReadingAreaTopFloor.JPG (35.13 KB) - this was my favorite area. Probably the equivalent of 12 stories high... you could see Eliott Bay and the colors were just stunning!

Enjoy.

Categories:
Opinions | FILLFACTOR

Theme design by Nukeation based on Jelle Druyts