Last week I posted in-depth about how checkpoints work and what exactly goes on (see How do checkpoints work and what gets logged). About a year ago I posted about why the buffer pool on a busy system may seem to have an inordinate amount of dirty tempdb pages in it, and now I want to clarify a bit more why this is the case and how checkpoints work for tempdb. To see the buffer pool contents, see my post Inside the Storage Engine: What's in the buffer pool?.

A checkpoint is only done for tempdb when the tempdb log file reaches 70% full - this is to prevent the tempdb log from growing if at all possible (note that a long-running transaction can still essentially hold the log hostage and prevent it from clearing, just like in a user database).

If you read the first post I reference above, you'll see that when a checkpoint occurs for a user database, all dirty pages for that database are flushed to disk (as well as other operations). This does not happen for tempdb. Tempdb is not recovered in the event of a crash, and so there is no need to force dirty tempdb pages to disk, except in the case where the lazywriter process (part of the buffer pool) has to make space for pages from other databases.

The other operation that occurs during a checkpoint of databases in the SIMPLE recovery model is that the VLFs in the log are examined to see if they can be made inactive (see TechNet Magazine: feature article on understanding logging and recovery and Inside the Storage Engine: More on the circular nature of the log). This process allows the log to wrap-around and overwrite itself without having to grow - and this process *IS* required by tempdb, for proper management of it's log size.

This is all that happens for a checkpoint of tempdb - no data pages are flushed to disk.

There's been a recent flurry of confusion and misconceptions about trace flag 1118 in SQL Server 2008. This trace flag switches allocations in tempdb from single-page at a time for the first 8 pages, to immediately allocate an extent (8 pages). It's used to help alleviate allocation bitmap contention in tempdb under a heavy load of small temp table creation and deletion.

There are multiple points of confusion, which I'll address in turn. Then I'll prove that the trace flag still works in SQL Server 2008.

1) Why is the trace flag usually required in 2000? In SQL 2000, whenever a temp table is created in tempdb and a row inserted, an IAM page must be allocated and a single data page must be allocated. These two pages are both 'single-page' allocations, from a mixed extent (see Inside The Storage Engine: Anatomy of an extent for more info). This means that an SGAM allocation bitmap page must be accessed, and a PFS page must be accessed (see Inside The Storage Engine: GAM, SGAM, PFS and other allocation maps for more info).

With lots of very small temp tables being created, this means the very first SGAM page and the very first PFS page in the data file are accessed/changed by all the threads, leading to latch contention problems on these two pages. When the temp tables are deleted again, the various pages are deallocated, which again needs to access and change the PFS page, and potentially the SGAM page.

There are two ways to alleviate this problem. Firstly, create multiple data files in tempdb - which splits the latch contention over multiple allocation bitmaps (from having allocations come from multiple files) and thus reduces the contention. The general rule of thumb was one tempdb data file for each processor core. Secondly, turn on TF1118, which makes the first 8 data pages in the temp table come from a dedicated extent. This means one extent is allocated from the GAM page, rather than 8 single pages (and potentially 8 accesses to the SGAM page). The pages within the extent are reserved and allocated singly from this extent, as needed. This also cuts down on contention and is documented in KB 328551.

2) What does reserved vs. allocated mean? When an extent is allocated to a table, the 8 pages in the extent are not immediately allocated as well. Allocating an extent means those 8 pages are reserved exclusively for subsequent allocation to that table. The pages are allocated individually as needed, but no other table can allocate them. This is why such extents are called 'dedicated' extents (see my blog post link above for more details). You can see the counters of reserved pages vs. allocated pages in the output from sp_spaceused.

3) Why is the trace flag not required so much in 2005 and 2008? In SQL Server 2005, my team changed the allocation system for tempdb to reduce the possibility of contention. There is now a cache of temp tables. When a new temp table is created on a cold system (just after startup) it uses the same mechanism as for SQL 2000. When it is dropped though, instead of all the pages being deallocated completely, one IAM page and one data page are left allocated, and the temp table is put into a special cache. Subsequent temp table creations will look in the cache to see if they can just grab a pre-created temp table 'off the shelf'. If so, this avoids accessing the allocation bitmaps completely. The temp table cache isn't huge (I think it's 32 tables), but this can still lead to a *big* drop in latch contention in tempdb.

4) Does the trace flag still exist in 2005 and 2008? Yes it does - KB 328551 clearly states:

Note Trace flag -T1118 is also available and supported in Microsoft SQL Server 2005 and SQL Server 2008. However, if you are running SQL Server 2005 or SQL Server 2008, you do not have to apply any hotfix.

Just to make extra-sure (as I'm always paranoid about saying absolutes), I checked with my good friend Ryan Stonecipher, who's the dev lead for the team that owns allocation (and a bunch of other stuff, including DBCC). He confirmed the code is exactly the same in 2008 as it was in 2005. And I prove it to you below too.

5) And why is it sill there in 2005 and 2008? It does the same thing in 2005/2008 as it did in 2000. If the temp table creation/deletion workload is high enough, you can still see latch contention, as the temp table cache won't be enough to completely alleviate the need for creating actual new temp tables, rather than just being able to grab one 'off the shelf'. In that case, using the trace flag to change to extent-based allocation (in *exactly* the same way as for 2000) can help, as can creating more tempdb data files.

As far as data files go though, the number has changed. Instead of a 1-1 mapping between processor cores and tempdb data files (*IF* there's latch contention), now you don't need so many - so the recommendation from the SQL team is the number of data files should be 1/4 to 1/2 the number of processor cores (again, only *IF* you have latch contention). The SQL CAT team has also found that in 2005 and 2008, there's usually no gain from having more than 8 tempdb data files, even for systems with larger numbers of processor cores. Warning: generalization - your mileage may vary - don't post a comment saying this is wrong because your system benefits from 12 data files. It's a generalization, to which there are always exceptions.

6) Why does DBCC IND still show two pages, even with the trace flag on? I've heard of some people being confused by the output of DBCC IND in SQL 2008 when the trace flag is turned on. Creating a single row temp table will only show two pages allocated in the DBCC output - one IAM page and one data page. Yes, that's completely correct - as only two pages are allocated, but the data page comes from a dedicated extent, not a mixed extent. (IAM pages are *always* single-page allocations from mixed-extents).

And now the proof, on SQL 2008.

SELECT @@VERSION;
GO

Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (Intel X86)

First off, I'll create a temp table without the trace flag enabled, and see what pages the table has allocated, by looking at the first IAM. I'll use a temp table with an 8000+ byte row size, and insert two rows - so we have two data pages for clarity.

DBCC TRACEOFF (1118, -1);
GO

USE tempdb;
GO

CREATE TABLE #temp (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
INSERT INTO #temp DEFAULT VALUES;
GO 2

Now I'll figure out what is the first IAM page, using my sp_AllocationMetadata script (see here for the script and details), and dump it with DBCC PAGE to see the single-page allocations it's tracking, and which dedicated extents are allocated to the table:

EXEC sp_AllocationMetadata '#temp';
GO

Object Name    Index ID  Alloc Unit ID        Alloc Unit Type  First Page  Root Page  First IAM Page
-------------- --------- -------------------- ---------------- ----------- ---------- ---------------
#temp__<snip>  0         1152921505223016448  IN_ROW_DATA      (1:158)     (0:0)      (1:199)

DBCC TRACEON (3604);
GO
DBCC PAGE ('tempdb', 1, 199, 3);
GO

<snip>

IAM: Single Page Allocations @0x4A35C08E

Slot 0 = (1:158)                     Slot 1 = (1:200)                     Slot 2 = (0:0)
Slot 3 = (0:0)                       Slot 4 = (0:0)                       Slot 5 = (0:0)
Slot 6 = (0:0)                       Slot 7 = (0:0)                      


IAM: Extent Alloc Status Slot 1 @0x4A35C0C2

(1:0)        - (1:1016)     = NOT ALLOCATED   

As you can clearly see from the partial output of the dump of the IAM page, there are two single-page allocations and no extents allocated to the temp table. This is what should happen when the trace flag is not enabled.

Now I'll do the same thing with the trace flag 1118 enabled.

USE tempdb;
GO

DROP TABLE #temp;
GO

DBCC TRACEON (1118, -1);
GO

CREATE TABLE #temp (c1 INT IDENTITY, c2 CHAR (8000) DEFAULT 'a');
GO
INSERT INTO #temp DEFAULT VALUES;
GO 2

EXEC sp_AllocationMetadata '#temp';
GO

Object Name    Index ID  Alloc Unit ID        Alloc Unit Type  First Page  Root Page  First IAM Page
-------------- --------- -------------------- ---------------- ----------- ---------- ---------------
#temp__<snip>  0         1224979099301904384  IN_ROW_DATA      (1:208)     (0:0)      (1:158)

DBCC TRACEON (3604);
GO
DBCC PAGE ('tempdb', 1, 158, 3);
GO

<snip>

IAM: Single Page Allocations @0x4A8FC08E

Slot 0 = (0:0)                       Slot 1 = (0:0)                       Slot 2 = (0:0)
Slot 3 = (0:0)                       Slot 4 = (0:0)                       Slot 5 = (0:0)
Slot 6 = (0:0)                       Slot 7 = (0:0)                      


IAM: Extent Alloc Status Slot 1 @0x4A8FC0C2

(1:0)        - (1:200)      = NOT ALLOCATED                              
(1:208)      -              =     ALLOCATED                              
(1:216)      - (1:1016)     = NOT ALLOCATED
                              

Now as you can clearly see, there are no single-page allocations, and there's a single extent allocated to the table. Proof that trace flag 1118 still does exactly what it should in SQL Server 2008.

Now for a DBCC IND on the table: 

DBCC IND ('tempdb', '#temp', -1);
GO

PageFID PagePID     IAMFID IAMPID      ObjectID    IndexID    
------- ----------- ------ ----------- ----------- -----------
1       158         NULL   NULL        293576084   0
1       208         1      158         293576084   0
1       209         1      158         293576084   0

(I've removed some of the trailing columns for clarity.) We see that it still only lists the two data pages (1:208, 1:209) and the IAM page (1:158) - although an entire extent was allocated to the temp table, only two pages from the extent were actually allocated and used - the rest are reserved for use by that table, but remain unallocated. 

Hopefully this post has cleared up a lot of the confusion around this trace flag and what it does.

The April edition of TechNet Magazine is available on the web now and has the latest installment of my regular SQL Q&A column.

This month's topics are:

  • Disappearing errors with DBCC CHECKDB
  • Provisioning tempdb when moving from 2000 to 2008
  • Does fillfactor prevent fragmentation and should it be set instance-wide
  • Avoiding FILESTREAM performance problems

Check it out at http://technet.microsoft.com/en-us/magazine/2009.04.sqlqa.aspx

Back in 2005 Kimberly produced two very popular webcast series - an 11-part webcast series for TechNet called SQL Server 2005 for the IT Professional and a 10-part webcast series for MSDN called A Primer for Proper SQL Server Development. The webcast links and blog posts were broken for quite a while but now they're all fixed up and working again. I've created some web pages that link to all the webcasts and blog posts, along with abstracts. I've also included some more recent ones too and will be adding to the list over the next few weeks.

There's over 30 hours of good stuff to watch - check them out at http://www.sqlskills.com/webcasts.asp

Over the last few weeks Sunil Agarwal (from the SQL Storage Engine team) has posted a great series of blog articles about tempdb and the version store, over on my old stomping ground - the Storage Engine blog. The articles are well worth reading - the links are:

Enjoy!

(Quickie post #2 while it's Kimberly's turn to lecture this morning...)

Greg asked a question regarding the script I posted to examine buffer pool contents (paraphrased) - why does the buffer pool seem to contain such a high proportion of dirty tempdb pages on busy production systems?

The answer is to do with the recoverability of the tempdb database. One of the reasons that checkpoint exists is to limit the duration of the "redo" phase of crash recovery - where log records are replayed on disk pages where the updated page image hadn't been written to disk after the transactions committed. Automatic checkpoints are done in databases to do this.

Tempdb, however, isn't recovered after a crash - it's recreated. This means that the time for recovery of tempdb isn't an issue so there's no need for automatic checkpoints. This means that the trigger of an automatic checkpoint for tempdb when it's log file reaches 70% full. For this reason, on busy systems, it's likely that the user databases are being checkpointed way more often than tempdb, and on a system where tempdb is used heavily too, there will likely be way more dirty pages from tempdb in the buffer pool at any time.

Theme design by Nukeation based on Jelle Druyts