Here's a question that came up recently: if I've upgraded a database from SQL 2000 or before, how can I tell if the data purity checks will be run or not?

As you may know, DBCC CHECKDB in 2005 onwards includes 'data purity' checks. These look for column values where the value is outside the valid range of values for the column's data type. For databases created on SQL 2005 onwards, these checks are always performed by DBCC CHECKDB and cannot be turned off. For databases created on earlier versions, it's a little more complicated.

In SQL Server versions prior to 2005, it was possible to import invalid data values into a database. These invalid values could cause query execution problems, or possibly even wrong results. In 2005, when the import 'holes' were closed, the data purity checks were added to DBCC CHECKDB, but not by default for upgraded databases. Because the possibility existed of upgraded database containing invalid values, the decision was made not to enable the data purity checks by default as this could lead people to suspect that the upgrade had caused corruptions that weren't there on 2000 or before.

So, if you have a database that was upgraded and you want to run the data purity checks, you need to use the WITH DATA_PURITY option for DBCC CHECKDB. This is all documented in Books Online and I put it into the Release Notes for SQL 2005 as well. Back to the question though - at what point do you NOT need to specify the option? Well, for an upgraded database, if the WITH DATA_PURITY option is used and no problems are found, a bit is irrevocably flipped in the boot page (see Search Engine Q&A #20: Boot pages, and boot page corruption) and from that point onwards the data purity checks will be performed on the database whenever DBCC CHECKDB runs.

The problem is though, how can you tell whether the bit has been flipped? You need to look at the boot page. The easiest way to do that is to use the DBCC DBINFO command (undocumented, but perfectly safe). It's the equivalent of using DBCC PAGE ('dbname', 1, 9, 3) to look at the boot page contents, as I explained in the blog post referenced in the previous paragraph.

There are two things you need to look for: what version of SQL Server created the database, and it the 'create version' is 2000 or lower, whether the special 'data purity' flag is set or not.

For a database created on SQL Server 2005:

DBCC TRACEON (3604);
GO
DBCC DBINFO ('master');
GO

DBINFO STRUCTURE:

DBINFO @0x66C8EF64

dbi_dbid = 1                         dbi_status = 65544                   dbi_nextid = 1984726123
dbi_dbname = master                  dbi_maxDbTimestamp = 16000           dbi_version = 611
dbi_createVersion = 611              dbi_ESVersion = 0                   
dbi_nextseqnum = 1900-01-01 00:00:00.000                                  dbi_crdate = 1900-01-01 00:00:00.000
dbi_filegeneration = 0              
dbi_checkptLSN

m_fSeqNo = 7612                      m_blockOffset = 224                  m_slotId = 1
dbi_RebuildLogs = 0                  dbi_dbccFlags = 0                   
dbi_dbccLastKnownGood = 2009-05-12 16:07:15.647                          
dbi_dbbackupLSN

<snip>

If the dbi_createVersion is 611 or higher, the database was created on SQL Server 2005+ and will always have data purity checks performed.

For an upgraded database (this is one of my pre-corrupted databases, you can get it from Conference corruption demo scripts and example corrupt databases):

DBCC TRACEON (3604);
GO
DBCC DBINFO ('DemoCorruptMetadata');
GO

DBINFO STRUCTURE:

DBINFO @0x6855EF64

dbi_dbid = 7                         dbi_status = 16                      dbi_nextid = 2089058478
dbi_dbname = DemoCorruptMetadata     dbi_maxDbTimestamp = 100             dbi_version = 611
dbi_createVersion = 539              dbi_ESVersion = 0                   
dbi_nextseqnum = 1900-01-01 00:00:00.000                                  dbi_crdate = 2009-06-17 15:14:49.490
dbi_filegeneration = 0              
dbi_checkptLSN

m_fSeqNo = 10                        m_blockOffset = 303                  m_slotId = 1
dbi_RebuildLogs = 0                  dbi_dbccFlags = 0                   
dbi_dbccLastKnownGood = 1900-01-01 00:00:00.000                          
dbi_dbbackupLSN

<snip>

This database has a dbi_createVersion lower than 611, so we need to look at the dbi_dbccFlags field. A value of 0 means that the data purity checks are not enabled by default. A value of 2 means they are enabled by default. You can easily check this out for your own databases.

Have fun!

I came across a question on ServerFault this afternoon that inflamed my desire to be ultra-geeky (it was really already inflamed after teaching backup and restore internals all afternoon). Basically the question boiled down to how to find out who dropped a table if there's no other way except the transaction log (e.g. no tracing is available, even the default trace). So I hacked around and figured out at least how to find out *when* a particular table was dropped plus the UID and SPID of who dropped it.

Everything hinges on using undocumented commands to look into the transaction log. I've played with this before on the blog: fn_dblog.

First off I created a script to create a database, populate a table and then drop it.

USE master;
GO
CREATE DATABASE FnDbLogTest;
GO
USE FnDbLogTest;
GO

CREATE TABLE TestTable (
    c1 INT IDENTITY,
    c2 CHAR (100) DEFAULT 'a');
GO
CREATE CLUSTERED INDEX TT_CL ON TestTable (c1);
GO
INSERT INTO TestTable DEFAULT VALUES;
GO

SELECT OBJECT_ID ('TestTable');
GO

DROP TABLE TestTable;
GO

First we need to find the transactions that drop tables in the log. The following code works for 2000, 2005, and 2008:

SELECT [Transaction Id], [Begin Time], [UID], [SPID]
FROM ::fn_dblog (NULL, NULL)
WHERE [Transaction Name] = 'DROPOBJ'
GO

The (NULL, NULL) is the starting LSN and ending LSN to process - NULL means process everything available. 

Results on SQL Server 2000 will look as follows for me logging in as a Windows login in the SA role:

Transaction Id Begin Time              UID         SPID       
-------------- ----------------------- ----------- -----------
0000:000000e0  2009/06/16 18:23:03:320 1           51

And for the same circumstances on 2005 and 2008, the results look like:

Transaction Id Begin Time               UID         SPID
-------------- ------------------------ ----------- -----------
0000:00000587  2009/06/16 17:49:56:927  -1          51

If the user who dropped the table logged in as a role member, and they're not connected as that SPID any more, you may not be able to tell who it was unless you're also tracking successful logins into your server - but you'll at least know what role it was.

Now, this only shows us that a table was dropped, not which table it was. There's no way to get the name of the table that was dropped, only the object ID - so you'll need to have some other way to determine what the table ID is if there are multiple table drops and only one of them is malignant.

For SQL Server 2000, the code to find which object ID we're talking about is as follows (dropping the Transacation Id into the WHERE clause):

SELECT DISTINCT [Object Name] FROM ::fn_dblog (NULL, NULL)
WHERE [Transaction Id] = '0000:000000e0'
AND [Context] = 'LCX_IAM';
GO

Object Name     
--------------------
(2009058193)

The object ID in parentheses is the ID of the table that was dropped. 

For SQL Server 2005 and 2008, the code is as follows (with the same Transaction Id substitution):

SELECT TOP (1) [Lock Information] FROM ::fn_dblog (NULL, NULL)
WHERE [Transaction Id] = '0000:00000587'
AND [Lock Information] LIKE '%SCH_M OBJECT%';
GO

Lock Information
--------------------------------------------
ACQUIRE_LOCK_SCH_M OBJECT: 8:2073058421:0

The 8:2073058421 is the database ID and object ID of the table that was dropped.

Now you can go find whoever it was and take whatever action you deem appropriate Wink

Hope this helps!

PS If you find the you don't get enough info from ::fn_dblog, try turning on trace flag 2537. It allows the function to look at *all* possible log, not just the active log.

I've just read a very good, very deep, and very interesting blog post by James Rowland-Jones. In the post, James investigates some locking issues using a variety of means and explains about the undocumented %%lockres%% function with you can use to figure out what the wait resource will be for individual table rows (basically the lock-hash value). He also shows something I've known about but never seen before - how the lock hash algorithm isn't perfect and can actually cause lock collisions where you wouldn't expect them - and how to mitigate the problem.

Excellent post and well worth reading. Check it out at The Curious Case of the Dubious Deadlock and the Not So Logical Lock.

Way back at the start of me blogging here I wrote a comprehensive description of ghost records and the ghost cleanup process - see Inside the Storage Engine: Ghost cleanup in depth. A question came up in the class I'm teaching this week that's worth answering in a blog post - do ghost records occur in heaps? The answer is no, not during normal processing.

When snapshot isolation is enabled, deletes from a heap are ghosted, as part of the overall versioning process, which can lead to some interesting side-effects. A versioned record has an extra 14-bytes tagged on the end, so a heap record that suddenly becomes versioned is 14-bytes longer - which may mean it doesn't fit on the page any longer. This could lead to it being moved, resulting in a forwarding/forwarded record pair - just because the record was deleted! Now, the page has to be full for this time happen, and the Storage Engine will take steps to avoid this happening for rows less than 32 bytes long - but that's getting a little too deep.

Anyway, I digress. I want to show you the difference between deleting from a clustered index and from a heap. I'm going to create two such tables, then delete row from each and roll it back.

CREATE TABLE t1 (c1 CHAR (10));
CREATE CLUSTERED INDEX t1c1 on t1 (c1);
GO

CREATE TABLE t2 (c1 CHAR (10));
GO

INSERT INTO t1 VALUES ('PAUL');
INSERT INTO t1 VALUES ('KIMBERLY');

INSERT INTO t2 VALUES ('PAUL');
INSERT INTO t2 VALUES ('KIMBERLY');
GO

-- prevent random background transactions
ALTER DATABASE GhostTest SET AUTO_CREATE_STATISTICS OFF;
GO

BEGIN TRAN DelFromClust;
DELETE FROM t1 WHERE c1='KIMBERLY';
ROLLBACK TRAN;
GO

BEGIN TRAN DelFromHeap;
DELETE FROM t2 WHERE c1='KIMBERLY';
ROLLBACK TRAN;
GO

SELECT * FROM ::fn_dblog (null, null);
GO

Here's a portion of the results from looking in the transaction log. The line of code where I turn off auto-update stats is just to prevent the auto-create transactions from cluttering up my view of the transaction log.

The first (highlighted) transaction is for the delete/rollback in the clustered index. You can clearly see that the third column shows a log context of ghosting for the LOP_DELETE_ROWS log record, plus the setting of the 'this page has at least one ghost record' in the PFS byte for that page.

The second (unhighlighted) transaction is for the delete/rollback in the heap. Here you can see that it just does a straight delete.

If you look at the data page contents before the rollback in both cases, for the clustered index you'll still be able to see the deleted (ghosted) record, and for the heap you'll see the deleted record really is deleted.

Hope this helps.

I'm teaching the Microsoft Certified Masters - Database qualification this week here in Redmond, and in part of day one I discuss the FILESTREAM directory structure. I was asked the question where do the directory name GUIDs come from? so I started digging around in the system tables while Kimberly was lecturing. Take a look at my previous blog post (FILESTREAM directory structure) from last week to see the database schema I'm working with. I recreated it again and wrote some queries to find where the GUIDs are stored, as they have to be stored in the database somewhere.

Here's the query to find all the FILESTREAM directory names, for both levels of directory (and you have to run this through the DAC as it's accessing undocumented, hidden system tables):

SELECT o.name AS [Table],
    cp.name AS [Column],
    p.partition_number AS [Partition],
    r.rsguid AS [Rowset GUID],
    rs.colguid AS [Column GUID]
FROM sys.sysrowsets r
    CROSS APPLY sys.sysrscols rs
    JOIN sys.partitions p ON rs.rsid = p.partition_id
    JOIN sys.objects o ON o.object_id = p.object_id
    JOIN sys.syscolpars cp ON cp.colid = rs.rscolid
WHERE rs.colguid IS NOT NULL AND o.object_id = cp.id
AND r.rsguid IS NOT NULL AND r.rowsetid = rs.rsid;
GO

See below for a screenshot of this using my scenario.

You can see that I'm connected through the admin connection in SSMS and that the top-level directory name is derived from the rowset GUID, with the column-level directory name is derived from the column GUID. Note that some parts are byte-reversed, but they're definitely the right GUIDs.

Enjoy!

For those of you who couldn't make it to a conference this year where I presented my Corruption Survival Techniques session, the folks at TechEd EMEA have just posted an 80 minute long video of the presentation I did in Barcelona in early November. It walks through I/O errors, what CHECKDB does, how it works, how to run it, CHECKDB FAQ, how to interpret the output, choosing between repair and restore and has a bunch of demos of recovering from corruptions. Lots of fun stuff!

The video is available at http://www.microsoft.com/emea/teched2008/itpro/tv/default.aspx?vid=78. The accompanying scripts and corrupt databases are all posted on our website - see this blog post for details.

Enjoy!

This blog post explains the demo scripts and databases I've posted to cover all the Corruption Survival Techniques and DBCC CHECKDB sessions I've presented at conferences this year. There are two zip files you need to download: the example corrupt databases (36-MB zip) and the demo scripts. These are also both available through our past events pages.

The databases zip contains the following databases:

  • DemoDataPurity
    • 192-MB SQL Server 2005 database with a single 2570 (data purity) error
  • DemoFatalCorruption1
    • 1-MB SQL Server 2005 database with a corrupt system table (that allows CHECKDB to complete)
  • DemoFatalCorruption2
    • 1-MB SQL Server 2005 database with a corrupt system table (that terminates CHECKDB)
  • DemoNCIndex
    • 192-MB SQL Server 2005 database with a bunch of nonclustered index corruptions
  • DemoRestoreOrRepair
    • 1-MB SQL Server 2005 database with a page checksum failure (in fact a zero'd out page)
  • DemoCorruptMetadata
    • 1-MB SQL Server *2000* database with corrupt syscolumns table

The scripts zip contains the following directories:

  • 1 - Fatal Errors
    • This makes use of the DemoFatalCorruption1 and DemoFatalCorruption2 databases. The FatalErrors.sql script has the steps to follow and see this blog post for a complete walk-through.
  • 2 - NC Indexes
    • This makes use of the DemoNCIndex database. The NCIndexCorruption.sql script has the steps to follow and see this blog post for a complete walk-through.
  • 3 - Data Purity
    • This makes use of the DemoDataPurity database. The DataPurityCorruption.sql script has the steps to follow. I'll blog more details on this script sometime in the next week (and then update this post).
  • 4 - Metadata
    • This makes use of the DemoCorruptMetadata database. The CorruptMetadata.sql script has the steps to follow and see this blog post for a complete walk-through.
  • 5 - Restore or Repair
    • This makes use of the DemoRestoreOrRepair database. There's a setup script (01CreateRestoreOrRepair.sql) and two demo scripts - FixUsingPageRestore.sql and FixUsingRepair.sql. I'll blog more details on these scripts sometime in the next week (and then update this post).
  • 6 - Suspect Database
    • This demo doesn't come with a corrupt database - you create your own one. The SuspectDatabase.sql script has the steps to follow and see this blog post for a complete walk-through.

Enjoy!

Following on from my previous post on boot pages and boot page corruption, I've been asked about file header pages - and I was already planning this post as the next in the series.

So what's a file header page? Every data file in a database has the very first 8kb page (i.e. page 0 in the file) set aside as the place to store all the metadata info about the file. As with the boot page, you can look at the contents with DBCC PAGE and it will interpret all the fields for you, or you can use the DBCC FILEHEADER command, which does a better job. This is undocumented and unsupported, just like DBCC DBINFO for looking at the database boot page, but it's been discussed and posted about on the Internet before so it's existence is no secret.

The command take a database name or database ID plus the file ID to dump. Here I've created a database called FileHeaderTest and used SSMS with results-to-text, plus a bunch of editing of the results to make it blog-able:

DBCC FILEHEADER ('FileHeaderTest', 1);
GO

FileId                : 1
LogicalName           : FileHeaderTest
BindingId             : D30AE3EF-14A6-47D5-B267-96F38238D882
FileGroup             : 1
Size                  : 152
MaxSize               : -1
MinSize               : 152
UserShrinkSize        : -1
Growth                : 128
BackupLSN             : 0
RedoStartLSN          : 0
FirstLSN              : 0
MaxLSN                : 0
FirstUpdateLSN        : 0
CreateLSN             : 0
SectorSize            : 512
RecoveryForkGUID      : 00000000-0000-0000-0000-000000000000
RecoveryForkLSN       : 0
DifferentialBaseLsn   : 19000000048800037
DifferentialBaseGuid  : 279A8EF4-4431-4CA5-8939-F613E5BC3033
Status                : 2
RestoreStatus         : 0
ReadOnlyLsn           : 0
ReadWriteLsn          : 0
MaxLsnBranchId        : 00000000-0000-0000-0000-000000000000
RedoTargetPointLsn    : 0
RedoTargetPointGuid   : 00000000-0000-0000-0000-000000000000
RestoreDiffBaseLsn    : 0
RestoreDiffBaseGuid   : 00000000-0000-0000-0000-000000000000
RestorePathOriginLsn  : 0
RestorePathOriginGuid : 00000000-0000-0000-0000-000000000000
OldestRestoredLsn     : 0

Lots of interesting stuff in here, such as:

  • BindingId: used to make sure a file is really part of this database
  • SectorSize: the disk sector size
  • Status: what kind of file and what state is it in (e.g. 2 = regular disk file)
  • Various sizes in number-of-8kb-pages (e.g. MaxSize of -1 means file growth is unlimited)
  • Growth: the number of pages to grow the file by if the 0x100000 bit is NOT set in the Status field. If it is set, the Growth is in percent.

And you can watch things change. For instance, if I change the file growth to 10%:

ALTER DATABASE FileHeaderTest MODIFY FILE (NAME = FileHeaderTest, FILEGROWTH = 10%);
GO

And then dump the file header page contents again, the Status and Growth fields have changed to:

.
.
Growth                : 10
.
.
Status                : 1048578
.
.

So what if a file header page is corrupt? I corrupted the file header page of my database and then started up SQL Server.

USE FileHeaderTest;
GO

Msg 945, Level 14, State 2, Line 1
Database 'FileHeaderTest' cannot be opened due to inaccessible files or insufficient memory or disk space. See the SQL Server errorlog for details.

Let's try EMERGENCY mode:

ALTER DATABASE FileHeaderTest SET EMERGENCY;
GO

Msg 5172, Level 16, State 15, Line 1
The header for file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\FileHeaderTest.mdf' is not a valid database file header. The PageAudit property is incorrect.

In this case, there's nothing to do except restore from backups, and the database is inaccessible because the PRIMARY filegroup could not be brought online. If the corruption were in a file in a secondary filegroup, things are a little bit different. Now I've added a secondary filegroup with a single file (called CorruptFile) and corrupted it's file header page. After starting up SQL Server we get the same behavior - but this time we can set the file to be offline and access the rest of the database. This is called partial database availability and works in Enterprise (and Developer) Edition only.

ALTER DATABASE FileHeaderTest MODIFY FILE (NAME = CorruptFile, OFFLINE);
GO

Note that the only way to bring an offline file back online is to restore it from a backup - see this post from my old Storage Engine blog for more details. Another corruption that can only be repaired using backups...

 

Now that I've done all the business-related blog posts, back to the good stuff to stop people complaining!

Something that's cropped up a few times over the summer so far is people trying to repair boot page corruptions.

First off, what's a boot page? Every database has a single page that stores critical information about the database itself. It's always page 9 in file 1 (the first file in the PRIMARY filegroup). You can examine the page using DBCC PAGE and it will interpret all the fields for you, but there's another command, DBCC DBINFO, that also dumps all this info (in fact the DBCC PAGE code calls the same underlying dumping code). This command is undocumented and unsupported but widely known and 'documented' in lots of places on the web - given that it uses the same code as DBCC PAGE, it's just as safe to use IMHO.

So what's on the boot page?

DBCC DBINFO ('BootPageTest');
GO

DBINFO STRUCTURE:


DBINFO @0x5BF6EF84

dbi_dbid = 19                        dbi_status = 65536                   dbi_nextid = 2073058421
dbi_dbname = BootPageTest            dbi_maxDbTimestamp = 2000            dbi_version = 611
dbi_createVersion = 611              dbi_ESVersion = 0                   
dbi_nextseqnum = 1900-01-01 00:00:00.000                                  dbi_crdate = 2008-07-10 15:53:18.843
dbi_filegeneration = 0              
dbi_checkptLSN

m_fSeqNo = 41                        m_blockOffset = 29                   m_slotId = 55
dbi_RebuildLogs = 0                  dbi_dbccFlags = 2                   
dbi_dbccLastKnownGood = 1900-01-01 00:00:00.000                          
dbi_dbbackupLSN

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_oldestBackupXactLSN

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0
dbi_LastLogBackupTime = 1900-01-01 00:00:00.000                          
dbi_differentialBaseLSN

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_createIndexLSN

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_versionChangeLSN

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0
dbi_familyGUID = a4e88c13-b4cf-4320-834e-92b237244d4b                    
dbi_recoveryForkNameStack


entry 0

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0
m_guid = a4e88c13-b4cf-4320-834e-92b237244d4b                            

entry 1

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0
m_guid = 00000000-0000-0000-0000-000000000000                            
dbi_differentialBaseGuid = 00000000-0000-0000-0000-000000000000           dbi_firstSysIndexes = 0001:00000014
dbi_collation = 872468488            dbi_category = 0                     dbi_maxLogSpaceUsed = 231936
dbi_localState = 0                   dbi_roleSequence = 0                
dbi_failoverLsn

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_dbmRedoLsn

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0

dbi_dbmOldestXactLsn

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0
dbi_dbMirrorId = 00000000-0000-0000-0000-000000000000                    
dbi_pageUndoLsn

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0
dbi_disabledSequence = 0            
dbi_dvSplitPoint

m_fSeqNo = 0                         m_blockOffset = 0                    m_slotId = 0
dbi_CloneCpuCount = 0                dbi_CloneMemorySize = 0             
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

There's all kinds on interesting things in there, for instance:

  • dbi_version and dbi_createversion: the physical version number of the database (and when it was created). See question 1 in the August 2008 SQL Q&A column in TechNet Magazine for an explanation (see here).
  • dbi_RebuildLogs: a count of the number of times the transaction log has been rebuilt for the database. PSS can use this to tell whether corruption problems could have been caused by DBAs rebuilding the log
  • dbi_dbccLastKnownGood: the completion time of the last 'clean' run of DBCC CHECKDB
  • a bunch of different LSNs related to checkpoint, backups, database mirroring
  • dbi_LastLogBackupTime: self-explanatory
  • dbi_differentialBaseGuid: the GUID generated by the last full database backup. Differential backups can only be restored on top of a matching full backup - so an out-of-band full backup could screw-up your disaster recovery - see this blog post for more info.

Now, what about if this page is corrupt in some way? I corrupted the BootPageTest database to have a corrupt boot page. Let's see what happens:

USE BootPagetest;
GO

Msg 913, Level 16, State 4, Line 1
Could not find database ID 19. Database may not be activated yet or may be in transition. Reissue the query once the database is available. If you do not think this error is due to a database that is transitioning its state and this error continues to occur, contact your primary support provider. Please have available for review the Microsoft SQL Server error log and any additional information relevant to the circumstances when the error occurred.

Okay - let's try setting the database into EMERGENCY mode:

ALTER DATABASE BootPageTest SET EMERGENCY;
GO

Msg 824, Level 24, State 2, Line 1
SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xcdee22fa; actual: 0xcb6ea2fa). It occurred during a read of page (1:9) in database ID 19 at offset 0x00000000012000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\BootPageTest.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Hmm. What about running DBCC CHECKDB?

DBCC CHECKDB ('BootPageTest') WITH NO_INFOMSGS, ALL_ERRORMSGS;
GO

Msg 922, Level 14, State 1, Line 1
Database 'BootPageTest' is being recovered. Waiting until recovery is finished.

It's not looking good. Obviously the change to EMERGENCY mode couldn't complete properly. What's the database status?

SELECT [state_desc] FROM sys.databases WHERE [name] = 'BootPageTest';
GO

state_desc
------------------------------------------------------------
RECOVERY_PENDING

The boot page is inaccessible so in effect the database is inaccessible too, this is what the database state means in this case.

So what does this mean? If the boot page is corrupt, you can't run DBCC CHECKDB so you can't possibly run repair, and you can't put the database into EMERGENCY mode so you can't extract data into a new database. It means that there's NO WAY to recover from a corrupt boot page EXCEPT to restore from backups. One more reason to have backups...

(I'm actually on-stage here at TechEd doing the  DAT track pre-con with Kimberly - she's on now until lunch so I'm catching up on forum problems...)

Here's a question that came up on of the SQLServerCentral.com corruption forums I monitor that I think is worth blogging about. To paraphrase:

I have a bunch of corruptions in a database, that look like they've been there for a while. Repair is my only option - it works but I'd like to know what data is being deleted. How can I do that? Here are some of the errors:

Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168576) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168577) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168578) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168579) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168580) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168581) could not be processed. See other errors for details.
Server: Msg 8928, Level 16, State 1, Line 2
Object ID 645577338, index ID 0: Page (1:168582) could not be processed. See other errors for details.
Server: Msg 8976, Level 16, State 1, Line 2
Table error: Object ID 645577338, index ID 1. Page (1:168576) was not seen in the scan although its parent (1:165809) and previous (1:168575) refer to it. Check any previous errors.
Server: Msg 8978, Level 16, State 1, Line 2
Table error: Object ID 645577338, index ID 1. Page (1:168583) is missing a reference from previous page (1:168582). Possible chain linkage problem.

This is a clustered index that CHECKDB  will repair by deleting pages at the leaf-level - essentially deleting a bunch of records. The pages look to be trashed (there were a bunch more errors that I didn't include here that said the page headers were all corrupted - looked like the IO subsystem trashde a whole 64KB chunk of the disk) so there's nothing much else you can do. As the table has a clustered index, you can use the error messages to find the pages on either 'logical' side of the pages being deleted - and hence figure out the range of records that have been deleted.

The errors show that pages 168576 through 168582 in file 1 are corrupt. There are also errors that say the previous page of 168576 is 168575, and the next page of 168582 is 168583. If you do a DBCC PAGE of these two pages, you can find the lower and upper bound of the clustered index key values that have been lost. Think of three ranges:

  • the lower range of records that are intact, logically before the corrupt pages in the index
  • the range of records that will be deleted by repair
  • the upper range of records that are intact, logically after the corrupt pages in the index

To find the upper bound of the lower range:

DBCC TRACEON (3604); -- allows the output to come to the console
DBCC PAGE ('dbname', 1, 168575, 3);
GO

The key value in the slot at the end of output is the upper bound of the bottom range that's intact.

Then do:

DBCC PAGE ('dbname', 1, 168583, 3);
GO

The key value in the slot at the beginning of the output is the lower bound of the upper range that's intact.

Everything in the middle will be deleted. You could also try a DBCC PAGE on the corrupt pages themselves too - you might be able to see some data in them.

I'll be blogging a bunch more about repair after my corruption session this week at TechEd - watch this space!

Over the weekend there was a question on one of the internal aliases at MS: how can I tell what percentage of a database has changed since the last full backup, so I can choose between a differential or full backup?

No such code exists as far as I know - until now! I happened to read the thread while sitting in the airport in Washington D.C. on the way back from Iceland so I started playing around and this morning I completed the code.

The code below creates a function and a stored procedure. The basic idea behind the code is as follows:

For each online data file in the database
   For each GAM interval in the file
      Crack the DIFF map page using DBCC PAGE
      Interpret the DIFF bitmap to aggregate the changed extents
      Add the sum to the total changed extents for the database
   End
End
Report results

There's a function that I create in msdb call SQLskillsConvertToExtents that cracks some of the DBCC PAGE output, and the main procedure is called sp_SQLskillsDIFForFULL and it created as a system object in master. I tried making it a table-valued function but you can't do things like INSERT-EXEC in a function, and that's required for processing the DBCC PAGE output. So - create your own wrapper function or whatever to use it. The interface/output is:

EXEC sp_SQLskillsDIFForFULL 'msdb';
GO

Total Extents Changed Extents Percentage Changed
------------- --------------- ----------------------
102           56              54.9

I've tested it with databases with multiple files and up to around 700GB for a single file size. There's the potential for an issue with file sizes of 4TB and beyond (where PFS intervals and GAM intervals map to the same extent every 4TB or so, but I think it'll be ok and the position of the DIFF map in the extent won't change - if someone could test it with a 4+TB file I'd be grateful). It's been tested on SQL Server 2005 and 2008. It will not work on SQL Server 2000 - I'll do a 2000 version sometime soon.

Note that after doing a full backup you will never see Changed Extents equal to zero. It will always be 4 + (number of online data files - 1), and around 20 or so for msdb. This is because the extent containing the file header in each file is always marked as changed, as are three extents in the primary file containing the roots of some critical system tables.

Anyway - here it is. You can download it in a zip file from SQLskillsDIFForFULL.zip (2.65KB). Enjoy!

/*============================================================================
   
File: SQLskillsDIFForFULL.sql

   Summary: This script creates a system-wide SP SQLskillsDIFForFILL that
   
works out what percentage of a database has changed since the
   
previous full database backup.

   Date: April 2008

   SQL Server Versions:
         
10.0.1300.13 (SS2008 February CTP - CTP-6)
         
9.00.3054.00 (SS2005 SP2)
------------------------------------------------------------------------------
   
Copyright (C) 2008 Paul S. Randal, SQLskills.com
   
All rights reserved.

   For more scripts and sample code, check out 
      
http://www.sqlskills.com/

   You may alter this code for your own *non-commercial* purposes. You may
   
republish altered code as long as you give due credit.

   THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF 
   
ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED 
   
TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A
   
PARTICULAR PURPOSE.

============================================================================*/

-- Create the function in MSDB
--
USE msdb;
GO

IF EXISTS (SELECT * FROM sys.objects WHERE NAME = 'SQLskillsConvertToExtents')
   
DROP FUNCTION SQLskillsConvertToExtents;
GO

-- This function cracks the output from a DBCC PAGE dump
-- of an allocation bitmap. It takes a string in the form
-- "(1:8) - (1:16)" or "(1:8) -" and returns the number
-- of extents represented by the string. Both the examples
-- above equal 1 extent.
--

CREATE FUNCTION SQLskillsConvertToExtents (
   
@extents VARCHAR (100))
RETURNS INTEGER
AS
BEGIN
   
DECLARE @extentTotal   INT;
   
DECLARE @colon         INT;
   
DECLARE @firstExtent   INT;
   
DECLARE @secondExtent  INT;

   SET @extentTotal = 0;
   
SET @colon = CHARINDEX (':', @extents);

   -- Check for the single extent case
   --
   IF (CHARINDEX (':', @extents, @colon + 1) = 0)
      
SET @extentTotal = 1;
   
ELSE
      
-- We're in the multi-extent case
      --
      BEGIN
      
SET @firstExtent = CONVERT (INT,
         
SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      
SET @colon = CHARINDEX (':', @extents, @colon + 1);
      
SET @secondExtent = CONVERT (INT,
         
SUBSTRING (@extents, @colon + 1, CHARINDEX (')', @extents, @colon) - @colon - 1));
      
SET @extentTotal = (@secondExtent - @firstExtent) / 8 + 1;
   
END

   RETURN @extentTotal;
END;
GO

USE master;
GO

IF OBJECT_ID ('sp_SQLskillsDIFForFULL') IS NOT NULL
   
DROP PROCEDURE sp_SQLskillsDIFForFULL;
GO

-- This SP cracks all differential bitmap pages for all online
-- data files in a database. It creates a sum of changed extents
-- and reports it as follows (example small msdb):
--
-- EXEC sp_SQLskillsDIFForFULL 'msdb';
-- GO
--
-- Total Extents Changed Extents Percentage Changed
-- ------------- --------------- ----------------------
-- 102           56              54.9
--
-- Note that after a full backup you will always see some extents
-- marked as changed. The number will be 4 + (number of data files - 1).
-- These extents contain the file headers of each file plus the
-- roots of some of the critical system tables in file 1.
-- The number for msdb may be round 20.
--
CREATE PROCEDURE sp_SQLskillsDIFForFULL (
   
@dbName VARCHAR (128))
AS
BEGIN
   
SET NOCOUNT ON;

   -- Create the temp table
   
--
   
IF EXISTS (SELECT * FROM msdb.sys.objects WHERE NAME = 'SQLskillsDBCCPage')
   
DROP TABLE msdb.dbo.SQLskillsDBCCPage;

   CREATE TABLE msdb.dbo.SQLskillsDBCCPage (
      
[ParentObject] VARCHAR (100),
      
[Object]       VARCHAR (100),
      
[Field]        VARCHAR (100),
      
[VALUE]        VARCHAR (100));

   DECLARE @fileID         INT;
   
DECLARE @fileSizePages  INT;
   
DECLARE @extentID       INT;
   
DECLARE @pageID         INT;
   
DECLARE @DIFFTotal      INT;
   
DECLARE @sizeTotal      INT;
   
DECLARE @total          INT;
   
DECLARE @dbccPageString VARCHAR (200);

   SELECT @DIFFTotal = 0;
   
SELECT @sizeTotal = 0;

   -- Setup a cursor for all online data files in the database
   
--
   
DECLARE files CURSOR FOR
      
SELECT [file_id], [size] FROM master.sys.master_files
      
WHERE [type_desc] = 'ROWS'
      
AND [state_desc] = 'ONLINE'
      
AND [database_id] = DB_ID (@dbName);

   OPEN files;

   FETCH NEXT FROM files INTO @fileID, @fileSizePages;

   WHILE @@FETCH_STATUS = 0
   
BEGIN
      
SELECT @extentID = 0;

      -- The size returned from master.sys.master_files is in
      
-- pages - we need to convert to extents
      
--
      
SELECT @sizeTotal = @sizeTotal + @fileSizePages / 8;

      WHILE (@extentID < @fileSizePages)
      
BEGIN
         
-- There may be an issue with the DIFF map page position
         
-- on the four extents where PFS pages and GAM pages live
         
-- (at page IDs 516855552, 1033711104, 1550566656, 2067422208)
         
-- but I think we'll be ok.
         
-- PFS pages are every 8088 pages (page 1, 8088, 16176, etc)
         
-- GAM extents are every 511232 pages
         
--
         
SELECT @pageID = @extentID + 6;

         -- Build the dynamic SQL
         
--
         
SELECT @dbccPageString = 'DBCC PAGE ('
            
+ @dbName + ', '
            
+ CAST (@fileID AS VARCHAR) + ', '
            
+ CAST (@pageID AS VARCHAR) + ', 3) WITH TABLERESULTS, NO_INFOMSGS';

         -- Empty out the temp table and insert into it again
         
--
         
DELETE FROM msdb.dbo.SQLskillsDBCCPage;
         
INSERT INTO msdb.dbo.SQLskillsDBCCPage EXEC (@dbccPageString);

         -- Aggregate all the changed extents using the function
         
--
         
SELECT @total = SUM ([msdb].[dbo].[SQLskillsConvertToExtents] ([Field]))
         
FROM msdb.dbo.SQLskillsDBCCPage
            
WHERE [VALUE] = '    CHANGED'
            
AND [ParentObject] LIKE 'DIFF_MAP%';

         SET @DIFFTotal = @DIFFTotal + @total;

         -- Move to the next GAM extent
         
SET @extentID = @extentID + 511232;
      
END

      FETCH NEXT FROM files INTO @fileID, @fileSizePages;
   
END;

   -- Clean up
   
--
   
DROP TABLE msdb.dbo.SQLskillsDBCCPage;
   
CLOSE files;
   
DEALLOCATE files;

   -- Output the results
   
--
   
SELECT
      
@sizeTotal AS [Total Extents],
      
@DIFFTotal AS [Changed Extents],
      
ROUND (
         
(CONVERT (FLOAT, @DIFFTotal) /
         
CONVERT (FLOAT, @sizeTotal)) * 100, 2) AS [Percentage Changed];
END;
GO

-- Mark the SP as a system object
--
EXEC sys.sp_MS_marksystemobject sp_SQLskillsDIFForFULL;
GO

-- Test to make sure everything was setup correctly
--
EXEC sp_SQLskillsDIFForFULL 'msdb';
GO

One of the drawbacks of not being in the SQL team at Microsoft any longer is that I don't know about all the undocumented features in the next release - I have to hunt around for them like everyone else :-(

So I was poking about in SSMS in 2008 CTP-6 and noticed a function called sys.fn_PhysLocCracker that I'd never heard of. Doing an sp_helptext on it gets the following output:

-- Name: sys.fn_PhysLocCracker
--
-- Description:
-- Cracks the output of %%physloc%% virtual column
--
-- Notes:
-------------------------------------------------------------------------------
create function sys.fn_PhysLocCracker (@physical_locator binary (8))
returns @dumploc_table table
(
 [file_id] int not null,
 [page_id] int not null,
 [slot_id] int not null
)
as
begin

 declare @page_id binary (4)
 declare @file_id binary (2)
 declare @slot_id binary (2)

 -- Page ID is the first four bytes, then 2 bytes of page ID, then 2 bytes of slot
 --
 select @page_id = convert (binary (4), reverse (substring (@physical_locator, 1, 4)))
 select @file_id = convert (binary (2), reverse (substring (@physical_locator, 5, 2)))
 select @slot_id = convert (binary (2), reverse (substring (@physical_locator, 7, 2)))
 
 insert into @dumploc_table values (@file_id, @page_id, @slot_id)
 return
end

Cool - but something else I've never heard of %%physloc%% - what's that? After playing around for a while, I figured out how to make it work.  Just to be confusing, there's another identical version of the function called sys.fn_PhysLocFormatter - and that's the only one I could get to work. Here's an example:

CREATE TABLE TEST (c1 INT IDENTITY, c2 CHAR (4000) DEFAULT 'a');
GO
INSERT INTO TEST DEFAULT VALUES
;
INSERT INTO TEST DEFAULT VALUES
;
INSERT INTO TEST DEFAULT VALUES;
GO

SELECT sys.fn_PhysLocFormatter (%%physloc%%) AS [Physical RID], * FROM TEST;
GO

Physical RID       c1
-----------------  -----------
(1:411:0)          1
(1:411:1)          2
(1:413:0)          3

It's a physical-record locator function! Undocumented and unsupported (obviously), but hey, some of the best features are :-) It gives the database file, page within the file, and slot number on the page in the format (file:page:slot). I can think of a *bunch* of uses for this which I'll be exploring over the next few months.

How cool is that?!?!

It's been a long few days of building slide decks and other content and I just had to stop for a bit and take care of my internals-hacking withdrawal symptoms!

While I was at Microsoft, I wrote some code in the Storage Engine to very easily return all the IAM chains/allocation units (see this post for more details of the internals of those), what type they are, and the relevant page IDs (first, root, first-IAM) so I could go spelunking with DBCC PAGE. Since I left six months ago, it's one of the things I've been missing using when poking around on customer sites, so this afternoon I sat down and wrote the equivalent in T-SQL, using the undocumented sys.system_internals_allocation_units DMV. The output is easy to match up to sys.partitions but the page IDs are formatted in byte-reversed hex so a little tweaking was needed to extract the fields and make them human readable - I've put them into the same format that all SQL Server error messages use when giving a page number.

So - I present to you sp_AllocationMetadata. I was having all kinds of trouble using it in other databases (trying to figure out a way to change database contexts in the SP) until I remembered that you can create an SP in master and mark it as a system object using the undocumented sys.sp_MS_marksystemobject SP. This makes any SP execute in the context of the database from where it is called - extremely useful when you're querying against a database's system catalog views.

[Edit: Kalen pointed out that the DMV *is* documented, but just not in the BOL index. Even better - that means the SP below isn't doing anything dodgy :-) - thanks Kalen!]

The SP can be called with an optional object name parameter, in which case it will only give you back the allocation metadata for that object. If you don't specify a parameter, it gives you back the allocation metadata for all objects in the database. Here's an example of the output:

USE AdventureWorks;
GO

EXEC sp_AllocationMetadata 'HumanResources.Employee';
GO

Object Name   Index ID   Alloc Unit ID       Alloc Unit Type   First Page   Root Page   First IAM Page
------------- ---------- ------------------- ----------------- ------------ ----------- ----------------
Employee      1          72057594050379776   IN_ROW_DATA       (1:588)      (1:594)     (1:593)
Employee      2          72057594055491584   IN_ROW_DATA       (1:2141)     (1:2144)    (1:2142)
Employee      3          72057594055557120   IN_ROW_DATA       (1:2146)     (1:2149)    (1:2147)
Employee      4          72057594055622656   IN_ROW_DATA       (1:2150)     (1:2150)    (1:2151)
Employee      5          72057594055688192   IN_ROW_DATA       (1:2153)     (1:2153)    (1:2154)

You'll notice there are only IN_ROW_DATA allocation units - that's because this table doesn't have any LOB data or an variable-length columns that have been pushed off-row (producing LOB_DATA and ROW_OVERFLOW_DATA allocation units, respectively). So - it only shows what actually exists (rather than creating NULL values, for instance).

Below is the script that creates the SP, and I've included it as an attachment too.

Ah - that feel's better :-) Happy spelunking!

USE master;
GO

IF OBJECT_ID ('sp_AllocationMetadata') IS NOT NULL
   DROP PROCEDURE sp_AllocationMetadata;
GO

CREATE PROCEDURE sp_AllocationMetadata
(
   
@object VARCHAR (128) = NULL
)
AS
SELECT
   
OBJECT_NAME (sp.object_id) AS [Object Name],
   
sp.index_id AS [Index ID],
   
sa.allocation_unit_id AS [Alloc Unit ID],
   
sa.type_desc AS [Alloc Unit Type],
   
'(' CONVERT (VARCHAR (6),
      
CONVERT (INT, SUBSTRING (sa.first_page, 6, 1) +
      
   SUBSTRING (sa.first_page, 5, 1))) +
   
':' + CONVERT (VARCHAR (20),
      
CONVERT (INT, SUBSTRING (sa.first_page, 4, 1) +
      
   SUBSTRING (sa.first_page, 3, 1) +
         
SUBSTRING (sa.first_page, 2, 1) +
         
SUBSTRING (sa.first_page, 1, 1))) +
   
')' AS [First Page],
   '(' + CONVERT (VARCHAR (6),
      
CONVERT (INT,
         
SUBSTRING (sa.root_page, 6, 1) +
         
SUBSTRING (sa.root_page, 5, 1))) +
   
':' + CONVERT (VARCHAR (20),
      
CONVERT (INT,
         
SUBSTRING (sa.root_page, 4, 1) +
         
SUBSTRING (sa.root_page, 3, 1) +
         
SUBSTRING (sa.root_page, 2, 1) +
         
SUBSTRING (sa.root_page, 1, 1))) +
   
')' AS [Root Page],
   
'(' + CONVERT (VARCHAR (6),
      
CONVERT (INT,
         
SUBSTRING (sa.first_iam_page, 6, 1) +
         
SUBSTRING (sa.first_iam_page, 5, 1))) +
   
':' + CONVERT (VARCHAR (20),
      
CONVERT (INT,
         
SUBSTRING (sa.first_iam_page, 4, 1) +
         
SUBSTRING (sa.first_iam_page, 3, 1) +
         
SUBSTRING (sa.first_iam_page, 2, 1) +
         
SUBSTRING (sa.first_iam_page, 1, 1))) +
   
')' AS [First IAM Page]
FROM
   
sys.system_internals_allocation_units AS sa,
   
sys.partitions AS sp
WHERE
   
sa.container_id = sp.partition_id
   AND sp.object_id =
      
(CASE WHEN (@object IS NULL)
         THEN sp.object_id
         
ELSE OBJECT_ID (@object)
      
END);
GO

EXEC sys.sp_MS_marksystemobject sp_AllocationMetadata;
GO

sp_AllocationMetadata.zip (.69 KB)

Over the years I was in the Storage Engine team I saw a lot of concern on the various forums about the ghost cleanup task. There have been a few bugs with it in previous versions  (see these KB articles - 932115 and 815594) and there's very little info available on it. For some reason I didn't get around to posting about it on my old blog but today I want to go into some depth on it.

So what is ghost cleanup? It's a background process that cleans up ghost records - usually referred to as the ghost cleanup task. What's a ghost record? As I described briefly in the Anatomy of a record post last week, a ghost record is one that's just been deleted in an index on a table (well, actually it gets more complicated if snapshot isolation of some form is enabled but for now, a record in an index is a good start). Such a delete operation never physically removes records from pages - it only marks them as having been deleted, or ghosted. This is a performance optimization that allows delete operations to complete more quickly. It also allows delete operations to rollback more quickly because all that needs to happen is to unmark the records as being deleted/ghosted, instead of having to reinsert the deleted records. The deleted record will be physically removed (well, its slot will be removed - the record data isn't actually overwritten) later by the background ghost cleanup task. The ghost cleanup task will leave a single record on the page to avoid having to deallocate empty data or index pages.

The ghost cleanup task can't physically delete the ghost records until after the delete transaction commits because the deleted records are locked and the locks aren't released until the transaction commits. As an aside, when ghost records exist on a page, even a NOLOCK or READ UNCOMMITTED scan won't return them because they are marked as ghost records.

When a record is deleted, apart from it being marked as a ghost record, the page that the record is on is also marked as having ghost records in one of the allocation maps - the PFS page (post coming soon!) - and in its page header. Marking a page as having ghost records in a PFS page also changes the database state to indicate that there are some ghost records to cleanup - somewhere. Nothing tells the ghost cleanup task to clean the specific page that the delete happened on - yet. That only happens when the next scan operation reads the page and notices that the page has ghost records.

The ghost cleanup task doesn't just start up when it's told to - it starts up in the background every 5 seconds and looks for ghost records to cleanup. Remember that it won't be told to go cleanup a specific page by a delete operation - it's a subsequent scan that does it, if a scan happens. When the ghost cleanup task starts up it checks to see if its been told to cleanup a page - if so it goes and does it. If not, it picks the next database that is marked as having some ghost records and looks through the PFS allocation map pages to see if there are any ghost records to cleanup. It will check through or cleanup a limited number of pages each time it wakes up - I remember the limit is 10 pages - to ensure it doesn't swamp the system. So - the ghost records will eventually be removed - either by the ghost cleanup task processing a database for ghost records or by it specifically being told to remove them from a page. If it processes a database and doesn't find any ghost records, it marks the database as not having any ghost records so it will be skipped next time.

How can you tell its running? On SQL Server 2005, you can use the following code to see the ghost cleanup task in sys.dm_exec_requests:

SELECT * INTO myexecrequests FROM sys.dm_exec_requests WHERE 1 = 0;

GO

SET NOCOUNT ON;

GO

DECLARE @a INT

SELECT @a = 0;

WHILE (@a < 1)

BEGIN

INSERT INTO myexecrequests SELECT * FROM sys.dm_exec_requests WHERE command LIKE '%ghost%'

SELECT @a = COUNT (*) FROM myexecrequests

END;

GO

SELECT * FROM myexecrequests;

GO

And on SQL Server 2000 you need to use sysprocesses (well, on SQL Server 2005 this works as well but its fake view derived from the DMVs):

SELECT * INTO mysysprocesses FROM master.dbo.sysprocesses WHERE 1 = 0;

GO

SET NOCOUNT ON;

GO

DECLARE @a INT

SELECT @a = 0;

WHILE (@a < 1)

BEGIN

INSERT INTO mysysprocesses SELECT * FROM master.dbo.sysprocesses WHERE cmd LIKE '%ghost%'

SELECT @a = COUNT (*) FROM mysysprocesses

END;

GO

SELECT * FROM mysysprocesses;

GO

The output from sys.dm_exec_requests is (with most unused and uninteresting columns stripped off):

session_id request_id  start_time              status       command
---------- ----------- ----------------------- ------------ ----------------
15         0           2007-10-05 16:34:49.653 background   GHOST CLEANUP

So how can you tell if a record is ghosted? Let's engineer some and look at it with DBCC PAGE - I've stripped out the uninteresting bits of the output and highlighted the interesting ghost parts:

CREATE TABLE t1 (c1 CHAR(10))

CREATE CLUSTERED INDEX t1c1 on t1 (c1)

GO

BEGIN TRAN

INSERT INTO t1 VALUES ('PAUL')

INSERT INTO t1 VALUES ('KIMBERLY')

DELETE FROM t1 WHERE c1='KIMBERLY';

GO

DBCC IND ('ghostrecordtest', 't1', 1);

GO

DBCC TRACEON (3604);

GO

DBCC PAGE ('ghostrecordtest', 1, 143, 3);

GO

<snip>

m_freeData = 130    m_reservedCnt = 0   m_lsn = (20:88:20)
m_xactReserved = 0  m_xdesId = (0:518)  m_ghostRecCnt = 1
m_tornBits = 0

<snip>

Slot 0 Offset 0x71 Length 17

Record Type = GHOST_DATA_RECORD      Record Attributes =  NULL_BITMAP
Memory Dump @0x6256C071

00000000:   1c000e00 4b494d42 45524c59 20200200 †....KIMBERLY  ..
00000010:   fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]

Slot 0 Column 1 Offset 0x4 Length 10

c1 = KIMBERLY

Slot 1 Offset 0x60 Length 17

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP
Memory Dump @0x6256C060

00000000:   10000e00 5041554c 20202020 20200200 †....PAUL      ..
00000010:   fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]

Slot 1 Column 1 Offset 0x4 Length 10

c1 = PAUL

Let's see what goes on the transaction log during this process (remember this is undocumented and unsupported - do it on a test database) - I've stripped off a bunch of the columns in the output:

DECLARE @a CHAR (20)

SELECT @a = [Transaction ID] FROM fn_dblog (null, null) WHERE [Transaction Name]='PaulsTran'

SELECT * FROM fn_dblog (null, null) WHERE [Transaction ID] = @a;

GO

Current LSN              Operation         Context             Transaction ID
------------------------ ----------------- ------------------- --------------
00000014:00000054:0011   LOP_BEGIN_XACT    LCX_NULL            0000:00000206
00000014:0000005a:0012   LOP_INSERT_ROWS   LCX_CLUSTERED       0000:00000206
00000014:0000005a:0013   LOP_INSERT_ROWS   LCX_CLUSTERED       0000:00000206
00000014:0000005a:0014   LOP_DELETE_ROWS   LCX_MARK_AS_GHOST   0000:00000206
00000014:0000005a:0016   LOP_DELETE_ROWS   LCX_MARK_AS_GHOST   0000:00000206

So there are the two inserts followed by the two deletes - with the rows being marked as ghost records. But where's the update to the PFS page? Well, changing the ghost bit in a PFS page is not done as part of a transaction. We'll need to look for it another way (apart from just dumping everything in the transaction log and searching manually):

SELECT Description, * FROM fn_dblog (null, null) WHERE Context like '%PFS%' AND AllocUnitName like '%t1%';

GO

Description               Current LSN              Operation        Context   Transaction ID
------------------------- ------------------------ ---------------- --------- ----------------
Allocated 0001:0000008f   00000014:00000054:0014   LOP_MODIFY_ROW   LCX_PFS   0000:00000208
                          00000014:0000005a:0015   LOP_SET_BITS     LCX_PFS   0000:00000000

The first one is just allocating a page but the second one is the one we're looking for - it's changed the bit for the page to say it has ghost records on. Let's commit the transaction and see what happens, filtering out all the previous transaction log:

SELECT MAX ([Current LSN]) FROM fn_dblog (null, null);

GO

-- 00000014:0000005e:0001

COMMIT TRAN

GO

SELECT [Page ID], * FROM fn_dblog (null, null) WHERE [Current LSN] > '00000014:0000005e:0001';

GO

Page ID         Current LSN              Operation          Context         Transaction ID
--------------- ------------------------ ------------------ --------------- --------------
NULL            00000014:0000005f:0001   LOP_COMMIT_XACT    LCX_NULL        0000:00000206
0001:0000008f   00000014:00000060:0001   LOP_EXPUNGE_ROWS   LCX_CLUSTERED   0000:00000000

We see that almost as soon as the transaction has commited, the ghost cleanup task goes in and process the page. Let's check a page dump to make sure the record is gone, and show that the contents of the record are still on the page (again, with non-relevant bits snipped out):

DBCC PAGE ('ghostrecordtest', 1, 143, 3);

GO

<snip>

m_freeData = 130         m_reservedCnt = 0        m_lsn = (20:94:1)
m_xactReserved = 0       m_xdesId = (0:518)       m_ghostRecCnt = 0
m_tornBits = 0

<snip>

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP
Memory Dump @0x6212C060

00000000:   10000e00 5041554c 20202020 20200200 †....PAUL      ..
00000010:   fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]

Slot 0 Column 1 Offset 0x4 Length 10

c1 = PAUL

DBCC PAGE ('ghostrecordtest', 1, 143, 2);

GO

<snip>

6212C040:   01000000 00000000 00000000 00000000 †................
6212C050:   00000000 00000000 00000000 00000000 †................
6212C060:   10000e00 5041554c 20202020 20200200 †....PAUL      ..
6212C070:   fc1c000e 004b494d 4245524c 59202002 †.....KIMBERLY  .
6212C080:   00fc0000 00000000 00000000 01000000 †................
6212C090:   00000000 13000000 01000000 00000000 †................

<snip>

So even though the record no longer exists, all that happened was that the slot was removed from the slot array at the end of the page - the record contents will remain on the page until the space is reused.

In the next post I'll go into details of the PFS and other allocation maps. Btw - please let me know if this stuff is interesting - I'd like to know where to spend blogging time. Thanks!

Here's a really interesting question that was in my search engine logs yesterday - if I have a transaction that runs and completes while a backup is running, will the complete transaction be in the backup? The answer is.... it depends!

In terms of what gets backed up, the way a full backup works is:

  1. Note the transaction log's LSN (Log Sequence Number)
  2. Read all allocated extents in the various data files
  3. Note the LSN again
  4. Read all the transaction log between the starting LSN and the ending LSN

Any transaction that commits before or on the LSN read in step 3 will be fully reflected when the database is restored. If not, the transaction will be undone. So you can't just go by the completion time of the backup and the completion time of the transaction. The transaction may well commit before the backup operation completes, but it may complete during step 4, and so it will get rolled back during a restore. In this case, it's necessary to take a log backup as well and restore that too to make the transaction be fully reflected after a restore.

As you know I always like to prove things  - so here's my proof of what I just said. I'm going to use the AdventureWorks database to do this. First thing is to set it to full recovery mode (and take the first full backup to start full recovery mode logging):

ALTER DATABASE AdventureWorks SET RECOVERY FULL;

BACKUP DATABASE AdventureWorks TO DISK='C:\SQLskills\AdventureWorks.bck' WITH INIT;

GO

Now I'm going to flush out the backup history tables in MSDB:

USE msdb;

GO

EXEC sp_delete_backuphistory '10/6/07';

GO

I've got a really contrived example that I'm going to use to show whether my transaction is wholely contained in the full backup. Using the HumanResources.Employee table, there's a column VacationHours which I'm going to set to 0 and then force all the changes pages to disk. This is my base state:

UPDATE AdventureWorks.HumanResources.Employee SET VacationHours = 0;

GO

CHECKPOINT;

GO

My contrived transaction is going to be in a tight loop updating all the rows in the table, which will generate lots of transaction log, and timed to complete just before the backup completes (i.e. in stage 4). In one connection I start the backup:

BACKUP DATABASE AdventureWorks TO DISK='C:\SQLskills\AdventureWorks.bck' WITH INIT;

SELECT GETDATE ();

GO

and in another I start my contrived transaction, after starting the backup:

BEGIN TRAN

DECLARE @a INT

DECLARE @b INT

SELECT @a = 1

WHILE (@a < 6)

BEGIN

SELECT @b = 1

WHILE (@b < 201)

BEGIN

UPDATE AdventureWorks.HumanResources.Employee SET VacationHours = @b

SELECT @b=@b+1

END

SELECT @a=@a+1

END

COMMIT TRAN;

SELECT GETDATE ();

GO

It's not pretty but it does the job. The backup finishes at 2007-10-05 17:42:38.983 and the transaction finishes at 2007-10-05 17:42:38.107 - before the backup finishes. Remember I set the VacationHours all to zero before running my transaction - let's check the transaction did actually change them:

SELECT MAX (VacationHours) FROM AdventureWorks.HumanResources.Employee;

GO

This returns 200 - which is what I'd expect. But is the transaction wholely contained in the backup? Let's look at the backup history to find out the last LSN that was captured in the full backup:

SELECT Backup_Start_Date, Backup_Finish_Date, First_LSN, Last_LSN

FROM msdb.dbo.backupset WHERE database_name = 'AdventureWorks';

GO

And the output we get is:

Backup_Start_Date       Backup_Finish_Date      First_LSN           Last_LSN
----------------------- ----------------------- ------------------- -------------------
2007-10-05 17:42:22.000 2007-10-05 17:42:38.000 86000000001600029   91000000625600001

The LSNs are in decimal, so we need to convert the three numbers to hex so we can compare them to what's in the log - giving: 5B:1870:1. This is the LSN of the last log record that was backed up in the full backup. Now let's take a look at the transaction log for AdventureWorks using the undocumented fn_dblog function. This is undocumented but very well known. It's a fully composable alternative to using the old DBCC LOG command.

USE AdventureWorks;

GO

SELECT [Current LSN], Operation, [Transaction ID], AllocUnitName FROM fn_dblog (NULL, NULL);

GO

The log record at that LSN is:

00000058:00001870:0001  LOP_MODIFY_ROW     0000:00001338  HumanResources.Employee.PK_Employee_EmployeeID

This is obviously in the middle of my contrived transaction - showing that it isn't all in the full backup. The end of the transaction isn't until way later in the log:

0000005e:00000628:01b1  LOP_MODIFY_ROW     0000:00001338  HumanResources.Employee.PK_Employee_EmployeeID
0000005e:00000628:01b2  LOP_COMMIT_XACT    0000:00001338  NULL

Before I do anything else, I want to take a log backup to preserve my transaction:

BACKUP LOG AdventureWorks TO DISK='C:\SQLskills\AdventureWorks_Log.bck' WITH INIT;

GO

Now I want to restore the full backup and really show that my transaction isn't all in there:

USE master;

GO

RESTORE DATABASE AdventureWorks FROM DISK='C:\SQLskills\AdventureWorks.bck' WITH REPLACE, RECOVERY;

GO

SELECT MAX (VacationHours) FROM AdventureWorks.HumanResources.Employee;

GO

This returns 0. Clearly my transaction isn't all in there, and the parts that are were rolled back during the restore. Now let's do the same thing but using NORECOVERY for the restore of the full backup and also applying the log backup I took:

RESTORE DATABASE AdventureWorks FROM DISK='C:\SQLskills\AdventureWorks.bck' WITH REPLACE, NORECOVERY;

GO

RESTORE LOG AdventureWorks FROM DISK='C:\SQLskills\AdventureWorks_Log.bck' WITH RECOVERY;

GO

SELECT MAX (VacationHours) FROM AdventureWorks.HumanResources.Employee;

GO

This time the SELECT returns 200. And now you can start playng around with fn_dblog if you didn't know about it before. I'll be posting more about Storage Engine internals that you can figure out from the transaction log in future.

I mentioned this in my Anatomy of a page post - its a common misconception that records in an index are ALWAYS stored in the same physical order as the logical order defined by the index key. Here's proof for you that this is incorrect (as well as introducing you to the other dump styles for DBCC PAGE).

I'm going to create a table with a clustered index on an integer column and keep the table to a single page for simplicity:

USE MASTER;

GO

IF DATABASEPROPERTY (N'rowordertest', 'Version') > 0 DROP DATABASE rowordertest;

GO

CREATE DATABASE rowordertest;

GO

USE rowordertest;

GO

CREATE TABLE t1 (c1 INT, c2 VARCHAR (10));

CREATE CLUSTERED INDEX t1c1 ON t1 (c1);

GO

Now I'm going to insert a few rows into the table, with c1 from 2 to 5 - conspicuously not inserting c1 = 1:

INSERT INTO t1 VALUES (2, REPLICATE ('b', 10));

INSERT INTO t1 VALUES (3, REPLICATE ('c', 10));

INSERT INTO t1 VALUES (4, REPLICATE ('d', 10));

INSERT INTO t1 VALUES (5, REPLICATE ('e', 10));

GO

Now, using DBCC IND  we see that the data page is (1:143) and dumping that with DBCC PAGE gives the following (skipping the header output):

DBCC IND ('rowordertest', 't1', 1);

GO

DBCC TRACEON (3604);

GO

DBCC PAGE ('rowordertest', 1, 143, 3);

GO

Slot 0 Offset 0x60 Length 27

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

Memory Dump @0x5BA3C060

00000000:   30000800 02000000 0300f802 0011001b †0...............
00000010:   00626262 62626262 626262†††††††††††††.bbbbbbbbbb
UNIQUIFIER = [NULL]  

Slot 0 Column 1 Offset 0x4 Length 4

c1 = 2

Slot 0 Column 2 Offset 0x11 Length 10

c2 = bbbbbbbbbb

<snip slots 1 and 2>                   

Slot 3 Offset 0xb1 Length 27

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

Memory Dump @0x5BA3C0B1

00000000:   30000800 05000000 0300f802 0011001b †0...............
00000010:   00656565 65656565 656565†††††††††††††.eeeeeeeeee
UNIQUIFIER = [NULL]

Slot 3 Column 1 Offset 0x4 Length 4

c1 = 5

Slot 3 Column 2 Offset 0x11 Length 10

c2 = eeeeeeeeee

DBCC PAGE with dump-style 3 always outputs the records on a page in their logical order (because that's how the slot array is ordered). Notice that the record with c1 = 2 is stored at offset 0x60 in the page and the last record on the page with c1 = 5 is stored at offset 0xb1. Now we'll insert a record with c1 =1. This will become the first logical record in the index, but will it cause the page to be shuffled so the records can all be stored in logical order?

INSERT INTO t1 VALUES (1, REPLICATE ('a', 10));

GO

DBCC PAGE ('rowordertest', 1, 143, 3);

GO

Slot 0 Offset 0xcc Length 27

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

Memory Dump @0x61FCC0CC

00000000:   30000800 01000000 0300f802 0011001b †0...............
00000010:   00616161 61616161 616161†††††††††††††.aaaaaaaaaa
UNIQUIFIER = [NULL]

Slot 0 Column 1 Offset 0x4 Length 4

c1 = 1

Slot 0 Column 2 Offset 0x11 Length 10

c2 = aaaaaaaaaa

Slot 1 Offset 0x60 Length 27

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

<snip>

The answer is no. Even though the record with c1 =1 is output by DBCC PAGE first, look at its offset within the page - 0xCC - clearly the last record on the page and stored in a different physical order than the logical order defined by the index key. Further proof can be obtained by looking at a raw hex dump of the page using dump style 2 with DBCC PAGE:

DBCC PAGE ('rowordertest', 1, 143, 2);

GO

<snip>

6204C000:   01010400 00400001 00000000 00000800 †.....@..........
6204C010:   00000000 00000500 44000000 0f1fe700 †........D.......
6204C020:   8f000000 01000000 13000000 60000000 †............`...
6204C030:   16000000 00000000 00000000 00000000 †................
6204C040:   01000000 00000000 00000000 00000000 †................
6204C050:   00000000 00000000 00000000 00000000 †................
6204C060:   30000800 02000000 0300f802 0011001b †0...............
6204C070:   00626262 62626262 62626230 00080003 †.bbbbbbbbbb0....
6204C080:   00000003 00f80200 11001b00 63636363 †............cccc
6204C090:   63636363 63633000 08000400 00000300 †cccccc0.........
6204C0A0:   f8020011 001b0064 64646464 64646464 †.......ddddddddd
6204C0B0:   64300008 00050000 000300f8 02001100 †d0..............
6204C0C0:   1b006565 65656565 65656565 30000800 †..eeeeeeeeee0...
6204C0D0:   01000000 0300f802 0011001b 00616161 †.............aaa
6204C0E0:   61616161 61616100 00000000 00000000 †aaaaaaa.........
6204C0F0:   00000000 00000000 00000000 00000000 †................

<snip>

You can clearly see that the last row I inserted, with c1 = 1 and the replicated 'a's is stored after the other records on the page, even though its key is logically before the others.

And just to nail the point home, doing a dump style 1 with DBCC PAGE will dump out the slot array for us:

DBCC PAGE ('rowordertest', 1, 143, 1);

GO

<snip>

OFFSET TABLE:

Row - Offset
4 (0x4) - 177 (0xb1)
3 (0x3) - 150 (0x96)
2 (0x2) - 123 (0x7b)
1 (0x1) - 96 (0x60)
0 (0x0) - 204 (0xcc)

<snip>

The slot array grows backwards, which is why its dumped in what looks like reverse logical order. You can see that slot 0, which represents the first logical record on the page, is stored at an offset greater than the others.

Time for the first post in the Inside the Storage Engine series. I'm going to focus on SQL Server 2005 in this series and I'll point out major differences between 2005 and previous versions. Please drop me a line if there's something you'd like to see explained and demo'd.

Before jumping into how things work, I'd like to go over two commands I'll be using a lot - DBCC PAGE and DBCC IND. These are both undocumented and unsupported commands, but are very safe to use as they're used extensively inside and outside Microsoft when troubleshooting. Nevertheless, use at your own risk. They're quite well known in the SQL community and I and others have publicized them before (I even demo'd them last year at ITForum in Spain).

To illustrate their use, I'm going to use a simple script I wrote to prove that page splits never roll back. I was having a discussion with someone a while ago about this question and the answer is always no. A page split occurs when an insert or update has to happen at a certain point in an index page, and there's no room on the page to accomomodate the new or updated record. Page splits are done internally as separate 'system' transactions. Once a system transaction commits, it cannot be rolled back - even if the user transaction it was part of rolls back.

So, let's run through the script. First thing to do is create a database containing a table with an index (as page splits only happen in indexes).

USE MASTER;

GO

IF DATABASEPROPERTY (N'pagesplittest', 'Version') > 0 DROP DATABASE pagesplittest;

GO

CREATE DATABASE pagesplittest;

GO

USE pagesplittest;

GO

CREATE TABLE t1 (c1 INT, c2 VARCHAR (1000));

CREATE CLUSTERED INDEX t1c1 ON t1 (c1);

GO

Now I'm going to fill up a page in the index, but leave a gap in the c1 values so that I can force a page split by inserting the missing key value.

INSERT INTO t1 VALUES (1, REPLICATE ('a', 900));

INSERT INTO t1 VALUES (2, REPLICATE ('b', 900));

INSERT INTO t1 VALUES (3, REPLICATE ('c', 900));

INSERT INTO t1 VALUES (4, REPLICATE ('d', 900));

-- leave a gap at 5

INSERT INTO t1 VALUES (6, REPLICATE ('f', 900));

INSERT INTO t1 VALUES (7, REPLICATE ('g', 900));

INSERT INTO t1 VALUES (8, REPLICATE ('h', 900));

INSERT INTO t1 VALUES (9, REPLICATE ('i', 900));

GO

I can find out what the first index page is using the DBCC IND command:

DBCC IND ('pagesplittest', 't1', 1);

GO

This command list all the pages that are allocated to an index. Here's the output in this case:

SEQA3.jpg

The columns mean:

  • PageFID - the file ID of the page
  • PagePID - the page number in the file
  • IAMFID - the file ID of the IAM page that maps this page (this will be NULL for IAM pages themselves as they're not self-referential)
  • IAMPID - the page number in the file of the IAM page that maps this page
  • ObjectID - the ID of the object this page is part of
  • IndexID - the ID of the index this page is part of
  • PartitionNumber - the partition number (as defined by the partitioning scheme for the index) of the partition this page is part of
  • PartitionID - the internal ID of the partition this page is part of
  • PageType - the page type. Some common ones are:
    • 1 - data page
    • 2 - index page
    • 3 and 4 - text pages
    • 8 - GAM page
    • 9 - SGAM page
    • 10 - IAM page
    • 11 - PFS page
  • IndexLevel - what level the page is at in the index (if at all). Remember that index levels go from 0 at the leaf to N at the root page (except in clustered indexes in SQL Server 2000 and 7.0 - where there's a 0 at the leaf level (data pages) and a 0 at the next level up (first level of index pages))
  • NextPageFID and NextPagePID - the page ID of the next page in the doubly-linked list of pages at this level of the index
  • PrevPageFID and PrevPagePID - the page ID of the previous page in the doubly-linked list of pages at this level of the index

So you can see we've got a single page clustered index with an IAM page. Note that the page IDs returned may differ on your server. Let's look at the data page:

DBCC TRACEON (3604);

GO

DBCC PAGE (pagesplittest, 1, 143, 3);

GO

The traceflag is to make the output of DBCC PAGE go to the console, rather than to the error log. The syntax for DBCC PAGE is:

dbcc page ( {'dbname' | dbid}, filenum, pagenum [, printopt={0|1|2|3} ])

The filenum and pagenum parameters are taken from the page IDs that come from various system tables and appear in DBCC or other system error messages. A page ID of, say, (1:143) has filenum = 1 and pagenum = 143.

The printopt parameter has the following meanings:

  • 0 - print just the page header
  • 1 - page header plus per-row hex dumps and a dump of the page slot array (unless its a page that doesn't have one, like allocation bitmaps)
  • 2 - page header plus whole page hex dump
  • 3 - page header plus detailed per-row interpretation

The per-row interpretation works for all page types, including the allocation bitmaps. In our case, we asked for a detailed output. I'll explain the various parts of the output in a post about the anatomy of a page. Here's the output from DBCC PAGE, with a bunch of the repeated per-row info removed for brevity:

PAGE: (1:143)


BUFFER:


BUF @0x02C49720

bpage = 0x05400000                   bhash = 0x00000000                   bpageno = (1:143)
bdbid = 8                            breferences = 0                      bUse1 = 22163
bstat = 0xc0010b                     blog = 0x32159bb                     bnext = 0x00000000

PAGE HEADER:


Page @0x05400000

m_pageId = (1:143)                   m_headerVersion = 1                  m_type = 1
m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x8000
m_objId (AllocUnitId.idObj) = 68     m_indexId (AllocUnitId.idInd) = 256 
Metadata: AllocUnitId = 72057594042384384                                
Metadata: PartitionId = 72057594038386688                                 Metadata: IndexId = 1
Metadata: ObjectId = 2073058421      m_prevPage = (0:0)                   m_nextPage = (0:0)
pminlen = 8                          m_slotCnt = 8                        m_freeCnt = 744
m_freeData = 7432                    m_reservedCnt = 0                    m_lsn = (18:113:2)
m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0
m_tornBits = 0                      

Allocation Status

GAM (1:2) = ALLOCATED                SGAM (1:3) = ALLOCATED              
PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                         DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED           

Slot 0 Offset 0x60 Length 917

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

Memory Dump @0x5C24C060

00000000:   30000800 01000000 0300f802 00110095 †0...............        
00000010:   03616161 61616161 61616161 61616161 †.aaaaaaaaaaaaaaa        

<snip> I've removed this section to save space

00000380:   61616161 61616161 61616161 61616161 †aaaaaaaaaaaaaaaa        
00000390:   61616161 61††††††††††††††††††††††††††aaaaa                   
UNIQUIFIER = [NULL]                 

Slot 0 Column 1 Offset 0x4 Length 4

c1 = 1                              

Slot 0 Column 2 Offset 0x11 Length 900

c2 = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
a                                   

Slot 1 Offset 0x3f5 Length 917

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS

Memory Dump @0x5C24C3F5

<snip> And again...

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

As you can see from the output, each row is 917 bytes long and there's only 744 bytes free (look at the m_freecnt value in the PAGE HEADER section). This means that we can't insert another row on that page of the same length - there just isn't space - but that's what we're going to do! Remember that this page currently has nine rows on it. Let's force a page split:

BEGIN TRAN;

GO

INSERT INTO t1 VALUES (5, REPLICATE ('a', 900));

GO

Now we know there wasn't enough room so the page must have split. Let's check DBCC IND again to see if another page was allocated to the index - here's the output:

SEQA41.jpg

Two pages have been added - an index page and another data page. Before we added the extra row and caused the page split, the index only needed one page. Now that there are two data pages, there needs to be an index page to allow searches through the index b-tree. Let's take a look at the two data pages to see which rows are stored on which page. Doing DBCC PAGE on them shows that page (1:143) has 5 rows, with c1 values 1 through 5, and page (1:154) has the 4 rows with c1 values 6 through 9 (I'm not going to post all the DBCC PAGE output - that would make the post way too long and it gives you an incentive to try the commands out). This is what we'd expect, as the page split occurs at the insertion point, and the row being inserted is put onto the page that split.

Now let's rollback the user transaction and see what happens:

ROLLBACK TRAN;

GO

Running the DBCC PAGE commands again shows that the index structure remains the same as after the split. Page (1:154) has the 4 rows on it from the split and page (1:143) has the other rows but not the one we inserted in the explicit transaction.

So, proof that a page split is never rolled back. I'll be making much more use of these two DBCC commands in future posts and I'll do the page anatomy one later this week. Let me know if there's anything in particular you'd like to see described in this series.

Theme design by Nukeation based on Jelle Druyts