There was an interesting discussion on Twitter this morning (started by my good friend Erin Stellato (blog|twitter)) about in-place updates for index records where the key value changes and the record remains on the same page. Various sources including the SQL Server 2008 Internals book (pg 361 – I didn't write or review that bit :-) describe the process, but they describe it incorrectly unfortunately. They say that the record will remain in the exact same location on the same page and only the key storage bytes need to change.
This is not true. True in-place updates of index key values do not happen in 2005 onwards. Let's go down the rabbit hole…
Here's a simple example:
CREATE DATABASE KeyUpdateTest;
GO
USE KeyUpdateTest;
GO
ALTER DATABASE KeyUpdateTest SET RECOVERY SIMPLE;
ALTER DATABASE KeyUpdateTest SET AUTO_CREATE_STATISTICS OFF;
ALTER DATABASE KeyUpdateTest SET AUTO_UPDATE_STATISTICS OFF;
GO
CREATE TABLE test (c1 INT, c2 VARCHAR (2000));
GO
CREATE CLUSTERED INDEX test_cl ON test (c1);
GO
CHECKPOINT;
GO
INSERT INTO test VALUES (1, REPLICATE ('Paul', 500));
GO
CHECKPOINT;
GO
Now we look at the page itself (and you can get the stored-proc I use below from my old blog post Inside The Storage Engine: sp_AllocationMetadata – putting undocumented system catalog views to work):
EXEC sp_allocationMetadata 'test';
GOObject Name Index ID Alloc Unit ID Alloc Unit Type First Page Root Page First IAM Page
———– ——– —————– ————— ———- ——— ————–
test 1 72057594039762944 IN_ROW_DATA (1:144) (1:144) (1:145)DBCC TRACEON (3604);
GO
DBCC PAGE (keyupdatetest, 1, 144, 2);
GOMemory Dump @0x0000000016A6A000
0000000016A6A000: 01010400 00c20001 00000000 00000800 †…..Â……….
0000000016A6A010: 00000000 00000100 1c000000 bd174108 †…………½.A.
0000000016A6A020: 90000000 01000000 1d000000 8a000000 †……………
0000000016A6A030: 12000000 00000000 00000000 239dbd4a †…………#.½J
0000000016A6A040: 00000000 00000000 00000000 00000000 †…………….
0000000016A6A050: 00000000 00000000 00000000 00000000 †…………….
0000000016A6A060: 30000800 01000000 03000002 001100e1 †0…………..á
0000000016A6A070: 07506175 6c506175 6c506175 6c506175 †.PaulPaulPaulPau
0000000016A6A080: 6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A090: 6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A0A0: 6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A0B0: 6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
<snip>
0000000016A6A820: 6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A830: 6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000016A6A840: 6c000021 21212121 21212121 21212121 †l..!!!!!!!!!!!!!
0000000016A6A850: 21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
<snip>
0000000016A6BFE0: 21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
0000000016A6BFF0: 21212121 21212121 21212121 21216000 †!!!!!!!!!!!!!!`.OFFSET TABLE:
Row – Offset
0 (0x0) – 96 (0x60)
Now I'll update the record:
UPDATE test SET c1 = 2 WHERE c1 =1;
GO
And let's look at the log records generated (in the output below I've removed all the extraneous stuff from the log that occurs before my transaction):
SELECT [Current LSN], [Operation], [Context],
[Log Record Length], [Page ID], [Slot ID] FROM fn_dblog (NULL, NULL);
GO
Current LSN Operation Context Log Record Length Page ID Slot ID
———————– ——————- ——————— —————– ————– ———–
0000001d:00000096:0011 LOP_BEGIN_XACT LCX_NULL 112 NULL NULL
0000001d:00000096:0012 LOP_DELETE_ROWS LCX_MARK_AS_GHOST 2116 0001:00000090 0
0000001d:00000096:0013 LOP_MODIFY_HEADER LCX_PFS 76 0001:00000001 0
0000001d:00000096:0014 LOP_SET_BITS LCX_PFS 56 0001:00000001 0
0000001d:00000096:0015 LOP_INSERT_ROWS LCX_CLUSTERED 2120 0001:00000090 1
0000001d:00000096:0016 LOP_COMMIT_XACT LCX_NULL 52 NULL NULL
0000001d:000000a2:0001 LOP_EXPUNGE_ROWS LCX_CLUSTERED 64 0001:00000090 0
0000001d:000000a2:0002 LOP_SET_BITS LCX_PFS 56 0001:00000001 0
0000001d:000000a2:0003 LOP_MODIFY_HEADER LCX_PFS 76 0001:00000001 0
The row is deleted and then inserted. Look at the log record lengths! And look at the slot numbers. The new row is in slot 1 (the LOP_INSERT_ROWS record) and slot 0 is ghosted (the LOP_DELETE_ROWS record), and then we see ghost cleanup running (the LOP_EXPUNGE_ROWS record), which removes the old slot 0, and the slot array shifts down so the new record is slot 0 again. See my blog post Inside the Storage Engine: Ghost cleanup in depth for info on ghost cleanup internals.
But is it in the same physical place on the page? No – a whole new record was created.
DBCC PAGE (keyupdatetest, 1, 144, 2);
GOMemory Dump @0x0000000013CAC000
0000000013CAC000: 01010400 00800001 00000000 00000800 †…………….
0000000013CAC010: 00000000 00000100 1c000000 bd172210 †…………½.".
0000000013CAC020: 90000000 01000000 1d000000 a2000000 †…………¢…
0000000013CAC030: 01000000 76020000 00000000 239dbd4a †….v…….#.½J
0000000013CAC040: 01000000 00000000 00000000 00000000 †…………….
0000000013CAC050: 00000000 00000000 00000000 00000000 †…………….
0000000013CAC060: 3c000800 01000000 03000002 001100e1 †<…………..á
0000000013CAC070: 07506175 6c506175 6c506175 6c506175 †.PaulPaulPaulPau
0000000013CAC080: 6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000013CAC090: 6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
<snip>
0000000013CAC820: 6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000013CAC830: 6c506175 6c506175 6c506175 6c506175 †lPaulPaulPaulPau
0000000013CAC840: 6c300008 00020000 00030000 02001100 †l0…………..
0000000013CAC850: e1075061 756c5061 756c5061 756c5061 †á.PaulPaulPaulPa
0000000013CAC860: 756c5061 756c5061 756c5061 756c5061 †ulPaulPaulPaulPa
<snip>
0000000013CADFE0: 21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
0000000013CADFF0: 21212121 21212121 21212121 41084108 †!!!!!!!!!!!!A.A.OFFSET TABLE:
Row – Offset
0 (0x0) – 2113 (0x841)
Notice that where the original record ended (offset 0x830 into the page). In the first page dump, everything after that is empty space. In the second page dump, the new record starts there. And look at the slot array (called the offset table by DBCC PAGE). In the first page dump the original record started at offset 0x60, and in the second dump it starts at offset 0x841. Even though ghost cleanup ran, the record did NOT migrate to it's original location.
It's very clear that an in-place update did not happen, and that the entire record is recreated in it's entirety.
In this case, the record ended up being physical slot 0 again, but that's a pathological case because there's only one slot on the page. If you repeat the experiment with two records (e.g. c1=1 and c1=2) and then update c1 of the first record to 3, you'll see the new record being physical slot 2, then after ghost cleanup becoming physical slot 1. But again, it's deleted and reinserted in it's entirety.
Now I'll show you a case where the row has to move to a new page – definitely not an in-place update!
CREATE DATABASE KeyUpdateTest;
GO
USE KeyUpdateTest;
GO
ALTER DATABASE KeyUpdateTest SET RECOVERY SIMPLE;
ALTER DATABASE KeyUpdateTest SET AUTO_CREATE_STATISTICS OFF;
ALTER DATABASE KeyUpdateTest SET AUTO_UPDATE_STATISTICS OFF;
GO
CREATE TABLE test (c1 INT, c2 VARCHAR (4000));
GO
CREATE CLUSTERED INDEX test_cl ON test (c1);
GO
CHECKPOINT;
GO
INSERT INTO test VALUES (1, REPLICATE ('Paul', 1000));
GO
INSERT INTO test VALUES (2, REPLICATE ('Erin', 1000));
GO
CHECKPOINT;
GO
I've filled up the page with two rows.
DBCC PAGE (keyupdatetest, 1, 144, 2);
GO
<snip>
Metadata: ObjectId = 2105058535 m_prevPage = (0:0) m_nextPage = (0:0)
pminlen = 8 m_slotCnt = 2 m_freeCnt = 58
m_freeData = 8130 m_reservedCnt = 0 m_lsn = (29:150:2)
<snip>
0000000016A6BFA0: 696e4572 696e4572 696e4572 696e4572 †inErinErinErinEr
0000000016A6BFB0: 696e4572 696e4572 696e4572 696e4572 †inErinErinErinEr
0000000016A6BFC0: 696e0000 21212121 21212121 21212121 †in..!!!!!!!!!!!!
0000000016A6BFD0: 21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
0000000016A6BFE0: 21212121 21212121 21212121 21212121 †!!!!!!!!!!!!!!!!
0000000016A6BFF0: 21212121 21212121 21212121 11106000 †!!!!!!!!!!!!..`.OFFSET TABLE:
Row – Offset
1 (0x1) – 4113 (0x1011)
0 (0x0) – 96 (0x60)
Now I'll update the first row to have c1 = 3 so my row moves away from Erin's :-)
UPDATE test SET c1 = 3 WHERE c1 =1;
GO
And look at the log:
SELECT [Current LSN], [Operation], [Context],
[Log Record Length], [Page ID], [Slot ID] FROM fn_dblog (NULL, NULL);
GOCurrent LSN Operation Context Log Record Length Page ID Slot ID
———————– ——————- ———————- —————– ————– ———–
0000001d:000000a3:0015 LOP_BEGIN_XACT LCX_NULL 136 NULL NULL
0000001d:000000a3:0016 LOP_MODIFY_ROW LCX_PFS 80 0001:00000001 0
0000001d:000000a3:0017 LOP_MODIFY_ROW LCX_IAM 88 0001:00000091 0
0000001d:000000a3:0018 LOP_HOBT_DELTA LCX_NULL 64 NULL NULL
0000001d:000000a3:0019 LOP_FORMAT_PAGE LCX_INDEX_INTERIOR 84 0001:00000092 -1
0000001d:000000a3:001a LOP_INSERT_ROWS LCX_INDEX_INTERIOR 84 0001:00000092 0
0000001d:000000a3:001b LOP_ROOT_CHANGE LCX_CLUSTERED 96 0001:00000082 52
0000001d:000000a3:001c LOP_COMMIT_XACT LCX_NULL 52 NULL NULL
0000001d:000000a3:001d LOP_BEGIN_XACT LCX_NULL 120 NULL NULL
0000001d:000000a3:001e LOP_MODIFY_ROW LCX_PFS 80 0001:00000001 0
0000001d:000000a3:001f LOP_MODIFY_ROW LCX_IAM 88 0001:00000091 0
0000001d:000000a3:0020 LOP_HOBT_DELTA LCX_NULL 64 NULL NULL
0000001d:000000a3:0021 LOP_FORMAT_PAGE LCX_HEAP 84 0001:00000093 -1
0000001d:000000a3:0022 LOP_INSERT_ROWS LCX_CLUSTERED 4092 0001:00000093 0
0000001d:000000a3:0023 LOP_DELETE_SPLIT LCX_CLUSTERED 60 0001:00000090 1
0000001d:000000a3:0024 LOP_MODIFY_HEADER LCX_HEAP 84 0001:00000090 0
0000001d:000000a3:0025 LOP_INSERT_ROWS LCX_INDEX_INTERIOR 88 0001:00000092 1
0000001d:000000a3:0026 LOP_COMMIT_XACT LCX_NULL 52 NULL NULL
0000001d:000000a3:0027 LOP_INSERT_ROWS LCX_CLUSTERED 4120 0001:00000093 1
0000001d:000000a3:0028 LOP_COMMIT_XACT LCX_NULL 52 NULL NULL
This is really interesting. Here's what happens (missing a few things for clarity):
-
0000001d:000000a3:0016: page (1:146) is allocated to be the new root page for the clustered index
-
0000001d:000000a3:0019: it gets minimally formatted (just the page header – although the page header is 96 bytes, there is 12 bytes of empty space at the end)
-
0000001d:000000a3:001a: the index record pointing at (1:144) is inserted in the root page
-
0000001d:000000a3:001b: the index root os changedd from (1:144) to (1:146) in metadata
-
0000001d:000000a3:001e: page (1:147) is allocated to be the second data page at the leaf of the clustered index
-
0000001d:000000a3:0022: a page split of page (1:144) occurs, moving the Erin row to slot 0 of page (1:147)
-
0000001d:000000a3:0023: the Erin row is removed from page (1:144) – not ghosted
-
0000001d:000000a3:0025: the index record pointing at (1:147) is inserted in the root page
-
0000001d:000000a3:0027: the updated Paul row is inserted as slot 1 of page (1:147)
As you can see, this is as far as you can get from doing an in-place update. Page (1:144) is left with a single ghost-record on (the Paul record – the Erin record isn't ghosted because it was moved because of a split). The next access to the page will cause ghost cleanup to remove it and deallocate the page.
So no, in-place updates of key values do not happen, even though they appear to at first glance. You can experiment to show the same thing happening in nonclustered indexes too.
Another misconception busted!
PS Some of you may remember when such in-place updates really *did* happen. That was before 2005 when the Storage Engine (specifically my old Access Methods team) did the index maintenance (changing clustered and nonclustered index structures based on key value changes). A whole bunch of features in 2005 meant that we switched it over to the Query Processor to drive the index maintenance, and nifty short-cuts like we used to do aren't possible any more.