Over the years I was in the Storage Engine team I saw a lot of concern on the various forums about the ghost cleanup task. There have been a few bugs with it in previous versions (see these KB articles – 932115 and 815594) and there's very little info available on it. For some reason I didn't get around to posting about it on my old blog but today I want to go into some depth on it.
So what is ghost cleanup? It's a background process that cleans up ghost records – usually referred to as the ghost cleanup task. What's a ghost record? As I described briefly in the Anatomy of a record post last week, a ghost record is one that's just been deleted in an index on a table (well, actually it gets more complicated if snapshot isolation of some form is enabled but for now, a record in an index is a good start). Such a delete operation never physically removes records from pages – it only marks them as having been deleted, or ghosted. This is a performance optimization that allows delete operations to complete more quickly. It also allows delete operations to rollback more quickly because all that needs to happen is to unmark the records as being deleted/ghosted, instead of having to reinsert the deleted records. The deleted record will be physically removed (well, its slot will be removed – the record data isn't actually overwritten) later by the background ghost cleanup task. The ghost cleanup task will leave a single record on the page to avoid having to deallocate empty data or index pages.
The ghost cleanup task can't physically delete the ghost records until after the delete transaction commits because the deleted records are locked and the locks aren't released until the transaction commits. As an aside, when ghost records exist on a page, even a NOLOCK or READ UNCOMMITTED scan won't return them because they are marked as ghost records.
When a record is deleted, apart from it being marked as a ghost record, the page that the record is on is also marked as having ghost records in one of the allocation maps – the PFS page (post coming soon!) – and in its page header. Marking a page as having ghost records in a PFS page also changes the database state to indicate that there are some ghost records to cleanup – somewhere. Nothing tells the ghost cleanup task to clean the specific page that the delete happened on – yet. That only happens when the next scan operation reads the page and notices that the page has ghost records.
The ghost cleanup task doesn't just start up when it's told to – it starts up in the background every 5 seconds and looks for ghost records to cleanup. Remember that it won't be told to go cleanup a specific page by a delete operation – it's a subsequent scan that does it, if a scan happens. When the ghost cleanup task starts up it checks to see if its been told to cleanup a page – if so it goes and does it. If not, it picks the next database that is marked as having some ghost records and looks through the PFS allocation map pages to see if there are any ghost records to cleanup. It will check through or cleanup a limited number of pages each time it wakes up – I remember the limit is 10 pages – to ensure it doesn't swamp the system. So – the ghost records will eventually be removed – either by the ghost cleanup task processing a database for ghost records or by it specifically being told to remove them from a page. If it processes a database and doesn't find any ghost records, it marks the database as not having any ghost records so it will be skipped next time.
How can you tell its running? On SQL Server 2005, you can use the following code to see the ghost cleanup task in sys.dm_exec_requests:
SELECT * INTO myexecrequests FROM sys.dm_exec_requests WHERE 1 = 0;
GO
SET NOCOUNT ON;
GO
DECLARE @a INT
SELECT @a = 0;
WHILE (@a < 1)
BEGIN
INSERT INTO myexecrequests SELECT * FROM sys.dm_exec_requests WHERE command LIKE '%ghost%'
SELECT @a = COUNT (*) FROM myexecrequests
END;
GO
SELECT * FROM myexecrequests;
GO
And on SQL Server 2000 you need to use sysprocesses (well, on SQL Server 2005 this works as well but its fake view derived from the DMVs):
SELECT * INTO mysysprocesses FROM master.dbo.sysprocesses WHERE 1 = 0;
GO
SET NOCOUNT ON;
GO
DECLARE @a INT
SELECT @a = 0;
WHILE (@a < 1)
BEGIN
INSERT INTO mysysprocesses SELECT * FROM master.dbo.sysprocesses WHERE cmd LIKE '%ghost%'
SELECT @a = COUNT (*) FROM mysysprocesses
END;
GO
SELECT * FROM mysysprocesses;
GO
The output from sys.dm_exec_requests is (with most unused and uninteresting columns stripped off):
session_id request_id start_time status command
———- ———– ———————– ———— —————-
15 0 2007-10-05 16:34:49.653 background GHOST CLEANUP
So how can you tell if a record is ghosted? Let's engineer some and look at it with DBCC PAGE – I've stripped out the uninteresting bits of the output and highlighted the interesting ghost parts:
CREATE TABLE t1 (c1 CHAR(10))
CREATE CLUSTERED INDEX t1c1 on t1 (c1)
GO
BEGIN TRAN
INSERT INTO t1 VALUES ('PAUL')
INSERT INTO t1 VALUES ('KIMBERLY')
DELETE FROM t1 WHERE c1='KIMBERLY';
GO
DBCC IND ('ghostrecordtest', 't1', 1);
GO
DBCC TRACEON (3604);
GO
DBCC PAGE ('ghostrecordtest', 1, 143, 3);
GO
<snip>
m_freeData = 130 m_reservedCnt = 0 m_lsn = (20:88:20)
m_xactReserved = 0 m_xdesId = (0:518) m_ghostRecCnt = 1
m_tornBits = 0<snip>
Slot 0 Offset 0x71 Length 17
Record Type = GHOST_DATA_RECORD Record Attributes = NULL_BITMAP
Memory Dump @0x6256C07100000000: 1c000e00 4b494d42 45524c59 20200200 †….KIMBERLY ..
00000010: fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]Slot 0 Column 1 Offset 0x4 Length 10
c1 = KIMBERLY
Slot 1 Offset 0x60 Length 17
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP
Memory Dump @0x6256C06000000000: 10000e00 5041554c 20202020 20200200 †….PAUL ..
00000010: fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]Slot 1 Column 1 Offset 0x4 Length 10
c1 = PAUL
Let's see what goes on the transaction log during this process (remember this is undocumented and unsupported – do it on a test database) – I've stripped off a bunch of the columns in the output:
DECLARE @a CHAR (20)
SELECT @a = [Transaction ID] FROM fn_dblog (null, null) WHERE [Transaction Name]='PaulsTran'
SELECT * FROM fn_dblog (null, null) WHERE [Transaction ID] = @a;
GO
Current LSN Operation Context Transaction ID
———————— —————– ——————- ————–
00000014:00000054:0011 LOP_BEGIN_XACT LCX_NULL 0000:00000206
00000014:0000005a:0012 LOP_INSERT_ROWS LCX_CLUSTERED 0000:00000206
00000014:0000005a:0013 LOP_INSERT_ROWS LCX_CLUSTERED 0000:00000206
00000014:0000005a:0014 LOP_DELETE_ROWS LCX_MARK_AS_GHOST 0000:00000206
00000014:0000005a:0016 LOP_DELETE_ROWS LCX_MARK_AS_GHOST 0000:00000206
So there are the two inserts followed by the two deletes – with the rows being marked as ghost records. But where's the update to the PFS page? Well, changing the ghost bit in a PFS page is not done as part of a transaction. We'll need to look for it another way (apart from just dumping everything in the transaction log and searching manually):
SELECT Description, * FROM fn_dblog (null, null) WHERE Context like '%PFS%' AND AllocUnitName like '%t1%';
GO
Description Current LSN Operation Context Transaction ID
————————- ———————— —————- ——— —————-
Allocated 0001:0000008f 00000014:00000054:0014 LOP_MODIFY_ROW LCX_PFS 0000:00000208
00000014:0000005a:0015 LOP_SET_BITS LCX_PFS 0000:00000000The first one is just allocating a page but the second one is the one we're looking for – it's changed the bit for the page to say it has ghost records on. Let's commit the transaction and see what happens, filtering out all the previous transaction log:
SELECT MAX ([Current LSN]) FROM fn_dblog (null, null);
GO
— 00000014:0000005e:0001
COMMIT TRAN
GO
SELECT [Page ID], * FROM fn_dblog (null, null) WHERE [Current LSN] > '00000014:0000005e:0001';
GO
Page ID Current LSN Operation Context Transaction ID
————— ———————— —————— ————— ————–
NULL 00000014:0000005f:0001 LOP_COMMIT_XACT LCX_NULL 0000:00000206
0001:0000008f 00000014:00000060:0001 LOP_EXPUNGE_ROWS LCX_CLUSTERED 0000:00000000We see that almost as soon as the transaction has commited, the ghost cleanup task goes in and process the page. Let's check a page dump to make sure the record is gone, and show that the contents of the record are still on the page (again, with non-relevant bits snipped out):
DBCC PAGE ('ghostrecordtest', 1, 143, 3);
GO
<snip>
m_freeData = 130 m_reservedCnt = 0 m_lsn = (20:94:1)
m_xactReserved = 0 m_xdesId = (0:518) m_ghostRecCnt = 0
m_tornBits = 0<snip>
Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP
Memory Dump @0x6212C06000000000: 10000e00 5041554c 20202020 20200200 †….PAUL ..
00000010: fc†††††††††††††††††††††††††††††††††††.
UNIQUIFIER = [NULL]Slot 0 Column 1 Offset 0x4 Length 10
c1 = PAUL
DBCC PAGE ('ghostrecordtest', 1, 143, 2);
GO
<snip>
6212C040: 01000000 00000000 00000000 00000000 †…………….
6212C050: 00000000 00000000 00000000 00000000 †…………….
6212C060: 10000e00 5041554c 20202020 20200200 †….PAUL ..
6212C070: fc1c000e 004b494d 4245524c 59202002 †…..KIMBERLY .
6212C080: 00fc0000 00000000 00000000 01000000 †…………….
6212C090: 00000000 13000000 01000000 00000000 †…………….<snip>
So even though the record no longer exists, all that happened was that the slot was removed from the slot array at the end of the page – the record contents will remain on the page until the space is reused.
In the next post I'll go into details of the PFS and other allocation maps. Btw – please let me know if this stuff is interesting – I'd like to know where to spend blogging time. Thanks!
48 thoughts on “Inside the Storage Engine: Ghost cleanup in depth”
Yes, this stuff is interesting. Please keep posting!!
Absolutely Awesome explanation. Thank you very much.
Hi all,
The above article extremely excellent, it clear all doubt in ghost cleanup process without doing any question. This article focus on different angle to solve and understand different question raise against it
Thank a lot
Vaibhav Bapat
"The ghost cleanup task will leave a single record on the page to avoid having to deallocate empty data or index pages." – this is not true in SQL Server 2005 or later version. Unless this is the first page maintained in system table, a page could be de-allocated by ghost cleanup.
Yes, I think you’re right. That was certainly the case in 2000. But why post the comment with the obvious fake name?
Very well articulated!!! (I know you must get that a lot )
But thoroughly enjoyed reading this
Paul ,may I know why heap doesn’t support ghost records ?
It doesn’t need them as there’s no way that a duplicate record being inserted before the deleting transaction rolls back could cause the rollback to fail a uniqueness test, or cause a page to split.
thank you Paul , and sorry for my late reply.
But I cannot understand for your explanation, can you kind of helping to explain it more detail or give a example?
We are in a situation where we have very busy OLTP system which is doing lot of deletes and then inserts,
the ghost cleanup task cannot keep up, its always lagging behind and we are facing lot of performance problem.
Is there a way to solve performance problem. We dont want to rebuid or reindex table.
ALso if Ghost cleanup task is executed every 10 seconds.
What is the possiblity that previous Ghost clean up task is still running and new one is started, overlapping each other.
Any help is appreciated.
Thanks.
No possibility that the other one is still running as it only runs on one thread.
No way to solve your perf issue except to manually force ghost record removal by doing something like DBCC FORCEGHOSTCLEANUP or changing your schema so the new inserts are not going into the same place in the index as the deleted records that haven’t been ‘removed’ yet by ghost cleanup.
Beautifully and elaborately explained.
Thank you
Mithalesh
Hi Paul,
First of all thanks for this wonderful that helps us to server of job better.
I have a questions which not exactly on this topic but very much related to this. If i reduce table rows from 1 million to assume 200 for a table A any new record into this table will reuse the space and how other table will it reuse the space released from table A?
When the ghosted records are removed, if a page becomes empty it is deallocated from the table. Any deallocated pages in extents allocated to the table can then be reused by that table. If the deallocated table is the only one allocated in an extent, the whole extent is deallocated and can then be allocated to a different table to be used.
Hi Paul is LOP_EXPUNGE_ROWS the ghost clean-up taking place ( after the LOP_COMMIT_XACT ) ?
Yes
ok thanks
Thank you Paul
Hi Paul
Great article as always. I have a question. We’re seeing an issue with Ghost Cleanup on our 2014 Enterprise server. It seems to be on one table where a lot of deletes are occurring. using fn_dblog I can see millions of entries for ghost cleanup against the two indexes on this table and the transaction log keeps growing to hundreds of gigabytes in a short period. The main issue seems to be on the clustered index (where were seeing most of the ghost deletes) which is a wide clustered index with 9 columns and data types of bigint, bigint, nvarchar(3), datetime, nvarchar(20), dec(32,16), dec(32,16),nvarchar(20), datetime
There is a separate non clustered primary index against a bigint field.
I’ve been working with SQL for 20 years now and this is the first time I’ve come across this issue. Could this simply be down to the clustered index being so wide?
This is a third party table written by an outside company used for data extracts from our AX system.
Cheers
Dave
Is it the deletes themselves or the ghost cleanup that’s causing the log growth? If the latter, it could be that the ghost cleanup is being triggered in one big batch (various reasons for this) in which case you might want to disable it with trace flag 661 (https://support.microsoft.com/en-us/help/920093/tuning-options-for-sql-server-when-running-in-high-performance-workloa) and rely on regular index maintenance to reclaim the space from the deleted records. If the former, the deletes could be done in small batches to allow the log to not grow so much. Also, make sure that nothing is preventing the log from being cleared and that log backups are being done at a frequency that prevents the log growing so large.
Hi Paul:
Thank you very much for your article.i am bothered by this problem.
In 10 minutes the sqlserver log flush becomed higher, and write/sec was close to 30000 and write bytes/sec was 100M+/sec ,the pcie disk io is full.
I’m sure that the server query was normal and there was no abnormal data file growthf at that moment,but the log file was doubled than usual.so i used dump_log() functions to do statistics on logs,i found lots of ‘ShrinkD’ 、’ghostcleanuptask’、 ‘BTree Split/Shrink’ in that moment.
the results:
select distinct [Transaction Name] ,count(*) from [t1].[dbo].split0426 where [Begin Time]>’2018/04/26 09:36:00′ and [Begin Time]<'2018/04/26 09:55:00' group by [Transaction Name] order by count(*) desc
ShrinkD 957691
INSERT 161702
BTree Split/Shrink 110077
UPDATE 107921
SplitPage 18663
implicit_transaction 16592
GhostCleanupTask 14072
user_transaction 245
MERGE 219
AllocHeapPageSysXactDML 143
DELETE 43
AllocPages 22
select convert(varchar(16),[Begin Time]) as min, count(*) as counts from [t1].[dbo].split0426 where [Transaction Name]='GhostCleanupTask'
group by convert(varchar(16),[Begin Time]) order by convert(varchar(16),[Begin Time])
2018/04/26 09:31 4
2018/04/26 09:32 4
2018/04/26 09:33 4
2018/04/26 09:34 4
2018/04/26 09:35 4
2018/04/26 09:36 946
2018/04/26 09:37 525
2018/04/26 09:38 576
2018/04/26 09:39 534
2018/04/26 09:40 654
2018/04/26 09:41 717
2018/04/26 09:42 1028
2018/04/26 09:43 539
2018/04/26 09:44 1317
2018/04/26 09:45 1229
2018/04/26 09:46 1088
2018/04/26 09:47 1837
2018/04/26 09:48 1212
2018/04/26 09:49 329
2018/04/26 09:50 454
2018/04/26 09:51 592
2018/04/26 09:52 480
2018/04/26 09:53 7
2018/04/26 09:54 8
2018/04/26 09:55 7
select convert(varchar(16),[Begin Time]) as min, count(*) as counts from [t1].[dbo].split0426 where [Transaction Name]='ShrinkD'
group by convert(varchar(16),[Begin Time]) order by convert(varchar(16),[Begin Time])
2018/04/26 09:36 75247
2018/04/26 09:37 22963
2018/04/26 09:38 20069
2018/04/26 09:39 17273
2018/04/26 09:40 37594
2018/04/26 09:41 29922
2018/04/26 09:42 33271
2018/04/26 09:43 17801
2018/04/26 09:44 34105
2018/04/26 09:45 13176
2018/04/26 09:46 173689
2018/04/26 09:47 113353
2018/04/26 09:48 241987
2018/04/26 09:49 5350
2018/04/26 09:50 5058
2018/04/26 09:51 55132
2018/04/26 09:52 61694
2018/04/26 09:53 3
2018/04/26 09:54 4
2018/04/26 09:55 3
2018/04/26 09:56 5
some configurations:
windows server2008 sp2
sqlserver 2014
database target_recovery_time = 60 seconds
database auto shrink and index auto shrink false
backup log per hour
physical disk:pcie ssd 3.0T used 70%;
other operations:
Recently clean up the useless index, organize index fragmentation,
increase the automatic archiving data operation, about 500w daily backup and delete;
my question:
Is GhostCleanupTask the cause of the problem?
what's the shrinkd in logs? shrink database?
how was it triggered?
why did the database engine use so high io without any protection?
thanks very much,all the best.
Looks like a) you’ve got auto-shrink enabled for the database, or there’s an Agent job doing shrink b) you’ve had a lot of deletes, and the shrink ‘touched’ a lot of the pages with deleted records on so they got queued up for the ghost cleanup task to take care of. You need to fix a).
Hello, Paul.
I checked the database. The database and index AUTO_SHRINK set OFF, and there was no shrinking job at that time.
Why would there be so many ghostcleanup tasks suddenly, this is what I do not understand. I will continue to observe.
Thank you very much.
Well someone or something started a shrink, as shrinkd transactions only come from a manual or automatic shrink.
Thank you for your article and reply, which made us understand a lot of knowledge that the official website did not have. It’s a great thing. I’ll check it again. Thank you again.
Hello, Paul.
I seem to have found the answer.When it happened again,i set my server dbcc traceon(661,-1),the server resumed immediately.
And why would there be so many ghostcleanup tasks suddenly without any shrink, it seems:if the number of hot ghost pages is above the specified (hard-coded) limit,the task runs non-stop till the count comes down below the threshold value.https://zh.scribd.com/document/323838917/Ghost-Records-in-SQL-Server
Now i resumed ghostcleanup tast auto run,and i will add a “dbcc forceghostcleanup” job after the deleted job,and keep looking at it
Thanks very much.
Very informative blog, thank you.
Quick question , does every deleted page must go through “Ghost Cleanup” process before it’s re-allocated?
The following have to go through the ghost cleanup process when they’re deleted: all index records, all data records in a clustered index, all data records in a heap when snapshot isolation is enabled.
What happens when the ghost clean-up can’t complete (for example if the records are locked on a readable secondary and are being queried). Does it exit immediately and try again after 5s when it gets spun up again or does it sit there waiting for the locks to be released?
That scenario isn’t possible. Read activity on a readable-secondary doesn’t cause any locking/blocking on the primary (or have any effect at all on the primary). But if the ghost cleanup task can’t get the page lock it needs, it’ll skip cleaning that page.
Tong – something is reading the pages that have had a large number of deletes on them, and that is triggering ghost cleanup. There isn’t a threshold, there’s a queue of pages with known ghost records, that is added to when a page with ghost records is encountered. If there are a lot of such pages, the ghost cleanup task will clean up everything on it’s queue next time it wakes up, rather than just searching for ghost records to clean up.
If so,when the ghost cleanup task clean up everything on it’s queue,it seems that there is not the limit of 10 pages?…
Correct
Ok,but why not…If there is a limit,the system will be better…
Because that’s just the way it was written.
Ok,i see.Really appreciate all your help:)
Paul,
I’ve got a table with one row that is updated frequently by a vendor application, table is called QRTZ_FIRED_TRIGGERS and used by Java Quartz module as part of a vendor supported application. I am seeing a high version_ghost_record_count. If I don’t rebuild the index on this table often, our transactions per second and CPU eventually climb and the SQL Instance performance becomes poor. Read committed snapshot is set to on. Have you written about ghost records and snapshot isolation since this article? I see the ghost record cleanup in this database and have also tried to force a full scan, but have no luck getting this table to play nice.
Thanks,
Sam
Hi Sam,
Several options here:
1) disable snapshot isolation
2) rebuild frequently
3) identify the long-running versioning query that’s preventing version store cleanup and ghost removal
My guess is you’re going to be stuck with #2.
Thanks
Paul,
Thanks for your advice. I’m unable to identify any query running. My tempdb version store is also massive (at 155GB). Need to figure out how to get that issue under control as well. Hoping an app restart will let the cleanup happen.
One thing I don’t understand is why having those version ghost records causes transactions (tps perfmon counter) to ramp up over time. Is the engine having to scan all pages and check to see if they are marked as deleted? If I stop rebuilds, the TPS gets up to 30-40k/s and we start to see system-wide impact.
Thanks Again,
Sam
Hi Sam – I’m afraid it’s impossible to say without digging in deeper to see what’s going on on the system.
I’ve also encountered some version_ghost_record_count issues before, but they were all caused by a very long-running transactions (for hours and even days sometimes), and everything would come in order as soon as those transactions were gone. But today I faced something new and I’m at a loss: sys.dm_tran_active_snapshot_database_transactions is empty, and there are no long-living (more than a minute or so) transactions active on the server at all, but i’ve got millions of version ghost records in a frequently rewritten table (a single user process routinely inserts about 1000 short rows, processes them, and then deletes them all) , and I could deduce that such beahior lasts for hours already. What else, besides long-running queries could be the reason for this?
It could be that nothing else has read those pages (so they’re not entered into the to-do list) and the ghost cleanup task is working through another database or portion of this one.
Does a checkpoint need to occur before ghost cleanup runs? Specifically, If I delete rows for a clustered index, then run DBCC CLEANPAGE against pages in the index, is it *required* to run a checkpoint before the DBCC CLEANPAGE in order for it to be effective?
No – checkpoints have nothing to do with the ghost cleanup task’s ability to remove deleted records.
Hello Paul,
nice explanation, I really appreciate it, thank you very much.
Is it possible ghost cleanup process would be triggered by change tracking auto cleanup ?
At this moment, we have an issue that seems corresponding to all issues described above.
Thanks a lot.
Bertrand.
Only on the change tracking tables themselves.
Does an update that changes indexed columns cause ghost records?
Yes, because an update to an indexed column means a delete+insert instead of an update.