In Monday’s Insider newsletter I discussed an email question I’d been sent about how to identify the transactions that had rolled back because of a crash, and I said I’d blog some code to do it.
First of all you need to know the time of the crash. We can’t get this exactly (from SQL Server) unless SQL Server decides to shut itself down for some reason (like tempdb corruption) but we can easily get the time that SQL Server restarted, which is good enough, as we just need to know a time that’s after the transactions started before the crash, and before those transactions finished rolling back after a crash. We can get the startup time from the sqlserver_start_time column in the output from sys.dm_os_sys_info.
Then we can search in the transaction log, using the fn_dblog function, for LOP_BEGIN_XACT log records from before the crash point that have a matching LOP_ABORT_XACT log record after the crash point, and with the same transaction ID. This is easy because for LOP_BEGIN_XACT log records, there’s a Begin Time column, and for LOP_ABORT_XACT log records (and, incidentally, for LOP_COMMIT_XACT log records), there’s an End Time column in the TVF output.
And there’s a trick you need to use: to get the fn_dblog function to read log records from before the log clears (by the checkpoints that crash recovery does, in the simple recovery model, or by log backups, in other recovery models), you need to enable trace flag 2537. Now, if do all this too long after crash recovery runs, the log may have overwritten itself and so you won’t be able to get the info you need, but if you’re taking log backups, you could restore a copy of the database to the point just after crash recovery has finished, and then do the investigation.
After that, the tricky part is matching what those transactions were doing back to business operations that your applications were performing. If you don’t name your transactions, that’s going to be pretty hard, as all you’ve got are the generic names that SQL Server gives transactions (like INSERT, DELETE, DROPOBJ). Whatever the reason you might want this information, your applications should be written so they gracefully handle transaction failures and leave the database in a consistent state (as far as your business rules are concerned – of course SQL Server leaves the database in a transactionally-consistent state after a crash).
I’ve written some code and encapsulated it in a proc, sp_SQLskillsAbortedTransactions, which is shown in full at the end of the post. To use it, you go into the context of the database you’re interested in, and just run the proc. It takes care of enabling and disabling the trace flag.
Here’s an example of a crash situation and using the proc.
First I’ll create a table and start a transaction:
USE [master]; GO IF DATABASEPROPERTYEX (N'Company', N'Version') > 0 BEGIN ALTER DATABASE [Company] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; DROP DATABASE [Company]; END GO CREATE DATABASE [Company]; GO USE [Company]; GO CREATE TABLE [test] ([c1] INT, [c2] INT, [c3] INT); GO INSERT INTO [test] VALUES (0, 0, 0); GO BEGIN TRAN FirstTransaction; INSERT INTO [Test] VALUES (1, 1, 1); GO
Now in a second window, I’ll start another transaction, and force the log to flush to disk (as I haven’t generated enough log to have the current log block automatically flush to disk):
USE [Company]; GO BEGIN TRAN SecondTransaction; INSERT INTO [Test] VALUES (2, 2, 2); GO EXEC sp_flush_log; GO
And in a third window, I’ll force a crash:
SHUTDOWN WITH NOWAIT; GO
After restarting the instance, I can use this code to run my proc:
USE [Company]; GO EXEC sp_SQLskillsAbortedTransactions; GO
Begin Time Transaction Name Started By Transaction ID ------------------------ ------------------ ---------------- -------------- 2017/01/18 17:09:36:190 FirstTransaction APPLECROSS\Paul 0000:00000374 2017/01/18 17:09:40:600 SecondTransaction APPLECROSS\Paul 0000:00000375
Cool eh?
Here’s the code – enjoy!
/*============================================================================ File: sp_SQLskillsAbortedTransactions.sql Summary: This script cracks the transaction log and shows which transactions were rolled back after a crash SQL Server Versions: 2012 onwards ------------------------------------------------------------------------------ Written by Paul S. Randal, SQLskills.com (c) 2017, SQLskills.com. All rights reserved. For more scripts and sample code, check out http://www.SQLskills.com You may alter this code for your own *non-commercial* purposes. You may republish altered code as long as you include this copyright and give due credit, but you must obtain prior permission before blogging this code. THIS CODE AND INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. ============================================================================*/ USE [master]; GO IF OBJECT_ID (N'sp_SQLskillsAbortedTransactions') IS NOT NULL DROP PROCEDURE [sp_SQLskillsAbortedTransactions]; GO CREATE PROCEDURE sp_SQLskillsAbortedTransactions AS BEGIN SET NOCOUNT ON; DBCC TRACEON (2537); DECLARE @BootTime DATETIME; DECLARE @XactID CHAR (13); SELECT @BootTime = [sqlserver_start_time] FROM sys.dm_os_sys_info; IF EXISTS (SELECT * FROM [tempdb].[sys].[objects] WHERE [name] = N'##SQLskills_Log_Analysis') DROP TABLE [##SQLskills_Log_Analysis]; -- Get the list of started and rolled back transactions from the log SELECT [Begin Time], [Transaction Name], SUSER_SNAME ([Transaction SID]) AS [Started By], [Transaction ID], [End Time], 0 AS [RolledBackAfterCrash], [Operation] INTO ##SQLskills_Log_Analysis FROM fn_dblog (NULL, NULL) WHERE ([Operation] = 'LOP_BEGIN_XACT' AND [Begin Time] < @BootTime) OR ([Operation] = 'LOP_ABORT_XACT' AND [End Time] > @BootTime); DECLARE [LogAnalysis] CURSOR FAST_FORWARD FOR SELECT [Transaction ID] FROM ##SQLskills_Log_Analysis; OPEN [LogAnalysis]; FETCH NEXT FROM [LogAnalysis] INTO @XactID; WHILE @@FETCH_STATUS = 0 BEGIN IF EXISTS ( SELECT [End Time] FROM ##SQLskills_Log_Analysis WHERE [Operation] = 'LOP_ABORT_XACT' AND [Transaction ID] = @XactID) UPDATE ##SQLskills_Log_Analysis SET [RolledBackAfterCrash] = 1 WHERE [Transaction ID] = @XactID AND [Operation] = 'LOP_BEGIN_XACT'; FETCH NEXT FROM [LogAnalysis] INTO @XactID; END; CLOSE [LogAnalysis]; DEALLOCATE [LogAnalysis]; SELECT [Begin Time], [Transaction Name], [Started By], [Transaction ID] FROM ##SQLskills_Log_Analysis WHERE [RolledBackAfterCrash] = 1; DBCC TRACEOFF (2537); DROP TABLE ##SQLskills_Log_Analysis; END GO EXEC sys.sp_MS_marksystemobject [sp_SQLskillsAbortedTransactions]; GO -- USE [Company]; EXEC sp_SQLskillsAbortedTransactions;
10 thoughts on “Code to show rolled back transactions after a crash”
Very interesting, thank you. I’ve never seen anyone name their transactions, is this feature really used in some shops?
Yup. Nowhere near as much as it should be though.
Paul, Thank you very much for the information. Sincerely, Tom
Paul,
Thanks for posting this. I am working on a tran log presentation now and the reminder about named transaction is timely.
Frank
Very interesting and, as always, very well presented. Thank you !!
It’s a very interesting article, Paul. Thanks for the sharing.
I’m working with a CDC tool from Informatica called Mass Ingestion. Logs are read by fn_dblog and sent to target. But the source database has a transaction log backup routine which clear the log every 30 minutes. This might be a problem if we need to get older logs. We want to be able to read transaction logs for at least 3 days without messing with backup. I already tested flag 2537 and I believe it can solve my problem if I set log to be long enough for 3 days, but as it is a undocumented trace flag my team it’s not comfortable to use it in production. Do you think using this flag can be too risky? I know you can’t guarantee but I want your opinion. Do you believe Microsoft can change or remove this flag in the next years? Thanks.
I never recommend building a production solution using undocumented features/trace flags. I’d look at a supported solution – like using copy_only log backups until you *really* want the log to clear, or have a replicated copy – something like that.
Thanks for your answer, I thought about a copy_only log backup, the problem is wouldn’t the normal backup delete all data. For instance if I have a backup routine every 30 minutes, 10:00, 10:30, 11:00, etc, and I create a copy_only backup routine for 10:15, 10:45… If I understand correctly using this configuration will make my copy only backups having only 15 minutes of data. How could I create copy only backups without losing data.
No – I meant if you’re happy with the log not clearing for three days, use copy_only log backups until you have a no-activity maintenance window and then do a normal log backup. As far as restoring is concerned, doesn’t matter if the log backups are copy_only or not.