SQL Server Pro Disaster Recovery Poster – Explanatory Notes
Instructions
This web page contains explanatory notes for some of the steps on the Disaster Recovery flow-chart poster I designed for SQL Server Pro. Please match the two-digit code and text from the step on the poster with the list below. I will update this page as needed.
Books Online links are for the most recent version where possible. Where the syntax or procedure differs for other versions, I’ve provided the relevant link.
If you have any questions, feel free to send me an email. We have a full range of disaster recovery and high availability planning service, and can assist with disaster recovery, including corruption analysis.
I hope the poster and instructions are helpful to you!
Paul S. Randal, May 10 2012
01: Perform regular tail-of-the-log backup if possible and required
The “tail-of-the-log” contains all transaction log records generated since the most recent transaction log backup. If you want to recover right up to the point of the disaster, you must back up this portion of the transaction log and restore it as the final operation in the restore sequence (see note 10 below).
This is also necessary when performing an online piecemeal restore operation. The tail-of-the-log backup is taken after the page/file/filegroup restore operation. This captures the transaction log spanning the time after which no changes can possibly have affected the database portion being restored.
Note that for an offline restore operation, the tail-of-the-log backup must always be taken before the first restore operation.
Full details are in my blog post Disaster recovery 101: backing up the tail of the log.
02: Perform a hack-attach tail-of-the-log backup if only log files exist
The “tail-of-the-log” contains all transaction log records generated since the most recent transaction log backup. If you want to recover right up to the point of the disaster, you must back up this portion of the transaction log and restore it as the final operation in the restore sequence (see note 10 below).
You are at this point because the database cannot be attached to a SQL Server instance because some of the data files are damaged or missing. To be able to back up the tail-of-the-log you must fool SQL Server into attaching the remaining portion of the database. This method involves creating a dummy database and then swapping in the files from the database you wish to attach.
Full details are in my blog post Disaster recovery 101: backing up the tail of the log.
03: Restore most recent full backups of all portions of the database, starting with primary filegroup
If you are restoring the entire database from scratch, you must restore a full file, filegroup, or database backup to start the restore sequence. If your database is split into multiple filegroups, you must restore all the filegroups, starting with the PRIMARY filegroup and then all other filegroups. If you do not want to restore the entire database, see note 02 below. Don’t forget to use the WITH NORECOVERY option on all restore operations in the restore sequence.
The Books Online section Implementing Restore Scenarios for SQL Server Databases has more details.
04: Restore primary filegroup from most recent full file, filegroup, or database backup using WITH PARTIAL
If you have your database split into multiple filegroups, and you’re restoring from scratch, on Enterprise Edition you can make use of partial database availability to only restore a subset of the filegroups before bringing the database online. This allows a faster restore than having to restore the entire database. You must always start the restore sequence by restoring the PRIMARY filegroup, plus zero or more secondary filegroups. Don’t forget to use the WITH NORECOVERY option on all restore operations in the restore sequence.
An example of the T-SQL syntax involved is in my SQL Server Magazine blog post Using Partial Database Availability for Targeted Restores.
The Books Online section Performing Piecemeal Restores also has more details.
05: Restored desired secondary filegroups from most recent full file and/or filegroup and/or database backups
When you are performing a partial restore of a database, you can choose how many filegroups to restore initially before bringing the database online. The PRIMARY filegroup must be restored first, as explained in note 04 above. Don’t forget to use the WITH NORECOVERY option on all restore operations in the restore sequence.
An example of the T-SQL syntax involved is in my SQL Server Magazine blog post Using Partial Database Availability for Targeted Restores.
The Books Online section Performing Piecemeal Restores also has more details.
06: Restore most recent differential file and/or filegroup and/or database backups
In any restore sequence where you want to restore the database to a more recent point in time than the initial full backup, you will either be restoring differential backups, transaction log backups, or a combination of both. Always try to restore the most recent differential backup(s) covering all portions of the database being restored, as these will restore faster than all the transaction log backups covering the same period of time as the differential backup. Don’t forget to use the WITH NORECOVERY option on all restore operations in the restore sequence.
The Books Online section Implementing Restore Scenarios for SQL Server Databases has more details.
07: Set damaged portion of database offline to allow online restore
Prior to restoring a file or filegroup into an existing database, you must set the file, or one of the files in the filegroup, offline. Example syntax is ALTER DATABASE dbname MODIFY FILE (name=myfile, OFFLINE). This is necessary if there are any other connections to the database so that SQL Server knows that no-one is using the file or filegroup being restored. This step is not necessary when performing page restores.
The Books Online section Implementing Restore Scenarios for SQL Server Databases has more details.
08: Restore page(s) and/or file(s) and/or filegroup(s) from most recent full file, and/or filegroup and/or database backups
First make sure to prepare for the piecemeal restore operation by taking a tail-of-the-log backup for offline restores or setting the file (or a file in the filegroup) offline for an online restore. Then start the restore sequence by restoring the desired portion(s) of the database from the most recent full backup(s). Don’t forget to use the WITH NORECOVERY option on all restore operations in the restore sequence.
The Books Online section Implementing Restore Scenarios for SQL Server Databases has more details.
09: Restore all necessary transaction log backups to bring all in-restore portions of the database to the same desired point in time
In any restore sequence where you want to restore the database to a more recent point in time than the initial full backup, you will either be restoring differential backups, transaction log backups, or a combination of both. Always try to restore the most recent differential backup(s) covering all portions of the database being restored, as these will restore faster than all the transaction log backups covering the same period of time as the differential backup. Then restore all necessary transaction log backups to bring the portions of the database to as recent as time as possible. Don’t forget to use the WITH NORECOVERY option on all restore operations in the restore sequence.
The Books Online section Implementing Restore Scenarios for SQL Server Databases has more details.
10: Restore tail-of-the-log backup if required
If you performed a tail-of-the-log backup (see notes 01 and 02 above) then you should restore it as the final operation in the restore sequence. This will bring the database (or portion of the database) being restored to the most recent time possible. This step is required when performing a piecemeal restore of part of an existing database. Don’t forget to use the WITH NORECOVERY option on all restore operations in the restore sequence.
The Books Online section Implementing Restore Scenarios for SQL Server Databases has more details.
11: Try to set database ONLINE
Use the syntax ALTER DATABASE dbname SET ONLINE to bring the database online. If there is a problem you will see an error message.
12: Try to perform regular attach
To attach a database you use the CREATE DATABASE … FOR ATTACH syntax, specifying all data and log file paths. The files must be in their correct locations and all data and log files must be present. You can also use the depracated sp_attach_db command.
The Books Online section Detaching and Attaching Databases has more details.
13: Try to perform attach with log rebuild
If the log file(s) are unavailable, and the database was cleanly shut down (i.e. there were no active transactions when it was shut down), then SQL Server can automatically re-create the log file(s) for you. Use the CREATE DATABASE … FOR ATTACH_REBUILD_LOG syntax and list all data file paths. The data files must be in their correct locations and must all be present.
The Books Online section Detaching and Attaching Databases has more details.
14: Try to perform a hack attach
If a regular attach method will not work, there is no choice but to fool SQL Server into attaching the database. This method involves creating a dummy database and then swapping in the files from the database you wish to attach.
Full details are explained in my blog post Disaster recovery 101: hack-attach a damaged database.
15: Switch to EMERGENCY mode if possible
EMERGENCY mode has been available since at least SQL Server 7.0 but it was only in SQL Server 2005 that it became a documented database state. It means that crash recovery has not been performed on the database and so the database is transactionally inconsistent. To put the database into EMERGENCY mode, use the syntax ALTER DATABASE dbname SET EMERGENCY.
Full details are explained in my blog post Using EMERGENCY mode to access a RECOVERY PENDING or SUSPECT database.
16: Try to perform EMERGENCY mode repair
EMERGENCY mode repair is the last resort when trying to recover a damaged database. You perform an EMERGENCY mode repair by placing the database into single-user mode, and EMERGENCY mode and then using the syntax DBCC CHECKDB (dbname, REPAIR_ALLOW_DATA_LOSS) WITH ALL_ERRORMSGS, NO_INFOMSGS. If the repair is successful, it will repair all problems (with some likely data loss) and bring the database online. Be aware that the data is likely to be transactionally inconsistenct and some data may have been deleted to effect the repairs.
For more information, see my blog post CHECKDB From Every Angle: EMERGENCY mode repair – the very, very last resort, the Books Online entry DBCC CHECKDB (Transact-SQL), and my feature article on Using Database Repair for Disaster Recovery in the September 2010 SQL Server Magazine.
17: Try to perform data extraction
If EMERGENCY mode repair does not work, or you do not want to attempt a repair operation, you can try to extract data and schema into a new database, using whatever scripting option or tool you choose. Be aware that the data is transactionally inconsistent and some data may be unavailable.
18: Rebuild master database
If the master database is damaged, you must rebuild the master database to allow SQL Server to start. If you have a backup of the master database, restore it and follow the instructions in note 19 below. Otherwise, you must reattach all user databases and recreate all users, logins, and other data stored in master before continuing.
See these SQL Server 2012 Books Online links for instructions and more information: Back Up and Restore of System Databases, Rebuild System Databases, Move System Databases
See these SQL Server 2005 Books Online links for instructions and more information: Considerations for Backing Up and Restoring System Databases,How to: Install SQL Server 2005 from the Command Prompt (explains how to rebuild system databases in 2005), Moving System Databases
19: Restore master database
After rebuilding the master database all data about the instance is lost. If you have a backup of the master database, you should restore it, following the instructions in the first link below (for your SQL Server version). You should then reattach any databases, and recreate any users/logins, and other operations that affect master that were performed after the most recent backup of master was performed.
See these SQL Server 2012 Books Online links for instructions and more information: Back Up and Restore of System Databases, Rebuild System Databases, Move System Databases
See these SQL Server 2005 Books Online links for instructions and more information: Considerations for Backing Up and Restoring System Databases,How to: Install SQL Server 2005 from the Command Prompt (explains how to rebuild system databases in 2005), Moving System Databases
20: Copy in correct version of resource database
The resource database is a read-only database that started to ship with SQL Server 2005, and is effectively the same as a DLL. It cannot be backed up or restored. To restore it, you must copy the correct version from install media, or a file-system backup.
See this SQL Server 2012 Books Online entry for more information: Back Up and Restore of System Databases.
See this SQL Server 2005 Books Online entry for more information: Considerations for Backing Up and Restoring System Databases.
21: Resolve file system issue or change tempdb location
If the tempdb database cannot be created, the SQL Server instance will not start, so you will have to correct whatever problem is preventing it. This may entail moving the tempdb database.
See this SQL Server 2012 Books Online entry for more information: Move System Databases.
See this SQL Server 2005 Books Online entry for more information: Moving System Databases.
22: Extract data from snapshot, or revert to snapshot
A database snapshot is a point-in-time view of your database so as long as it was created before the user deleted the data, it will still contain the deleted data. Be aware that any changes made to the data after the database snapshot was created will not be in the database snapshot and so will be effectively lost. To copy the data from the database snapshot, use whatever command you prefer, treating the database snapshot as if it is a regular database. If the user dropped a table, you will be able to recreate it again by first scripting out the CREATE TABLE statement using SQL Server Management Studio against the database snapshot and then running the script in the production database.
Alternatively, you can revert the entire database back to the point in time at which the database snapshot was created. Be aware that all work performed since the database snapshot was created will be lost, and the transaction log backup chain will be broken (but can easily be reestablished with a full or differential database backup). To revert from a database snapshot you use the syntax RESTORE DATABASE dbname FROM DATABASE_SNAPSHOT = ‘dbsnapshotname’.
Note that if you have any FILESTREAM filegroups, you will not be able to perform a revert to snapshot.
For more information see the Books Online topic Revert a Database to a Database Snapshot.
23: Extract data from log shipping secondary
A log shipping secondary is a copy of the database that transaction log backups are applied to on a schedule. If you have a log shipping secondary that has not yet had transaction log backups applied that cover the time when the user deleted the data, that database will still contain the deleted data. You can copy the data back using a similar method to that described in the first paragraph of note 22 above.
24: DBCC CHECKDB (dbname) WITH ALL_ERRORMSGS, NO_INFOMSGS
DBCC CHECKDB is the only operation that will tell you what corruptions exist in the database. Always let it complete before deciding on a course of action, unless your disaster recovery plan is to immediately fail over when corruption is suspected. I have a lot of blog posts about DBCC CHECKDB (as I wrote all its code), which you may want to read *after* you’ve finished performing your disaster recovery 🙂
For more information, see the Books Online entry DBCC CHECKDB (Transact-SQL), and my blog post category CHECKDB From Every Angle.
25: Restore deleted table, or deleted rows from restored copy of table
If you have a third-party backup tool that allows single-table restores, you may be able to take advantage of it to perform a targeted restore of just the deleted data. The instructions for this are entirely vendor-specific and nothing to do with SQL Server per se.
26: Point-in-time restore as close to data deletion as possible and extract as much data as possible
If you have a complete set of backups then you can restore the database to another location (NOT overwriting the production database) to the point in time just before the data was deleted. To do this, you use the WITH STOPAT=’date-time-value’ syntax.
If you do not know the point in time at which the data was deleted, you must slowly move forward through the transaction log backups until you find the time at which the data is deleted (we call this ‘inching through the log’). It’s a very tedious operation to perform but will allow you to find the desired time. You may be able to find the time from the default trace, or by using a third-party log analysis tool (see note 27).
For more information, see the Books Online entry Restore a SQL Server Database to a Point in Time (Full Recovery Model).
27: Use tool to reverse user error that deleted data
There are some third-party tools that allow you to investigate the contents of the transaction log to determine when an operation took place, and potentially to generate the T-SQL to reverse the operation. The instructions for how to do this are entirely vendor-specific and nothing to do with SQL Server per se.
28: Offline rebuild affected indexes
If you can determine that the only corruptions are in nonclustered indexes (i.e. all error messages from DBCC CHECKDB list an index ID of 2 or higher) then you can correct these corruptions without having to run database repair or perform a restore – both of which require application downtime. For SQL Server 2005, an offline rebuild of the nonclustered index(es) will remove corruption. For SQL Server 2008 onwards, you will most likely need to drop the nonclustered index and then recreate it again to ensure that SQL Server does not use an index rebuild query plan that uses the old (corrupt) index to build the new one. If you go the drop/create route, and the nonclustered index is enforcing a constraint, be sure that no-one can perform an operation that violates the constraint (and hence prevent the nonclustered index being recreated) during the operation. You can easily do this by using an explicit BEGIN TRAN/COMMIT TRAN around the index drop/create.
29: Try to repair database using DBCC CHECKDB
Repair is usually a last resort when trying to recover a damaged database, although you may chose to repair if downtime is more detrimental to your business than data loss, and the only restore options you have would take far longer than a repair operation.. You perform repair by placing the database into single-user mode, and then using the syntax DBCC CHECKDB (dbname, REPAIR_ALLOW_DATA_LOSS) WITH ALL_ERRORMSGS, NO_INFOMSGS. If the repair is successful, it will repair all problems (with some likely data loss) and then you bring the database out of single-user mode. Be aware that some data may have been deleted to effect the repairs.
For more information, see the Books Online entry DBCC CHECKDB (Transact-SQL), and my feature article Using Database Repair for Disaster Recovery in the September 2010 SQL Server Magazine.
30: Back up repaired database. Potentially restore most recent backups and extract as much data as possible
After running any kind of repair operation it is always prudent to take a full backup, just in case a further disaster occurs – you’ve then got an easy starting point for recovery. You may also be able to determine what data, if any, was deleted by the repair operation and then use older backups to salvage some of it.
For more information, see my feature article Using Database Repair for Disaster Recovery in the September 2010 SQL Server Magazine.
31: Try to repair 8992/2570 errors
Error 8992 means there is a mismatch between data in the system catalogs. The common cause of this is someone manually altering the system tables in SQL Server 2000. There is no automatic repair for these errors so you must repair them manually. The instructions are in my blog post Using the Dedicated Admin Connection to fix Msg 8992: corrupt system tables.
Error 2570 means there is a corrupt column value in a column that is a data type with a defined valid range of values (e.g. DATETIME or FLOAT). There is no automatic repair for these error so you must repair them manually. The instructions are in KB 923247: Troubleshooting DBCC error 2570 in SQL Server 2005. The instructions also work on SQL Server 2008.
Note: there was a bug in SQL Server 2005 that could cause error 2570. It was fixed in SQL Server 2005 SP2 CU4. See KB 942904: FIX: Error message when you run the DBCC CHECKTABLE statement or the DBCC CHECKDB statement in Microsoft SQL Server 2005: “Column ” value is out of range for data type ‘float'”.
32: Bring the database online
After completing the restore sequence, using WITH NORECOVERY on each restore operation, the database is not yet online. Issue the command RESTORE DATABASE dbname WITH RECOVERY to complete the restore sequence and bring the database online.