In my previous post on interpreting DBCC CHECKDB output, plus in my DBCC Internals session at TechEd IT Forum in Barcelona yesterday, I mentioned there are some things that DBCC CHECKDB can’t repair. In this post I want to go into a bit more detail – based on a post from my old Storage Engine blog.
Before anyone takes this the wrong way – what do I mean by “can’t be repaired”? Remember that that purpose of repair is to make the database structurally consistent, and that to do this usually means deleting the corrupt data/structure (that’s why the option to do this was aptly named REPAIR_ALLOW_DATA_LOSS – see this post for more explanation on why repair can be bad). A corruption is deemed unrepairable when it doesn’t make sense to repair it given the damage the repair would cause, or the corruption is so rare and so complicated to repair correctly that it’s not worth the engineering effort to provide a repair. Remember also that recovery from corruptions should be based on a sound backup strategy, not on running repair, so making this trade-off in functionality makes sense.
Here’s a few of the more common unrepairable corruptions that people run into along with the reasons they can’t be repaired by DBCC CHECKDB.
PFS page header corruption
An example of this is:
Msg 8946, Level 16, State 12, Line 1 Table error: Allocation page (1:13280496) has invalid PFS_PAGE page header values. Type is 0. Check type, alloc unit ID and page ID on the page.
DBCC CHECKDB uses the PFS pages to determine which pages are allocated – and so which pages to read to drive the various consistency checks. The only repair for a PFS page is to reconstruct it – they can’t simply be deleted as they’re a fixed part of the fabric of the database. PFS pages cannot be rebuilt because there is no infallible way to determine which pages are allocated or not. There are various algorithms I’ve experimented with to rebuild them, with optimistic or pessimistic setting of page allocation statuses and then re-running the various consistency checks to try to sort out the incorrect choices, but they all require very long run-times and won’t work 100% of the time. Given the frequency with which these corruptions are seen, and the engineering effort required to come up with an (imperfect) solution, I made the choice to leave this as unrepairable, and I don’t think that will change in future. I wrote another article about this here.
Critical system table clustered-index leaf-page corruption
An example of this is:
Msg 7985, Level 16, State 2, Server SUNART, Line 1 System table pre-checks: Object ID 4. Could not read and latch page (1:51) with latch type SH. Check statement terminated due to unrepairable error.
In a previous post in the series I described why how and why there are special checks of the clustered indexes of the critical system tables. If any of the pages at the leaf-level of these indexes are corrupt, they cannot be repaired. Repairing would mean deallocating the page, wiping out the most important metadata for potentially hundreds of user tables and so effectively deleting all of these tables. That’s obviously an unpalatable repair for anyone to allow and so DBCC CHECKDB doesn’t do it.
Column value corruption
Here’s an example of this:
Msg 2570, Level 16, State 3, Line 1 Page (1:152), slot 0 in object ID 2073058421, index ID 0, partition ID 72057594038321152, alloc unit ID 72057594042318848 (type "In-row data"). Column "c1" value is out of range for data type "datetime". Update column to a legal value.
This is where a column has a stored value that is outside the valid range for the column type. There are a couple of repairs that could be done for this:
- Delete the entire record
- Insert a dummy value
#1 isn’t very palatable because then data is lost and it’s not a structural problem in the database so doesn’t have to be repaired. #2 is dangerous – what value should be chosen as the dummy value? Any value put in may adversely affect business logic, or fire a trigger, or have some unwelcome meaning in the context of the table – even a NULL. Given these problems, I chose to allow people to fix the corrupt values themselves.
Metadata corruption
Here’s an example of this:
Msg 3854, Level 16, State 1, Line 2 Attribute (referenced_major_id=2089058478) of row (class=0,object_id=2105058535,column_id=0,referenced_major_id=2089058478,referenced_minor_id=0) in sys.sql_dependencies has a matching row (object_id=2089058478) in sys.objects (type=SN) that is invalid.
This example is relatively benign. There are other examples that will cause DBCC CHECKDB to terminate – not as bad as the critical system table corruption example above, but enough that DBCC CHECKDB doesn’t trust the metadata enough to use it to drive consistency checks. Repairing metadata corruption has the same problems as repairing critical system table corruption – any repair means deleting metadata about one or more tables, and hence deleting the tables themselves. It’s far better to leave the corruption unrepaired so that as much data as possible can be extracted from the remaining tables.
Summary
Repair can’t fix everything. You may end up having to perform manual and time-consuming data extraction from the corrupt database and losing lots of data because of, say, a critical system table corruption. Bottom line (as usual) – make sure you have valid backups so you don’t get into this state!
27 thoughts on “CHECKDB From Every Angle: Can CHECKDB repair everything?”
Hi Paul, i was reading your post, I have this problem in SQL server 2008 right now:
Msg 8966, Level 16, State 2, Line 3
Unable to read and latch page (1:926966) with latch type SH. l(failed to retrieve text for this error. Reason: 15105) failed.
Some posts say that was a hardware HDD problem and the only way to repair is restoring the database, but my last restore has 12 days old, because that problem stop the backup!
Any idea hoy to repair this?
You’ll need to go back to the most recent backup if repair won’t fix it.
If m_pageId = (0:0) in the page header can repair fix this issue from the value in bpageno? That value is accurate (1:412168). We have 8 pages that are in this state. This is the summary for the affected table.
There are 4657925 rows in 2838409 pages for object “ePrice_WQBill”.
CHECKTABLE found 0 allocation errors and 1672 consistency errors in table ‘ePrice_WQBill’ (object ID 1605580758).
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKTABLE (WorkQueue_Prod.dbo.ePrice_WQBill).
Thanks.
No – those pages will be deallocated.
DBCC CHECKDB on my database in simple recovery model failed. Please find the error message below. I copied the backup and restored on a test server and ran DBCC CHECKDB with repair_allow_data_loss. Below bold lines was the one got repaired. Does ths caused any loss of user data. Kindly suggest.
Repair: The Clustered index successfully rebuilt for the object “sys.sysobjvalues” in database “****”.
Repair: The page (1:629823) has been deallocated from object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data).
Executing the query “DBCC CHECKDB(N’****’) WITH NO_INFOMSGS
” failed with the following error: “Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID -3212797581117423616 (type Unknown), page (40634:1181196049). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12716041 and -4.
Incorrect PFS free space information for page (1:160) in object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data). Expected value 95_PCT_FULL, actual value 50_PCT_FULL.
Incorrect PFS free space information for page (1:3427) in object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data). Expected value 80_PCT_FULL, actual value 95_PCT_FULL.
Incorrect PFS free space information for page (1:22426) in object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data). Expected value 100_PCT_FULL, actual value 50_PCT_FULL.
Incorrect PFS free space information for page (1:26439) in object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data). Expected value 50_PCT_FULL, actual value 100_PCT_FULL.
Incorrect PFS free space information for page (1:27700) in object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data). Expected value 95_PCT_FULL, actual value 100_PCT_FULL.
Incorrect PFS free space information for page (1:27701) in object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data). Expected value 80_PCT_FULL, actual value 95_PCT_FULL.
Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 71776119065149440 (type LOB data): Page (1:629823)could not be processed. See other errors for details.
CHECKDB found 0 allocation errors and 1 consistency errors not associated with any single object.
CHECKDB found 0 allocation errors and 7 consistency errors in table ‘sys.sysobjvalues’ (object ID 60).
CHECKDB found 0 allocation errors and 8 consistency errors in database ‘****’.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (****).
Thanks Lakshmana Y
Unknown – it deleted a page from a non-critical system table (sysobjvalues) so at some point in the future, an operation around a random table will fail. You didn’t lose any user data.
Paul Randal – So what is the solution if this table has been removed by the repair?
Well, unless you have a backup or another copy of the database, there is no solution – that’s why it’s called repair_allow_data_loss.
Thank God, Thank you so much Paul.
We are getting below error. Can we rebuild clustered index to fix the corruption
Msg 2511, Level 16, State 2, Line 1
Table error: Object ID 27826728, index ID 1, partition ID 7823872782634, alloc unit ID 8738838383836 (type In-row data). Keys out of order on page (1:78362343), slots 163 and 164
It’s likely that you’re on an older build where there’s a bug that creates index key order corruption (there have been several).
Paul
I am getting this following DBCCC CHECKDB error on the master database, i checked regrading the error msg 8981 on your chapter 11 of “SQL Server 2012 Internals book” where i didn’t find any solution.
my questions is : do we have any possible way to fix this error Msg 8981 by doing some sort of repair or it’s better to simply restore from backup ?
Msg 8981, Level 16, State 1, Line 1
Table error: Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 281474980642816 (type In-row data). The next pointer of (1:113) refers to page (1:115). Neither (1:115) nor its parent were encountered. Possible bad chain linkage.
Msg 8935, Level 16, State 1, Line 1
Table error: Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 281474980642816 (type In-row data). The previous link (1:120) on page (1:124) does not match the previous page (1:113) that the parent (1:275), slot 7 expects for this page.
Msg 8934, Level 16, State 2, Line 1
Table error: Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 281474980642816 (type In-row data). The high key value on page (1:120) (level 0) is not less than the low key value in the parent (0:1), slot 0 of the next page (1:124).
Msg 8934, Level 16, State 3, Line 1
Table error: Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 281474980642816 (type In-row data). The high key value on page (1:120) (level 0) is not less than the low key value in the parent (0:1), slot 0 of the next page (1:124).
Msg 8935, Level 16, State 1, Line 1
Table error: Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 281474980642816 (type In-row data). The previous link (1:115) on page (1:264) does not match the previous page (1:120) that the parent (1:275), slot 14 expects for this page.
Msg 8978, Level 16, State 1, Line 1
Table error: Object ID 60, index ID 1, partition ID 281474980642816, alloc unit ID 281474980642816 (type In-row data). Page (1:264) is missing a reference from previous page (1:115). Possible chain linkage problem.
If you have a backup, you should always choose restoring over repairing for master.
Yes, I got below errors, after dbcc checkdb the error still exists:
Msg 8935, Level 16, State 1, Line 1
Table error: Object ID 325576198, index ID 2, partition ID 72057594438418432, alloc unit ID 72057594439794688 (type In-row data). The previous link (1:7128580) on page (1:6687422) does not match the previous page (1:7112104) that the parent (1:5287451), slot 115 expects for this page.
Msg 8935, Level 16, State 1, Line 1
Table error: Object ID 325576198, index ID 2, partition ID 72057594438418432, alloc unit ID 72057594439794688 (type In-row data). The previous link (1:5678739) on page (1:6691197) does not match the previous page (1:7111939) that the parent (1:6973298), slot 233 expects for this page.
Msg 8935, Level 16, State 1, Line 1
How to figure out it?
Best Regards,
Sue.
Do you mean after running repair the problem still exists?
Hi Paul,
We have a customer whose SQL Server runs on VM. The customer has several databases and some of them (randomly) gets corrupted very often. We try to repair with allowing data loss every time. After a while the same databases and sometimes others get corrupted again and again. We had the customer change its disk from SSD to classical HDD. Also we had them upgrade SQL Server from 2012 to 2014. However, the problem persists, and we cannot explain the issue to the customer. Is there any possibility that fixing the database leaves something that may cause corruption again in the future? Or, where should we look for the real cause of the corruptions? As far as we know, the server is at a good data center where no power-off may happen unexpectedly to cause disk failure etc. Thanks in advance for your comments.
Nope. Their problem is environmental (bad memory, driver problem, hypervisor, hardware, cables…)
Hi Paul,
I am getting this following DBCCC CHECKDB error for one of our sharepoint user database (size 1TB); (sql server 2008R2)
we already resolved it by identifying the 11 corrupted rows out of 3 lakhs rows by manual insert into new table.but again after 3 days the database got corrupted with the below error.
Msg 8928, Level 16, State 1, Line 1
Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594486784000 (type LOB data): Page (25:7050120) could not be processed. See other errors for details.
Msg 8965, Level 16, State 1, Line 1
Table error: Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594486784000 (type LOB data). The off-row data node at page (25:7050120), slot 0, text ID 686882816 is referenced by page (25:7047653), slot 0, but was not seen in the scan.
Msg 8929, Level 16, State 1, Line 1
Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10474436:22)
Msg 8929, Level 16, State 1, Line 1
Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10474937:8)
Msg 8929, Level 16, State 1, Line 1
Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10475411:87)
Msg 8929, Level 16, State 1, Line 1
Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10475796:38)
Msg 8929, Level 16, State 1, Line 1
Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10476808:54)
Msg 8929, Level 16, State 1, Line 1
Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10477312:4)
Msg 8929, Level 16, State 1, Line 1
Object ID 373576369, index ID 1, partition ID 72057594401259520, alloc unit ID 72057594488946688 (type In-row data): Errors found in off-row data with ID 686882816 owned by data record identified by RID = (18:10477466:38)
There are 311517 rows in 3621 pages for object “AllDocStreams”.
CHECKDB found 0 allocation errors and 9 consistency errors in table ‘AllDocStreams’ (object ID 373576369).
Kindly suggest the reason for it.
Thank you in advance.
I/O subsystem problem, bad memory chip, possible SQL Server bug (unlikely).
Hi Paul:
I am getting error like below : ( just a few key lines)
DBCC results for ‘DWH_LANDING’.
Msg 8905, Level 16, State 1, Line 3
Extent (1:6648336) in database ID 19 is marked allocated in the GAM, but no SGAM or IAM has allocated it.
Msg 8905, Level 16, State 1, Line 3
Extent (1:6656424) in database ID 19 is marked allocated in the GAM, but no SGAM or IAM has allocated it.
Msg 8905, Level 16, State 1, Line 3
Extent (1:6664512) in database ID 19 is marked allocated in the GAM, but no SGAM or IAM has allocated it.
CHECKDB found 127 allocation errors and 0 consistency errors in database ‘DWH_LANDING’.
repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (DWH_LANDING).
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
would DBCC CHECKDB (‘xyz’, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS fix this error?
Second question, is allocation error less severe than conssitency errors?
Thanks
Hui
Yes, repair should fix those. And no, allocation errors can be very severe as well – all depends on what the error is and where the error is.
Thanks Paul. I did end up using DBCC CHECKDB (‘xyz’, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS
And result is below:
CHECKDB found 127 allocation errors and 0 consistency errors in database ‘DWH_LANDING’.
CHECKDB fixed 127 allocation errors and 0 consistency errors in database ‘DWH_LANDING’.
Can I say there is no data loss in this case?
thanks
If all the errors were like the ones you posted, then likely no data loss. I can’t say 100% without examining the database before and after the repair.
what resides in Extent (1:6648336) ? some say No object in the database has assigned that extent (1 allocation error … not associated with any single object), is this true?
thanks
Likely nothing. The error means that the database thinks someone is using that extent, but nobody is.
Thanks Paul. I am really curious about those number , for example Extent (1:6648336) , I want to know if pageID 6648336 contains any user data. Therefore, I managed to get this script below:
Is so.Object_Id referrring to the PageID for user tables? My rationale is that if I could list all PageID for objects and I could
easily tell if those errors are related to database objects or not. This would give me much more confidence when reporting to manager. Can you check if query below is able to give me the info I am looking for?
Thanks
select so.name, so.object_id, sp.index_id, sp.partition_id, sp.hobt_id, sa.container_id, internals.total_pages, internals.used_pages, internals.data_pages, first_page, root_page, first_iam_page
from sys.objects so
inner join sys.partitions sp on so.object_id = sp.object_id
inner join sys.allocation_units sa on sa.container_id = sp.hobt_id
inner join sys.system_internals_allocation_units internals on internals.container_id = sa.container_id
where so.name not like ‘sys%’
It doesn’t have any user data on it otherwise it would have been marked as allocated, which it wasn’t.