I was thinking back to a conversation I had some time ago regarding cases when sys.dm_db_index_usage_stats does not get updated after an index is used for just the statistics associated with the indexed columns. What makes this profession so interesting is that almost every tool we use has a "yes, but…" associated with it. What a DMV gives with one hand it might hide with another.
While I recalled the conversation, I needed to double check who I had actually discussed it with. Admittedly, it’s a short list. After checking with him over email, I realized that conversation had been with MVP, Microsoft RD, MCM Greg Low. He also mentioned that Rob Farley once demonstrated the effect of creating a unique non-clustered index on a column which impacts the query plan but doesn’t actually register that use in sys.dm_db_index_usage_stats.
While the conversation stuck with me, I had never taken time to test it out. In this post, I’ll be testing out a different scenario, but essentially a similar situation where index column stats are used for a query (to provide more accurate estimates) – but its use results in no update to sys.dm_db_index_usage_stats.
Before getting to the demo, I do want to say that I still think sys.dm_db_index_usage_stats is incredibly useful for evaluating usage patterns and helping to identify indexes that aren’t pulling their weight (high cost – low benefit). I also see the following scenario as an edge case – but something to most certainly be mindful of as one reason why you could all-of-the-sudden see cardinality estimate issues after dropping an index that wasn’t showing as being used at all.
In this demo, I’m using SQL Server 2008 R2 (10.50.1617) and the AdventureWorksDW database. I’ll start off by disabling auto-creation of statistics (you’ll see why shortly). While I see auto-creation enabled more often than not, I have seen cases where it has mindfully been disabled and cases where it was disabled for no good reason at all – a topic for another day:
— Disable creation of statistics
USE [master]
GO
ALTER DATABASE [AdventureWorksDW] SET AUTO_CREATE_STATISTICS OFF WITH NO_WAIT
GO
Next I’ll execute the following query against the dbo.FactInternetSales table – and I’ve enabled the “Include Actual Execution Plan” in SSMS so I can see the actual plan:
USE [AdventureWorksDW]
GO
— Estimated rows (no stats on TaxAmt) = 3,853
— Actual rows (562)
SELECT RevisionNumber
FROM dbo.FactInternetSales
WHERE TaxAmt = 5.08
We can check the estimated versus actual rows in SSMS, but I’ll actually show the results in SQL Sentry Plan Explorer because I like the tabular format (and I don’t have to “hover” to see it). The following screen shot is from the Plan Tree tab:
As you can see – the estimated rows were 3,853 versus 562 rows. Now recall that automatic creation of statistics are disabled, so I’m going to manually create the following index:
— Create an index on TaxAmt (which means we also get stats)
CREATE INDEX IX_FactInternetSales_TaxAmt ON
dbo.FactInternetSales (TaxAmt)
If I re-execute the previous query against dbo.FactInternetSales, I see the following:
Now our estimates match the actual results. We have a better estimate, although notice that a table scan was still chosen (although imagine the disparity had this table had significantly more rows – and associated impact). But in essence, we did use the statistics associated with that index, so what about index usage stats?
If I execute the following query, I get no results at all:
— Was the index used? Not according to sys.dm_db_index_usage_stats
SELECT u.user_seeks, u.user_lookups, u.user_scans
FROM sys.dm_db_index_usage_stats u
INNER JOIN sys.indexes i ON
u.object_id = i.object_id AND
u.index_id = i.index_id
WHERE u.object_id=object_id('dbo.FactInternetSales') AND
i.name = 'IX_FactInternetSales_TaxAmt'
So let’s drop the index we just created and see what happens:
USE [AdventureWorksDW]
GO
DROP INDEX [IX_FactInternetSales_TaxAmt] ON [dbo].[FactInternetSales] WITH ( ONLINE = OFF )
GO
— Estimated rows with index = 3,853
— Actual rows (562)
SELECT RevisionNumber
FROM dbo.FactInternetSales
WHERE TaxAmt = 5.08
As you may expect, we’re back to the estimation issue:
As a final test, let’s enable auto-creation of statistics and re-execute the query (again, with no supporting index statistics):
USE [master]
GO
ALTER DATABASE [AdventureWorksDW] SET AUTO_CREATE_STATISTICS ON WITH NO_WAIT
GO
USE [AdventureWorksDW]
GO
SELECT RevisionNumber
FROM dbo.FactInternetSales
WHERE TaxAmt = 5.08
The plan shows that actual versus estimated matches again (without the index) because auto-creation of statistics occurred in the background:
And I can validate if auto-statistics were created as follows:
SELECT s.name, STATS_DATE(s.object_id, s.stats_id) auto_stats_date
FROM sys.stats s
INNER JOIN sys.stats_columns c ON
s.stats_id = c.stats_id AND
s.object_id = c.object_id
WHERE s.object_id = object_id('dbo.FactInternetSales') AND
s.auto_created = 1 AND
c.column_id = 20 – TaxAmt
And indeed – this was the case:
Will this stop me from using sys.dm_db_index_usage_stats to identify high cost/low benefit indexes?
Absolutely not. The potential benefit of identifying and eliminating wasteful indexing is too great and this is a fantastic (but not perfect) method to use in assessing an indexes’ value.
However – I will also be mindful of such scenarios. If someone tells me that plans have turned for the worse after an index cleanup operation, I’ll validate this very scenario. And even if I found this as the root cause, my bias would revolve around creating the needed statistics, rather than creating an index that is not used for actual data access.
Wish list time… One “nice to have” for a future version of sys.dm_db_index_usage_stats would be to add a stats_lookup bigint column and a last_stats_lookup datetime column. I would see it as a great way to ensure we address indexes that are used exclusively for the index column statistics associated with it.
6 thoughts on “Exceptions – what sys.dm_db_index_usage_stats doesn’t tell you”
Thanks Robert! Yes, after I heard about this from Greg I’ve been meaning to explore further. Took a while to get to it. Sneaky situation – and tough to identify on a busy SQL Server instance.
Very nice! I was thinking of these two as separate object after seeing Kimberly’s MCM video about statistics but never thought that it is implemented separately inside as well. It is amazing that it creates statistic object along with the Index creation to reduce the DBA’s burdon. It clearly shows the strength of Optimizer to use useful information to generate plan.
It also shows that after dropping index, if someone reports performance problem than we need to understand that index is not required but statistic is required for the query.
Most of the environment has AUTO CREATE and AUTO UPDATE statistics on, so we are not seeing this effect after dropping unused index.
Very interesting. This dmo could also be used to explain how stats without an index are used. Nicely done.
Thanks for your post, it was interesting.
I actually have another issue with sys.dm_db_index_usage stats. I can see that a non clustered index is used in the last_user_seek and a matching datetime in the clustered the last_user_lookup. The column user_lookups is updated for the clustered index, but the column in user_seeks for the non clustered index is not updated. As a matter of fact the column user_seeks is zero since the last restart. Do you think this issue is related to your post?
Regards
/Håkan Winther
MCITP: Database developer
Hi /Håkan
Actually, no, I think this is a different situation. When you have a user_lookup that references the number of bookmark lookups by user queries. A bookmark lookup occurs for clustered indexes. So you’re seeing the expected behavior.
Hope this helps,
Joe
Comments are closed.