SQL Server 2008 R2 SP1 and SQL Server 2012 RTM+ includes actual row count statistics in the sys.dm_exec_query_stats dynamic management view. This includes the total_rows, last_rows, min_rows and max_rows columns, which are intuitively named.
One potential use of the row statistics data is to detect cardinality estimate issues where the actual count of rows is significantly different from the query plan’s estimated rows. While I can see using this information as a valid approach, there are limits, and I’ll walk through them in this post.
To illustrate the use cases, I’m going to use the Credit database to execute three different queries. I’ll show you the plan tree tab from SQL Sentry Plan Explorer for each query.
Query 1: No Cardinality Estimate Issue
SELECT region.region_name, member.lastname, member.firstname, member.member_no FROM dbo.member INNER JOIN dbo.region ON region.region_no = member.region_no WHERE region.region_no = 9; GO
The plan:
For each operation in this query, the estimated rows match the actual rows.
Query 2: Cardinality Estimate Issue, Leaf-Level + Final Operator
DECLARE @Column INT = 2, @Value INT = 10; SELECT [member].[member_no], [member].[street], [member].[city], [charge].[charge_no], [charge].[provider_no], [charge].[category_no], [charge].[charge_dt], [charge].[charge_amt], [charge].[charge_code] FROM [dbo].[charge] INNER JOIN [dbo].[member] ON [member].[member_no] = [charge].[member_no] WHERE CHOOSE(@Column, [charge].[provider_no], [charge].[category_no]) = @Value; GO
The plan:
Unlike the previous query, for this query we have leaf-level estimated vs. actual row count skews and skews that flow up to the final operation.
Query 3: Cardinality Estimate Leaf-Level Skew and No Skew for Root Operator
SELECT TOP (1000) [member].[member_no], [member].[lastname], [member].[firstname], [region].[region_no], [region].[region_name], [provider].[provider_name], [category].[category_desc], [charge].[charge_no], [charge].[provider_no], [charge].[category_no], [charge].[charge_dt], [charge].[charge_amt], [charge].[charge_code] FROM [dbo].[provider] INNER JOIN [dbo].[charge] ON [provider].[provider_no] = [charge].[provider_no] INNER JOIN [dbo].[member] ON [member].[member_no] = [charge].[member_no] INNER JOIN [dbo].[region] ON [region].[region_no] = [member].[region_no] INNER JOIN [dbo].[category] ON [category].[category_no] = [charge].[category_no]; GO
The plan:
For this third query, we see leaf-level skews (and intermediate-level for the Hash Match), but then the root of the plan does NOT have a skew (estimated 1,000 vs. actual 1,000).
Detecting Issues
So in my next query, I pull the estimated rows out from the query execution plan of the three queries and compare it to the last actual row count values from sys.dm_exec_query_stats (and I’m keeping this example query as simple as possible):
SELECT t.text, p.[query_plan], s.[last_execution_time], p.[query_plan].value('(//@EstimateRows)[1]', 'varchar(128)') AS [estimated_rows], s.[last_rows] FROM sys.[dm_exec_query_stats] AS [s] CROSS APPLY sys.[dm_exec_sql_text](sql_handle) AS [t] CROSS APPLY sys.[dm_exec_query_plan](plan_handle) AS [p] WHERE DATEDIFF(mi, s.[last_execution_time], GETDATE()) < 1 GO
The results are as follows:
So what do we see here? We show a final skew for just one query (Query #2). The other two queries show no final skew. For Query #1 – that is an accurate assessment, but for Query #3 the underlying skew is hidden since we’re only looking at the final operator estimated rows vs. actual. So our actual row count stats are still useful – but for only one specific skew scenario.
Why Does this Matter?
Even if the query optimizer estimates final row counts accurately, the leaf-level skews can drive performance issues. For example – for leaf-level or intermediate skews, you may be under-estimating memory grants sizes (risk of spills) or actually over-estimating memory grant sizes (potential concurrency issues for larger grant queries). Another example may be the seek vs scan decision, with unnecessary I/O being driven by leaf-level skews.
So to summarize, the additional sys.dm_exec_query_stats data is useful for situations where we can detect the skew at the root of the query plan, but it doesn’t help you identify leaf-level and intermediate-level skews.