This weekend I set up a Windows Azure virtual machine running SQL Server 2014 CTP2 (12.0.1524). I had a local copy of SQL Server 2014 CTP2 in VMware as well, but thought it would be more fun to explore it on Windows Azure (kill two explorations with one stone). On a side note, I’m really digging having the azure credits to play with each month. If you have an MSDN subscription, be sure to take advantage of this benefit.
In this post I’m just sharing my initial exploration steps regarding the query_optimizer_estimate_cardinality XE event. I’m not entirely sure how well this event will be documented, but I’m definitely interested in learning more about it.
For my test scenario, I attached a version of AdventureWorksLT2012, set it to compatibility level 120 and then created the following session:
CREATE EVENT SESSION [XE_Card_Calculator] ON SERVER ADD EVENT sqlserver.query_optimizer_estimate_cardinality WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS, MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF) GO
If you’re looking for the query_optimizer_estimate_cardinality in the GUI, keep in mind that it is in the Debug channel (so you’ll need to select this in order to see it). This also implies that there is likely nontrivial overhead to enabling this event, so while there isn’t an explicit warning for this event like other more invasive ones, I would still use it with caution.
As for the description of this event in the GUI, it is as follows:
“Occurs when the query optimizer estimates cardinality on a relational expression.”
Okay – no big deal, right? Why care?
So in keeping things simple, I executed the following query against a single table (using the RECOMPILE so I can get the event each time I tested it out):
SELECT AddressLine1 FROM [SalesLT].[Address] WHERE AddressID = 9 OPTION (RECOMPILE);
The actual query execution plan had a Clustered Index Seek with an estimate of 1 row. And gathering the query_optimizer_estimate_cardinality event I saw two events surfaced.
The first event had the following information:
calculator | <CalculatorList> <FilterCalculator CalculatorName=”CSelCalcUniqueKeyFilter” /> </CalculatorList> |
creation_time | 2013-11-16 16:56:35.6666666 |
input_relation | <Operator Name=”LogOp_Select” ClassNo=”32″> <StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”450.00″ TableName=”SalesLT.Address” /> <Operator Name=”ScaOp_Comp ” ClassNo=”100″> <CompInfo CompareOp=”EQ” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Address]” ColumnName=”AddressID” /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”int” Value=”(9)” /> </Operator> </Operator> </Operator> |
query_hash | 13158512245962950952 |
stats_collection | <StatsCollection Name=”CStCollFilter” Id=”2″ Card=”1.00″ /> |
stats_collection_id | 2 |
The second event had the following information:
calculator | <CalculatorList /> |
creation_time | 2013-11-16 16:56:35.6666666 |
input_relation | <Operator Name=”LogOp_SelectIdx” ClassNo=”43″> <StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”450.00″ TableName=”SalesLT.Address” /> <Operator Name=”ScaOp_Comp ” ClassNo=”100″> <CompInfo CompareOp=”EQ” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Address]” ColumnName=”AddressID” /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”int” Value=”(9)” /> </Operator> </Operator> </Operator> |
query_hash | 13158512245962950952 |
stats_collection | <StatsCollection Name=”CStCollFilter” Id=”2″ Card=”1.00″ /> |
stats_collection_id | 2 |
So there is a lot here to dig through, but I highlighted a couple of values that stood out. And I know that AddressID happens to be my clustered, unique, primary key column for this table.
What happens if I reference a non-unique key value that is covered by an index (such as StateProvince)?
SELECT AddressID FROM [SalesLT].[Address] WHERE StateProvince = 'Arizona' OPTION (RECOMPILE);
This query uses in Index Seek as I expected, and this time for query_optimizer_estimate_cardinality I saw a new calculator value:
<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcColumnInInterval” Selectivity=”0.029″ TableName=”[AdventureWorksLT2012].[SalesLT].[Address]” ColumnName=”StateProvince” StatId=”5″ />
</CalculatorList>
The stats_collection value was as follows:
<StatsCollection Name=”CStCollFilter” Id=”2″ Card=”13.00″>
<LoadedStats>
<StatsInfo DbId=”5″ ObjectId=”69575286″ StatsId=”5″ />
</LoadedStats>
</StatsCollection>
Also – for a scenario where I didn’t have stats – and disabled them from being auto-created (to simulate a wild guess scenario), I saw the following calculator list:
<CalculatorList>
<FilterCalculator CalculatorName=”CSelCalcPointPredsFreqBased“>
<SubCalculator Role=”DistinctCountPlan”>
<DistinctCountCalculator CalculatorName=”CDVCPlanLeaf“ Guesses=”1″ CoveringStatId=”4″ CoveringStatDensity=”450.000″ />
</SubCalculator>
</FilterCalculator>
</CalculatorList>
The “Guesses” part looks promising (thinking magic numbers/selectivity guesses/heuristics, whatever you like to call it).
When executing a query that kicked off auto-stats operations, I saw the following operator information:
<Operator Name=”LogOp_GbAgg” ClassNo=”31″>
<StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” />
<Operator Name=”AncOp_PrjList ” ClassNo=”137″>
<Operator Name=”AncOp_PrjEl ” ClassNo=”138″>
<Operator Name=”ScaOp_AggFunc ” ClassNo=”90″>
<AggFuncInfo AggType=”STATMAN” />
<Operator Name=”ScaOp_Identifier ” ClassNo=”99″>
<IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Customer]” ColumnName=”LastName” />
</Operator>
</Operator>
</Operator>
</Operator>
</Operator>
And I saw the following calculator information (for the auto-stats operations):
<CalculatorList>
<DistinctCountCalculator CalculatorName=”CDVCPlanTrivial” />
</CalculatorList>
And lastly, I tried a query with a bad-practice (fiddling with the column reference via concatenation) to see what steps would be taken:
SELECT [CustomerID] FROM [SalesLT].[Customer] WHERE LastName + ' ' = 'Gates' OPTION (RECOMPILE);
This query plan just had a Clustered Index Scan, but spawned five query_optimizer_estimate_cardinality events associated with it (and I tested this a few times to see if the 5-event output was consistent):
calculator | input_relation |
<CalculatorList> <FilterCalculator CalculatorName=”CSelCalcHistogramComparison” Selectivity=”0.002″ ComparisonType=”Interval” /> </CalculatorList> |
<Operator Name=”LogOp_Select” ClassNo=”32″> <StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” /> <Operator Name=”ScaOp_Comp ” ClassNo=”100″> <CompInfo CompareOp=”EQ” /> <Operator Name=”ScaOp_Arithmetic ” ClassNo=”87″> <ArithmeticInfo Operation=”ADD” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Customer]” ColumnName=”LastName” /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”nvarchar(1)” Value=”N’ ‘” /> </Operator> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”nvarchar(5)” Value=”N’Gates'” /> </Operator> </Operator> </Operator> |
<CalculatorList /> | <Operator Name=”LogOp_Project” ClassNo=”29″> <OpProjectInfo /> <StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” /> <Operator Name=”AncOp_PrjList ” ClassNo=”137″> <Operator Name=”AncOp_PrjEl ” ClassNo=”138″> <Operator Name=”ScaOp_Arithmetic ” ClassNo=”87″> <ArithmeticInfo Operation=”ADD” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Customer]” ColumnName=”LastName” /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”nvarchar(1)” Value=”N’ ‘” /> </Operator> </Operator> </Operator> </Operator> </Operator> |
<CalculatorList> <FilterCalculator CalculatorName=”CSelCalcHistogramComparison” Selectivity=”0.002″ ComparisonType=”Interval” /> </CalculatorList> |
<Operator Name=”LogOp_Select” ClassNo=”32″> <StatsCollection Name=”CStCollProject” Id=”3″ Card=”847.00″ /> <Operator Name=”ScaOp_Comp ” ClassNo=”100″> <CompInfo CompareOp=”EQ” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo ColumnName=”Expr1002″ /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”nvarchar(5)” Value=”N’Gates'” /> </Operator> </Operator> </Operator> |
<CalculatorList /> | <Operator Name=”LogOp_Project” ClassNo=”29″> <OpProjectInfo /> <StatsCollection Name=”CStCollBaseTable” Id=”1″ Card=”847.00″ TableName=”SalesLT.Customer” /> <Operator Name=”AncOp_PrjList ” ClassNo=”137″> <Operator Name=”AncOp_PrjEl ” ClassNo=”138″> <Operator Name=”ScaOp_Arithmetic ” ClassNo=”87″> <ArithmeticInfo Operation=”ADD” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo TableName=”[AdventureWorksLT2012].[SalesLT].[Customer]” ColumnName=”LastName” /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”nvarchar(1)” Value=”N’ ‘” /> </Operator> </Operator> </Operator> </Operator> </Operator> |
<CalculatorList> <FilterCalculator CalculatorName=”CSelCalcHistogramComparison” Selectivity=”0.002″ ComparisonType=”Interval” /> </CalculatorList> |
<Operator Name=”LogOp_Select” ClassNo=”32″> <StatsCollection Name=”CStCollProject” Id=”5″ Card=”847.00″ /> <Operator Name=”ScaOp_Comp ” ClassNo=”100″> <CompInfo CompareOp=”EQ” /> <Operator Name=”ScaOp_Identifier ” ClassNo=”99″> <IdentifierInfo ColumnName=”Expr1002″ /> </Operator> <Operator Name=”ScaOp_Const ” ClassNo=”98″> <ConstInfo Type=”nvarchar(5)” Value=”N’Gates'” /> </Operator> </Operator> </Operator> |
Lots of scenarios to mull over and dig through as time permits.
Why care?
Many query performance issues (and associated query plan quality issues) are due to cardinality estimate skews. It would be great to have a way to more efficiently point to how the various estimates are being calculated and why the estimates are off.
I’m not sure how in-depth this event and associated calculators will be documented by Microsoft, and my assumption is that we’ll need to figure it out via collective reverse-engineering. But in the meantime this new XE event might prove to be quite useful for troubleshooting the more mysterious cardinality estimates.