Let’s say you have a heap table with 1,000,000 rows in it. Let’s also say that your automatic creation of statistics are disabled, as well as updates to the statistics (and in this scenario, there are NO existing indexes or statistics).
What kind of selectivity guess would the optimizer make for a query like the following? (I copied over the FactInternetSales table with no indexes and put 1,000,000 rows in it – tested on SQL Server 2012, 11.0.2316)
SELECT SalesOrderNumber FROM dbo.FactInternetSales
WHERE ProductKey = 537;
First of all, the actual plan on my system was as follows:
If I look at the properties of the table scan, I see the actual number of rows was 23,042 versus the estimated of 31,622.8. (As an aside – execute SELECT POWER(1000000.0,.75) and you’ll see how the estimated number of rows was derived). I also see that the table cardinality is known (and correct) at 1,000,000.
What if I have a predicate referencing another column, like OrderDateKey? This too results in an estimate of 31,622.8 rows.
The following table shows a sampling of various queries and associated search conditions, along with their associated row estimates:
Query | Estimated Rows | Percentage (based on 1,000,000 rows) |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey = 537; |
31,622.8 | 3.16228% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE OrderDateKey = 20040224; |
31,622.8 | 3.16228% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey BETWEEN 537 AND 600; |
90,000 | 9% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey BETWEEN 537 AND 537; |
90,000 | 9% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE TotalProductCost < 100.00; |
300,000 | 30% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE TotalProductCost > 100.00; |
300,000 | 30% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE TotalProductCost > 100.00 AND TotalProductCost < 200.00; |
90,000 | 9% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE TotalProductCost >= 100.00 AND TotalProductCost <= 200.00; |
90,000 | 9% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE TotalProductCost LIKE 100.00; |
100,000 | 10% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE SalesOrderNumber LIKE ‘%A%’; |
539,232 | 53.9232% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE SalesOrderNumber LIKE ‘A%’; |
269,616 | 26.9616% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE SalesOrderNumber LIKE ‘%A’; |
269,616 | 26.9616% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE TotalProductCost > (ProductStandardCost + 10.00); |
300,000 | 30% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey IN (1); |
31,622.8 | 3.16228% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey IN (1,2); |
62,245.6 | 6.22456% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey = 1 OR ProductKey = 2; |
62,245.6 | 6.22456% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey IN (1,2,3); |
91,900 | 9.19% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey = 1 OR ProductKey = 2 OR ProductKey = 3; |
91,900 | 9.19% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey = 1 AND ProductKey = 2 AND ProductKey = 3; |
10,746.1 | 1.07461% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey = 1 AND ProductKey = 2; |
13,335.2 | 1.33352% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE (TaxAmt * UnitPrice) > 10.00; |
300,000 | 30% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey NOT IN (1); |
968,377 | 96.8377% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey <> 1; |
968,377 | 96.8377% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey NOT IN (1,2); |
13,335.2 | 1.33352% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey NOT IN (1,2,3); |
10746.1 | 1.07461% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey <> 1 AND ProductKey <> 2 AND ProductKey <> 3; |
10746.1 | 1.07461% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE EXISTS (SELECT 1); |
1,000,000 | 100% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE NOT EXISTS (SELECT 1); |
0 | 0% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey IS NOT NULL; |
1,000,000 | 100% |
SELECT SalesOrderNumber
FROM dbo.FactInternetSales WHERE ProductKey IS NULL; |
0 | 0% |
ALTER TABLE dbo.FactInternetSales
ALTER COLUMN ProductKey int NULL; SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey IS NOT NULL; |
968,377 | 96.8377% |
ALTER TABLE dbo.FactInternetSales
ALTER COLUMN ProductKey int NULL; SELECT SalesOrderNumber FROM dbo.FactInternetSales WHERE ProductKey IS NULL; |
31,622.8 | 3.16228% |
There are numerous other scenarios I could have put in this table, but you probably get the point by now that the search conditions help define the which “guess” calculation is used in absence of statistics. When there are additional details beyond table cardinality, such as the nullability of a column or constraints, SQL Server can often leverage it accordingly.
4 thoughts on “Selectivity Guesses in absence of Statistics”
This table could be used to construct "cardinality hints" by introducing useless predicates which alter misestimated cardinality. A nasty hack, but it seems nicer that join hints which cause side-effects like forcing join order.
Thanks for writing this Joe. Does it mean that in a table which is a heap and with no -nonclustered index, the actual row-counts guessed by the optimizer while compiling a plan will always be wrong?
What about a heap with some non-clustered index on some columns?
Thanks
Chandan
Hi Chandan,
If auto-create / auto-update statistics are enabled (which most often they should be) – the guesses don’t come in to play (for this scenario at least). Basically the guesses come into play when there are no stats involved (whether stats objects or stats associated with an index).
Cheers
Comments are closed.