I gave a presentation on columnstore indexing last week and one question I received was regarding the performance of a (hypothetical) narrow, supporting nonclustered index versus a columnstore index. We discussed how nonclustered indexes were still going to be more effective for specific types of queries (for example – singleton operations, smaller range scans, etc.). But what about a situation where we’re aggregating data across the entire fact table? Was there any way that a narrow nonclustered index could measure up against a columnstore index?
The ability to contrast the performance is of course very much dependent on the query construction, table design, data distribution and other factors. With that said, I thought I would test out a scenario as follows…
First of all, I used FactInternetSales – pumped up to 123,695,104 rows. The query I used for comparing performance was as follows (with a few hint-modifications to force index and parallelism caps that I used later):
SELECT p.ProductLine,
SUM(f.SalesAmount) TotalSalesAmount
FROM [dbo].[FactInternetSales] f
INNER JOIN [dbo].[DimProduct] p ON
f.ProductKey = p.ProductKey
GROUP BY p.ProductLine
ORDER BY p.ProductLine;
I created the following nonclustered indexes (index on joining keys – and then an INCLUDE on the SalesAmount which is being aggregated):
CREATE NONCLUSTERED INDEX [NCI_FactInternetSales_ProductKey]
ON [dbo].[FactInternetSales]
([ProductKey] ASC)
INCLUDE ( [SalesAmount]) ON [PRIMARY];
GO
CREATE NONCLUSTERED INDEX [NCI_DimProduct_ProductKey_ProductLine] ON [dbo].[DimProduct]
( [ProductKey] ASC,
[ProductLine] ASC
)ON [PRIMARY];
GO
I also created a columnstore index that covered all columns in the fact table. To force my query to ignore that index for the comparison test, I added the following line:
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX);
So how did the performance compare between a supporting nonclustered index versus the columnstore index? (By the way, I used a warm cache for both tests – executing twice):
Index Support |
Elapsed Time |
CPU Time |
Columnstore |
256 ms. |
1045 ms. |
Nonclustered index |
34189 ms. |
33322 ms |
The columnstore index plan was as follows:
The nonclustered index plan was as follows:
One obvious difference is that the columnstore index execution leveraged parallelism. The query also benefited from batch execution mode:
If I hobble the columnstore index by capping max degree of parallelism, the results (in comparison) are as follows (third row represents the test):
Intended Index Support |
Elapsed Time |
CPU Time |
Mode |
Columnstore |
256 ms. |
1045 ms. |
Batch |
Nonclustered index |
34189 ms. |
33322 ms. |
Row |
Columnstore index (MAXDOP 1) – ending up using NCI |
33540 ms. |
34560 ms. |
Row |
Looking at the plan, we see the columnstore index wasn’t used at all:
If I force use of the columnstore index with MAXDOP 1, I see:
Index Support |
Elapsed Time |
CPU Time |
Mode |
Columnstore |
256 ms. |
1045 ms. |
Batch |
Nonclustered index |
34189 ms. |
33322 ms. |
Row |
Columnstore index (MAXDOP 1) – ending up using NCI |
33540 ms. |
34560 ms. |
Row |
Columnstore index (MAXDOP 1) – forced columnstore index |
29859 ms. |
30107 ms. |
Row |
The plan shows the index is used, but it is row mode (due to capping parallelism):
Looping back to the inciting discussion – even with a supporting nonclustered index, this specific query which scanned all 123 million rows and returned a smaller result set with aggregated values performed significantly better when using a columnstore index in conjunction with batch execution mode.
3 thoughts on “Comparing Query Performance when using an “ideal” Nonclustered Index versus Columnstore Index”
Very interesting post, Joe! I am just curious, did you do the normal DBCC DROPCLEANBUFFERS between query runs for your testing? Did you try using PAGE compression on the non-clustered indexes? Finally, what kind of hardware were you running these tests on? Probably a laptop, but you know me, I would like to know what CPU, how much RAM, what kind of storage it has.
Thanks Glenn! Here is some additional information (and I’ll be trying to get a parallel plan for the NCI scenario so we can compare that performance as well):
-) No exec of DBCC DROPCLEANBUFFERS (each test run I did execute twice, and then I used the second exec for the post)
-) I didn’t try page compression – BUT – I think that is a good idea to test (thanks)
-) Hardware is my laptop, Intel Core i7 2820 (4 cores)
-) 16 GB RAM
-) Samsung SSD PM810 2.5" 7
More questions spawning from this post – so I’ll update here…
Posting parallel NCI query (per Adam Twitter suggestion):
SELECT ProductLine,
SUM(SalesAmt)
FROM (SELECT DISTINCT ProductLine, 0%ProductKey +
ProductKey AS Pkey
FROM [dbo].[DimProduct]) AS p
OUTER APPLY
(SELECT SUM(y.SalesAmount) SalesAmt
FROM [dbo].[FactInternetSales] AS y
WITH(INDEX ([NCI_FactInternetSales_ProductKey]))
WHERE p.Pkey = y.ProductKey) AS x
GROUP BY ProductLine
OPTION (QUERYTRACEON 8649);
Same results (had NULL row) but matched data set. 16 seconds duration using NCI and parallelism.
Comments are closed.