Since SQL Server 2012 is relatively easy to install and get running, using just the default configuration settings, I thought it would be interesting to explore the negative performance effects of some of these default configuration settings, using a simple, repeatable test.
My test machine is a desktop machine with an Intel Core i7-3770K processor with 32GB of RAM, and two fast, consumer-level SSDs. The C: drive is a 512GB OCZ Vertex 4, and the L: drive is a faster, 256GB Samsung 840 Pro. Both SSDs are plugged into 6Gbps SATA III ports, and they are performing as I would expect.
For my test database, I downloaded a copy of the 2008 Credit database and restored it to a SQL Server 2012 SP1 CU5 instance. This database has a table called charge, that has 1.6 million rows of data. The database starts out with a 1000MB data file, and a 400MB log file with 15 VLFs. The database is in the SIMPLE recovery model, and both the data and log files have their autogrowth set to 10% (which is not the best choice). I used the MOVE command to place the SQL Server log file for the database on my separate, faster L: drive, so I would get the fastest possible sequential write performance for the log file.
To make this experiment more interesting, I changed the recovery model to FULL, and I changed the autogrowth for both the data and log files to 1MB (which is a really bad idea). Finally, I took a compressed, full database backup to have a SQL Server 2012 baseline for the experiments, as shown in Listing 1:
Initial Preparation of the Credit Database
USE [master]; -- Start of initial preparation -- Restore from the original SQL Server 2008 backup with MOVE if necessary -- This will upgrade it to SQL Server 2012 format RESTORE DATABASE [Credit] FROM DISK = N'C:\SQLBackups\CreditBackup100.bak' WITH FILE = 1, MOVE N'CreditData' TO N'C:\SQLData\CreditData.mdf', MOVE N'CreditLog' TO N'L:\SQLLogs\CreditLog.ldf', NOUNLOAD, STATS = 1; GO -- Change compatibility level to 110 (SQL Server 2012) -- This is different from the database format ALTER DATABASE [Credit] SET COMPATIBILITY_LEVEL = 110; GO -- Change recovery model to FULL ALTER DATABASE [Credit] SET RECOVERY FULL WITH NO_WAIT; GO -- Change file growth for data file to 1 MB (This is a bad thing to do!) ALTER DATABASE [Credit] MODIFY FILE (NAME = N'CreditData', FILEGROWTH = 1MB); GO -- Change file growth for data file to 1 MB (This is a bad thing to do!) ALTER DATABASE [Credit] MODIFY FILE (NAME = N'CreditLog', FILEGROWTH = 1MB) GO -- Take a compressed full backup for a baseline BACKUP DATABASE [Credit] TO DISK = N'C:\SQLBackups\CreditBackup110.bak' WITH NOFORMAT, INIT, NAME = N'Credit-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 1; GO -- End of initial preparation
Listing 1: Initial Preparation and Backup of Baseline Database
Standard Test Details
After the initial preparation, I ran a test, where I doubled the number of rows in the charge table every time I went through a loop, with a simple INSERT…SELECT pattern, also incrementing the charge_dt to a later date for each new set of charges. I added some timing information, and gathered some information about cumulative waits, VLFs, and file sizes for each test run. I dropped and then restored the baseline SQL Server 2012 Credit database before each test run. I also restarted the SQL Server Service before each test run after I made my configuration changes. Listing 2 shows the standard test loop that I ran for each configuration. Listing 3 shows the cleanup I did after each test was done.
-- Start Standard Test USE Credit; GO -- Supress done messages SET NOCOUNT ON; -- Get starting row count and size of Charge table EXEC sp_spaceused N'dbo.charge'; -- Get intial VLF count DBCC LogInfo; -- Individual File Sizes and space available for current database SELECT f.name AS [File Name] , f.physical_name AS [Physical Name], CAST((f.size/128.0) AS decimal(15,2)) AS [Total Size in MB], CAST(f.size/128.0 - CAST(FILEPROPERTY(f.name, 'SpaceUsed') AS int)/128.0 AS decimal(15,2)) AS [Available Space In MB], [file_id], fg.name AS [Filegroup Name] FROM sys.database_files AS f WITH (NOLOCK) LEFT OUTER JOIN sys.data_spaces AS fg WITH (NOLOCK) ON f.data_space_id = fg.data_space_id OPTION (RECOMPILE); -- Clear Wait Stats DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR); -- Declare and initialize some variables DECLARE @LoopCount int = 1; DECLARE @DateDiff int = 30; DECLARE @Start datetime = GETDATE(); DECLARE @LoopDone datetime; DECLARE @End datetime; -- Double the size of the charge table each time, changing charge_dt to a higher value each loop WHILE @LoopCount < 6 BEGIN INSERT INTO dbo.charge (member_no, provider_no, category_no, charge_dt, charge_amt, statement_no, charge_code) SELECT member_no, provider_no, category_no, charge_dt + @DateDiff, charge_amt, statement_no, charge_code FROM dbo.charge; -- Do some timing and housekeeping SET @LoopDone = GETDATE(); PRINT N'Loop ' + CONVERT(NVARCHAR(5), @LoopCount) + N' done in ' + CONVERT(NVARCHAR(5), DATEDIFF(second, @Start, @LoopDone)) + N' seconds'; SET @DateDiff += 30; SET @LoopCount += 1; END SET @End = GETDATE(); SELECT DATEDIFF(second, @Start, @End) AS [Total Elapsed Time (sec)]; -- Isolate top waits for server instance since last restart or statistics clear -- SQL Server 2012 and newer specific version WITH Waits AS (SELECT wait_type, CAST(wait_time_ms / 1000. AS DECIMAL(12, 2)) AS [wait_time_s], CAST(100. * wait_time_ms / SUM(wait_time_ms) OVER () AS decimal(12,2)) AS [pct], ROW_NUMBER() OVER (ORDER BY wait_time_ms DESC) AS rn FROM sys.dm_os_wait_stats WITH (NOLOCK) WHERE wait_type NOT IN (N'CLR_SEMAPHORE', N'LAZYWRITER_SLEEP', N'RESOURCE_QUEUE',N'SLEEP_TASK', N'SLEEP_SYSTEMTASK', N'SQLTRACE_BUFFER_FLUSH', N'WAITFOR', N'LOGMGR_QUEUE', N'CHECKPOINT_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH', N'XE_TIMER_EVENT', N'BROKER_TO_FLUSH', N'BROKER_TASK_STOP', N'CLR_MANUAL_EVENT', N'CLR_AUTO_EVENT', N'DISPATCHER_QUEUE_SEMAPHORE' ,N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'XE_DISPATCHER_WAIT', N'XE_DISPATCHER_JOIN', N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP', N'ONDEMAND_TASK_QUEUE', N'BROKER_EVENTHANDLER', N'SLEEP_BPOOL_FLUSH', N'SLEEP_DBSTARTUP', N'DIRTY_PAGE_POLL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',N'SP_SERVER_DIAGNOSTICS_SLEEP')), Running_Waits AS (SELECT W1.wait_type, wait_time_s, pct, SUM(pct) OVER(ORDER BY pct DESC ROWS UNBOUNDED PRECEDING) AS [running_pct] FROM Waits AS W1) SELECT wait_type, wait_time_s, pct, running_pct FROM Running_Waits WHERE running_pct - pct <= 99 ORDER BY running_pct OPTION (RECOMPILE); -- Get ending VLF count of current database DBCC LogInfo; -- Individual File Sizes and space available for current database SELECT f.name AS [File Name] , f.physical_name AS [Physical Name], CAST((f.size/128.0) AS decimal(15,2)) AS [Total Size in MB], CAST(f.size/128.0 - CAST(FILEPROPERTY(f.name, 'SpaceUsed') AS int)/128.0 AS decimal(15,2)) AS [Available Space In MB], [file_id], fg.name AS [Filegroup Name] FROM sys.database_files AS f WITH (NOLOCK) LEFT OUTER JOIN sys.data_spaces AS fg WITH (NOLOCK) ON f.data_space_id = fg.data_space_id OPTION (RECOMPILE); -- Get ending row count and size of Charge table EXEC sp_spaceused N'dbo.charge'; GO -- End Standard Test
Listing 2: Standard Test
-- Cleanup after each test run -- Drop Credit database USE [master] GO ALTER DATABASE [Credit] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; GO DROP DATABASE [Credit]; GO -- Restore from our initial baseline SQL Server 2012 full backup RESTORE DATABASE [Credit] FROM DISK = N'C:\SQLBackups\CreditBackup110.bak' WITH FILE = 1, MOVE N'CreditData' TO N'C:\SQLData\CreditData.mdf', MOVE N'CreditLog' TO N'L:\SQLLogs\CreditLog.ldf', NOUNLOAD, STATS = 1; GO -- Restart SQL Server Service -- End of Cleanup
Listing 3: Cleanup After Each Test Run
After all of this preparation, what did I actually want to test? Each test would measure the elapsed time and ending Virtual Log File (VLF) count after inserting 49.4 million rows into the credit table, in ever larger batches of INSERTs. I also wanted to see what the top cumulative wait statistics were for each test run.
Test Configurations and Results
Test 1
Disable Windows Instant File Initialization (IFI), which is the default condition when you install SQL Server, unless you choose to enable it. I used the very bad autogrowth value of 1MB for both the data file and the log file of the database. This is a very bad combination that you should not emulate! I expected this configuration to perform relatively poorly, and I was not surprised by the results. This test took 1074 seconds to complete, and the VLF count went up to 126,399. Table 1 shows the top cumulative wait types during this test run. The PREEMPTIVE_OS_xxx waits are related to the fact that we have such a small file autogrowth increment and that Windows Instant File Initialization is not enabled.
Wait Type | Percent | Running Percent |
CXPACKET | 72.22 | 72.22 |
PREEMPTIVE_OS_FLUSHFILEBUFFERS | 13.01 | 85.23 |
PREEMPTIVE_OS_WRITEFILEGATHER | 6.52 | 91.75 |
WRITELOG | 2.13 | 93.88 |
PREEMPTIVE_OS_FILEOPS | 1.39 | 95.27 |
PAGEIOLATCH_EX | 0.97 | 96.24 |
ASYNC_IO_COMPLETION | 0.82 | 97.06 |
WRITE_COMPLETION | 0.71 | 97.77 |
IO_COMPLETION | 0.67 | 98.44 |
LATCH_SH | 0.62 | 99.06 |
Table 1: Test 1 Cumulative Wait Types
Test 2
Enable Windows Instant File Initialization (making sure to restart the SQL Server Service). I used the same bad autogrowth value of 1MB for both the data file and the log file of the database. I expected this configuration to still perform relatively poorly. This test took 1069 seconds to complete, and the VLF count went up to 126,399. The lower elapsed time was really within the margin of error, so just enabling Windows Instant File Initialization was not really helping in a meaningful way (by itself). Keep in mind that IFI only affects the data file, not the log file. Table 2 shows the top cumulative wait types during this test run. The cumulative wait types during Test 2 did not really change in a meaningful way.
Wait Type | Percent | Running Percent |
CXPACKET | 72.88 | 72.88 |
PREEMPTIVE_OS_FLUSHFILEBUFFERS | 13.34 | 86.22 |
PREEMPTIVE_OS_WRITEFILEGATHER | 6.4 | 92.62 |
PREEMPTIVE_OS_FILEOPS | 1.39 | 94.01 |
WRITELOG | 1.12 | 95.13 |
WRITE_COMPLETION | 1.01 | 96.14 |
PAGEIOLATCH_EX | 0.87 | 97.01 |
ASYNCH_IO_COMPLETION | 0.83 | 97.84 |
LATCH_SH | 0.67 | 98.51 |
IO_COMPLETION | 0.65 | 99.16 |
Table 2: Test 2 Cumulative Wait Types
Test 3
Enable Windows Instant File Initialization (making sure to restart the SQL Server Service). I changed the autogrowth value to a more reasonable 1024MB for both the data file and the log file of the database. I expected this configuration to perform much better, with a reduced VLF count in the log file. This test took 771 seconds to complete, and the VLF count only went up to 246, which was much better. Table 3 shows the top cumulative wait types during this test run. We can see that PREEMPTIVE_OS_FLUSHFILEBUFFERS drops completely off the list when we have much fewer file growths during the test.
Wait Type | Percent | Running Percent |
CXPACKET | 88.95 | 88.95 |
PREEMPTIVE_OS_WRITEFILEGATHER | 7.48 | 96.43 |
ASYNCH_IO_COMPLETION | 1.11 | 97.54 |
PAGEIOLATCH_EX | 0.89 | 98.43 |
LATCH_SH | 0.88 | 99.31 |
Table 3: Test 3 Cumulative Wait Types
Test 4
Enable Windows Instant File Initialization (making sure to restart the SQL Server Service). I changed the autogrowth value to a more reasonable 1024MB for both the data file and the log file of the database. I also pre-grew the log file to 32000MB, in 4000MB increments (to reduce the VLF count and eliminate any log file autogrowths during the test). I expected this configuration to perform a little better than Test 3, with a reduced VLF count in the log file. This test took 711 seconds to complete, and the VLF count only went up to 127, which was even better. Table 4 shows the top cumulative wait types during this test run. Notice that the PREEMPTIVE_OS_WRITEFILEGATHER wait type is greatly reduced by this latest configuration, since we are only seeing a few file growths for the data file and none for the log file.
Wait Type | Percent | Running Percent |
CXPACKET | 94.74 | 94.74 |
PREEMPTIVE_OS_WRITEFILEGATHER | 1.57 | 96.31 |
ASYNCH_IO_COMPLETION | 1.13 | 97.44 |
PAGEIOLATCH_EX | 1.12 | 98.56 |
LATCH_SH | 0.94 | 99.50 |
Table 4: Test 4 Cumulative Wait Types
Test 5
Test 5 used the same configuration steps as Test 4, except that I changed the instance-level MAXDOP setting to 4 (since I had a single, quad-core processor with hyper-threading enabled). This change had no measurable effect, taking 716 seconds to complete, and the VLF count stayed at 127. Since there is only one NUMA node on my desktop machine, I did not expect any improvement here. Table 5 shows the top cumulative wait types during this test run.
Wait Type | Percent | Running Percent |
CXPACKET | 92.69 | 92.69 |
PREEMPTIVE_OS_WRITEFILEGATHER | 2.71 | 95.40 |
ASYNCH_IO_COMPLETION | 1.89 | 97.29 |
PAGEIOLATCH_EX | 1.68 | 98.97 |
LATCH_SH | 0.63 | 99.60 |
Table 5: Test 5 Cumulative Wait Types
Test 6
Test 6 used the same configuration steps as Test 5, except that I changed the tempdb configuration by adding three additional tempdb data files and I made all of the tempdb data files the same initial size (2048MB) with an autogrowth increment of 1024MB. I also made the tempdb log file a larger 1024MB initial size, with an autogrowth increment of 1024MB. This change also had no measurable effect, taking 723 seconds to complete, with the VLF count staying at 127. This was somewhat of a surprise, since I could see the original tempdb data file being hit pretty hard during the test runs. Table 6 shows the top cumulative wait types during this test run.
Wait Type | Percent | Running Percent |
CXPACKET | 94.07 | 94.07 |
ASYNCH_IO_COMPLETION | 1.96 | 96.03 |
PREEMPTIVE_OS_WRITEFILEGATHER | 1.96 | 97.99 |
PAGEIOLATCH_EX | 1.83 | 99.82 |
Table 6: Test 6 Cumulative Wait Types
Part One Conclusions
We can see from these tests that it is very beneficial to make sure that you use a reasonable autogrowth increment size for both your data file(s) and log file rather than an extremely small autogrowth increment size. It is also very beneficial to manually pre-grow your log file to an appropriate size, in 4000MB increments so that it does not have to autogrow during normal and maintenance operations. This will also keep your VLF count under control.
It is also beneficial to enable Windows Instant File Initialization (IFI), even though it only affects your SQL Server data file(s). More tests in Part 2 will show some of the other advantages of IFI for SQL Server. You can get the complete script and some more detailed results here. Table 7 shows the overall test results for Part 1. Keep in mind that this testing is using relatively high performance, consumer-level SSDs, so the negative effects of lots of small file growths in the data and log files are not as severe as they might otherwise be with magnetic disk storage.
Test Run | Elapsed Time | VLF Count |
Test 1 | 1074 seconds | 126,399 |
Test 2 | 1069 seconds | 126,399 |
Test 3 | 771 seconds | 246 |
Test 4 | 711 seconds | 127 |
Test 5 | 716 seconds | 127 |
Test 6 | 723 seconds | 127 |
Table 7: Part One Test Results
11 thoughts on “Experiments with SQL Server VLFs, Part 1”
Very interesting.
I was discussing with the other DBA at the shop about the autogrowth settings and he didn’t see why the 10% for the log is not a good idea. Could you rerun your tests with a 10% autogrowth for the log ? Also with 10% for the data and the log, the results should be *interesting*…
With an autogrow set to 10%, if the initial size is 100MB, you’re growing by 10MB… your next growth is from 110MB with an 11MB growth to 121MB. If you have a system that has a 380GB log file and it needs another 200MB, instead of growing at a nice 1024MB set, it’ll grow by 38GB.
Lots of VLF’s at first, and later on massive growth for a request.
Static growth is better, though I’m sure someone has a more detailed answer as well.
Not only lots of small growths early on, but much larger growths (with large VLF sizes) as the transaction log file gets larger and larger. These large growths will take a long time to complete, because Windows Instant File Initialization does not work on transaction log files.
I will do a test run with 10% log growth.
Great review Glenn, thanks!
Didier, the 10% default, or really any percent, is a bad idea. The tlog is zero initialized, so with a percent the autogrowths will get larger and larger, and each will cause SQL Server to freeze activity in the db. You will also lose control of the number of VLFs and the individual VLF sizes. For example, if you start with a 1GB log file, by the time it hits 100GB you’ll have 600 VLFs and it will be growing in >10GB chunks, likely suspending activity for at least a few seconds with every autogrowth, depending on the speed of your disk system.
Read this post and the comments for more details: https://www.sqlskills.com/blogs/kimberly/transaction-log-vlfs-too-many-or-too-few/
That said, I agree it would be interesting to see how the 10% default performs in comparison to the other tests, since it’s still far and away the most common setting I see in the wild.
Don’t worry, all the 10% or 1MB autogrowth settings are nearly gone, I changed nearly all the DBs I have access to.
I am a contractor and while talking with the DBA who works for the company who hired me, he didn’t see why 10% was not a good idea. And the contractor before me was thinking 10% autogrowth doesn’t hurt.
Last week, I saw a test DB, loading GBs of data, with a 10% autogrowth and each autogrowth was taking longer and longer, painful to watch in full realtime, but can’t change the setting for the test.
Can’t wait to see the results and your thoughts, Glenn.
I recently learned about how SQL zeros non-log files during a restore. Where can I gain insight about it’s progress? I found some flags to dump restore information to the SQL error log, but it only tells me after zeroing starts and finishes. Is the zeroing progress reported anywhere?
Thanks!
I know these are SSDs, but just wondering a couple of things: 1. if it would be good to defrag the physical drive and 2. if the tests could be run again with stored procedures instead of T-SQL and if that affected any of the results?
good tests!
you are using 4 tempdb fiels and don’t see any change, which is normal, you have only 1 connection opened and the system will use 1 tempdb file only.
what about the trace flags? you can play here to improve the tempdb usage.
I know the test was made in 2013, too late to add some tests in your batch.
but its very interesting to see the impact of small options…
I doubt that tempdb is really a bottleneck for this test. The main point in having multiple tempdb data files is to reduce allocation contention in tempdb, which is a workaround for an issue in SQL Server where multiple threads are trying to access the same tempdb data file. It is not directly I/O related in that context. The number of connections that are open also has nothing to do with it.