Today Joe Webb (Blog|Twitter) blogged about The Power of Regex in Powershell, and in his post he shows how to parse the SQL Server Error Log for events of interest. At the end of his blog post Joe asked about other places where Regular Expressions have been useful in PowerShell so I thought I’d blog my script for parsing SQLIO output using Regex in PowerShell, to populate an Excel worksheet and build charts based on the results automatically.
If you’ve never used SQLIO, Brent Ozar (Blog|Twitter) has a article on SQLServerPedia titled SAN Performance Tuning with SQLIO that includes a lot of information as well as a video tutorial on its use. I have been using SQLIO quite a bit this year, mainly to gather performance information for systems before and after reconfiguring their storage to show percent impact information to the business for making changes in storage like migrating from RAID 5 to 10, and aligning disk partitions used for database and log file storage. I have also used it in benchmarking new SAN configurations and the SAN configuration of our Virtual Machine environment to show team members why I have recommended that a number of our physical servers be migrated onto Virtual Machines.
I generally use a param.txt file similar to:
G:\MSSQL\testfile1.dat 4 0x0 32768
which creates a 32GB test file and then have a DOS batch file that runs the actual SQLIO tests using a script like:
sqlio -kW -t16 -s10 -o8 -fsequential -b8 -BH -LS -Fparam.txt
sqlio -kW -t8 -s360 -o8 -frandom -b8 -BH -LS G:\MSSQL\testfile1.dat > post.txt
sqlio -kW -t8 -s360 -o8 -frandom -b32 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -frandom -b64 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -frandom -b128 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -frandom -b256 -BH -LS G:\MSSQL\testfile1.dat >> post.txtsqlio -kR -t8 -s360 -o8 -frandom -b8 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -frandom -b32 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -frandom -b64 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -frandom -b128 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -frandom -b256 -BH -LS G:\MSSQL\testfile1.dat >> post.txtsqlio -kW -t8 -s360 -o8 -fsequential -b8 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -fsequential -b32 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -fsequential -b64 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -fsequential -b128 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kW -t8 -s360 -o8 -fsequential -b256 -BH -LS G:\MSSQL\testfile1.dat >> post.txtsqlio -kR -t8 -s360 -o8 -fsequential -b8 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -fsequential -b32 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -fsequential -b64 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -fsequential -b128 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
sqlio -kR -t8 -s360 -o8 -fsequential -b256 -BH -LS G:\MSSQL\testfile1.dat >> post.txt
Which runs the gamit of tests using 8, 32, 64, 128, and 256K block sizes, with 16 threads, and 8 pending IO. You can certainly expand the test set much further than this and do a very exhaustive testing regimen, I am by no means saying that my tests above are the best fit, but they have sufficiently met my needs for the tuning I have been doing. Once the tests have been run the post.txt file will contain an output entry similar the following for every test that was run.
sqlio v1.5.SG
using system counter for latency timings, 14318180 counts per second
16 threads writing for 60 secs to file d:\MSSQL\testfile1.dat
using 8KB random IOs
enabling multiple I/Os per thread with 8 outstanding
buffering set to use hardware disk cache (but not file cache)
using current size: 2048 MB for file: d:\MSSQL\testfile1.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 18527.91
MBs/sec: 144.74
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 6
Max_Latency(ms): 4900
histogram:
ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+
%: 78 6 8 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2
Linchi Shea (Blog) wrote a blog post three years ago that showed how to Parse the sqlio.exe Output using Perl. I found this in my search for a smarter way of parsing these long output files I was generating other than manually, but I don’t have Perl installed anywhere and I really felt that I should be able to do this natively using the tools readily available to me on my Windows 7 laptop, so I looked to PowerShell for a solution.
Reading the file data in is accomplished using Get-Content which returns a collection of strings. At first I tried casting this to a [string] object directly, but soon found out that when I did this the resulting string was missing the carriage returns and line feeds that existed in the original file. This posed a couple of problems for me in parsing the outputs initially, but doing a [string]::Join operation using [Environment]::NewLine as the concatenater allowed an appropriately CRLF delimited string to be returned.
$filedata = [string]::Join([Environment]::NewLine,(Get-Content $FileName))
Now that I had my file data loaded into a the $filedate object, the next step was to split this string based on the “sqlio v1.5.SG” header that is output at the beginning of each test run. Since PowerShell is .NET based, this is easily accomplished by using the Split() method of the System.String object which is System.Type for the $filedata object.
$Results = $filedata.Split( [String[]]"sqlio v1.5.SG", [StringSplitOptions]::RemoveEmptyEntries )
At first I started to write a foreach loop with this object, lets face it old habits die hard, but instead I made use of piping which allows a collection like the one returned by the Split() method to be worked on in a single operation. Piping in PowerShell is similar to a manufacturing process where an object gets transformed and then passed on until the ultimate result is achieved. Once an object is piped, it becomes accessible through the use of the $_ variable name, allowing the object to be transformed along the pipeline. By piping the result of the Split() I was able to write a SELECT statement that utilized Regex to get the information of interest out of each test run.
select @{Name="Threads"; Expression={[int]([regex]::Match($_, "(\d+)?\sthreads\s(reading|writing)").Groups[1].Value)}},`
@{Name="Operation"; Expression={switch ([regex]::Match($_, "(\d+)?\sthreads\s(reading|writing)").Groups[2].Value)
{
"reading" {"Read"}
"writing" {"Write"}
} }},`
@{Name="Duration"; Expression={[int]([regex]::Match($_, "for\s(\d+)?\ssecs").Groups[1].Value)}},`
@{Name="IOSize"; Expression={[int]([regex]::Match($_, "\tusing\s(\d+)?KB\s(sequential|random)").Groups[1].Value)}},`
@{Name="IOType"; Expression={switch ([regex]::Match($_, "\tusing\s(\d+)?KB\s(sequential|random)").Groups[2].Value)
{
"random" {"Random"}
"sequential" {"Sequential"}
} }},`
@{Name="PendingIO"; Expression={[int]([regex]::Match($_, "with\s(\d+)?\soutstanding").Groups[1].Value)}},`
@{Name="FileSize"; Expression={[int]([regex]::Match($_, "\s(\d+)?\sMB\sfor\sfile").Groups[1].Value)}},`
@{Name="IOPS"; Expression={[decimal]([regex]::Match($_, "IOs\/sec\:\s+(\d+\.\d+)?").Groups[1].Value)}},`
@{Name="MBs_Sec"; Expression={[decimal]([regex]::Match($_, "MBs\/sec\:\s+(\d+\.\d+)?").Groups[1].Value)}},`
@{Name="MinLat_ms"; Expression={[int]([regex]::Match($_, "Min.{0,}?\:\s(\d+)?").Groups[1].Value)}},`
@{Name="AvgLat_ms"; Expression={[int]([regex]::Match($_, "Avg.{0,}?\:\s(\d+)?").Groups[1].Value)}},`
@{Name="MaxLat_ms"; Expression={[int]([regex]::Match($_, "Max.{0,}?\:\s(\d+)?").Groups[1].Value)}}`
The select returns a new type of object that has Properties named Threads, Operation, Duration, IOSize, IOType, PendingIO, FileSize, IOPS, MBs_sec, MinLat_ms, AvgLat_ms, and MaxLat_ms. This alone is sufficient to proceed to creating an Excel file for the information, but I wanted the information sorted in the same manner every time I ran this, primarily because I change how I run my tests sometimes, and I like consistency in the output. Since I am so anal retentive thorough, I fixed this problem by piping the output from the select to the Sort-Object commandlet and sorted by the IOSize, IOType, Operation, and Threads.
Getting the data into Excel was actually fairly simple to do, thanks to Buck Woody(Blog|Twitter) and SQL Rockstar Tom LaRock(Blog|Twitter). Buck wrote a blog post titled Create an Excel Graph of your Big Tables – With Powershell, and Tom wrote a Simple Talk Article, Monitoring SQL Server Virtual Log File Fragmentation, that use the Interop Excel objects to create an Excel workbook from PowerShell, populate the workbook with data, and build charts using the data. My code for the Excel integration is based entirely on their examples.
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $true
$WorkBook = $Excel.WorkBooks.Add()
$WorkBook.WorkSheets.Item(1).Name = "RawData"
$WorkBook.WorkSheets.Item(3).Delete()
$WorkBook.WorkSheets.Item(2).Delete()
$WorkSheet = $WorkBook.WorkSheets.Item("RawData")
$x = 2
$WorkSheet.Cells.Item(1,1) = "Threads"
$WorkSheet.Cells.Item(1,2) = "Operation"
$WorkSheet.Cells.Item(1,3) = "Duration"
$WorkSheet.Cells.Item(1,4) = "IOSize"
$WorkSheet.Cells.Item(1,5) = "IOType"
$WorkSheet.Cells.Item(1,6) = "PendingIO"
$WorkSheet.Cells.Item(1,7) = "FileSize"
$WorkSheet.Cells.Item(1,8) = "IOPS"
$WorkSheet.Cells.Item(1,9) = "MBs/Sec"
$WorkSheet.Cells.Item(1,10) = "Min_Lat(ms)"
$WorkSheet.Cells.Item(1,11) = "Avg_Lat(ms)"
$WorkSheet.Cells.Item(1,12) = "Max_Lat(ms)"
$WorkSheet.Cells.Item(1,13) = "Caption"$Results | % {
$WorkSheet.Cells.Item($x,1) = $_.Threads
$WorkSheet.Cells.Item($x,2) = $_.Operation
$WorkSheet.Cells.Item($x,3) = $_.Duration
$WorkSheet.Cells.Item($x,4) = $_.IOSize
$WorkSheet.Cells.Item($x,5) = $_.IOType
$WorkSheet.Cells.Item($x,6) = $_.PendingIO
$WorkSheet.Cells.Item($x,7) = $_.FileSize
$WorkSheet.Cells.Item($x,8) = $_.IOPS
$WorkSheet.Cells.Item($x,9) = $_.MBs_Sec
$WorkSheet.Cells.Item($x,10) = $_.MinLat_ms
$WorkSheet.Cells.Item($x,11) = $_.AvgLat_ms
$WorkSheet.Cells.Item($x,12) = $_.MaxLat_ms
$WorkSheet.Cells.Item($x,13) = [string]$_.IOSize + "KB " + [string]$_.IOType + " " + `
[string]$_.Operation + " " + [string]$_.Threads + `
" Threads " + [string]$_.PendingIO + " pending"
$x++}
$WorkBook.Charts.Add() | Out-Null
$Chart = $WorkBook.ActiveChart
$Chart.SetSourceData($WorkSheet.Range("H1:H$x"))
$Chart.SeriesCollection(1).xValues = $WorkSheet.Range("M2:M$x")
$Chart.SetSourceData($WorkSheet.Range("H1:H$x"))
$Chart.SeriesCollection(1).xValues = $WorkSheet.Range("M2:M$x")
$Chart.Name = "IOPS"$WorkBook.Charts.Add() | Out-Null
$WorkBook.ActiveChart.SetSourceData($WorkSheet.Range("I1:I$x"))
$Chart = $WorkBook.ActiveChart
$Chart.SeriesCollection(1).xValues = $WorkSheet.Range("M2:M$x")
$Chart.Name = "MBs Sec"
I don’t know why, but the first Chart doesn’t refresh properly unless I call SetSourceData() and set the SeriesCollection xValues properties twice. Since the code block that loads the data into the worksheet tracks the rows with the $x variable, the Range for the chart is always row 1 to $x. The H column happens to be the column that holds the IOPS values and the I column holds the MBs Sec values in the workbook. The only thing missing at this point a parameter declaration at the head of the script file to require an input parameter $FileName which is passed to Get-Content to read the information.
param( [Parameter(Mandatory=$TRUE)]
[ValidateNotNullOrEmpty()]
[string]
$FileName)
The complete script is attached to this blog post. If you use it let me know your thoughts.
14 thoughts on “Parsing SQLIO Output to Excel Charts using Regex in PowerShell”
I ran SQLIO on a hyper-v VM today and was surprised to see approx 50,000 IOps per second. the same test on the host machine gave me around 1000 IOps. does it really mean physical to VM migration can boost io performance significantly for a given sql installation? looks like i should be doing another sql test both in vm as well as the host machine…
No, I’m going to go with there is a mistake or flaw in the testing here. A VM can’t achieve more IOPS than the host can for the same physical disk/LUN.
Nice script John. Life safer in analyzing my SQLIO output. Thanks!
Great script.
disovered bug when testing with 1 thread
\d+)?\sthreads\s(rea …
should be
\d+)?\sthreads?\s(re
(question mark to handle 1 or no s, since for one thread the output is “1 thread reading for 300 secs… “)
Jonathan,
Great idea !
But I have problem when using this script on Excel 2013,on
$WorkBook = $Excel.WorkBooks.Add()
line,exception is thrown
Exception calling “Add” with “0” argument(s): “Old format or invalid type library. (Exception from HRESULT: 0x80028018 (TYPE_E_INVDATAREAD))”
Hi Jonathan,
The same for me using the script with Excel 2013.
I have used it a lot in the past and like it a lot!
Is there a fix or is there a way to be able to store the output to SQL server database table instead of putting it to Excel?
I don’t have Excel 2013 installed on any machine to test or debug the problem with so I don’t know why it fails there. You can easily change the script to use out-csv to write the contents of the object into a CSV file instead of the Excel processing that it does.
Hey Mattias and Dino,
I installed Office 2013 Professional Plus 64-bit on my laptop after wiping it and installing Windows 8.1, and I don’t get the error listed above when I test the script. It doesn’t like the $WorkBook.WorkSheets.Item(3).Delete() and $WorkBook.WorkSheets.Item(2).Delete() lines because 2013 doesn’t start with 3 sheets like previous versions, but the .Add() execution works every time I run it without error. What version specifically are you seeing this on and do you have any updates applied/not applied?
Hi Jonathan!
Thanks for your response.
I have Excel Professional Plus 2013 (15.0.4532.1507) 64-bit
Br,
Mattias
I am getting the following:
”
You cannot call a method on a null-valued expression.
At C:\TEMP\SQLIOResults.ps1:xx char:”
+ $Worksheet.Cells.Item($x,x) = $_.Threads
+
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokemethodOnNull
“
First error I get is the same as Dino, the the above starts:
Exception calling “Add” with “0” argument(s): “Old format or invalid type library. (Exception from HRESULT: 0x80028018 (YPE_E_INVDATAREAD)
Sorry guys, but on three different installs I can’t reproduce this and I can’t put more time into trying to debug a problem I can’t reproduce. You can try debugging it yourself using Power GUI (www.powergui.org) and if you find out what the issue was, please let me know and I’ll update the script for those odd cases.
Hi again,
I was able to track down the problem.
It has to do with which regional settings you have on your computer running the PS script. I’m from Sweden so naturally I have Formats = Swedish.
Changing Formats = English (United States) solves my problem.
Hope that helps someone.
Br,
Mattias Lundell
For non-English regional settings
[Threading.Thread]::CurrentThread.CurrentCulture = ‘en-US’
$WorkBook = $Excel.WorkBooks.Add()
T.