Blog posts in this series:

  • For the hardware setup I'm using, see this post.
  • For an explanation of log growth and its effect on perf, see this post.
  • For the baseline performance measurements for this benchmark, see this post.
  • For the increasing performance through log file IO optimization, see this post.
  • For the increasing performance through separation of data and log files, see this post.

In the previous post in the series, I examined the effects of separating the data and log files (one file each) to different RAID arrays. It was very obvious that separation gave a performance boost, and that having the portion of the database with the highest I/O write throughput requirements on the faster array (the 8-drive 15k SCSI RAID10) produced the biggest gain.

Now - a confession.  In the last post, when I posted it I found that moving the data an 8-drive 7.2k SATA RAID10 array was the best approach. *But* during the testing for this post, I found that one of my tests had screwed up and only half the client threads had run. You'll notice in that post I went back in and edited it to explain that and update the graph and results. I've now augmented my test harness with a way to check that all client threads are running - to make sure the tests are apples to apples, rather than apples to pomegranates :-)

So - the best I've been able to do so far with the tests is creating 1TB of data using 128 connections (each creating 1/128th TB using inserts with default values) with the single data file on an 8-drive 15k SCSI RAID array (pre-created to 1TB) and the log file on an 8-drive 7.2k SATA RAID10 array (pre-created to 256MB with 50MB autogrowth) in 20842 seconds.

Lots of people have been asking how my network is setup in these tests. Here's what I've been running with (all 1GB ethernet):

  • 1 NIC from a Broadcom BCM5708C NetXtreme II GigE card on the 10.x.x.x network
  • 1 NIC from a Broadcom BCM5708C NetXtreme II GigE card on the 192.168.x.x network
  • 2 x PowerConnect 5424 24-port iSCSI optimized switches , with no separation of traffic
  • The 10.x.x.x server NIC connected to all iSCSI arrays

Over the last couple of weeks I've been playing around with the network setup to make sure things are optimized, and this post will describe what I did and what effect it had. In all the tests below, I kept the dat aon the faster SCSI array and the log on the slower SATA array.

I'm very grateful to the help I received from Wes Brown (twitter|blog) and Denny Cherry (twitter|blog) to the technical questions and WTF?s I sent (and to anyone else on twitter I may have forgotten!).

1) Separation of network traffic

I decided to make one of the 5424 switches dedicated to iSCSI traffic on the 10.x.x.x network and the other for general network activity, including connecting to the management ports on the MD3000s. Turns out that I didn't really need to, as each 5424 can handle 48GB of throughput, way more than I'm generating. But hey ho, at least the wiring in the back of the 42U rack is a little tidier now :-)

Running the 128-way test with the new configuration gave a test time of 21252 seconds, slightly slower than the best time without separation! This was the first of the WTF?s. Until I realized that I hadn't actually removed any network bottleneck at all. I can't explain why things are slightly slower here, so I decided to take the switches out of the equation. My suspicion is that if I ran the test ten times, I'd get ten different results, but within a standard deviation of the median. So - no cause for concern. (In fact, I'm going to try this as part of the next set of tests.)

2) Direct connections to the iSCSI arrays

I configured another NIC (one from an Intel PRO/1000 PT Dual Port GigE card) and then had one NIC directly connected to one of the RAID controllers on the SCSI MD3000 (only one configured volume, so no concerns about having multiple volumes suddenly switching over to a non-optimal RAID controller) and the other NIC directly connected to the SATA MD3000.

Running the 128-way test with the new configuration gave a test time of 21859 seconds, slower than test #1. Very surprising - I expected to get some *gain* so I looked at the peak throughput of the arrays:

  • For test 1, peak SATA was 50500 bytes/sec and peak SCSI was 106012 bytes/sec.
  • For test 2, peak SATA was 46923 bytes/sec and peak SCSI was 107708 bytes/sec.

Things are slower with the network bottleneck removed.

3) Upgrading 5424 switch firmware and reconfiguring

Although the 5424 switches are supposed to come iSCSI optimized, I thought I'd flatten them and reconfigure the. I got the latest version of the switch firmware and re-flashed both switches. I think configured the 10.x.x.x one specifically for iSCSI using this excellent Dell whitepaper.

Running the 128-way test with the new configuration gave a test time of 20745 seconds. Finally an improvement, but nothing major, and still possibly just a statistical variation.

4) Upgrading the NIC drivers

Next I figured I'd bring the NICs up to the latest driver versions so upgraded all the NICs on all the servers.

Running the 128-way test with the new configuration gave a test time of 21743 seconds. Hmmm.

5) Homogenizing the network paths

At this point I started wondering if the Broadcom and Intel NICs had different characteristics so I decided to use the two Intel NICs for the iSCSI traffic. I also enabled jumbo frames. The Intel NICs have three setting for jumbo frames - off, 4088 bytes or 9014 bytes. The MD3000s can only go up to 9000 bytes, so I chose 4088 bytes and configured the MD3000 iSCSI ports to use the same.

Running the 128-way test with the new configuration gave a test time of 21526 seconds - nothing to write home about.

None of the network configuration changes I made had much effect on performance, apart from removing the network bottleneck, which made performance slightly worse overall. I checked other stuff like TCP offloading, but that wasn't enabled. My suspicion was that by removing the network bottleneck, I unmasked a SQL Server contention issue with my 128-connection test. I decided to try fewer client connections.

Here are the results:

There's clearly a SQL Server bottleneck that's being alleviated by reducing the number of connections and allowing the throughput to each array to increase slightly. With 8 connections, SQL Server isn't being driven hard enough and the elapsed time increases again, and this is reflected in the array throughput measurements too (a 10-15% drop compared to the 16-way test). One thing I forgot to do was examine the distribution of wait types while these tests were running, but my guess would be the bottleneck was in the transaction log manager.

Summary

By separating the network traffic and moving to two iSCSI NICs, I removed the network bottleneck I had (see the image at the bottom of the last post) and replaced it with a SQL Server bottleneck. Here's a snapshot of network utilization with the new setup:

 

In the next set of tests, I'm going to look at the effect of altering the transaction log auto-growth size, and pre-allocation size. In all the tests so far, the log has grown from the pre-allocated 256MB to somewhere between 6.5-8GB.

Should be interesting - stay tuned.

PS Yes, I'll be doing a bunch of stuff with the Fusion-io drives too - be patient! :-)

Christmas comes but once a year... really? Then mine just came early on this afternoon's UPS truck.

The very nice folks at Fusion-io just sent me two of their fully-loaded top-of-the-line ioDrive Duos with 640GB of solid-state flash memory in each. This is really extra-nice of them because on Dell's Small Business website they're currently retailing for $12800 *each*. Expensive? Yes. Worth it? That's what I'm hoping to prove.

There's nothing like expensive, pretty hardware to get me excited... here's what they look like:

Now, above I said 'expensive', and these are, but they pack some pretty amazing specs in terms of read/write bandwidth so you're getting a lot of bangs for you bucks. But where does it really make sense to drop the bucks for the biggest bangs? To answer that I'm planning to do a whole series of blog posts as part of my benchmarking efforts to investigate which operations can benefit the most from these drives. With 1.2TB of SSD storage I'll be able to plug these into one of my test systems here and run comparisons against 15k SCSI and 7.2k SATA drives.

Anyway, there's a lot of hype about the speed of SSDs, and also a lot of angst about SSDs not being Enterprise-ready. I don't agree with them not being Enterprise-ready - in fact, fellow-MVP Greg Linwood, who runs (among other things) our partner company SQLskills Australia, already has a bunch of customers with Fusion-io drives deployed in their enterprises successfully. As with any critical hardware infrastructure (especially cutting-edge stuff like this), the key to success is having everything setup correctly so I'll be blogging about all my experiences with them.

To summarize, I'm very excited! I've been wanting to get my hands on some serious SSD hardware for a couple of years now so I can do some *real* testing - it doesn't get better than this!

Shoot me an email or leave a comment if there's something you're interested in seeing tested.

PS Full disclosure: yes, of course Fusion-io sent me these because they're getting publicity from me blogging about them, but we don't have any editorial/veto agreement. I want to be able to recommend these to our enterprise clients and the only way to honestly do that is to play with them myself - so it's a win-win for both of us. And you guys get to test them vicariously through me, so it's a win-win for you too :-)

Stay tuned...

I've been spending some time reorganizing my electronic components and came across a serious blast from the past...

 

(Click the image for a larger version - 354Kb)

Left to right, from top down:

  • Zilog Z80 (popular in the Sinclair ZX computers)
  • Intel 8085 from 1980
  • AMD 8080 from 1977
  • Intel 8086 from 1978
  • Intel 8080 from 1979
  • Intel 8088 from 1981
  • Motorola 68000 (from the Atari ST and Commodore Amiga)

I guess I've kept these for nostalgia - I've got a bunch of 6502s but they're still in the computers. Checkout the Online Computer Museum (kind of slow) for a wonderful archive of nearly all old computers.

Enjoy!

Categories:
Hardware

I was reading a thread on SQL Server Central today where someone replied to a question and confused 823 with 832. Now, 823 is a nasty error to get - it says that an I/O operation failed at the OS level and the I/O subsystem is causing corruption - SQL Server didn't even get a chance to check the page checksum (and potentially raise an 824 error).

Error 832 is way worse. When you get one, you'll see an error like the following:

A page that should have been constant has changed (expected checksum: 1dcb28a7, actual checksum: 68c626bb, database 13, file 'E:\Program Files\microsoft sql server\MSSQL\data\BlahBlah.mdf', page (1:112644)). This usually indicates a memory failure or other hardware or OS corruption. 

This error occurs *AFTER* a page has been read into memory. Once the page passes all checks as it's read in from disk, it's a known clean page sitting in the buffer pool. When the time comes to update the page and mark it as a dirty page, the checksum is checked again, juuuust to make sure. If the checksum is no longer valid, error 832 is raised. This is saying that something apart from SQL Server stomped on the page while it was in SQL Server's memory - either hardware memory corruption (bad RAM), an OS memory-management bug, or a rogue process writing into SQL Server's memory space.

It's really bad. If memory diagnostics don't show up any issues, there's a way to start to track down what's going on, but only by enabling an expensive trace-flag at server startup that prevents untoward accesses to SQL Server's memory unless an exclusive page latch is held. And this should only be enabled under Product Support's supervision.

So - hopefully you'll never see this - they're very rare - I've only seen a handful in my time. If you do, run memory diagnostics, and if nothing shows up, call PSS to help you out.

Hope this helps.

(and yes, open invitation to send me any electronics/boards/drives to try if you work for a hardware manufacturer, I'll blog about it - address on our Contact page :-)

Part of being a Microsoft Regional Director is getting early access to information and private releases to play with. They also hand out free stuff occasionally. Yesterday Kimberly and I reach received a Windows 7 Sensor development board kit (and as Kimberly doesn't tinker with electronics, I effectively got two!) - these are the same ones that were being given out at PDC apparently.

The board looks pretty neat:

It has an ambient light sensor, a 3D accelerometer, and two touch-sensitive strips on. If you're interested you can buy these from FreeScale for $50 - see here (I have no affiliation with them at all). There's a slide deck from WinHEC that Gavin Gear presented that explains the new Sensor and Location Platform in Win7 - see here.

I'm looking forward to playing with mine and I'll blog about what I do with it.

PS You can get the latest Windows API Code Pack here.

Categories:
Electronics | Hardware

Last week's survey was another two-fold one - when you buy new servers, what architecture to you predominantly buy, and why?; when you buy new servers, which Edition of SQL Server do you predominantly buy, and why? Here are the results as of 5/17/2009.

 

For the first survey, the 'other' values were basically that 64-bit is purchased only if required. For the second survey, nine of the 'other' values were basically that Enterprise Edition is purchased only if required, one person uses Web Edition, and one person uses Developer Edition as they're just a developer.

I don't have a huge amount to say about these two surveys, I really just wanted to confirm my gut feel (and give you all a view of what's happening out there in the field).

For the architecture survey, I was entirely unsurprised to see that almost 80% of respondents are using 64-bit. It's widely known that 64-bit can give you better performance through the availability of more memory for SQL Server to use. Although you can use more than 4GB using AWE on 32-bit, that extra memory is only available for the buffer pool to use - not for general query processing - and AWE access does incur a little overhead. Much has been written about this and I'm not going to duplicate it here. A selection of articles can easily be found at http://www.google.com/search?hl=en&q=sql+server+64+bit+vs+32+bit&aq=1&oq=sql+server+64 Smile More interesting is the small number of respondents who are not able to use 64-bit, either because their corporate policy is 32-bit (maybe because of cost?), they have software that doesn't run on 64-bit systems, or specifically because 64-bit is too expensive.

A few people only use 64-bit if required, and (I imagine) prefer to save money if it not by sticking with 32-bit. I wonder how long it will be until 32-bit servers are not available at all? Certainly future versions of Microsoft server software are starting to become 64-bit only - for instance Exchange 2010 (see here) and SharePoint Server 2010 (see here), will the next version of SQL Server be 64-bit only?

For the Edition survey, again, unsurprising results. 55% use Enterprise Edition for one or a combination of the various "-abilities" (if you make performancability a new word), with almost 10% more having the option to use it if required, and 16% having to stay with Standard because of budgetary constraints. What you may be surprised to see (but I wasn't from my time working side-by-side with the SQL marketing team) is that 20% of respondents don't need Enterprise Edition. There were a lot of improvements to the database engine in SQL Server 2000 and SQL Server 2005 that meant that for intermediate workloads (hmm - just made that up, but you know what I'm trying to say), Enterprise Edition isn't needed. And with synchronous database mirroring available in Standard Edition, you can implement a great, low-cost high-availability plan without paying for Enterprise. As you can see, I don't get any kickbacks from the SQL team for selling Enterprise Edition Smile Saying that, however, there are a lot of very cool features in Enterprise Edition in the various versions - but if you don't *need* them, why pay for the higher Edition just for the sake of it?

One thing I will say to summarize, it's important to look at your requirements in terms of performance, availability, etc to make the right choice of server architecture and SQL Server Edition. If you make the wrong choice, do you think your company will pay to rectify your mistake before the next round of capital-expenditure? Probably not - but you'll definitely pay for the wrong choice with several years worth of hassle trying to make the system be more performant and easily recoverable than your choices allow.

As always, thanks for participating in the surveys - I've had a bunch of mail from people who like to see what other people are doing.

Next up - this week's survey!

Categories:
General | Hardware | Surveys

I've just been setting up some of our new hardware, and wanted to do some background reading to ensure I use the correct disk partition offset, RAID stripe size, and NTFS allocation unit size to enable the best possible performance for the volumes I'm creating.

You may not of heard about this (or your disk admins may not have heard about this) but on Windows Server 2003 and before, the default partition offset typically causes worse-than-optimal performance - and correcting it can get gains of maybe as high as 30% in terms of IO latency and duration. The SQLCAT team have just published a *fantastic* whitepaper (written by Jimmy May and Denny Lee) which explains the issue simply and clearly and shows you how to correct it. You should checkout the whitepaper at Disk Partition Alignment Best Practices for SQL Server.

The summary is that on Windows Server 2003 and before, the default partition offset is 31.5KB (63 x 512byte disk sectors), which does not align nicely with the common RAID stripe sizes of 64K or 128K, or the optimal NTFS allocation unit size of 64KB. This can lead to having to read/write multiple stripes every so often and a big perf drop. It can be fixed, as detailed in the whitepaper. For volumes *created* on Windows Server 2008, the problem does not exist as it creates a default partition offset of 1024KB.

In fact Jimmy just published a blog post to help you make the case to your disk admins/customers a few days ago: Disk Partition Alignment (Sector Alignment): Make the Case: Save Hundreds of Thousands of Dollars.

Luckily I'm using Windows Server 2008, which correctly sets the disk partition for the vast majority of cases.

Next thing I considered was RAID stripe size and NTFS allocation unit size (previously known as 'cluster size'). Kendal Van Dyke just published an *excellent* blog post series that provides a lot of empirical evidence as to what the best numbers are for the RAID level you're using. This saved me a lot of time. Check out his series at Disk Performance Hands On Series.

The Dell MD3000i units I'm using don't go any lower than 128KB for a RAID stripe size, so the default is fine. Unfortunately, I forgot to set the NTFS allocation unit size to 64KB when creating the partitions in Windows, so I need to recreate the partitions.

A massive thank-you to these guys for saving me a lot of time and hassle. You should go read this stuff too.

At the last few conferences I've presented at, there have been questions about using SSDs (Solid-State Drives) for enterprise storage and whether that will change some of the database maintenance practices. My answer to that is "I don't know" (ha - bet you thought I was going to say "It depends!") because adoption of SSDs is very low. I haven't been able to find much info about using them, but the Microsoft Research group in Cambridge just published a research paper Migrating Sever Storage to SSDs: Analysis of Tradeoffs, which does a nice job of walking through the issues involved and concludes that for the majority of workloads, it makes more economic sense to host them on HDDs. The exception is for top-end OLTP databases. I'll warn you that this isn't a whitepaper - it's a research paper, and gets a bit deep into algorithms and mathematical analyses, but if you're up to the challenge it's a great read.

You can download the paper from http://research.microsoft.com/en-us/um/people/antr/ms/ssd.pdf. Enjoy!

PS I found it on James Hamilton's blog.

Theme design by Nukeation based on Jelle Druyts