The Depth of the MCM Videos

I got a great question via email today.  Paraphrased: "The two MCMs I know, know SQL Server to a depth much deeper than the videos suggest you need to know.  What's the real depth that you need to know to pass the MCM?"

I'll illustrate it with my SQLOS video.  Back when the Microsoft Certified Master class consisted of a 3-week in-person event in Redmond, most of a day was dedicated to SQLOS.  My video, on the other hand, is only 24 minutes long.  Think of SQLOS as a pool.  I dove off the diving board, went straight down CPU scheduling, and touched the bottom of what you need to know on that one particular area of the pool.  I came back up and got out of the pool – but I didn't swim a lap, and I didn't dive all the way down in all of the parts of the pool.  I didn't touch memory clerks, lock management, or the buffer pool, for example.  That's where the recommended reading list and SQLskills training courses come in – there's much more you need to know, but we don't have videos to cover all of it.  Think about it this way – there's only 40 hours of videos, but SQLskills offers four weeks of MCM training.  If you could learn everything you needed to know in 40 hours of videos, we wouldn't sell a single seat of training – but we do.

Now, about the depth of MCMs you've interacted with – I can go on for days on CPU scheduling.  I think it's one of the sexiest topics out there.  Whenever Anandtech publishes their analysis of an upcoming processor, I pore through it repeatedly.  That's one of my strong points – but I can't cover every topic to that depth.  There are plenty of MCM subject topics that I understand well enough to be a Master, but I don't understand well enough to teach at the Master level.  Frankly, there are even a few topics that I don't understand well enough to be a Master <cough>CLR</cough> – but thankfully you don't need a 100% score in order to pass.  When you see a Master presenting, and you see their level of knowledge on that topic, remember that it's one of their specialties.

Finally, not all the areas of the pool are the same depth.  On relatively-new features with minimal adoption like Policy-Based Management or Resource Governor, you may not need to know the internals of how it works inside the engine, but you have to know the externals well – how you access it and manage it.  On time-tested topics like how a table is stored on disk, you gotta know internals, even the stuff that isn't exposed to end users, because that hasn't changed dramatically in a decade and it impacts everything you do.

I hear what you're asking, for: you're asking for depth markers on each area of the pool, and a clear definition of what that depth means.  I wish it was as easy as a single integer, like "The Clustering module is 2 years of experience deep," but we both know that 2 years of experience means very different things to different people.  If I had to boil it down to a single integer per module, I would probably try to use the number of projects you've done with that particular feature, but even that doesn't work with things like storage internals.  Over time, as the certification becomes more mature, I'm sure we'll see skills checklists and study guides just like we see for the MCITP.  Until then, the best thing I can point you to is the SQLskills Immersions training agendas.  Take each week of training, look at the modules, and look at the time allocated.  Think about the depth we go to in a given amount of time on the free MCM videos, and we go to that same depth across each module in person.  For example, in the Performance Tuning week, one of the modules covers SQLOS, and each module is roughly 2-4 hours long.  You saw 24 minutes on CPU scheduling – now take that same depth across the other topics in that module.

DBAs, Here’s Your Task for the Holidays

The holidays can be a slow time in IT. Users aren't asking for as many things and projects are going a little slower.  Now is your time to strike.  Open your email client right now, start an email to your SAN admin, and copy/paste the below in. Hey – it's not often that I actually tell you to plagiarize my work and sign your name on the bottom. Do it while you can.

Dear SAN Administrator,

Wow, the SAN's running good today! No performance issues at all. Has the SAN been losing weight? Great work.

While I've got your attention, can you do me a favor? The next time you get some spare drives, can you put them in a RAID 5, attach them to my development server, and shoot me an email? No advance notice required whatsoever. LUN size or configuration doesn't matter, and I'll take whatever you give me. No, I don't want to keep them – you can pull them back off 24 hours later without even asking me. Promise.

I want to test restoring my production database over there as if I had to do a disaster recovery. I've already got everything scripted out, but practicing it keeps me on my feet. If I can't do it in 24 hours, I fail anyway, and more time isn't going to help, so you can pull the drives after a day without even telling me. If I finish in under 24 hours, good for me – I'll run some stuff like DBCC to make sure my data's okay, but that's just a bonus.

I really appreciate it. You're a lifesaver. And you look great, by the way – is that a new haircut or something?


Stop Tuning Your Queries.

Most of my performance tuning is done in small groups – a developer is frustrated because his query isn't fast enough, so the developer and the DBA get together to show me the code in question. If they hand me a stored procedure with a SELECT statement to tune, the first thing I do is read through the comments. That usually only takes two seconds because so few people write comments, so then we spend the next few minutes trying to reverse engineer what data they're after. I talk through it aloud:

"It looks like we're trying to retrieve the widget sales by salesperson for the last 90 days?"

I want to express it in plain English because sometimes the query isn't actually getting the right data. For example, in one of my recent sessions, the project manager immediately turned to the developer and said, "No, that's not right – it needs to be by calendar month for the last 3 months." Turning the query into human-understandable English lets you bridge that gap between the coders and the businesspeople. At that point, I can avoid optimizing the query, because they're going to go rewrite it anyway. If the query is indeed right, I follow up with another question:

"Are these the right tables to get widget sales grouped by salesperson and by month?"

That makes everyone in the room stop and think. Maybe there's already a reporting structure for this. Maybe there's a more narrower set of tables I can use to achieve that goal. If these are indeed the right tables, then I start by examining every table involved in the query to see if I can rip it out altogether. I don't change the output fields – I just rip out tables. For example, take this query:

SELECT c.CustomerID, SUM(s.SalesAmount) AS TotalSalesAmount
FROM dbo.Customers c
INNER JOIN dbo.Sales s ON c.CustomerID = s.CustomerID
INNER JOIN dbo.SalesRegions sr ON s.SalesRegionID = sr.SalesRegionID
LEFT OUTER JOIN dbo.SalesPeople sp ON s.SalesPersonID = sp.SalesPersonID
GROUP BY c.CustomerID

The query isn't doing anything with dbo.SalesPeople, and because it's a left outer join, it isn't required in the query. Removing it won't change the results. On the other hand, I can't simply rip out dbo.SalesRegions even though nothing's being selected or grouped by any of its fields. It's part of an INNER JOIN, so that means this query will only return results from dbo.Sales that have a legit SalesRegionID. (Bonus question for the comments: what properties of Sales.SalesRegionID would tell me that I could remove the join to SalesRegions without breaking the query?) Next, I can remove the dbo.Customers table too. If I'm grouping by CustomerID, I don't need to touch the dbo.Customers table at all. Since CustomerID is part of the join and it's on both tables, I can use Sales.CustomerID instead. So here's the rewritten query:

SELECT s.CustomerID, SUM(s.SalesAmount) AS TotalSalesAmount
FROM dbo.Sales s
INNER JOIN dbo.SalesRegions sr ON s.SalesRegionID = sr.SalesRegionID
GROUP BY s.CustomerID
If the query had included fields that were only available on the Customers table, like c.CustomerName, then I wouldn't have been able to use this technique. You'd be surprised at how often people copy/paste queries and then modify them to serve a different purpose, but they forget to strip all the extraneous tables out of the joins. (I'm certainly surprised.) This technique reminds me of the Sex Panther Cologne in the movie Anchorman: The Legend of Ron Burgundy, "They've done studies, you know. 60% of the time, it works every time."

I check the query costs after removing extra tables, and if it's not significantly lower, then I could start really tuning the query – but nope. Not gonna do it. Wouldn't be prudent. I ask:

"Are there similar tables that already aggregate the data differently? Can we deliver that data to the users instead?"

I want the business users, project managers, and pointy-headed bosses to stop for a minute and think about what the report really needs. In one recent tuning session, the developer said, "Well, now that you mention it, I'm just showing five other users who bookmarked this same report. It doesn't matter which five they are." Bingo! That tells me I've got the opportunity to rewrite their query in a way that takes advantage of existing indexes. Before, they might have been sorting for the top 5 users ordered by login date, but I could sort them by some other field where I've created an index.

If the results of the query can't be changed, I'm still not out of options. My next question is:

"How old can this data be? Can it be from 5 minutes ago? From last night?"

The fastest query is the one you never run. The second fastest is the one you were able to run ahead of time. The ability to calculate (or pre-cook) the data after-hours gives us tremendous flexibility. Heck, sometimes my users even get excited at this concept, because they've always wanted to see a static set of results as of this morning's numbers so that they can consistently refer to them throughout the day. I can then introduce the option of aggregated reporting tables calculated overnight via SSIS, T-SQL jobs, or a monkey with a keyboard.

At the end of this discussion, I may still need to tune the query, but asking these four simple questions only takes a couple of minutes and may open up a world of architecture tuning options. The faster you want to scale your application, the tougher you have to be about asking these questions. In some environments, I have to put my foot down and say, "I'm sorry, but you can't run this query in production no matter how fast it is. We have to explore the other options from these four questions."

What I’m Telling My Clients About SQL Server Denali

My consulting clients don’t have the time to read all the latest blog posts with news about SQL Server, so after a big event, I like to boil down the announcements to a few key paragraphs. I sum up things they care about and try to put it in a business-friendly perspective. Here’s what I just sent my clients about Denali, the next version of SQL Server:

This week at the PASS Summit in Seattle, Microsoft took the wraps off the next version of SQL Server – code name Denali. They gave the community an early preview build and showed us some new features that will have a big impact on how we scale SQL Server, plan high availability and disaster recovery, build BI solutions, and more. There’s a heck of a lot to cover here, and I’m going to try to boil it down to the main things I want you to be aware of when you’re thinking about your 2011 architecture plans and budgets. Microsoft isn’t talking release dates yet, but don’t plan on this coming in the next 6 months. Forgive me for typos in here – I wrote this on my iPad on the plane ride home, so it’s a miracle that any words are spelled correctly.

If You Need To Handle More Load

The new AlwaysOn features will let us scale SQL Server in a wild new way. One server is designated as the primary production server, and all writes happen on that server. Up to four additional servers can be added as secondaries, and we get read-only access to them, including the ability to run our backups on the read-only servers instead of the production box. If the primary production server fails, one of the secondary boxes takes over as the new primary production instance.

This is by far my favorite feature added to SQL Server in a very long time. It helps me solve a lot of your scaling and performance challenges with no disruption to your applications. I’ve been steering developers toward using multiple connection strings: one for writes, one for reads, and one for delayed reads (like historical reporting). expect to hear this from me a lot more over the coming months, because implementing this in your code today will put you in an amazing position when the new SQL Server ships. You’re going to be able to solve your very toughest performance challenges by writing checks instead of writing code – and sometimes that’s a better answer.

If You Need Easier High Availability & Disaster Recovery

The AlwaysOn features I mentioned above also have a lot of neat tricks in store for HA/DR. In the past, I’ve had to architect solutions requiring multiple pieces like clustering, database mirroring, log shipping, and replication. AlwaysOn brings the best of these different technologies into one feature and can replace all of them in certain environments.

A typical AlwaysOn HA/DR environment will probably look like this:

  • Server A – the production server
  • Server B – nearby server in the same datacenter with synchronous updates
  • Server C – nearby server with asynchronous updates, so it may be a few minutes behind, and users connect here to run reports
  • Server D – disaster recovery server in another datacenter with asynchronous updates. If the primary datacenter completely fails, this is our backup plan.

You might be doing something like this already using a variety of technologies, but now we’re going to be able to fail over multiple databases in groups, fail over the associated logins and jobs, and manage it all under one pane of glass in a way that won’t intimidate non-DBAs.

If You’re Considering Different Data Warehouse Vendors

If you’ve been tempted to switch to one of the in-memory or columnar-based databases that promise blazing speed for huge data warehouses, Microsoft is taking a big step to keep you around. They’ve added columnar indexes, which is the technology that enables multi-billion-row table scans to happen in seconds instead of minutes. This new index type is aimed at data warehouses, not OLTP systems, and it’s best suited for systems that do batch loading and partitioning. If you’ve got a database with nightly loads, this index type is a good reason to consider implementing partition switching for ETL loads due to the way the columnar index is created and updated.

If You Need To Build BI Cubes

Microsoft showed their roadmap, and there’s some vibrant discussion in the BI community about what it means. At first glance, it appears that Microsoft is gradually abandoning SQL Server Analysis Server cubes in favor of the technology behind PowerPivot, their Excel-based tool with in-memory databases stored differently than either traditional databases or SSAS cubes. Blogger Chris Webb summed it up well, and Microsoft’s Amir Netz posted his spin on it in the comments at

What this means for you is that if you’re using SSAS today, you might want to take a breather to examine the new technologies. We’re at an awkward time – the new stuff isn’t out yet and will take some time to be able to handle big analysis tasks. The old stuff still works great, but the writing is on the wall, and there doesn’t appear to be an easy way to migrate your BI infrastructure to this new storage methodology. It reminds me of Microsoft’s transition from DTS to SSIS in 2005 – they didn’t give us a good upgrade path, and as a result companies had to pour resources into converting their DTS packages to SSIS by hand. Many of my clients just left their ETL processes in DTS rather than reinvest, and they’re still working fine. This may be a good approach for your SSAS cubes for the next 12-24 months too – wait it out rather than panicking just yet. It also reminds me of Microsoft’s recent bugaboo with Silverlight. They said they were going to migrate mobile apps to HTML5 because Apple’s devices (with their huge tablet market share) won’t support Silverlight anytime soon. Some people read that as, “Silverlight is dead,” at which point Microsoft had to do a ton of backtracking. It’s still not clear what the real answer is.

If You Need to Store Files in SQL Server

If your apps need to upload documents or get transactional consistency for those documents, your options have been pretty crappy in past versions of SQL Server. I’ve historically told folks to avoid storing binaries like Word docs, images, and XML files inside the database – rather, store pointers to files on a Windows DFS file share instead. Well, now we’ve got another option – FileTable – but I’m not convinced that this isn’t crappy yet either. In theory, you can simply enable this feature, set up a file share on the SQL Server, and any files written to that share will be stored in the database. Imagine running an UPDATE statement in SQL and changing attributes of files – pretty powerful stuff.

Readers with gray hair like me might remember the WinFS and Cairo stuff from about a decade ago when Microsoft announced that SQL Server would be the storage engine for Windows. That never came to fruition, but the code and the staff stuck around, and today they’re rising from the dead like a zombie. There’s a chance this feature will work, but the DBA in me wants to hit it in the head with a shovel. I know some of you are going to want this, though, so I’m going to investigate it as they build it out, and I’ll give it an open mind. (Braaaaains…)

If You Want to Be a Microsoft Certified Master

Big news here! For the last several months, my blogging frequency has gone way down because I’ve been working with Microsoft, my cohorts at SQLskills, and a handful of industry experts to help reinvent the MCM program. This week I’m finally able to share the fruits of our labor.

To get my MCM, I had to spend 3 weeks onsite at Microsoft in Seattle in order to take several exams and a lab. Now, the training and the tests have been split out. Candidates take a multiple-choice exam at Prometric, and if they pass, they can take the 6-hour hands-on qualifying lab to become a Master! It’s just that easy! Okay, well, the test and the lab aren’t exactly easy, but now I expect many more highly qualified and experienced SQL Server pros to give it a shot. You could try to take the tests cold, but at $500 per test and $2,000 for the lab, it’s not a wise bet. Plus, each retake costs the same, and there’s a 90-day waiting period between retakes.

To increase your chances of passing, you’re going to want to fill in your gaps with training, and that’s where SQLskills comes in. Paul and Kim’s one-week Immersions training has always been roughly the equivalent of the first week of MCM training – if you’ve already taken that, you’re one third of the way there. We’re announcing new Immersions training events in 2011 that will cover the remainder of the MCM training like storage, tuning, clustering, CLR, XML, security, and more. Pricing for these events will be roughly the same as our existing Immersions events – figure around $3,000 per week plus travel & expenses.

If you’re already an expert in a particular subject area, maybe you want to try self-studying by watching the videos that Bob, Kim, Paul, and I produced for Microsoft. You can watch them for free here to get an idea of the kind of depth you need to have on each topic:

If you’re considering a run at the MCM, let me know and I can help you assess what studying route you should take.

If You Want More Information

It’s way too early to talk training – in fact, I scheduled the entire month of December off so I could write MCM and SQL training for next year – but it’s the right time to talk long term plans. I’m doing 2-4 hour architectural briefings to talk about what the next version means to you, your development plans, and your hardware budgets for next year. It’s probably too late to make any dramatic budget changes for FY2011, but if you’ve got any wiggle room, you might want to get a quick heads-up. To schedule one of these, drop me an email.

SQL Server Virtualization: Mo Cores, Mo Problems

Non-Uniform Memory Access (NUMA) is a type of computer design popping up more and more these days. In simple terms, it means that when a processor core accesses different areas of the server’s memory, not all memory will be accessed the same way. Some memory may be faster or slower depending on where it’s wired up.

Let’s get started by looking at a truly different kind of server: the IBM 3950. It’s a rack-mount server with four CPU sockets, which means we can plug in four Intel Xeon processors. It has 32 DIMM slots for memory, so we can stuff it with up to a terabyte of memory. When we plug in a keyboard, mouse, and monitor, and we push the power button, our operating system boots up. Simple enough so far.

Things start to get complicated with the special scalability ports. You can wire two or more these together, populate them both with CPUs and memory, and they become a larger server. The picture below shows two IBM x445’s on top of each other for visualization purposes – the 445 is an older relative of the 3950, and it had the same capabilities.

IBM 445s

When you hit the power button on one, both 3950’s light up simultaneously, and the BIOS boot screen shows 8 CPUs and the total amount of memory in both servers. You can install Windows (or VMware or whatever) and the whole server acts as one pool of resources. Well, not exactly the same pool – more like two pools, to be more specific, because when a CPU in one of the 3950’s needs to access something stored in the other 3950’s memory, it has to travel through that scalability cable that connects the two boxes together. There’s a delay – small, but measurable – involved with this cross-server access. We’re accessing memory that isn’t on our motherboard – it’s over in another server next door.

NUMA-aware applications like SQL Server recognize that not all memory is created equal. SQL Server knows which memory lives closest to each CPU and tries to make scheduling decisions based on the server’s architecture.

Unfortunately, all this goes out the window with virtualization. Depending on load and the sysadmin’s decisions, your virtual SQL Server can move around not just from core to core, but from host to host live in real time. In the case of our IBM 3950’s, our SQL Server could be running on the top server one minute, and on the bottom server the next! If our SQL Server has 32GB of memory living in the top box, but we’re scheduling threads on the CPUs in the bottom box, we’re going to pay a penalty every time we try to grab things off memory from the “wrong” box.

Newer hypervisors try to patch things up by being aware of where the threads are getting scheduled versus where the memory lives. When VMware vSphere moves a virtual machine to a different NUMA node, it copies the virtual machine’s memory over to the new NUMA node in the background to gradually eliminate the overhead associated with accessing memory across nodes. (This is yet another reason why I push clients to upgrade their hypervisors as fast as possible.) Does your hypervisor support that, and to be more specific, does it support that feature on your physical hardware? Only your vendor and your hypervisor know for sure, and you have to constantly read the release notes of each hypervisor version to figure things out. Call me crazy, but I love doing stuff like that, and that’s how I learn the most about what WASN’T supported in the last version of the hypervisor. When a new version brags about doing something to get more performance from NUMA configurations, that tells me the previous version didn’t have that capability – and maybe its competitors don’t either.

The easiest answer today is to simply avoid big servers as hypervisors – there’s no real need to use a single 8-socket host instead of a 2-socket pizza box or blade. The bigger hosts are more expensive to buy and offer less flexibility. When you toss in the performance risks of hypervisors and big virtual machines, it’s no contest. Smaller is better.

Here’s a few links to learn more about NUMA, CPU scheduling, and virtualization on big servers:

SQL Server Virtualization: Get Your Hands Off That MAXDOP

Virtualization makes a sysadmin's life easier by abstracting away the hardware. The hypervisor (like VMware ESX, Microsoft Hyper-V, or Citrix Xen) sits as a layer between the physical server and the virtual server, offering a set of general hardware services like network cards, video cards, SCSI adapters, and so on. In Device Manager, these drivers look like the real thing, but like a child watching a magician, it's hard for us to tell the real thing from the fakes:

Watch Me Saw Your Hardware In Half

Note that I said this makes a sysadmin's life easier.
I didn't say this makes a database administrator's life easier.
In fact, it's quite the opposite. For the last decade, each new version of SQL Server has gotten closer to the metal, integrating closer with the underlying hardware in order to gain faster performance. No other Microsoft product offers settings like affinity masking, MAXDOP, and its own CPU scheduling, and DBAs have grown accustomed to tweaking these settings in order to wring every last bit of performance out of their servers.

To tune performance for virtual SQL Servers, we have to know more about this mysterious hypervisor layer that purports to abstract our hardware away. We need to learn about this gauzy fabric, understand what it's doing right, see what it struggles with, and adapt our tuning techniques to work with the hypervisor instead of against it. In the next few posts, I'll show you some of the most important concepts to help you get started, and I'll point you to resources to help learn more. Today's post starts with how to configure various SQL Server CPU options under virtual environments.

How to Configure Max Degrees of Parallelism (Maxdop)

When a query executes, the SQL Server engine's Query Optimizer decides whether any tasks in that plan would run faster if they were broken out into multiple tasks. It can break those tasks out into several parallel tasks that run simultaneously to take advantage of multiple idle cores. The sp_configure setting for "Max Degree of Parallelism" (BOL) controls the maximum number of cores that will get used by default. This isn't just a server-level setting either – ambitious developers have been known to code their T-SQL statements with a MAXDOP hint. Parallelism is discussed in chapter 5, CPU and Query Processing, of our book, Professional SQL Server 2008 Internals and Troubleshooting.

The Internet is chock full of advice on how to set MAXDOP. Got an OLTP server? Set MAXDOP to 1 to avoid problems with runaway queries. Got a NUMA-equipped server? Set it to the max number of cores in a particular NUMA node. Got hyperthreading? Well, you should turn it off, don'tcha know, but –

See, stop right there. Even though the Internet is just a jumbled mass of electrons, it still respects Newton's First Law of Motion: an object at rest tends to stay at rest. Once a piece of advice is written on the Internet, it just sits there. It doesn't take into account the blazing fast rate of change of hardware and software. Most of the advice out there was written long before virtualization even caught on in the datacenter, before Intel's Nehalem processors unveiled a new generation of hyperthreading, and before NUMA servers were common. This advice is just wrong, but since advice doesn't come with an expiration date, it's up to you to ask questions about what you're reading – including this very post – is still relevant right now.

The Problem with CPU Settings in Virtualization

First, before you change any configuration setting, you need to know with certainty that you're solving a problem. Blindly changing configuration options is a surefire recipe for unpredictable and unreliable performance. You also need to know that the best way of solving that problem lies in changing SQL Server – and in virtual environments, that's often not the case.
Before tweaking any SQL Server settings, start by asking whether the virtual server itself has the right number of virtual CPU cores. The sysadmin can build virtual servers with any number of virtual cores, and those numbers don't need to have any relation to the number of physical cores in the underlying hardware.

Even when you know the exact host hardware configuration, your knowledge is only a snapshot in time. With tools like VMware vMotion, Microsoft Live Migration, and Citrix XenMotion, your sysadmin can move your virtual server from one host to another in real time without you knowing. If you haven't used one of these technologies yourself, it can sound like smoke and mirrors, but it really does work when set up correctly. You don't have to stop the SQL Server services or get everyone out of the application. Some TCP/IP packets can get dropped in the process, and it's not wise to perform a vMotion during extremely heavy loads, but I'll address that concept in a later post.

The important thing to know is that the new host hardware may have:

  • Faster or slower CPUs
  • More or less CPUs or cores
  • Hyperthreading turned on or off
  • More or less virtual servers sharing those same cores

This move might even happen when no one is looking, because technologies like VMware Distributed Resource Scheduling will automatically move guests around to different hosts based on performance needs. Whether or not these tools should be used with SQL Server is a separate discussion, but the fact that they can be used transparently is enough for us to stop and ask if we're solving the right problem with the right tools when we set MAXDOP or affinity masking. We DBAs like to make our CPU configuration decisions by monitoring CPU % metrics with Perfmon or Task Manager, but there's our next problem.

Task Manager is a dirty, filthy liar – CPU % is absolutely meaningless on virtual servers. If the server is running at 100% CPU, that doesn't mean the server is actually doing any work. It just means the server is using 100% of the CPU cycles available to it, which is determined by other running guests, throttled limits set by the virtualization sysadmin, or hypervisor CPU scheduling issues.

The Real Question: How Many vCores Do We Need?

Rather than making fine-grained CPU settings at the SQL Server or query level, we need to take a step back and look at what the hypervisor is doing to our SQL Server CPU cycles. In my next post, I'll talk about CPU scheduling under the different hypervisors and why you need to start learning about NUMA.