Earlier today there was a question on SQL Server Central where someone wanted to know what could be causing so many reads on their transaction log. I was asked to chime in by fellow MVP Jonathan Kehayias (who also sent me some questions that I've answered in this post - thanks Jon!), so I did, with a list of everything I could think of. I thought it would make for a good post, so here it is, with a few more things I remembered while writing the post.

Before I start, if you're not comfortable talking log records and transaction log architecture, see my TechNet Magazine article on Understanding Logging and Recovery, which explains everything clearly, including how having too many VLFs can affect operations on the log that have to scan VLFs.

Each of these things can cause reads of the log:

  • Transaction rollback: when a transaction has to roll back (either because you say ROLLBACK TRAN or something goes wrong and SQL Server aborts the transaction), the log records describing what happened in the transaction have to be read so that their effects can be removed from the database. This is explained in the TechNet Magazine article. Note that it doesn't matter if you're using explicit transactions or not (i.e. BEGIN TRAN), SQL Server always starts a transaction for you (called an implicit transaction) so that it can put a boundary on what needs to be rolled back in case of a failure.
  • Crash recovery: crash recovery must read the transaction log to figure out what to do with all the log records in the active portion of the log (all the way back to the earlier of the most recent checkpoint or the start of the oldest active transaction). The log is read twice - once going forward from that oldest point (called the REDO phase) and then going backwards (called the UNDO phase). Again, this is explained in great depth in the article.
  • Creating a database snapshot: a database snapshot is a point-in-time view of a database. What's more, it's a transactionally consistent point-in-time view of a database - which means that, essentially, crash recovery must be run on the real database to create the transactionally consistent view. The crash recovery is run into the database snapshot, the real database isn't affected - apart from having all the active transaction log read so that crash recovery can run.
  • Running DBCC CHECKDB: creates a database snapshot by default on 2005 onwards, and runs the consistency checks on the snapshot. See above. There's a much more detailed description, including how this worked in 2000, in the first part of the 10-page blog post CHECKDB From Every Angle: Complete description of all CHECKDB stages.
  • Transaction log backups: this one's kind of obvious. A transaction log backup contains all the transaction log records generated since the last log backup finished (or since the log backup chain was established). To back up the log it has to read it. What's not so obvious is that a log backup will also scan through all the VLFs in the log to see if any active ones can be made inactive (called clearing or truncating the log - both misnomers as nothing is cleared and nothing is truncated). See my TechNet Magazine article on Understanding SQL Server Backups and in the blog post Importance of proper transaction log size management.
  • Any kind of data backup: (full/differential backup of a file/filegroup/database). Yup - data backups always include transaction log - so the backup can be restored and give you a transactionally consistent view of the database. See Debunking a couple of myths around full database backups and More on how much transaction log a full backup includes for details if you don't believe me.
  • Transactional replication: transactional replication works by harvesting committed transactions from the transaction log of the publication database (and then sending them to the subscriber(s) via the distribution database - beyond the scope of this post). This is done by the Log Reader Agent job, running from the Distributor. It needs to read all the log records generated in the publication database, even if they're nothing to do with the publications. More log equals more reads. My whitepaper on combining database mirroring and transactional replication in 2008 has more details on this stuff, as does Books Online.
  • Change data capture (in 2008): CDC uses the transactional replication log reader agent to harvest changes from the transaction log. See above. This means the CDC can cause the log to not be able to clear properly, just like transactional replication or database mirroring - see my blog post Search Engine Q&A #1: Running out of transaction log space for more details. Note the I didn't say Change Tracking - it uses a totally different mechanism - see my TechNet Magazine article on Tracking Changes in Your Enterprise Database for more details.
  • Database mirroring: DBM works by sending physical log records from the principal to the mirror database. If the mirroring sessions drops out of the SYNCHRONIZED state, then the log records won't be able to be read from memory and the mirroring subsystem will have to get them from disk - causing log reads. This can happen if you're running asynchronous mirroring (where you're specifically allowing for this), or if something went wrong while running synchronous mirroring (e.g. the network link between the principal and mirror dropped out, and a witness wasn't configured or the principal could still see the witness - again, beyond the scope of this post). Regardless, this is called having a SEND queue on the principal.
  • Restoring a backup: whenever backups are restored, even is you've said WITH NORECOVERY, the REDO portion of recovery is run for each restore, which reads the log.
  • Restoring a log backup using WITH STANDBY: in this case, you've essentially said you'd like recovery to run, but not to affect the transaction log itself. Running recovery has to read the log. For more info on using WITH RECOVERY, NORECOVERY, or STANDBY, see my latest TechNet Magazine article on Recovering from Disasters Using Backups, which explains how restores work.
  • A checkpoint, in the SIMPLE recovery mode only: see my blog post How do checkpoints work and what gets logged for a description of what checkpoints are and what they do. In the SIMPLE recovery mode, checkpoints are responsible for clearing the log (described with links above) so must read through all the VLFs to see which can be marked inactive.
  • When processing a DML trigger (on 2000): (thanks to Clay Lenhart for the comment that reminded me of this). In SQL Server 2000, the before and after tables that you can process in a DML trigger body are actually found from looking at the log records generated by the operation that caused the trigger to fire. My dev team changed this in 2005 to store the before and after tables using the version store, giving a big perf boost to DML trigger processing. 
  • Manually looking in the log (with DBCC LOG or the table-valued function fn_dblog): this one's pretty obvious.

Phew - a lot of things can cause log reads, the trick is knowing which one it is!

As you can see, there could be a lot of activity reading from your log as well as writing to it, which could cause an IO bottleneck. Make sure that the IO subsystem on which you place the log file (note: you don't get ANY performance benefit from having multiple log files) can handle the read and write workload the log demands. RAID 1 or RAID 10 with a bunch of spindles to spread the IOs out (note/warning/achtung: that's a big generalization - don't reply with a comment saying it's wrong because you've seen something different - different scenarios have different demands), and a proper RAID configuration (64k multiple for a stripe size, NTFS allocation unit size, volume partition alignment).

The 35-page whitepaper on high availability I wrote for the SQL team over the summer has been published on MSDN. It's a 2-300 level whitepaper that describes the various high-availability technologies in SQL Server 2008 and how they can be used to mitigate disasters. It's chock-full of links to other whitepapers, technical articles and Books Online sections and also presents my methodology for planning a high-availability strategy.

You can get it at http://msdn.microsoft.com/en-us/library/ee523927.aspx.

Enjoy!

Here's the table of contents:

  • Introduction
  • Causes of Downtime and Data Loss
    • Planned Downtime
    • Unplanned Downtime and Data Loss
  • Planning a High-Availability Strategy
    • Requirements
    • Limitations
    • Technology Evaluation
  • SQL Server 2008 High-Availability Technologies
    • Logging and Recovery
    • Backup, Restore, and Related Technologies
      • Partial Database Availability and Online Piecemeal Restore
      • Instant File Initialization
      • Mirrored Backups
      • Backup Checksums
      • Backup Compression
    • Other Single-Instance Technologies
      • Online Operations
      • Database Snapshots
      • Hot-Add Memory and CPU
      • Resource Governor
    • Multi-Instance Technologies
      • Log Shipping
      • Transactional Replication
      • Database Mirroring
      • Failover Clustering
      • Combining Multi-Instance Technologies
      • Virtualization
  • Mitigating the Causes of Downtime and Data Loss
  • High-Availability Features Supported by SQL Server 2008 Editions
  • Conclusion

A few weeks ago I kicked off a survey on how you add geo-redundancy to a failover cluster (see here for the survey). The results as of 8/26/09 are as follows:

 

So why is this interesting? Well, many people will suggest failover clustering as the best way to provide high-availability for a database (or group of databases). And it is a great technology to protect against server failure, but there's only a single copy of the database, which is the Achilles' heel of failover clustering. If that copy of the database is damaged, the application is down unless there's another copy of the database available. This is where providing geo-redundancy comes in. With that in mind, I'm surprised at the percentage of respondents that don't provide any geo-redundancy at all.

There are a bunch of options for providing a redundant copy of a database that is hosted on a failover cluster, with pros and cons to each, and that's what I'm going to spend the rest of this post on.

SAN replication: This is where the SAN hardware itself mirrors all write I/Os to a remote SAN, thus maintaining a remote copy of the database. The hardware has to provide disk-block size and write-order preservation; otherwise the database on the remote SAN could become corrupt. Imagine if write-ordering was not preserved and some data pages write I/Os were completed on the remote SAN before log records write I/Os (thereby breaking the write-ahead logging protocol) - recovery wouldn't be able to work properly! This mechanism requires a remote SAN, a second failover cluster connected to the remote SAN, a network including both clusters, and a big, fat network pipe between the two SANs. The bigness and fatness of the pipe depends, of course, on how many write I/Os are performed on the local SAN, and whether the SAN replication is synchronous or asynchronous.

Synchronous replication requires that the I/O is completed on the remote SAN and acknowledged back to the local SAN before the local I/O can be acknowledged to the local server. If the network bandwidth and latency can't support the volume of write I/Os trying to be replicated to the remote SAN, the I/Os will start to queue up and delays will be incurred on the local server. This will lead to the workload slowing down as SQL Server has to wait longer and longer for I/Os to complete. Now, with synchronous replication you have the guarantee that the remote copy of the database is completely in-sync with the local copy, so if a failure occurs, no committed data will be lost. If the network can't keep up though, you may have to switch to asynchronous replication. This means the local I/Os don't have to wait for the remote I/Os to complete, and so no performance penalty is incurred. BUT as the replication is now asynchronous, committed data may be lost if the local copy of the SAN is damaged.

Apart from the potential for performance problems with SAN replication, it's also very expensive - as another SAN, another cluster, and some beefy network hardware/bandwidth is required. This isn't a technology I'd expect a small company to be using or considering. Finally, the portion of the remote SAN that's being replicated to cannot be accessed at all. On the MAJOR plus side, all databases on the SAN are replicated at once, without having to setup a technology to provide a redundant copy of each. For application ecosystems that include multiple databases, this is what I like to recommend.

Log shipping: This is the simplest way to maintain a redundant copy of the database - it's just backup log, copy, restore log; repeat. It works seamlessly with failover clustering and is really easy to setup and maintain. The only problem with this is that you open yourself up to data loss, as a log shipping secondary is usually not right up-to-date with the primary. You can use the secondary for reporting/querying by restoring the log backup WITH STANDBY (which requires a little more configuration, but not much), and you can protect against accidental data damage by having a secondary with a load-delay configured, so the database is, say, 8 hours behind the primary. In my experience, this is the most common technology that's used in conjunction with failover clustering as it's the cheapest and easiest. On the downside, it's a single database solution so its not suitable for complicated application ecosystems.

Transactional replication: This isn't very commonly used at all, although again, it works seamlessly with failover clustering after a failover. The reason this isn't used very often for geo-redundancy is that transactional replication doesn't provide database-level redundancy, only table-level. It's also much more complicated to setup and troubleshoot when things go wrong, plus there's varying latency between a transaction committing in the publication database and it being applied to the subscription database(s).

Database mirroring: Database mirroring is the only technology apart from SAN replication that can provide a zero data-loss solution when configured for synchronous operation. It works by shipping the log records from a database rather than the raw I/Os, so doesn't require anywhere near as much capital expenditure, but the network has to be able to cope with sending the log generated on the principal, otherwise performance on the principal can be affected. Mirroring is relatively easy to setup and maintain, and the mirror database can only be accessed, but only through a database snapshot. When combined with failover clustering, you need to be careful about setting the mirroring partner timeout, so that the local failover cluster gets a chance to fail over before mirroring does. Checkout my blog post on this: Search Engine Q&A #3: Database mirroring failover types and partner timeouts. You can configure database mirroring for synchronous or asychronous operation, with the same performance and data-loss exposure caveats as SAN replication. SQL Server 2008 provides log stream compression and automatic page repair, which make this more attractive (see SQL Server 2008: Performance boost for Database Mirroring and SQL Server 2008: Automatic Page Repair with Database Mirroring, respectively), but only supports a single database. I'm seeing this combination start to be used more, but again, it's a single database solution so isn't suitable for complicated application ecosystems.

Backups/homebrew: Good old backups can easily be used to provide a very low cost way of maintaining a redundant copy of a database, and if you think about it, this is really do-it-yourself log shipping. At the very least, databases should *always* be included in a backup strategy, no matter what other high-availability technology(s) you may have implemented.

3rd-party solution: There are a few non-Microsoft solutions for providing redundancy with failover clustering which don't involve traditional SAN replication. I'm not an expert in any of them, but I've heard of anecdotal issues with the two I mentioned in the survey and worked with customers who've had real issues with PolyServe (one of which I blogged about).

Summary 

When you're planning a high-availability strategy, you always need to consider the limitations of technologies while evaluating them. The big limitation of failover clustering is that there's no redundant copy of the database so you need to add another technology to provide that. I've just finished writing a 35-page whitepaper for Microsoft on the high-availability technologies in SQL Server 2008, as well as how to go about planning a strategy. It will be published before PASS in November, but in the meantime, this should have given you lots of food for thought.

Next post - the next survey!

I just heard today that the first whitepaper I've written for Microsoft has been published!

The abstract is:

SQL Server Replication: Providing High-Availability using Database Mirroring

This white paper describes how to use database mirroring to increase the availability of the replication stream in a transactional environment. It covers setting up replication in a mirrored environment, the effect of mirroring partnership state changes, and the effect of mirroring failovers on replication. In addition, it describes how to use LSN-based initialization to recover from the failover of a mirrored subscriber database.

Although brief overviews are given of both replication and database mirroring, it is easier to understand this white paper if the reader has some experience with one or both of these technologies, and has at least a rudimentary knowledge of database concepts such as transactions.

You can download it from http://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/ReplicationAndDBM.docx and I'll put a link in our whitepapers page.

Enjoy!

A couple of weeks ago I blogged about the three tracks of the SQL Server 2008 JumpStart course that SQLskills.com taught internally for Microsoft and some MVPs - see here for details. Well, the content is now available to download! Note that this was based on CTP-5 (November 2007 CTP) and there have been *lots* of behavioral changes since then (with more planned for CTP-6 Refresh and RTM), but if you want a high-level overview of a bunch of the features (albeit in bullet-point summaries on slides) then this is a good place to start.

For me, what's *REALLY* cool is that the site also has a downloadable VPC plus lab manuals for all of the AlwaysOn High-Availability hands-on labs that SQLskills.com wrote. We originally wrote these labs for SQL Server 2005 and I updated them all for CTP-5. The VPC has a long lab on each of the following:

  • Database Snapshots
  • Data Recovery and Preventative Techniques
  • Instant Initialization
  • Peer-to-Peer Replication (including the new Topology Wizard I blogged about here)
  • Table and Index Partitioning
  • Snapshot Isolation
  • Online Operations
  • Database Mirroring (including a demo I wrote of Automatic Page Repair, described here)
  • Service Oriented Database Architecture

There is some great depth in each of these - Kimberly blogged more info about the exercises in each lab here. There's also another VPC image with some higher-level labs on a variety of 2008 features and written by a number of different people- including some labs on Policy-Based Management and Performance Data Collection that Kimberly wrote.

So - where can you get these from? Go to http://sqlserver2008jumpstart.microsofttraining.com/ and hit the Download link on the right-hand side. Register and then you can get to the materials. The AlwaysOn VPC image is Collection 2 at the bottom of the page, and you'll see all the slide decks as you scroll down the page.

Enjoy!

Wow - almost 10 days without a blog post - that must be a record for me! :-) Never fear - I'll be posting more over the next couple of weeks. Kimberly flew off to India yesterday to teach some Microsoft classes and unfortunately I couldn't join her this time as I'm teaching 3 classes myself:

  • an internal Microsoft class on Designing for High Availability
  • another internal Microsoft class on SQL Server 2008 for DBAs (similar to the JumpStart class I posted about here)
  • 3 days of content for the new Microsoft Certified Architect: Database qualification - see the Microsoft Learning site here for details

Anyway - the subject of this post is to let you know that last week, Kimberly and I did two interviews for TechNet Radio on SQL Server 2008 technologies. Part 1 has just been released where we discuss security and availability features. You can get to it by going to the March 4th 2008 show here. Tune in and find out how I lull myself to sleep when Kimberly's out of town...

Enjoy!

Phew - last week Kimberly and I spent 3 days teaching the ins-and-outs of SQL Server 2008 for DBAs/IT-Pros to about 130 Microsoft SQL Server experts and MVPs (like Kalen Delaney, Adam Machanic and Ron Talmage). This was the (95% complete) Beta delivery of a course we've been developing for the last six months for Microsoft that they'll use to train their SQL experts around the world on the new release. It's been very interesting watching the features develop through the CTPs (especially since I left the fold last August) - and making demos work on pre-release builds of the CTPs.

Teaching the course was a *blast* - the thing I love about teaching a really geeky crowd is the plethora of great questions and opportunities for going deep with explanations. Our team actually wrote and delivered the concurrently presented Developer and BI tracks as well. As you can see from the list below (and this is just the features a DBA needs to use/know about), SQL Server 2008 isn't a dot release of Yukon at all, as some people have suggested. Over the three days we covered:

  • Database Mirroring (D)
  • Backup Compression
  • Peer-to-Peer Replication (D)
  • Transparent Data Encryption (D)
  • Extensible (Off-Box) Key Management
  • All Actions Audited (D)
  • Policy-Based Management
  • Resource Governor (D)
  • Extended Events (D)
  • Spatial Indexes
  • Integrated Full-Text Search
  • Sparse Columns (D)
  • Filtered Indexes
  • Change Tracking
  • Change Data Capture (D)
  • FILESTREAM (D)
  • Performance Data Collection
  • Query Optimizer Enhancements
  • Data Compression (D)
  • Service Broker
  • Partition-Level Lock Escalation (D)

The features marked with a (D) are ones I demo'd during the course (Kimberly demo'd a bunch of the others - especially the tools features). Some of the demos were challenging to make work in time as we only got a pre-CTP6 build mid-January just before we headed off to China.

So why am I posting this? Well, a bunch of these features are in CTP-6, which should be just around the corner, and I have some easy-to-understand demos of them that I'll be posting here over the next month or so. Also, if this course sounds interesting, Kimberly and I will be teaching it in various configurations over the next year - starting with SQL Connections in April, a soon-to-be-announced class in Iceland in March, and the ITPro portion of TechEd in June.

Watch this space starting next week (today's the last day of six straight weeks of teaching for us so this weekend's a break :-))

SQLskills.com has released its first ever utility! :-)

After several years of being asked to make available the Dual Database Monitor application (that's present inside the popular AlwaysOn Hands-On Labs we give out), I've spent a bunch of time making it configurable, work outside the VPC environment, and available in an easy kit form for people to use. It comes with easy-to-follow instructions and example SQL scripts to get you going. It supports SQL Server 2005 and SQL Server 2008.

This is an invaluable tool to use when setting up a Database Mirroring partnership to check that mirroring is working ok, without having to hack up your own application to do it. It can also be used to monitor two nodes of a peer-to-peer replication topology (or more using multiple instances of the monitor). It comes in two versions - single-user or unlimited use within a single company.

Check out the DDM webpage here for more details and ordering info.

While we were in Barcelona we sat down with Richard Campbell and Greg Hughes from RunAs Radio to record a 1/2 hour interview on SQL Server 2008. We touch on a ton of different features (look at the number of Categories I've tagged this with!) and have a bunch of laughs along the way - check it out here.

PS There's been a ton of interest in the slide deck idea I had so we'll be going ahead with that. Look for an announcement sometime in the first few months of next year about how to get them. Thanks to everyone that replied!

One of the cool features added in SQL Server 2005 for scaling-out a workload was peer-to-peer replication. The major drawback was that to change an existing peer-to-peer topology, the entire topology had to be quiesced. In SQL Servr 2008, the Configure Peer-To-Peer Topology Wizard in Management Studio has undergone a major face-lift and a peer-to-peer topology can be altered ONLINE - very cool!

To get to the wizard, you still need to go through the clunkiness of enabling peer-to-peer subscriptions in the Subscription Options pane of the Publication Properties of a new publication (by right-clicking the publication under the Replication->Local Publications folder in Object Explorer):

Once that's set to True, you can right-click the publication and you'll see a Configure Peer-To-Peer Topology option which will bring up the new wizard. Here's what you'll see with only a single node configured - I've hovered the mouse over the node to get the tool-tip to show up:

By right-clicking anywhere on the design surface you get a menu, from which you can select to Add a New Peer Node. Of course the node you select has to already have been setup for replication otherwise you'll get an error. As long as replication is already setup, you'll get the Add a New Peer Node wizard:

You need to set the Peer Originator ID to a number that isn't already being used by another node in the topology - it would be nice if the wizard would default to an ID it knows isn't being used instead of 1.  You then decide whether to have peer-to-peer connections with all the other nodes in the topology automatically setup. If you don't check that option, the node will appear on the topology viewer, but with no connections, like below (again I've brought up the tool-tip so you can see it's a different node than the first one - in this case a different instance inside a VPC):

If you don't check the option, you can create connections manually by simply right-clicking either node and selecting Add a New Peer Connection. You'll see a rubber-banding arrow that you pull to the node you want to connect to. I tested the automatic method of creating the connections and it works nicely too. Here's a three node topology in the viewer:

The rest of the wizard is as before - setting the Log Reader Agent and Distributor Agent security settings and how to initialize the new peer. At any point you can go back into the wizard and add or remove connections or peers. I'm pretty impressed!

Theme design by Nukeation based on Jelle Druyts