My latest MSDN column is available in the August issue and called "How Data Access Affects Database Performance". It was inspired by a problem that one of my DBA friends noticed in the database that could only be fixed by changing the data access code.

I noticed that there were two articles in the issue on EDM and, although I'm using native ADO.NET code for examples, I mention how EDM-generated code fits into one of the common cases.

I hope the information is useful.

Categories:
Performance

During a talk about spatial indexes and performance last week, I was surprised by a question about using ANSI-92 JOIN - ON syntax in spatial queries. I'd coded:

SELECT * FROM geonames.dbo.geonames g
JOIN Sample_USA.dbo.Counties c
  ON g.GEOG.STIntersects(c.geog) = 1
WHERE c.ID = 1569

Later I learned that the question was prompted by the following verbiage from the SQL Server Books Online: "To be supported by a spatial index, these methods must be used within the WHERE clause of a query, and they must occur within a predicate of the following general form geometry1 . method_name ( geometry2 ) comparison_operator valid_number".

There phrase "within the WHERE clause" is a bit too restrictive. The query above using JOIN has the exact same plan as this equivalent query using WHERE does:

SELECT * FROM geonames.dbo.geonames g, Sample_USA.dbo.Counties c
WHERE c.ID = 1569
AND g.GEOG.STIntersects(c.geog) = 1

Works just fine, either form uses the spatial index. The reason why folks would be concerned about the EXACT WORDING rather than the spirit of the advice, is that the second part of the BOL sentence IS true; the spatial method must be BEFORE the comparison predicate. That is:

-- This uses the spatial index
SELECT * FROM geonames.dbo.geonames g, Sample_USA.dbo.Counties c
WHERE c.ID = 1569
AND g.GEOG.STIntersects(c.geog) = 1

-- This doesn't use the spatial index
SELECT * FROM geonames.dbo.geonames g, Sample_USA.dbo.Counties c
WHERE c.ID = 1569
AND 1 = g.GEOG.STIntersects(c.geog)

But go ahead and use the JOIN verb, it uses the spatial index just fine. As a matter of fact, if you want parameter sniffing and query plan reuse it should be:

CREATE PROCEDURE GetNamesForCounty (@county_id int)
AS
SELECT * FROM geonames.dbo.geonames g
JOIN Sample_USA.dbo.Counties c
  ON g.GEOG.STIntersects(c.geog) = 1
WHERE c.ID = @county_id

Or call the parameterized query from a database API directly (ie use query parameters rather than variables with parameterized queries). And OF COURSE you usually don't need SELECT * either...

I've answered quite a few questions lately about how to make sure that your spatial index is being used. Use of a spatial index with a large or complex spatial dataset can mean the difference between a query that takes minutes to execute and subsecond execution. You also want to use an index hint as a last resort, the spatial index should not need a hint. Here's a few things to try, in order of importance.

1. Apply SQL Server 2008 SP1!!! I can't stress this one too much. There was a change to query costing that affected spatial index use.
2. Make the query as uncomplicated as possible. Don't try to combine STIntersects with a call to STBuffer, MakeValid or other nested spatial method calls or subqueries. Use multiple statements if needed.
3. If you're running the code in SSMS, use sp_executesql around the spatial query (or use your own stored procedure with the spatial value as a parameter) to ensure the query coster "knows" the parameter value at the time its creating the query plan, that is, at beginning of the batch or on entry to a stored procedure or sp_executesql.Here's an example:

-- does not use the spatial index without a hint
declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects(@latlonPoint)=1
go

-- this does use the spatial index without using a hint
declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
exec sp_executesql
N'select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects(@latlonPoint)=1', N'@latlonPoint geometry', @latlonPoint
go

-- so does this
create procedure find_zipcode (@g geometry)
as
select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects(@latlonPoint)=1
go

declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
execute find_zipcode(@latlonPoint)

4. If you're using client code, make sure you use a parameterized query and that it passes the value in the SqlParametersCollection. ADO.NET will change this into a call that uses sp_executesql.

5. Don't depend of passing in a string literal to give the query coster the right info, because the code "creates" the point inline, after the query plan has been created.:

-- use a parameterized query with sp_executesql or stored procedure instead
select a.id, a.shape.STAsText()
from zipcodes a
where a.shape.STIntersects('POINT (45.518066 -122.767464)')=1

6. Check the query plan (actual or estimated plan will work) to ensure the index is being used. The Spatial Index step in the query plan is easy to locate.

7. Make sure you have an appropriate spatial index with your data AND for your query sample. You can check this with the spatial index diagnostic stored procedures. To see how to use these procedures, start with the blog series here.

8. Use a hint as a last resort and see if it makes a difference in the query speed. If using a spatial index hint causes an error "Could not create plan", it may mean that your query is too complex. See step 2.

I hope this helps your query response be as fast as possible.

In the past year or so, a few installations had begun to experience problems with the SQL Server 2005 security cache (aka TokenAndPermUserStore) growing too large over time. Some manifestations are connection and query timeouts and queries that take a long time.

The folks at PSS published the canonical blog entry about this problem, including knowledge base articles and suggestions for SQL Server 2005, and also mention of the name of the SQL Server 2008 parameters to configure the size of this cache at the end of the article.

In SQL Server 2008 the size of TokenAndPermUserStore is configurable as mentioned in the SQL Server 2008 Books Online. I happened across it yesterday trying in researching a user inquiry. The configuration options, access check cache quota and access check cache bucket count, are mentioned in the books online without much fanfare or much description. But a knowledge base article, (KB955644) published the day that SQL Server 2008 shipped, describes the defaults and suggest some guidelines for configuring these parameters. Excellent.

There's a newer version of SQL Server 2008 (Release Candidate 0) available on MSDN and Technet.

Another performance-related feature (actually its additional information) allows you to have better visibility into performance-affecting queries. This information is available as columns query_hash and query_plan_hash in the sys.dm_exec_query_stats and sys.dm_exec_requests DMV. There's a really nice illustrative example in Books Online, so I won't repeat it here. You can use the queries in my last blog post to experiment with this as well.

The queries:

SELECT title_id, sum(qty) from sales
group by title_id
having sum(qty) = 30

and

SELECT title_id, sum(qty) from sales
group by title_id
having sum(qty) = 40

are similar enough that they could be parameterized (although they aren't autoparameterized). Parameterization would save query plan entry storage, rather than allocating a separate plan for each incantation of the similar query. You can also use this feature, along with the additional information in sys.dm_exec_query_stats to produce cumulative statistics (such as avg IOs, elasped time, etc. BOL has a nice query to accumulate similar plans, the relavent part is simply "GROUP BY query_hash".

But because you have similar queries, should you always have a parameterized query? What if "...having  sum(qty) = 30" has a completely different plan than "...having sum(qty) = 130"? You can find this information by looking at query_plan_hash. If two plans have similar structure but a different query plan, the query_hash value will be identical but query_plan_hash will be different. That's an indication that parameterization might not be the way to go; the first query to executed will cache it's plan and the other query will use the same plan. This is because of a SQL Server behavior known as parameter sniffing.

Because parameterizing queries is usually one of the first things programmers can do to affect performance, it helps to know when to parameterize. In addition, many folks like to start query tuning by tuning the queries (including similar queries with different parameter values) that are executed most frequently. Query_plan_hash and query_hash gives you visibility into this important information.

There's a newer version of SQL Server 2008 (Release Candidate 0) available on MSDN and Technet.

There's a couple of features that show up in RC0 that are performance related. One is a server setting 'optimize for ad hoc workloads'. This one tells SQL Server to save a "compiled plan stub" in the query cache for adhoc queries, rather than the query plan that's usually saved.

You can see these stubs as cacheobjtype 'Compiled Plan Stub' in sys.dm_exec_cached_plans. And you can see them because trying to resolve a plan_handle to a query plan (using sys.dm_exec_cached_plans or sys.dm_exec_query_stats) doesn't work for stubs, like it does for "real" plan handles. Saving a stub rather than the entire plan saves precious plan cache memory.

Here's a little script to try it out.

sp_configure 'show advanced options',1
reconfigure
go
sp_configure 'optimize for ad hoc workloads',1
reconfigure
go
-- don't use on a production system, clears whole query cache
DBCC FREEPROCCACHE
go
use pubs
go
-- background plans to produce this DMV, note there is a stub for one of these plans
select * from sys.dm_exec_cached_plans
go
-- not autoparameterized, stub produced
SELECT title_id, sum(qty) from sales
group by title_id
having sum(qty) = 30
go
-- you can see the stub
select * from sys.dm_exec_cached_plans
go
-- autoparameterized, whole query plan for parameterized version
SELECT *
FROM titles
WHERE price = 19.99
go

-- stub for the non-parameterized version, plan for the parameterized version
SELECT usecounts, cacheobjtype, objtype, [text]
FROM sys.dm_exec_cached_plans P
   CROSS APPLY sys.dm_exec_sql_text(plan_handle)
   WHERE [text] NOT LIKE '%dm_exec%'
ORDER BY p.usecounts DESC
-- query plan for stub query handle is not saved returns NULL
SELECT sql.text, p.query_plan
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle) sql
CROSS APPLY sys.dm_exec_query_plan(plan_handle) p
WHERE text NOT LIKE '%sys_dm_exec%' AND text NOT LIKE '%msparam_0%'
ORDER BY qs.EXECUTION_COUNT DESC
go

-- set it back, same experiment produces plans not stubs
sp_configure 'show advanced options',1
reconfigure
go
sp_configure 'optimize for ad hoc workloads',0
reconfigure
go

It's a good idea when talking to a database to save on network roundtrips. The table-valued parameter in SQL Server 2008 is an example of a feature that can reduce them in the "1 order, 1-n detail items" use case.

So its always been mildly irritating that in order to insert a row with a filestream column, you'd need to make 2 database roundtrips. One roundtrip is to execute the INSERT statement, inserting an empty value in the filestream column. This causes the file to be created, a NULL value won't cause file creation, and you need the PathName to open a file handle or use the new SqlFileStream .NET class. The second roundtrip is used to get the FILESTREAM_TRANSACTION_CONTEXT and PathName. Then you have the info you need for the handle/SqlFileStream.

However, with the OUTPUT clause introduced in SQL Server 2005 you can do it all in one roundtrip. Here's the SQL statement, based on the BOL filestream example. You can run this code using an ADO.NET DataReader to retrieve the values, and construct a SqlFileStream instance to write to.

INSERT INTO dbo.student
OUTPUT Inserted.Resume.PathName(), GET_FILESTREAM_TRANSACTION_CONTEXT()
VALUES (newid (), 'Mary', CAST ('' as varbinary(max)))";

Of course, if you're writing a large value into a file anyway, the savings of one database roundtrip in the overall scheme of things is questionable, but using minimum roundtrips is a good habit to get into in any case.

This post covers LINQ to SQL and EF worry #2. That is:

LINQ to SQL and EF will encourage "SELECT * FROM..." style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.

LINQ to SQL and EF can return something other than a whole object instance. Here's an example:

// This returns a collection of authors
var query =  from a in ctx.authors
             select a;

// this returns an anonymous type
var query =  from a in ctx.authors
             join ta in ctx.titleauthors on a.au_id equals ta.au_id
             join t in ctx.titles on ta.title_id equals t.title_id
             select new { a.au_id, t.title_id };

The collection of authors returned by the first query is updatable and a reference to it can be used outside the function in which its created. The anonymous type is not updatable and cannot be used outside the function in which its created (there is a nice workaround for this, but even the author of the workaround describes it as a bit of a hack). Although I can see a use for anonymous types in data-binding the good ol' dropdown list. But because they're be can't insert a new row unless you include all the columns you need, perhaps not dropdown comboboxes.

You don't necessarily need to return an anonymous type. You can define (by hand) a class that represents the projection of authorid and titleid. Or use a view. But, in order to do this on a large-scale project, you'd need to define a class for each projection in the entire project. Just for fun, I ask my students "do you know every projection (rowset) that every query in your project returns"? Can you even enumerate them? No one's answered "yes" to that question yet. So anonymous type (or much extra work). Maybe someone will write a tool to do this some day. Or "whole objects"...aka SELECT * FROM.

Counter to this is perhaps you SHOULD know every projection your project returns. Would certain help in tuning to know them all.

BTW, this is the same issue you'd run into using stored procs that return rowsets with LINQ to SQL; they return anonymous types. Using rowset-returning procs with EF *forces* (in beta3) you to define a class to contain the rowset produced. That's good. But EF can't make use of procs that return more than 1 rowset (SqlDataReader.NextResult() in ADO.NET). LINQ to SQL can use these, you get multiple anonymous types.

So simple 1-rowset sprocs with EF, or custom classes for every projection (and sproc in LINQ to SQL) it is. Or nice, clean, full objects (using SELECT * FROM). And what about covering indexes? An overly-simplistic definition would be "non-clusted index defined over the set of columns used by one table in a projection". Makes for some nice query plans (and sometimes even helps with concurrency issues in SELECTs). But if we're always doing SELECT * FROM, flush those covering indexes. The only index that matters is the (possibly) clustered index on the base table.

A counter-argument to this is that you really shouldn't define a covering index for every projection just because you can. Any index takes space and affects insert/updates/deletes, there's a tradeoff here. A second counter-argument is that if you need many, many covering indexes, perhaps your database isn't as well normalized as it could be. I'm not really sure I buy this argument.

MHO: This is a for the mostpart a valid worry.

I'd been meaning to write this entry for a while, about my opinion on LINQ to SQL and the ADO.NET Entity Framework and performance. I've just finished reading the 2-part blog series "Exploring the Performance of the ADO.NET Entity Framework", and was surprised (I guess) that the database performance aspect was barely mentioned.

One way to look at performance is to examine and profile the ADO.NET code, but because both EF and LINQ to SQL are T-SQL code generators (EF is not database-specific, but I'm only talking about SQL Server here, so far), another way to talk about performance is to examine the generated T-SQL code. I want to look at the generated T-SQL code side.

In this posting I'm going to list my top worries WRT the code generated by these APIs. And try and argue for and against the worries at the same time. Here's the list.

1. LINQ to SQL and EF will proliferate dynamic SQL, and will almost surely produce suboptimal dynamic SQL, causing database performance problems and plan cache pollution. And dynamic SQL is evil to start with.

2. LINQ to SQL and EF will encourage "SELECT * FROM..." style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.

3. LINQ to SQL and EF will encourage "SELECT to middle-tier, then update/delete" rather than issuing direct SQL UPDATE/DELETE statements that are set-based.

4. LINQ to SQL and EF will write code that gets too much or too little data at a time. This is a variation of #2. Too much data in one query is a waste. Too little data is also bad because it means extra database round-trips.

5. LINQ to SQL and EF will discourage the use of stored procedures to return rowsets; returning rowsets in stored procedures is *thought to be* superior to returning them in dynamic SQL. In addition, the code generators will use a subset of T-SQL query constructs, that is, only the constructs that the LINQ or ESQL language supports, rather than the full power of the T-SQL query language.

6. LINQ to SQL and EF queries will be "untunable" because, even if you discover a performance problem, you can't change the underlying API code to produce the exact SQL query that you want. There's too many layers of abstraction to change it.

7. Other arguments? I'll accept other points to argue/worry about...

Here the argument about worry-point #1. Other worry-points in upcoming posts.

It's interesting to see the number of folks who do query tuning for a living salivating over the prospect of tuning the bad, bad, bad queries that will assuredly result from these two data access stacks. And the number of DBA-types who'd like to "ban LINQ/EF and databases in their companies". It's also interesting to note that most people who profess a dislike for the generated code, have never seen (or seen very little of) the generated code. Usually, when someone sites a particularly bad instance of generated code they're never able to tell me if the code came from LINQ to SQL or EF. So I'd like to open a clearinghouse for LINQ to SQL and EF queries that generate really poor SQL. Send them to me at my SQLskills email address, together with enough info/data to reproduce it. Conor Cunningham's blog has begun addressing the question of LINQ's IN operator and SQL cache pollution, by writing about IN and SQL plans.

So far, its been my experience that LINQ to SQL, being more relation-centric, will in general generate code that's closer to what a good T-SQL programmer would generate. EF is more "object-centric" and sometimes generates SQL thats meant to construct object graphs. But neither one of them (that I can deduce) can generate a full outer join.

About dynamic SQL. It's almost dogma amongst database programmers that "static SQL" in stored procedures is better for security than dynamic SQL constructed via string concatenation. Besides the obvious association between dynamic SQL and SQL injection, using dynamic SQL means that all users must be given access to the underlying tables, unless you strictly use LINQ to SQL/EF with views and sprocs. LINQ to SQL and EF make every attempt to use parameterized SQL, and LINQ to SQL claims to have minimized/eraticated the potential for SQL injection when using their code. And remember, many programmers use dynamic SQL currently, LINQ to SQL would be an improvement for them.

More about plan cache pollution. I'll start by recommending Sangeetha Shekar's blog series on the plan cache. The starting entry of the 17-part series is here.  Except for the concept of "many different projections when one stored procedure will suffice" there's no cachability difference between parameterized SQL and a stored procedure. Non-parameterized SQL suffers a slight cachability difference until its reused. Because LINQ to SQL and EF are code generators, its likely that they may generate more homogenous SQL than programmers who use SqlCommand.CommandText. And programmers that use dynamic SQL (the ones most likely to use only LINQ to SQL/EF) are likely causing plan cache pollution right now in any case.

MHO.

To round out the new plan guide-related features in SQL Server 2008, there is a new way to express a table hint that increases the plan guide's reach.

The sp_create_plan_guide procedure requires a hint as the last parameter. This can be in a form OPTION (hint), just an XML query plan (in SQL Server 2008 "OPTION (USE PLAN)" isn't required), or NULL. Specifying NULL can be used to "subtract" a hint from an existing query where the hint is hardcoded in source code you don't have the ability to change.

This requirement limited plan guides to hints that could be used in an OPTION clause, and table hints didn't qualify because they are inline. In SQL Server 2008 this limitation is removed. You can specify a table hint by using the syntax OPTION (TABLE HINT (table_alias, hint)). This makes table hints usable in plan guides. You can even specify multiple hints on a per-table basis.

OPTION (TABLE HINT (table_alias1, hint1), (table_alias2, hint2)...)

BOL indicates in a few places that this syntax exists specifically to make table hints accessible to plan guides. Cool.

Another use for an analogous syntax that's not in SQL Server 2008 would be to be able to specific per-table JOIN hints the same way. You can currently specify only one JOIN hint in the option clause. An analogous way to specify multiple, per-join JOIN hints in an OPTION clause would be nice. But, if you're controlling the plan that closely, perhaps its best to resort to plan forcing.

BTW, manditory disclaimer: Hints are (usually) evil and should be used only as a last resort. Using a USE PLAN hint seems, at least to me, similar to writing custom navigation code for each query in IMS/DB, as I did in the '80s. You're throwing the query algebrizer/optimizer away...or at least pinning its hands.

One of the nifty new SQL Server 2005 features was called "plan forcing". You could acheive plan forcing by using the USE PLAN query hint but this made such a hideous query that the better way to do this is to use a plan_guide. A plan_guide is a named database object (like table or view) that associates a query hint with a particular query.

In SQL Server 2008 there have been some enhancements made for plan_guides and plan forcing. The biggest change is that you can create a plan from the plan cache directly, rather than using the rather long set of steps that you needed in SQL Server 2005. You only need a few lines of boilerplate code, setting up and invoking sp_create_plan_guide_from_cache, and the plan guide is automatically created. See SQL Server 2008 Nov 2007 for details and copious examples. The concept of creating a plan guide from a query plan already in cache is known as "plan freezing".

You can also validate that a plan guide will still be used (after dropping an index, for example) by using the new sys.fn_validate_plan_guide function. In addition, its easy to see if a Plan Guide is being used or not with the profiler trace events "Plan_guide_successful" and "Plan_guide_unsuccessful", located under the "Performance" category. These can also be monitored over time with two new perfmon counters "Guided Plan Executions/sec" and, what has to be the most humorous perfmon counter "Misguided Plan Executions/sec".

One other plan guide enhancement that doesn't seem to be mentioned in BOL is that you can use plan guides for INSERT, UPDATE, DELETE, or even MERGE statements in SQL Server 2008. This feature works in the SQL Server 2008 Nov CTP (CTP5).

As always the standard disclaimer is "query hints are (usually) evil" but if you're desperate enough to need plan forcing or plan freezing, its good to know that its there.

One reason why I've seen people use plan freezing is as a hedge against plan instability that can result from what's known as "parameter sniffing". Parameter sniffing is usually a good thing; the query processor sniffs the parameter values the first time you execute a parameterized stored procedure and uses the statistics based on those parameters accordingly. However, if your "first time execution" was based on uncommon parameter values, you're now stuck with a suboptimal plan. Plan freezing can be use to freeze a plan that's most optimal and "override" parameter sniffing. You just have to catch such a plan in the cache.

The drawback is, of course, that you are stuck with the same plan even if the statistics change. I'm sometimes accused of liking plan guides and plan forcing too much, but perhaps it's because I'd oftentimes get a call when parameter sniffing (or a service pack: yes, it does happen every now and again) changed to a query plan that is suboptimal. For those trouble calls (ie, "my query just starting running 4 times slower today and the plan is different from yesterday") plan_guides/forcing let you change the plan without changing the underlying code. When the fix arrives, you just disable the plan guide to let the optimizer go on its merry way.

Thought I'd answer one last question or two from the last time I did the talk on SQL Server 2005 plan guides. If you're asking at this point "what is a plan guide", reference the SQL Server 2005 BOL or this blog entry.

Q. Can you use a plan guide on an encrypted procedure? I want to put a hint on encrypted vendor-supplied code.
A. No, per books online you cannot create an OBJECT plan guide on any object that is itself encrypted or references an encrypted object. You get error message 10512 if you try.

Q. Can you create a plan guide on the SQL statement (rather than an OBJECT plan guide) on a SQL statement that I know exists as part of an encrypted procedure and have the guide be used?
A. You can create a SQL-type plan guide on a statement in a procedure, but it won't be used in any case; you need to create an OBJECT-type plan guide for it to be usable inside a procedure. You can confirm its "non-use" by looking at the showplan output from an un-encrypted procedure with a SQL-type plan guide; you can't see an XML showplan for statements in an encrypted procedure.

Q. Can I have more than one plan guide? For example to ENABLE/DISABLE different guides at different times of the day? Can I have two guides enabled at the same time?
A. You can't create more than one plan guide on a single statement/module-or-object combination, whether they are DISABLEd or not. You get error message 10502 if you try. You'd have to DROP the first plan guide before you can create the second one.

When I discuss separation of users and schemas in SQL Server 2005, I usually think of ownership chains (that follow the object owner which is always the schema owner unless you change it) or object resolution (which now goes: special name, then default_schema/procedure schema, then DBO). Another item that now follows schemas rather than users is query plan reuse with 1-part object names. Create two users:

-- logins already created
CREATE USER bob FOR LOGIN bob
CREATE USER mary for LOGIN mary
GRANT SELECT ON authors TO bob, mary

The query plan for the following query (executed by bob or mary)

USE pubs
GO
SELECT * FROM authors -- note the 1-part object name

would not normally be reused for both bob and mary. In SQL Server 2005, it can be reused if bob and mary have the same default schema. BTW, this is adhoc SQL, rules differ in procedural objects like stored procs. To see the reason for the query reuse, you can obtain the plan handle (reference this blog entry), but leave out querying the plan itself because you don't need it here) and pass it in to the sys.dm_exec_plan_attributes dynamic management function. This function shows the attributes of the plan and includes a second column, is_cache_key. If the cache_keys match, the plan is reused.

One of the rows produced by sys.dm_exec_plan_attributes is named user_id. Interestingly, for the query plan for bob or mary against SELECT * FROM authors, the user_id is 1. DBO. Hmmm... User_id is a misleading name for this column. It's not the user_id of the user who executed the query (and caused the plan to be created) but that user's DEFAULT_SCHEMA's owner's user_id. Looking at the CREATE USER DDL statements above, because I didn't specify a DEFAULT_SCHEMA, both users' DEFAULT_SCHEMA is DBO. So, for adhoc SQL, all other things being equal, the plan will be resued for two users as long as they have the same DEFAULT_SCHEMA. The rules for query plan reuse follow the rules for object resolution, which makes sense, once you think about it.

Of course, now that you know this, ALWAYS use 2-part names. User_id with 2-part names' plans has a special value of -2 which ensures the plan is resued regardless of DEFAULT_SCHEMA of the user who executes it.

SQL Server 2005 implements quite a few technology pieces to help database developers and administrators acheive scale-out while acheiving the appearance of (or actually accomplishing) application or server federation. The information is scattered throughout the books online, but you can read a nice whitepaper by Roger Wolter on how to choose among the features here. I'll be doing some events with SQLskills about scale-out applications in Chicago and NYC, and partners in Reading UK, and I didn't want to miss anything. One technology piece that was a big hit in SQL Server 2000 stayed mostly the same in SQL Server 2005, but I wanted to qualify the word "mostly". That feature is distributed partitioned views and distributed queries.

Distributed queries and distributed partitioned views use OLE DB as a communication mechanism. OLE DB access is built-into the query engine, data can be retrieved by the same basic mechanism from the SQL Server storage engine or from a "remote storage engine". When you're talking to another SQL Server, the OLE DB provider used in SQL Server 2000 is SQLOLEDB. In SQL Server 2005 the OLE DB provider changed to SQLNCLI (or SQL Native Client). There are some subtle differences (remote errors, ability to run out of process) between providers, and, if you upgraded to SQL Server 2005, all your Linked Servers now use SQLNCLI.

SQL Server 2005 does, however provide nice tracing information in SQLProfiler regarding which OLE DB calls the engine makes to the linked server. Because OLE DB is interface based, you can even see the query interface calls, if you want to go to that level of abstraction. You can trace these on either or both sides of a SQL Server linked server; outgoing trace works regardless of the destination provider. You do have to grok the OLE DB interfaces to understand the events, but it's like the query plan, there are various levels of understanding.

Distributed partitioned views remote part of the query tree to the remote database. In SQL Server 2000 the portions of the query would never execute in parallel. In SQL Server 2005, the startup commands can execute in parallel and accomodations are made for caching the remote connections and using overlapped I/O to optimize things. And SQL Server's OLE DB providers have always been able to remote "statistics rowsets" to the other side to help the query processor out.

For distributed query, there's a version of the SQL EXECUTE statement that implements passthrough queries using an AT keyword. You could execute passthrough queries in previous versions by using OPENQUERY, but this required the results to be treated as a rowset (SELECT * FROM OPENQUERY(...)). I've gotten EXECUTE AT to support remote queries that use the "MAX" data types and CLR data types (the assembly must be catalogued to both remote and local database), but the XML data type isn't yet supported for remote queries even though the BOL says so. So cast it to a MAX type and cast back when you get it back on the local server.

That's enhancing one piece of the puzzle, real federated servers. Although this won a TPC benchmark (in the SQL Server 2000 release timeframe, with 32 federated servers) there's LOTS more. As always, the devil is in the details, that's where I usually come in.

While teaching SQL Server 2005 to developers, I'd always point out the irony in the fact that the new EVENTDATA() function, used to return information in a DDL trigger or EVENT NOTIFICATION, returns XML and would be used almost exclusively by DBAs. This seems almost like a nefarious plot to force DBAs, who would be the most XML-phobic, to learn XML and XQuery/XPath in order to deal with these in their daily jobs. In fact, I once wrote a SQL statement using XQuery to coerce a common variant of EVENTDATA() output into tabular format.

After spending the last few months wallowing in query plans, deadlock graphs, and the like, I've actually become convinced that XML and XQuery in SQL Server is *primarily a DBA tool*. I knew that would catch your attention. Besides event and query notifications, blocked process events, query plans, deadlock graphs, and EVENTDATA(), Database Tuning Advisor and new bulkload format files also use XML format. XML query plans can be studied by using XQuery/XPath and can then be modified (slightly) for use in "plan forcing" where the XML plan is used as a "query hint", USE PLAN. And there's probably some uses that I missed. SSIS files and Reporting Services RDLs are XML format. I'm becoming convinced that T-SQL error messages should be in XML format too.

So DBAs: if you haven't yet learned XQuery/XPath and the SQL Server XML methods, its not too late. It's there just to make you more productive (well, maybe "just" is an overstatement, but you get the point).

Just to whet your appitite, here's a little stored procedure that uses dynamic management views, the new CROSS APPLY operator and XQuery to look for physical operations in query plans and correlate it with the SQL query. Could be useful, perhaps sorted by execution count. Cheers.

CREATE PROCEDURE LookForPhysicalOps (@op VARCHAR(30))
AS
SELECT sql.text, qs.EXECUTION_COUNT, qs.*, p.*
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(sql_handle) sql
CROSS APPLY sys.dm_exec_query_plan(plan_handle) p
WHERE query_plan.exist('
declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
/ShowPlanXML/BatchSequence/Batch/Statements//RelOp/@PhysicalOp[. = sql:variable("@op")]
') = 1
GO

EXECUTE LookForPhysicalOps 'Clustered Index Scan'
EXECUTE LookForPhysicalOps 'Hash Match'
EXECUTE LookForPhysicalOps 'Table Scan'

Categories:
Performance | SQL Server XML

I don't know anyone that likes to wait for an application to respond. This is especially a concern with web applications; you start to wonder if the web infrastructure has a problem. Should you hit by "Order" button again?

Because waiting is subjective it's nice to be able to know, when a user calls, how long they have actually been waiting. Or at least how long the database has been waiting. SQL Server 2005 will tell you, at least when locking is involved.

There is a configuration parameter "blocked process threshold" that controls if and when to send notifications whenever a process is blocked for N seconds. You set this through sp_configure; its an advanced option so show advanced options must be on too. It's set to 0 (never notify) by default. You can catch these notifications through WMI/SQLAgent alerts or the new event notifications. Very cool.

So now when a user reports "I've been waiting for this database query for 10 minutes" but the query response is immediate when you check it (the "works for me" answer that users hate), you can see (after the fact) if its lock contention that caused the waiting. “Well, you weren't waiting for locks” might be a better answer. Maybe. 

Just catching up on my blogging before a little vacation next week.

Browsing through the SQL Server BOL from the December CTP, I came across some information on something called "plan guides". There is info on some stored procedures that create and manage plan guides, a database option (in ALTER DATABASE) and a system view that lists plan guides. However, none of this these are active yet in the actual product. Let's hope this is another example of documentation being ahead of things (a la EXCEPT and INTERSECT support), because these sound interesting. According to BOL...

A plan guide is a database object that associates query hints with certain queries in the database. You can create a plan guide (using sp_createplanguide) for a SQL statement or batch. The statement can be standalone or specified to be part of a certain stored procedure. The plan guide specifies an OPTION clause specifying query hints to be applied whenever the statement is executed.

Plan guides must first be "enabled" on in a database (using ALTER DATABASE) before they can be used. Then you turn them "on and off" by sp_controlplanguide enable/disable. When a matching query is detected the hints are automatically “put in place“.

Sounds VERY cool for query plan afficianados. You can have configurable query hinting without touching your queries in the application code. And turn it on or off at will. Only thing is, NONE of it works in the December CTP. Any of the stored procedures produce "not found" message, as does the ALTER DATABASE keyword and the system view. Maybe the BOL IS a little ahead again.

Theme design by Nukeation based on Jelle Druyts