I've been busy for the last few weeks putting together a "manifesto" whitepaper about the .NET-based data access stacks and also the possibilities for .NET programmers in the SQL Server product itself. The whitepaper's direct link is here, although its also available via both the MSDN Data Developer Center and the SQL Server 2008 Application Development website. I don't see the arbitrary distinction between application developers who use databases and SQL developers that some do, so writing this paper and covering both was a good fit.

This whitepaper is quite a bit more high-level than others that I've written, meant to be an all-embracing introduction to .NET and SQL Server and point out the possibilities for .NET programmer. While writing this I was stunned at how many integration points there are between .NET and SQL Server. Every part of the SQL Server product uses .NET in one form or another. Check it out.

Even though there's a quite a bit of coverage of futures, the platform is so fluid that it doesn't include the announcement on the rename of ADO.NET Data Services to WCF Data Services and also the announcement of the Open Data Protocol (OData). Follow the links to these announcements.

Hope you like the paper, I had fun writing it...Cheers.

I'm always thought that what makes or breaks any ORM layer for use in any but the simplest of applications is the underlying SQL code that's generated. In fact, I'm a proponent of using stored procedures with any ORM for best performance. So it was with great interest that I came across the blog posting "Improvements to the Generated SQL in .NET 4.0 Beta1" by the folks that are working on the Entity Framework SQL Server provider at Microsoft. These improvements, along with some others I wrote about in May, show they're serious about improving TSQL performance in the next release of Entity Framework.

I'd really like to see support in the EDM provider for SQL Server (and LINQ to SQL) for the spatial and hierarchyid data types (and UDTs in general), table-valued parameters, filestream through the streaming APIs, and better support for stored procedures (beyong the ComplexType generation that's in EDM 4.0 beta) and UDFs. This would complete and enhance the SQL Server support in future.

Categories:
Data Access

For those of you who's blog readers don't always get updated (rather than brand new) blog entries, there's a happy ending to my last "fixed-almost" blog entry on EDM 4.0 and parameterization. I've heard that this has been fixed to generate VARCHAR(8000) parameters and also that the unparameterized version:

var x = from a in ents.authors
        where a.au_lname == "Smith"
        select new { a.au_lname, a.au_fname };

will generate ...WHERE au_lname = 'Smith' instead of ...WHERE au_lname = N'Smith'.

Note that this is a different semantic; they generate a parameter type that agrees with the column's data type rather than using the .NET string type's data type equivalent (NVARCHAR). This will be better for performance because the database and parameter types agree.

Fix is coming in next beta version of .NET 4.0. I've updated the original blog entry to note this as well.

Categories:
Data Access

One of the first things that I did after installing VS2010 beta this week was to check on a parameterization problem with string parameters, and Linq To SQL/ADO.NET Entity Framework generated code. In the original version, this L2S query:

var x = from a in ents.authors
        where a.au_lname == "Smith"
        select new { a.au_lname, a.au_fname };

or this EF query:

string name = "Smith";
var x = from a in ents.authors
        where a.au_lname == name
        select new { a.au_lname, a.au_fname };

Would cause plan cache pollution and possibly performance problems. These queries produced a query plan with the parameter specified as NVARCHAR(5). Each query with a different length will produce a new plan in the cache, and the use of NVARCHAR rather than VARCHAR will cause a table scan (rather than a seek) in the plan, regardless of how many rows are in the table, because there must be a conversion of au_lname from VARCHAR to NVARCHAR as part getting the rows, making the query non-sargable. The correct database data type (VARCHAR) needs to be specified as the parameter data type.

In .NET 4.0/VS2010 Beta 1, L2S handles this, producing a VARCHAR(8000) parameter, as SQL Server's autoparameterization does. Entity Framework produces NVARCHAR(4000), fixing the length problem but still using the a different data type than the table specifies.

NOTE: This will be fixed in the next beta version of .NET 4.0 to produce VARCHAR(8000). See complete post here.

In my next installment of my MSDN "Under The Table", I'll be writing about the performance implications of data access method code in detail. I do mention "fixed in the next release" WRT L2S and EF, and hope that it will be resolved before .NET 4.0 ships.

Categories:
Data Access

Sorry to appear after a blog drought with theory meta-type blog entries. Too much time at conferences pondering technologies, I guess.

The "relational database bigots" I hang out with don't like LINQ at all. They hope it would shrivel up in a corner and become part of the fad-technology graveyard. Or they're waiting to make big bucks fixing the performance problems they think will ensue.

For the life of me I can't figure out why. I think they're reacting to LINQ to SQL and Entity Framework, layers on top of relational data access that happen to use LINQ to obtain sets of data. Every layer of abstraction is going to make it easier to write less-performant SQL. Granted. But think of it a different way....

You, my SQL-centric friends, have spent much of your career specializing in this declarative query language. SQL. The best language so far (modulo C.J. Date's third manifesto "D" language) to query the relational data structures  at the center of your universe. LINQ is just "SQL for the rest of us(them)". Disguising iterators, everyone is now using SQL (well, at least the SELECT...FROM...WHERE...GROUP BY...ORDER BY part) to query everything and anything. You were right to invest such effort into this query language. With LINQ, it's become universal.

The impression I get (I could be wrong) is that LINQ might become the next OLE DB. OLE DB was supposed to go beyond relational data sources to provide extensions to the column-row paradigm and provide "data access" over non-relational systems like "file system" (OLE DB provider for WebDav), Active Directory, Search Server (fulltext search worked through OLE DB once), Exchange, etc, etc. (Mostly) similar access to every source of data. You wouldn't believe what data sources folks wrote OLE DB providers for in the good-ol days. Anything that could be cast to an API that could implement (or be cast to an implementation of) the Connection/Session/Command/Rowset co-types (or even just IRowset) was fair game.
 
LINQ is a system that provides a *SQL-like* (high-level) language for anything (including collections of objects/.NET types, XML, etc, etc). Anything that has any API that can implement IQueryable/IUpdateable. Yes, LINQ to SQL (or LINQ to Entities) doesn't always generate exactly the most optimal SQL that generates the best query plans. NO argument. But T-SQL/PL-SQL/etc doesn't always generate the most optimal query plans for the storage engine 100% of the time either. Although they will get closer than LINQ-to-database, because they (T-SQL, etc) are more tightly coupled to the relational model and query optimizer than LINQ to SQL is. Every level of abstraction (translation) pays a performance penalty; the goal is to minimize it, while making the abstraction similar. And all of the same kinds of providers that speak the LINQ language are appearing. Just like the OLE DB providers did.

So perhaps, the next version of SQL Server will support not only OLE DB-based Linked Servers, but also "LINQ-ed" servers (bad pun intended) as non-SQL Server rowset sources. Hmmm...

Categories:
Data Access

I've been following the ADO.NET Entity Framework since its inception. Never did buy the idea about it being "more than an ORM", thought it was just marketing hype. After all, if it looks like an ORM and smells like an ORM, then... But, the more I've been wallowing around in it and thinking about its position in the world in general, the more I'd thought about what could drive someone to the "more than an ORM" conclusion. Here's three things that come to mind.

1. It's built on the ADO.NET provider model. This is actually interesting for two reasons. First, anyone with an ADO.NET data provider can plug in without having to change all the tools. More importantly, it provides the EF an entry to any tool that consumes ADO.NET data providers. Imagine being able to  do reporting against your object model with SSRS or Crystal Reports. Or use your CDS (conceptual data store) as an integration source.  Or integration with Office. Granted that currently these utilities don't know anything about IExtendedDataRecord currently, but Microsoft's Report Builder let's you build a user-friendly object model for reporting...why not use a model that already looks like your business objects?

2. Lack of IExtendedDataRecord support brings us to point two. The ESQL language extensions. You can use ESQL to provide a query over the data source, flattening the hierarchies out when needed. And that's just what most tools need, a connection string and a query string. That produces rowsets (even flat rowsets) to be consumed by tools that don't care a bit about object models. Without waiting for all these tools to add support for an object-oriented conceptual data source.

3. Finally, I came across this one. The writer was writing about PowerShell (specifically, why PowerShell in SQL Server when there already is T-SQL), but here's an interesting quote:

Within Microsoft we have a concept called Common Engineering Criteria (CEC). The goal of CEC is to establish consistency across IT products (Windows Server, Exchange, SQL Server, BizTalk, System Center, etc). [some text deleted] ...One of the latest CEC requirements is support for PowerShell. [more text deleted] ...Transact-SQL is a good scripting language but it’s not model driven. We will continue to invest in and evolve T-SQL. But we’ll also continue to invest in and evolve the models over DDL and DML. The model over DDL is SMO. The model over DML is EDM. (emphasis mine)

While this doesn't say (or even imply) that EDM is part of the Common Engineering Criteria, the idea of it a "model-driven DML" is intruiging.

So perhaps "more than an ORM" isn't just marketing hype. We'll just have to see.

Categories:
Data Access

In a blog posting about a few weeks ago, I'd written about noticing a DataSet being created in an Entity Framework program by using the ADO.NET client trace facility. Entity Framework programs do (indirectly) cause a DataSet to be created, but its only for resolving the ProviderInvariantName of the underlying data provider. NOT for anything related to the functioning of the EntityClient or ObjectContext in any way. I've revised the original blog posting to remove the DataSet reference, but just in case you already read it....

The connection string for the EntityClient data provider contains a parameter that refers to the ProviderInvariantName of the underlying data provider (e.g. System.Data.SqlClient). Entity Framework validates the underlying data provider name by using the ADO.NET provider enumerator class DbProviderFactories. DbProviderFactories creates the DataSet to store the provider information obtained from the machine.config file.

I might have guessed this...as I'd written a whitepaper on the functionality of the ADO.NET provider model, "Generic Coding with the ADO.NET 2.0 Base Classes and Factories" not more than a few years ago. Using the DbProviderFactories class is the only way to enumerate the ADO.NET data providers installed on your machine.

Come to my "End-To-End Tracing with SQL Server 2008 and ADO.NET" talk at TechEd next week and I'll show you the trace.

This item was mentioned in other blogs too, but just to mention it again (so my SQLCE and Sync Services friends won't think I overlooked it)...

Visual Studio 2008 SP1 Beta contains direct support for using SQL Server 2008 Change Tracking. When you use ADO.NET 1.0 Sync Services with a SQL Server 2008 database, the Sync Services designer (that's Add/New Item/Local Database Data Cache) adds a checkbox that allows you simply to "Use SQL Server Change Tracking". No extra triggers, no tombstone tables, change tracking does it all for you. Visual Studio generates some scripts to enable change tracking at a database level and at a table level for the tables you select.

There's some nice generated code that warns you if you need to resync because you haven't synchronized often enough. And you can tweak the SQL scripts for more control. You can specify a CHANGE_RETENTION (the default is 2 days) and whether or not AUTO_CLEANUP is ON (it's on by default) in your ALTER DATABASE statement. And if you want to write your own, more granular synchronization or conflict resolution code, you can specify WITH TRACK_COLUMNS_UPDATED = ON (default is OFF).

Something that puzzled me was, although the SQL Server Books Online states: "Using snapshot isolation is the easiest way for you to help ensure that all change tracking information is consistent. For this reason, we strongly recommend that snapshot isolation be set to ON for the database.", the ALTER DATABASE DDL generated by Visual Studio did not include the option to SET ALLOW_SNAPSHOT_ISOLATION ON. Enabling snapshot isolation in SQL Server does involve a non-trivial amount of overhead, so it looks like they set it to the least overhead option and let you decide if you need snapshot isolation after reading the arguments/scenarios where it's useful in Books Online. Probably a wise choice.

Synchronization Framework is an exciting technology, given that there's so many possible uses of synchronization with "Local Data Caches" and, with the support for Sync Services in mobile devices. The next release of the Synchronization Framework (2.0, there's a beta out already) even contains support for Peer-to-Peer synchronization through the model. Perhaps this will fit some additional "replication via services" in databases. We'll have to wait and see if the Peer-to-Peer sync provider even supports databases, I guess. Always something interesting going on...

Last fall I'd asked the folks on the data access team about the possibility of including the new SQL Server 2008 data types in the new data access object mapping layers. At that time they said "no" but they also asked "why do you want it?" and "are people expecting it?". Someone must have had some compelling arguments.

I'm sure you've read this in other blogs, but in Visual Studio 2008 SP1 Beta, which arrived Monday, there was...drumroll please...support for all four date and time types in not only LINQ to SQL, but in ADO.NET Entity Framework. Thanks! I'm still a bit surprised about the inclusion of DateTimeOffset in EF (its a SQL Server-specific data type and EF is more platform-neutral than LINQ to SQL), but it was a cool surprise to have them there. 

Everyone's blog entry also points out there is some special support for Filestream storage (including a System.Data.Types.SqlFilestream class), but they identify Filestream as a data type. Filestream is a storage format rather than a data type, the actual data type is varbinary(max). Minor terminology nit. You can read and write FILESTREAM based types without resorting to PInvoke. Cool.

Of course, there were some SQL Server 2008 data types (the SQLCLR-based types) that aren't "in there". These are the spatial types (Geometry and Geography) and the HierarchyID data type. Perhaps for EF/LINQ to SQL version 2. These types are correctly identified in Server Explorer in a table but attempting to add a table containing them to an EF diagram fails (it actually leaves the type out of the diagram) and in LINQ to SQL throws an error "the data type is not supported".

That means if you have one of these types in your SQL Server 2008 database, you need to do something funky like treat it as a varbinary(max) and deserialize it (think IBinarySerialize) on the client yourself. Because the SQLCLR-based data types would have a one-to-one mapping to .NET classes (Geometry = Microsoft.SqlServer.Types.SqlGeometry, Geography = SqlGeography, HierarchyID = SqlHierarchyID) these would possibly be straightforward to implement in the next release. They WOULD be a potential model-breaker in that you would have "columns" (entity properties) that have their own properties and methods. Another whole level of indirection.

I've said before that IMHO, spatial support is the "killer developer feature" in SQL Server 2008. Let's hope for more API support in future. But THANKS for the inclusion of the date and time types.

It appears that there are some changes in .NET 3.5 System.Transactions (or System.Transactions.dll and System.Data.dll version 2.0.50727.1433 if you're looking at assembly versions). Florin Lazar blogs about a change to the syntax here. And Alazel Acheson blogs about changes to SqlClient changes to accomodate using lightweight transactions with less promotion to distributed transactions in the ADO.NET Team blog.

It is interesting to read how the latest version of SqlClient can combine automatic transactions with connection pooling to use promotion to a distributed only when necessary. It was reminiscent of how COM+ used the connection pool; the difference is that COM+ always used distibuted transactions. Some of the changes use a transaction-aware connection reset mode that's only supported in SQL Server 2008. So server changes were required as well.

At the end of the blog entry Alazel mentions that you can't use this facility with two open connections at the same time; it requires that there is a free connection with the appropriate transaction scope in the pool. I immediately thought of SQL Server's sp_getbindtoken and sp_bindsession, the manual way to allow two connections to share a single transaction context. But sp_getbindtoken and sp_bindsession are on the deprecation list (to be removed in a future version of SQL Server) in SQL Server 2008. The books online entry for these calls recommends "using MARS or distributed transactions instead". But why MARS?

MARS (multiple active resultsets) made its appearence in SQL Server 2005 and its data access stacks. It permits you to have multiple interleaved sessions while using a single SQL Server connection. These multiple sessions share the same transaction, modulo an interesting but escoteric behavior known as batch-scoped transactions. MARS also does not support named savepoints.

What MARS and sp_getbindtoken/sp_bindsession have in common is that they are both solutions to the same "problem". I prefer calling this a SQL Server "behavior" because its not technicall a problem, just how the underlying network stack works. SQL Server does not allow other activity on a connection (like an UPDATE or a second SELECT) while reading a rowset is in progress. Pre-MARS, if you wanted to have multiple commands in a single transaction scope, you would use two connections and "bind" the transaction scopes together with the transaction token. With the MARS capability this is no longer necessary.

So, the OpenConnction1/DoCommand1/Close then OpenConnection2/DoCommand2/Close pattern is now acommodated by System.Transactions and SQL Server 2008 without transaction promotion. To do two commands with the same transaction scope without closing the connection, use a single connection and multiple sessions with MARS. With either data access pattern, only a single physical connection to SQL Server is needed.

This is the last post in the series, at least for now. I'll update it (or post more on the topics discussed here) as the products involved evolve and mature. This one's about:

LINQ to SQL and EF queries will be "untunable" because, even if you discover a performance problem, you can't change the underlying API code to produce the exact SQL query that you want. There's too many layers of abstraction to change it.

T-SQL is a declarative language, allowing you the ability to rewrite queries for better performance. A whole cottege industry has grown up around this (I teach it myself), and it usually consist of changing the SQL to get the plan you want, based on your intimate knowledge of the (current) data and the (current) use cases. As one of the simplest examples, you can switch between joins, correlated subqueries, and nested subqueries to see which one gives best performance. Or use EXISTS rather than a correlated subquery or IN clause.

Because the queries (LINQ and/or ESQL) are programmatically transformed in SQL queries there is not (that I'm currently aware of) the ability to "rephrase" LINQ/ESQL queries to produce subtlely different SQL queries and thus better performance. If you can produce rephrased SQL by changing a LINQ/ESQL query (not just rewriting a LINQ/EF query to produce different results that are more optimal), I'd be interested in hearing about it. Perhaps another cottege industry awaits...

BTW, although most/many SQL queries can be rewritten (sometimes many different ways) and tested for best generated query plan/best performance, the limitation is that, in future, the query processor can get smarter, thus making your past years' work unncessary. Usually though, you've benefited from rewriting SQL for that extra 6 mos-5 years until the query processor changed anyway.

Besides query rewrites, you can also "hint" queries, in most dialects of SQL I've seen. This helps when the query processor chooses a suboptimal plan (uncommon, but not unheard of) and you have intimate knowledge or data and use cases. Or when you're trying to service different use cases with the same query; SQL queries only have one plan at a time (modulo parallelized plans) and you might have to satisfy different use case by differently hinting the same query. Because the translation to SQL is deeply imbedded in the LINQ/EF source code, if I find a performance problem that can be helped with a hint, I can't hint in the LINQ/ESQL code. This means going back to using stored procedures (they work with hints) and away from the model.

Hinting is usually not preferred over rewriting the SQL because hints "tie the query processor's hands", i.e. if the statistics change so that a different plan would work better, the query processor can't use this information because you've told it how to accomplish the query. You've changed SQL from a declarative language to an imperative language. It's best not to put query hints in code, but separate them to a separate layer. SQL Server 2005 calls this separate layer plan guides. The plan guide is a named database object that relates a hint to a query without changing the underlying code. You can add/drop plan guides or turn them on/off at will. Or re-evalute them when things (the statistics or use cases) change.

Can you use plan guides with LINQ/EF queries? Two things to keep in mind. First, a plan guide for a SQL statement requires an exact match on a batch-by-batch basis. Machine-generated SQL will likely make exact match easier, but you will have to check that the guides are being used each time LINQ/EF libraries changes. Second, plan guides work best if you have a limited number of them in your database. They're meant to be special-case... not to add another level of complexity to an already complex (and getting more so as the layers of abstraction increase) situation. So use these with care.

So, is this an issue worth worrying about? I think we'll need to wait and see. Fix a few "bad (generated) SQL or bad queries" problems before giving up entirely. Or, fix performance problems (in the generated SQL) by going to stored procedures and see how many procs you have after a year. Are the folks who are licking their chops in anticipation of LINQ/EF related perf problems justified? Well, its not me that thinks optimizing declarative languages will always have its place.

MHO.

Hope you enjoyed this series. As implementations of these models take hold, I'll be watching for items that would change my opinions. Or prove them...


This post is part of a series about worries when implementing LINQ to SQL or ADO.NET Entity Framework from a SQL database-centric programmer's perspective. The last two worries are related. It's mostly about either level of abstraction being one level away from the "real SQL code" that's being executed.

First off...
LINQ to SQL and EF will discourage the use of stored procedures to return rowsets; returning rowsets in stored procedures is *thought to be* superior to returning them in dynamic SQL. In addition, the code generators will use a subset of T-SQL query constructs, that is, only the constructs that the LINQ or ESQL language supports, rather than the full power of the T-SQL query language.

A stored procedure is always thought of by stored procedure afficianados as representing a "contract" between consumer and provider. That is, the database metadata tells me exactly what I'm going to get. Although the database metadata does indicate number, type, etc of parameters, this is absolutely not true for rowsets returned by stored procedures. There is NO database metadata that records anything about the returned rowsets, or even how many rowsets a stored procedure will return. Actually the number of rowsets returned is part of the ANSI standard but SQL Server implement it. In addition, errors that might happen in the middle of a stored procedure might result of rowsets being missing. And than there's always the possibility of returning multiple and/or different rowsets by using a stored procedure with conditional code... Not much of a rowset contract at all.

One way to ameliorate this problem (in SQL Server) is to use multistatement table-valued functions to return one rowset with known metadata. The main hassle with this is performance; a multistatement table-valued function is the equivalent of filling a table variable in code and then returning it. There are I/O considerations (the I/O of reading the base tables + I/O of reading the table variable at the consumer) The are also performance considertations as SQL Server table variables have no statistics, if the table-valued function is used as a row source in a larger query (composable queries), there is no way to estimate the number of rows returned by the TVF.

SQL Server's strongly typed table-valued parameters in SQL Server 2008 would be an analogous concept, but currently these are limited in being "input only" in procedures. No strong typed results yet. Oracle is an exception to this "no contract for rowsets" concept. Because Oracle doesn't return rowsets from stored procedures, they introduced a special parameter type called refcursor. Refcursors can appear in database APIs as a parameter (of type Refcursor or more generically "table"). And you can have strongly typed Refcursors, providing the needed contract. We'll have to wait for Oracle's (or DataDirect Technologies') EF provider or LINQ abstraction product to see how they use this.

So now that we've determined that there is no more of a rowset contract for stored procedures than ad-hoc SQL (the difference really is in SQL encapsulation and support of ownership chains, but that's another story), what about extentions that ESQL doesn't support? There are database-specifc extensions like SQL Server's PIVOT operator, or ANSI SQL standards, like ranking and windowing.

LINQ folks are quick to talk about implementation thorugh "extension methods" but the long and short of this is that these are a LINQ-ism, unrelated to LINQ to SQL. That is, the LINQ construct to SQL dialect statement mapping is fixed and embedded in the LINQ product. Using extensions to the *SQL statement mapping* (ie changing what SQL statement is produced) require either going deep into the framework (if this can be done at all) or implementing equivalent concepts on the client side, leaving the generated database code alone.

EF may have a little better story with this because each provider-writer implements the ESQL to query mapping, conceivably you could write a custom provider to encapsulate the supplied provider with extensions. However, the ESQL language itself does not have to capability of ODBC-like "escape clauses", so there'd be no way to express this extended SQL-based functionality in ESQL.

So I'd classify the "subset of SQL" and "stored procedure rowset is an anonymous type" problem as something that might be worked out in future releases of databases and frameworks. Until LINQ to SQL or EF provides escape clauses in the framework, the easiest way out is the ultimate escape clause, using the stored procedure that returns (anonymous) rowsets. And the more stored procedures are used (not CUD procedures, which enhance the model, but rowset-returning procedures), the farther away from the model you get. Which interferes with the usefulness of the model in general.

MHO.

Quite a bit of activity last week in the Visual Studio and ORM spaces.

Visual Studio released a CTP of Visual Studio 2008 support for SQL Server 2008 CTP6. It allows you to connect to CTP6 with Server Explorer, which enables quite a few other features to work. check with Overview document on the download page. SQLCLR projects and T-SQL and SQLCLR debugging work too.

Curiously the Overview document says "using the LINQ to SQL Designer with SQL Server 2008 databases" is specifically not supported. But I've used SQL Server 2008 WITH NO new data types (e.g. DATE, TIME, etc) and it does seems to work. EF beta3 designer too. Maybe they are referring to lack of support for the new data types. Speaking of which...

Faisal Mohamood announced on the ADO.NET team blog that they ARE working on support for the four new date and time types in SQL Server 2008. Hooray. Now let's keep our fingers crossed for SQL Server UDT support.

Julie Lerman wants these types in EF too, but I'm not sure I agree. DATE and TIME would be nice because they're ANSI standard types. DATETIME2 maps favorably to DateTime in EF, actually better coverage of the value space than DATETIME. Maybe what she really means is that she wants the SSDL and MDL and the designer to be aware of them and generate the closest possible mapping. Even if that's "string".

The distinction is that the ADO.NET Entity Framework is meant to be database neutral, not SQL Server specific, and supports a discrete subset of database data types not a 1-1 mapping of EF types to SQL Server types. LINQ to SQL, OTOH, is SQL Server-specific and should support a 1-1 mapping. At this point I'm really thinking they should have called it "LINQ to SQL Server". Not that the *concept* can't be extended to other databases (LINQ to DB2, LINQ to Oracle, etc) but these databases will have to implement their own. The fact is that LINQ to SQLCE does exist and is separate from LINQ to SQL (Server). And other vendors may follow with their own, database-dependent implementation of a light LINQ layer. But speaking of SQLCE...

The SQLCEBLOG announced the beta of SQLCE 3.5 SP1 with ADO.NET Entity Framework Beta 3. There was support for EF in SQLCE a while ago, but it was postponed in the 3.5 release which sports LINQ support. Now it looks like Entity Framework support is back. Hooray. BUT, although a common programming model between SQLCE and a subset of SQL Server functionality is a great idea, IMHO the best and most lightweight programming model for SQLCE uses SqlResultset. Still the winner.

Oh yes, BTW, SQL Sevrer 2008 CTP6 was released last week. But that's been posted on EVERY SQL Server related blog in the world. Congrats for another CTP folks. Sparse columns, filtered indexes, and filtered statistics are in as well as quite a few other features. 

Sometime before SQL Server 2008 RTM, the libraries for LINQ and LINQ to XML (that is System.Core.dll and System.Xml.Linq.dll) but NOT LINQ to SQL will be added to the SQLCLR "approved" assembly list. The approved assembly list is the list of .NET Framework libraries that have been tested in a SQL Server-hosted environment, annotated their code with HostProtectionAttributes where needed, and can be used in user assemblies that can be cataloged with "PERMISSION_SET = SAFE". That means you can use 'em in SQLCLR user-defined functions, procs, and other SQLCLR database objects.

Being curious, I asked "why LINQ and LINQ to XML but not LINQ to SQL"? The answers I got were "LINQ to SQL not tested in this environment" and "not sure there should be yet another way to access the database with .NET". Fair enough. Although, its never seems to be a problem to provide yet another way to mix SQL Server and XML. The number of different ways now exceeds the number of fingers on one hand.

OTOH, SMO doesn't appear to be added to the approved assembly list in SQL Server 2008. Nor has WCF. Oh well.

Enjoy...

I often hear those who identify themselves as "database programmers" (sometimes I think these may be folks who program only inside-database code), say: "I'd give learning LINQ and/or LINQ to SQL/Entities or Entity SQL (eSQL) a go in my sparse time, but I don't have a good book or a tool.

I haven't read a book on either of these yet, spending most of my time in BOL. BOL is a good start (but not comprehensive) for LINQ and LINQ to SQL. eSQL docs are pretty sparse, but that's because its not a shipping product yet. Maybe things will improve. (Note: if anyone wants to send me a copy of their book to read in my spare time, I'll give it a go ;-).

Following LINQ is more difficult than EF (modulo LINQ to Entities) because there's many implementations/dialects. There are operators that LINQ (over objects) can do, but LINQ to SQL or LINQ to Entities cannot.

On to tools:

For LINQ, I like LINQPad by Joseph Albahari. Simple to set up, although the queries are not *exactly* the same as you'd write in a program. And it doesn't support LINQ to Entities yet (or LINQ to DataSet for that matter). But, all in all, an excellent tool.

For eSQL, there's no comparison to Zlatko Michailov's eSqlBlast. This is not only an eSQL practice query processor, but also a visualizer for the results you'd get back. You just type in the eSQL. Even comes with source code. Latest version is on CodePlex. This is straight eSQL, no LINQ to Entities support here either.

Neither one of these has Intellisence or statement completion, I believe.

Enjoy...

This post covers LINQ to SQL and EF worry #2. That is:

LINQ to SQL and EF will encourage "SELECT * FROM..." style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.

LINQ to SQL and EF can return something other than a whole object instance. Here's an example:

// This returns a collection of authors
var query =  from a in ctx.authors
             select a;

// this returns an anonymous type
var query =  from a in ctx.authors
             join ta in ctx.titleauthors on a.au_id equals ta.au_id
             join t in ctx.titles on ta.title_id equals t.title_id
             select new { a.au_id, t.title_id };

The collection of authors returned by the first query is updatable and a reference to it can be used outside the function in which its created. The anonymous type is not updatable and cannot be used outside the function in which its created (there is a nice workaround for this, but even the author of the workaround describes it as a bit of a hack). Although I can see a use for anonymous types in data-binding the good ol' dropdown list. But because they're be can't insert a new row unless you include all the columns you need, perhaps not dropdown comboboxes.

You don't necessarily need to return an anonymous type. You can define (by hand) a class that represents the projection of authorid and titleid. Or use a view. But, in order to do this on a large-scale project, you'd need to define a class for each projection in the entire project. Just for fun, I ask my students "do you know every projection (rowset) that every query in your project returns"? Can you even enumerate them? No one's answered "yes" to that question yet. So anonymous type (or much extra work). Maybe someone will write a tool to do this some day. Or "whole objects"...aka SELECT * FROM.

Counter to this is perhaps you SHOULD know every projection your project returns. Would certain help in tuning to know them all.

BTW, this is the same issue you'd run into using stored procs that return rowsets with LINQ to SQL; they return anonymous types. Using rowset-returning procs with EF *forces* (in beta3) you to define a class to contain the rowset produced. That's good. But EF can't make use of procs that return more than 1 rowset (SqlDataReader.NextResult() in ADO.NET). LINQ to SQL can use these, you get multiple anonymous types.

So simple 1-rowset sprocs with EF, or custom classes for every projection (and sproc in LINQ to SQL) it is. Or nice, clean, full objects (using SELECT * FROM). And what about covering indexes? An overly-simplistic definition would be "non-clusted index defined over the set of columns used by one table in a projection". Makes for some nice query plans (and sometimes even helps with concurrency issues in SELECTs). But if we're always doing SELECT * FROM, flush those covering indexes. The only index that matters is the (possibly) clustered index on the base table.

A counter-argument to this is that you really shouldn't define a covering index for every projection just because you can. Any index takes space and affects insert/updates/deletes, there's a tradeoff here. A second counter-argument is that if you need many, many covering indexes, perhaps your database isn't as well normalized as it could be. I'm not really sure I buy this argument.

MHO: This is a for the mostpart a valid worry.

I'd been meaning to write this entry for a while, about my opinion on LINQ to SQL and the ADO.NET Entity Framework and performance. I've just finished reading the 2-part blog series "Exploring the Performance of the ADO.NET Entity Framework", and was surprised (I guess) that the database performance aspect was barely mentioned.

One way to look at performance is to examine and profile the ADO.NET code, but because both EF and LINQ to SQL are T-SQL code generators (EF is not database-specific, but I'm only talking about SQL Server here, so far), another way to talk about performance is to examine the generated T-SQL code. I want to look at the generated T-SQL code side.

In this posting I'm going to list my top worries WRT the code generated by these APIs. And try and argue for and against the worries at the same time. Here's the list.

1. LINQ to SQL and EF will proliferate dynamic SQL, and will almost surely produce suboptimal dynamic SQL, causing database performance problems and plan cache pollution. And dynamic SQL is evil to start with.

2. LINQ to SQL and EF will encourage "SELECT * FROM..." style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.

3. LINQ to SQL and EF will encourage "SELECT to middle-tier, then update/delete" rather than issuing direct SQL UPDATE/DELETE statements that are set-based.

4. LINQ to SQL and EF will write code that gets too much or too little data at a time. This is a variation of #2. Too much data in one query is a waste. Too little data is also bad because it means extra database round-trips.

5. LINQ to SQL and EF will discourage the use of stored procedures to return rowsets; returning rowsets in stored procedures is *thought to be* superior to returning them in dynamic SQL. In addition, the code generators will use a subset of T-SQL query constructs, that is, only the constructs that the LINQ or ESQL language supports, rather than the full power of the T-SQL query language.

6. LINQ to SQL and EF queries will be "untunable" because, even if you discover a performance problem, you can't change the underlying API code to produce the exact SQL query that you want. There's too many layers of abstraction to change it.

7. Other arguments? I'll accept other points to argue/worry about...

Here the argument about worry-point #1. Other worry-points in upcoming posts.

It's interesting to see the number of folks who do query tuning for a living salivating over the prospect of tuning the bad, bad, bad queries that will assuredly result from these two data access stacks. And the number of DBA-types who'd like to "ban LINQ/EF and databases in their companies". It's also interesting to note that most people who profess a dislike for the generated code, have never seen (or seen very little of) the generated code. Usually, when someone sites a particularly bad instance of generated code they're never able to tell me if the code came from LINQ to SQL or EF. So I'd like to open a clearinghouse for LINQ to SQL and EF queries that generate really poor SQL. Send them to me at my SQLskills email address, together with enough info/data to reproduce it. Conor Cunningham's blog has begun addressing the question of LINQ's IN operator and SQL cache pollution, by writing about IN and SQL plans.

So far, its been my experience that LINQ to SQL, being more relation-centric, will in general generate code that's closer to what a good T-SQL programmer would generate. EF is more "object-centric" and sometimes generates SQL thats meant to construct object graphs. But neither one of them (that I can deduce) can generate a full outer join.

About dynamic SQL. It's almost dogma amongst database programmers that "static SQL" in stored procedures is better for security than dynamic SQL constructed via string concatenation. Besides the obvious association between dynamic SQL and SQL injection, using dynamic SQL means that all users must be given access to the underlying tables, unless you strictly use LINQ to SQL/EF with views and sprocs. LINQ to SQL and EF make every attempt to use parameterized SQL, and LINQ to SQL claims to have minimized/eraticated the potential for SQL injection when using their code. And remember, many programmers use dynamic SQL currently, LINQ to SQL would be an improvement for them.

More about plan cache pollution. I'll start by recommending Sangeetha Shekar's blog series on the plan cache. The starting entry of the 17-part series is here.  Except for the concept of "many different projections when one stored procedure will suffice" there's no cachability difference between parameterized SQL and a stored procedure. Non-parameterized SQL suffers a slight cachability difference until its reused. Because LINQ to SQL and EF are code generators, its likely that they may generate more homogenous SQL than programmers who use SqlCommand.CommandText. And programmers that use dynamic SQL (the ones most likely to use only LINQ to SQL/EF) are likely causing plan cache pollution right now in any case.

MHO.

In a previous blog entry a while ago, I wrote about writing a single SQL statement that did a SQL MERGE operation, used the OUTPUT clause to put out a rowset and directed the rowset into an INSERT statement. This happens in one statement without the need of explicitly defining a temporary table and using multiple SQL statements.

Turns out that this feature has a name: Composable DML. I've also heard it called (in SQL Server Books Online) "DML table source".

I've also heard the term composable queries, both in references to both Entity Framework functions and LINQ to SQL. They even have an "IsComposable" attribute in the Function definition) In this context, it means that the output of a function that produces a rowset (in this case a SQL Server table-valued function) or code that produces a rowset can be combined with further queries that do more filtering or projection, or even retrieve related rows. When the queries are submitted to the database, they are composed into a single SQL statement.

The point is pretty similar, reduce the resources required to accomplish a set of operations by reducing the number of statements or database round-trips needed.

After getting insert/update/delete stored procedures going with a simple standalone table (no relationships), I started thinking about how I'd do this in a many-many relationship. Let's say that I have students and classes, with a many-many relationship represented by a studentclass table with only a student_id and class_id. EDM maps this to a many-many relationship, leaving the "join table" out of the conceptual model (no studentclass entity). That's what I'd want. But...how to map insert/update/delete on the studentclass table, because there is no entity for it?

Turns out that this is OK, because mapping insert/update/delete to sprocs is not exposed on the entity (CSD layer), but on the mapping (MSD) layer. And, sure enough, there is an ModificationFunctionMapping element exposed under AssociationSetMapping, as well as under EntityTypeMapping. The designer doesn't support this yet, but the raw XML schema does.

In addition, the Entity Framework "Help Overview" file does mention "Mapping Association Sets to stored procedures". It doesn't help matters by showing an example of Categories and Products in the Northwind sample database, where the Categories and Products tables don't have a many-many relationship or a join table. Do I *need* stored procedures on a zero-or-one-many relationship as well as on the "base" tables that comprise the relationship? I can insert and delete a relationship between an existing product and an existing category, so I might need these in addition to the stored procs mapping the base tables, in case these aren't already enforced by key constraints in the database and model? Doesn't seem like I'd need them at all, unless I specifically left the appropriate key constraints out of the database, but wanted them in the mapping and conceptual model.

But the Help Overview specifically mentions "mapped to a join table in a relationship", so many-many sounds like to the relationship type (AssociationSet cardinality) I'd need stored procedure mappings for.

So there IS a way to do it with many-many.

Categories:
Data Access

I've been working with the new ADO.NET Entity Framework designer for a few weeks now, and I've got to like it. If you're used to the LINQ To SQL Object Relational designer, in which all the action takes place in the "diagram pane", it takes some getting used to. The ADO.NET Entity Framework designer actually uses three panes, the Entity Designer pane, the Model Browser pane (which docs itself in the Solution Explorer group at the right in my layout), and the Mapping Details pane (which docs itself in the Error List group at the bottom in my layout) pane.

In general, you can work with Entities (the CSDL) in the diagram pane or Model Browser. You can work with the store (database, the SSDL) in Model Browser, and the mapping (MSL) in Mapping Detalis. The Properties Window displays the property of the currently selected item, as with most things in Visual Studio.

Knowing which pane to work in makes it easier to figure out how to perform operations like mapping INSERT/UPDATE/DELETE to stored procedures. In the download announcement (which lists the features by pane) one the the new features was "View/Edit mappings using stored procedures (insert/update/delete)". The Entity Model Tools document seemed to indicate "not yet supported". I remember doing this "by hand" in earlier betas and it was fairly painful to hand-edit the XML file(s). I used the Model Browser to create a Function Import for my stored procedures and was ready to go...well...

Mapping the stored procedures to the insert/update/delete procedures in the Entity Designer (my LINQ to SQL experience got in the way here) and trying the Model Browser because the Mapping Details didn't seem to know about it, I finally located the little buttons on the extreme left side of the Mapping Details pane. The bottom button, "Map Entity to Functions" brought up an alternate Mapping Details pane, where I could map the stored procedures. Whew.

Maybe I'm "GUI impaired" but after I thought about it a while, where else would this function fit in the designer? Like I said, the EF/EDM is rich enough to make the designer quite complex. But after adding an ordinary "read" stored procedure (ie, not for insert/update/delete) to the model, and the XML hand-editing that this entails, I'm HAPPY to have to designer. Keep adding those new features...

Categories:
Data Access

Since its appearance in SQL Serve 2008 CTP5, folks have been wondering about the reason for SQL Server's new Change Tracking feature, and how it differs from Change Data Capture (CDC). The CTP5 BOL has a nice writeup about how it differs from CDC, so I'd encourage you to start there. But about "why it's there...".

The main reason for change tracking, as far as I can see, is to be used with ADO.NET Sync Services. ADO.NET Sync Services is part of the Microsoft Sync Framework and is a way to do (I'm trying to summarize here) "client directed programmable replication". For database folks ADO.NET Sync Services can be thought of as providing synchronization between SQL Server Compact Edition and any relational datastore, but the model is extensible. ADO.NET Sync Services 1.0 shipped with SQL Server Compact Edition 3.5 in Visual Studio 2008 (there may be other ship vehicles I'm unaware of) and provides 2-tier, 3-tier, or service-oriented synchronization. It provides hub-and-spoke synchronization. ADO.NET Sync Services version 2.0 will add (the CTP is out now) peer-to-peer synchronization. Microsoft Sync Framework (CTP 1 is out now) includes sync support for data stores that aren't necessarily databases. The best place to go to find some highly informative examples that illustrate the synchronization patterns supported by the model is "The Synchronizer's" (Rafik's) blog.

What's this all have to do with SQL Server 2008 change tracking? Although ADO.NET Sync Services 1.0 is a good start (and there's GUI-based designers in VS2008), it usually means adding timestamp or datetime columns and tombstone tables (tables that track the primary key of deleted rows) to existing database tables and triggers to populate the information sync services needs. That's fairly intrusive, especially with packed applications. SQL Server 2008 change tracking takes care of all of this for you.

You turn on change tracking on a database with ALTER DATABASE and on individual tables with ALTER TABLE. Change tracking does the rest. You can access the information you need for Sync Services applications using the CHANGETABLE table-valed function and a few related functions. You usually want to turn on ALLOW_SNAPSHOT_ISOLATION in the database as well, because change tracking works by tracking when a transaction is committed, rather than when its started.

Change tracking provides the information ADO.NET Sync Services needs to answer request such as "what rows have changed since my particular client (change originator is tracked by change tracking as well) last synchronized" and "have I synchronized with the main database so long ago (change tracking info has a DBA-specified retention) that I need to sync the entire table because the incremental info I need is no longer available".

Currently the VS2008 designers (they're accessed by Added a "Local Database Cache" item to a programming project) don't have an option to indicate "just use SQL Server 2008 and I'm using change tracking", so you have to code the synchronization procedures to use change tracking with Sync Services by hand. Perhaps such a feature is in the works for ADO.NET Sync Services 2.0.

And how about SQL Server Compact Edition (currently the only "ClientProvider" that ship with Sync Services 1.0). Well the SQL Server Compact Edition, Sync Services "just works". No special setup is required, although you may (I haven't determined this) need SQL Server Compact Edition 3.5. This version of SQLCE ships with VS2008. One last thing...ADO.NET Sync Services 1.0 isn't available for compact devices yet, so when you sync with SQL Server Compact Edition, it must be deployed on the desktop.

As a continuation of the previous discussion, here's a DELETE of a title row without fetching it from the database:

pubsEntities model = new pubsEntities();
titles deletetitle = new titles();
deletetitle.title_id = "BU9994";
deletetitle.EntityKey = new EntityKey("pubsEntities.titles", "title_id", "BU9994");
model.Attach(deletetitle);
model.DeleteObject(deletetitle);
model.SaveChanges();

In this case, you need to populate both the property (title_id) that corresponds to the primary key and also populate the EntityKey itself. Attach the object to the store (note that all the original values need not be specified), delete the object from the store, and save the changes. This bit of trickery convinces the ObjectStateManager that the object was in the store all the time, that you'd fetched it from the database. Then you deleted it from the store, so they'll delete it from the database.

Of course, if you have associated titleauthor rows, you'll get an error trying to delete that title. As you would in the relational database. Of course, you can put an ON DELETE CASCADE in the database or delete all of the associated rows first. SQL Server 2005 also allows ON DELETE SET NULL and ON DELETE SET DEFAULT, but you relational data model has to allow this; be careful about using these. You can synthesize and delete all the associated rows and the parent in a single round trip if know what the rows keys are in their entirety, but how do you delete a set of rows in a single statement without fetching or knowing all of the primary key pieces (like the associated au_id in titleauthor table)? Better yet...

UPDATE without fetch is even going to be even more interesting, because you need to know the original row values. Or how about updating/deleting a set of rows based on a non-key column, like "UPDATE titles SET royalty = NULL WHERE ytd_sales IS NULL"? Alex James, Metadata PM of the EF team has posted the first in a series of blog entries on doing just these types of things with EF...it will be interesting to see how far you can take this.

Categories:
Data Access

The ADO.NET Entity Framework Beta 3 was released this week. For details, see the ADO.NET team blog.

One of the things that bothered me originally about EDM was that Entity SQL in V1 doesn't contain INSERT, UPDATE, or DELETE statements. Only SELECT. So to update or delete a row, I'd have to fetch it first. Let's start with something simple, like an INSERT with a foreign key constaint?

All of the "add" examples I've seen always insert a new customer and new order and new order details at the same time. But what if your customer already exists?

As an example, the titles table in the pubs database requires a pub_id of an existing publisher. But EDM represents this constraint as a publishers class field in the titles class and an additional  publishersReference instance as a member also. My first attempt to create a new title with an existing publisher (pub_id 1389) went like this.

titles t = new titles();
t.title_id = "BU9994";
// rest of columns elided for clarity
// statement to set the pub_id to 1389
model.AddTotitles(t);
model.SaveChanges();

My first "statement to set the pub_id to 1389" was:

publishers pub = new publishers();
pub.pub_id = "1389";
t.publishers = pub;

This one tries to add the publisher row too and gets a duplicate key violation.

This worked
t.publishers = model.publishers.First(pub => pub.pub_id == "1389");

BUT it causes a SELECT to get the publisher from the database. I wondered at this point if I really needed to do a SELECT to INSERT a row with an existing foreign key row. With some help from Danny Simmons, a much nicer way is to use the publishersReference class and synthesize an EntityKey.

t.publishersReference = new System.Data.Objects.DataClasses.EntityReference<publishers>();
t.publishersReference.EntityKey = new EntityKey("PubsTestEntities.publishers", "pub_id", "1389");

This does the insert correctly with no gratuitous SELECT. Confirmed by SQL Profiler. Good.

Categories:
Data Access

Actually yesterday, but today was the first time I'd had a chance to write about it.

Having done quite a bit with what's now being called "traditional web services", my first impression of REST were, I'll have to admit, the thought that it was web services without schema. I kinda like my metadata; it's always been irritating that stored procedures do not store ANY metadata on the number or shapes of the rowsets returned, only metadata on the parameters are stored. The closest that ANSI comes is allowing specification of the number of rowsets, a piece of the standard the SQL Server doesn't implement. So web services without schema seemed less useful than web services with.

But one of the up-and-coming data access technologies at Microsoft is Astoria, which is described as "a REST-ful set of interfaces to relational database (and other) data". It's causing a lot of excitement.

After the conference I mentioned this to an old friend, Jon Flanders, as he was wearing his "real programmers care about URIs" (or something close) teeshirt. Asked him to explain the zen of REST. To summarize, I got the impression it was all about the location specification and using HTTP verbs like GET/POST/DELETE to effect "state transfer" operations. And, although its not very common, metadata can be specified using WADL (Web Application Description Language). So specifying the location is perhaps like a "connection string" to the service? And to the data the application interacts with?

I'm still a bit skeptical of the (seeming) typelessness and contract-lessness of it all, the "HTTP is the only protocol"-ishness, and IIRC, I can find out the location of a traditional web service using WSDL's soap:address element's "location" attribute in the service portion. But at least I have an somewhat of an understanding of what all the buzz is about.

More on Astoria in future posts.

I'll be at TechEd and ITForum in Barcelona in November, doing sessions on SQL Server 2005 and SQL Server 2008. One of the TechEd sessions is called "To ODBC or to OLE DB?" and is a discussion of using ODBC and OLE DB is applications these days, whether one or the other is "better", if you should convert existing applications, etc. On that note...

Back at TechEd US, I'd spoken with Chris Lee, who's in charge of SQL Native Client. That's the OLE DB provider and ODBC driver that ship with SQL Server. SQL Server 2008 has a new version of the provider and driver, and when I'd asked if support for the new features (like the new DATE/TIME data types and table-valued parameters) Chris had not only replied "yes", but given me a demo of using table-valued parameters from ODBC to show off and post if I wanted to.

Here's the code. To build and run it:

1. Use the Visual C++ compiler that comes with Visual Studio 2005 or Visual Studio 2008 beta. I used VS2008 Beta 2.
2. Convert the project if needed. I ignored the warnings about 1 source file not being converted.
3. Make sure that sqlncli.h and sqlncli10.lib are available to the compiler. They're in C:\Program Files\Microsoft SQL Server\100\SDK\Includes and Lib, respectively.
4. Install SQLNCLI from the SQL Server 2008 CTP distibution on the client machine. Just run the SQLNCLI.msi in Servers\Setup.
5. Setup an ODBC System DSN from Control Panel/ODBC Administrator named TVPDemo. It must use the SQL Server Native Client 10.0 ODBC driver. Use the database of your choice, the demo will run DDL to create the database objects you need.
6. Compile the demo, set breakpoints and walk through the code

Cheers! See you in Barcelona!

 

ODBC TVP Sample code1.zip (1008.79 KB)

Let's talk about clients and SQL Server 2008. First, a little history...

Every new version of SQL Server seems to include a new feature that requires a change to the protocol that SQL Server uses to talk to clients, the TDS protocol. TDS stands for tabular data stream, and is a propriatary protocol used by both SQL Server and Sybase. Since version 4.21, SQL Server's version of TDS and Sybase's version have "grown apart". Because TDS includes version negotiation you can still use old libraries to talk to newer versions, albeit at reduced functionality.

Even if they don't update the TDS protocol, there will always be new features that require changes to the client stack at some level.

SQL Server comes with support for the following client stacks:
ODBC - Open Database Connectivity, related to the ANSI SQL standard, vendor neutral
OLE DB - a COM-based vendor neutral library
ADO.NET - a .NET-based vendor neutral library

ADO (classic) is an IDispatch-friendly library over OLE DB
LINQ to SQL and Entity Data Model use ADO.NET to talk to the database. Remember LINQ to SQL is actually LINQ to SQL Server at present.

In addition, Microsoft ships a JDBC (which does not stand for Java Database Connectivity, the docs say so ;-) driver separately. DBLIB also still works, but not shipped with SQL Server any more. It's functionality is limited to features that existed in SQL Server 6.5, IIRC. Other vendors (e.g. DataDirect) ship SQL Server clients too. DataDirect licenses the TDS stack.

Before SQL Server 2005, providers and drivers were part of MDAC. MDAC (Microsoft Data Access Components) was once a separate install, but now its part of the OS. TDS libraries were separate DLLs (e.g. DBMSSOCN.dll). This stopped in

SQL Server 2005, and there are two main distribution vehicles:
SQL Native Client (SQLNCLI): OLE DB, ODBC, and network libs
ADO.NET System.Data.dll: SqlClient provider and network libs

So how does this happen in SQL Server 2008? For OLE DB and ODBC, there is a new version of SQL Native Client, version 10. Since MDAC is now part of the operating system, if you install SQL Server 2008 over SQL Server 2005, you'll now see three ODBC drivers, on my CTP4 + VS 2008 beta 2 system:

SQL Server          version 2000.86.1830.00
SQL Native Client   version 2005.90.3042.00
SQL Server Native Client 10.0 version 2007.100.1049.14

For OLE DB there are three providers:
Microsoft OLE DB Provider for SQL Server
SQL Native Client
SQL Server Native Client 10.0

For ADO.NET, the situation is a bit more interesting. There's a revision to System.Data.dll to include the new functionality. The new version currently ships with Visual Studio 2008 Beta 2 and its version number currently is 2.0.50727.1378. It simply replaces the version that's installed on the system (and in the GAC) at the time. There's only one ".NET 2.0" version registered in the GAC, as "Version Number 2.0.0.0". Hmmm....seems very "MDAC-like with MDAC as part of the operating system" (that is, the operating system includes .NET 2.0).

If you want to use the new SQL Server 2008 functionality, like date, time, datetime2, dateoffset, table-valued parameters and large UDTs/UDAggs (and perhaps FILESTREAM support when it arrives) with OLE DB or ODBC, you need to use the new driver/provider. This means changing the connection string. AND RETESTING. This also applies to SSIS packages, Reporting Services reports and anywhere else you used OLE DB or ODBC.

If you only install the operating system and SQL Server 2008, you won't have the original (version 9.0) provider and driver. These will be provided as a separate download, for folks that have installed only 2008 but haven't restested.

BTW, once upon a time, there was talk of "multiple versions of SQL Native Client running side-by-side, using the fusion SxS loader". That didn't happen. The new providers are separate and are registered in the registry separately. No SxS magic needed.

I've been trying out LINQ for SQL and the Entity Data Model (EDM) latest betas with some "different" data sources. This post covers using SQLCE 3.5. The latest version of SQLCE 3.5 comes with Visual Studio 2008 Beta 2.

Using SQLCE 3.5 data sources doesn't work with the built-in LINQ for SQL designer in Visual Studio 2008 beta 2 or the CTP EDM designer released shortly after beta 2. You can add a Data Connection for a SQLCE 3.5 database. I used the Northwind.sdf sample database that was supplied. But dragging a SQLCE table on to the LINQ to SQL designer produces "The selected object(s) use an unsupported provider". The EDM Wizard doesn't even list the SQLCE ADO.NET data provider as a choice. So there's no LINQ to SQL or EDM designer support for SQLCE 3.5.

SQLCE works quite well with SQLmetal, although my attempt to generate a .dbml for the Northwind.sdf sample database produces the error message:

error DBML1055: The DeleteOnNull attribute of the Association element 'Order Det
ails_FK01' can only be true for singleton association members mapped to non-null
able foreign key columns.

Removing the "Order Details" table provides a 'hit it with a hammer' solution; I haven't figured out a nicer way to solve this problem yet. Then, the SQLMetal-generated classes worked fine.

Using EDMgen against the Northwind.sdf file produced the following error:

error 7001: Could not load System.Data.SqlServerCe.Entity.dll. Reinstall SQL Server Compact.
Could not load file or assembly 'System.Data.SqlServerCe.Entity, Version
=3.5.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' or one of its depend
encies. The system cannot find the file specified.

Hmmm...the version of SQLCE that came with Visual Studio 2008 B1 did come with this DLL. And, I had it working with EDM then. But the version in VS2008 B2 doesn't. As a last resort, I installed the SQLCE 3.5 version from VS 2008 B1, it's out on the web as a separate download. This changed the error message to:

error 7001: Method 'CreateDbCommandDefinition' in type 'System.Data.SqlServerCe.
SqlCeProviderServices' from assembly 'System.Data.SqlServerCe.Entity, Version=3.
5.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' does not have an implementation.

Sure enough, the required implementation of the ProviderServices class required for EDM support has changed between VS Beta1 and VS Beta2. I confirmed this by checking the methods in the SqlProviderServices in System.Data.Entity.dll, they did change recently, adding CreateDbCommandDefinition. Unless I can find an updated System.Data.SqlServerCe.Entity.dll somewhere, it doesn't look like SQLCE is usable with EDM at this point in beta-time.

That's all for now.

I've been trying out LINQ for SQL and the Entity Data Model (EDM) latest betas with some "non-traditional" data sources. Namely SQL Server 2008 (CTP4) and SQL Server Compact Edition 3.5. I'll cover using SQL Server 2008 CTP in this blog entry, and using SQLCE 3.5 in the next one.

Using SQL Server 2008 data sources doesn't work with the built-in designers in Visual Studio 2008 beta 2. Adding new "LINQ to SQL classes" produces an empty design surface onto which you can drag items from Server Explorer. Problem is, you can't make a Server Explorer Data Connection for a SQL Server 2008 database "only versions 2005 and before are currently supported. The EDM Wizard actually includes "Use this selection to connect to Microsoft SQL Server 2000 or 2005..." when you attempt to create a Data Connection. Not SQL Server 2008. So there's no LINQ to SQL or EDM designer support. Or Server Explorer support, for that matter.

But each of the mapping products comes with a command line tool as well. LINQ to SQL uses SQLMetal, while the EDM uses EDMgen.

Using the command line tools, I was able to generate the appropriate artifacts (and use them) for SQL Server 2008. I can even load the artifacts (.dbml or .edmx) files into the Visual Studio designer after they were generated from the command line and edit them. Very cool.

I then did an experiment to see if LINQ to SQL or EDM supports the new data types in SQL Server 2008, namely, the new date/time-related types and the hierarchyID, which is implemented as a SQLCLR UDT.

SQLmetal did not produce an error or warning on the date/time types, but it did create definitions for the new data types as "NVarChar" with appropriate lengths. Using these definitions in LINQ code prints database-correct values, albeit as strings. Haven't tried an insert/update/delete yet. I had less success with hierarchyID, this produces the message:

warning SQM1021: Unable to extract column 'c1' of Table 'dbo.h1' from SqlServer
because the column's DbType is a user-defined type (UDT).

SQLmetal then error'd out and didn't generate a dbml file at all.

Trying date/time-related data types with EDMgen produces errors indicating that the date, time, datetime2, datetimeoffset aren't supported. Neither are SQLCLR UDTs, hierarchyID is reported as date type <unknown>.

Well, these are still beta. That's all for now.

I've been looking at the mapping of the new Katmai date/time data types to .NET types, both from the point of view of SQL Server ADO.NET clients and of SQLCLR procedures/functions/triggers. There are a couple of things that "interesting", if I'm not misunderstanding something obvious.

There are no new System.Data.SqlTypes to correspond to the new SQL Server data types. The beta Visual Studio docs state this as a fact. So, unless SQLCLR supports the generic Nullable types as parameters in Katmai, you'll pass these in and out as strings. Or only use them in UDFs with RETURNS NULL ON NULL INPUT. No word yet (that I'm aware of) on Nullable type support in SQL Server 2008, BUT the new HierarchyID type (which is .NET based) supports INullable, like SqlTypes do. So I can pass in a NULL HierarchyID to SQLCLR, but not a NULL TIME parameter.

SQL Server's TIME data type is mapped to TimeSpan, which is a time interval. data type. This might confuse folks that know SQL Server doesn't support SQL-99's date/time interval data types. But the TimeSpan seems to have appropriate semantics as long as you stay away from the "Days" related properties and methods.

There is a new .NET data type for DATETIMEOFFSET, System.DateTimeOffset. That's good.

To distinguish between SQL Server 2000/2005/2008 usage/mappings, there is a (client-side) connection string parameter: "Type System Version". You have a choice of 2008, 2005, 2000, or "Latest". This was already used in SQL Server 2005 to distinguish between 2005 (XML data type exists) vs 2000 (XML data type is a long string). And other distinctions.

I wonder if LINQ for SQL and Entity Framework will support these... ;-)

I've been trying out table-valued parameters along with ADO.NET support in Orcas and came across an interesting dilemma. It centers around INSERTs using TVP against a table with an identity column. ADO.NET can use DataTable, IDataReader or IList<SqlDataRecord> to represent a TVP parameter. Let's say that I want a TVP and a procedure for insert that looks like this:

CREATE TYPE dbo.JobsTableWithIdentity AS TABLE (
  job_id smallint identity primary key,
  job_desc varchar(50),
  min_lvl tinyint,
  max_lvl tinyint
);

The "insert proc" would look like this:

CREATE PROCEDURE dbo.InsertJobsID (@tvp1 dbo.JobsTableWithIdentity readonly)
as
INSERT INTO dbo.Jobs (job_desc, min_lvl, max_lvl)
  SELECT job_desc, min_lvl, max_lvl from @tvp1;

Using this in ADO.NET (with either DataTable or IDataReader as a parameter) produces the error: "INSERT into an identity column not allowed on table variables. The data for table-valued parameter "@tvp1" doesn't conform to the table type of the parameter." But I didn't DO an insert into an identity column in the proc. And this works just fine in T-SQL:

DECLARE @t dbo.JobsTableWithIdentity;
INSERT @t VALUES('hi', 10, 10);
INSERT @t VALUES('hi2', 10, 10);
EXEC InsertJobsID @t;

So its ADO.NET "deciding" this is an error. The ADO.NET workaround (if I do want to start with a DataTable that contains the identity column, add rows to it, and call Update) is this:

CREATE TYPE dbo.JobsTableWithoutIdentity AS TABLE (
  -- job_id smallint identity primary key,
  job_desc varchar(50),
  min_lvl tinyint,
  max_lvl tinyint
);

-- sproc dbo.InsertJobsNoID changed accordingly

// and then, in ADO.NET code
// DataTable "t" contains a real jobs table, to which I've added rows

DataTable added = t.GetChanges(DataRowState.Added);
added.Columns.Remove("job_id");
da.InsertCommand.CommandText = "dbo.InsertJobsNoID";
da.InsertCommand.CommandType = CommandType.StoredProcedure;
da.InsertCommand.Parameters.AddWithValue("@tvp1", added);

But, should I have to do this? Or modify the T-SQL code, given I've not used the IDENTITY column? But, I will need this column to UPDATE (or MERGE) using the TVP. A dilemma...

The day after Orcas Beta1 appeared for public download reports appeared on the ADO.NET Technology Preview forum that the EDM Wizard (that is, "Add New Item/ADO.NET Entity Data Model) wasn't working. And, sure enough, if you choose "Generate From Database" and carefully choose which tables to use, you get a model with no entities. If you choose an empty model, you get "Specified argument out of range of valid values", and get the three mapping files, but no language file. What to do?

One way around this is to use the files generated from the March CTP wizard. Because the XML schemas for the mapping files haven't changed since March CTP, these work fine. In fact, all of the EDM code that worked with March CTP continues to work unchanged. That's what I used in the SQLCE and EDM blog entry posting.

If you can't keep an old March CTP around just to generate EDM Data Models, you can use the EDMGen.exe command line utility in the C:\WINDOWS\Microsoft.NET\Framework\v3.5.20404 directory. This utility appears to work and generates useable code. It also generates a richer model than the wizard did (with AssociationSets and Associations) but it takes a little more work to put together a project. I'll post a really simple one if there's interest. And EDMGen doesn't generate the same output as the wizard used to (for example, columns of type NTEXT are ignored in the model) and uses different default naming conventions. So at least there are a few workarounds for now.

Categories:
Data Access

One thing that I thought was particularly interesting in the Orcas Mar CTP was support for stored procedures. This support exists in LINQ to SQL and EDM ObjectServices; I thought I'd start with ObjectServices. There almost no documentation on this topic at this point, about half a page with an incomplete mapping schema example. That's to be expected at this point, though.

In Mar CTP, you can specify stored procs for insert/update/delete, but not for EDM queries yet. You need to change the SSDL (store schema definition language) and MDL (mapping definition language) files. With the correct mapping filechanges, AcceptChanges just calls the sprocs automatically. I started with a VS-generated set of mapping files and used XSD-based intellisence and error messages with line numbers to guide me along. Although you may have gotten rid of the XSD/XML errors or compile-time errors, your mapping files can fail to agree with one another. This occurs as a runtime error when you "new-up" an instance of your model class. Try-catch is your friend here.

SSDL changes were easier (just add the Function and Parameter elements for the proc) with one gotcha. My SSDL Namespace attribute was called "people"; if you specify "dbo.myproc" as the store name, EDM looks for [people].[dbo.myproc] at execution time. That's invalid in SQL Server. Specifying "myproc" as the store  name (or letting it default to the value of the "Name"attribute) causes the runtime to look for [people].[myproc]. The name of my sproc was actually [dbo].[myproc]; I had to change to SSDL Namespace attribute to "dbo" to make it work. Doing so made me change the MSL to match, but had no global ill effects.

The MSL was a little more interesting. I needed to put an "EntityTypeMapping" and "TableMappingFragment" in between my "EntitySetMapping" and "ScalarProperties" elements. Then I could add my ModificationFunctionMapping and Insert/Update/DeleteFunction elements as children of EntityTypeMapping. Because the XML schemas require elements be defined in order, I almost thought this feature was masked out in this CTP. Although I could see it in the schemas. It was there, of course, you just have to hit the intellisense in exactly the correct place in the document to show the element you're looking for.

OF COURSE, this is going to sound like complete gibberish without an example. I started with the easiest possible example: one table, few columns, primary key (its required) but not even an identity column. I also have two projects, one with the "vanilla" tool-generated CSDL/SSDL/MSL so I could refer back if I had problems. I did have problems. Coding three files of XML by hand, even with XSD-base intellisense, is right up there with [insert your least favorite chore here].

Three hours later, thanks to perseverance and SQLProfiler, I had a functioning prototype. It's posted here. I'm trying to decide which undoc'd part of this interesting set of mappings, models, and query lanaguages to try next. Or what tool to create to make this less of a chore. Enjoy!

ProcedureEDM.zip (75.96 KB)

Categories:
Data Access

My SSIS whitepaper, which was available in draft form on the SSIS wiki, is live on the Microsoft web site today. For anyone who is confused by all of the whitepaper announcements lately, here is a list of titles and direct links.

This one is about SSIS:

Connectivity and SQL Server 2005 Integration Services
http://download.microsoft.com/download/2/7/c/27cd7357-2649-4035-84af-e9c47df4329c/ConnectivitySSIS.doc

And these two (from last week) are about scaleout technologies and solutions. That is Service Broker, Scalable Shared Database, Peer-to-Peer Replication, Query Notifications, Distributed Partitioned Views, and Data Dependent Routing.

Planning, Implementing, and Administering Scaleout Solutions with SQL Server 2005
http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/ImplementingScaleOut.doc

Internals, Troubleshooting, and Best Practices for use of Scaleout Technologies in SQL Server 2005
http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/InternalsTroubleshootingScaleOut.doc

Enjoy!

 

The companion whitepaper to my "Planning, Implementing, and Administering Scaleout Solutions with SQL Server 2005" whitepaper (see yesterday's post) is available. This whitepaper is called "Internals, Troubleshooting, and Best Practices for use of Scaleout Technologies in SQL Server 2005", with as much about internals as I could cram in 50 pages. Again, I don't have the "main" link; the direct link is here. Enjoy.

I've been working on some whitepapers on scaleout technologies in SQL Server 2005. The first whitepaper is now available; I don't have the main link, but here is the direct link to the doc file on the Microsoft download site. The whitepaper is about the implementation steps when using scaleout technologies like Service Broker, Scalable Shared Database, Query Notiifcations, and Peer-to-Peer Replication and how to choose which technology or combination of technologies is the best fit.

It will be followed by a companion whitepaper about internals and troubleshooting of these same scaleout technologies. I'll let you know when that one's available. Hope you find them useful.

Quite a while ago, someone (I forget who) was soliciting opinions on whether or not LINQ and ADO.NET vNext EDM should be able to run "in process", in other words should be useable in SQLCLR procedures. Here's one perspective, and a question.

LINQ and Entity SQL are QUERY languages. Although they might be able to encapsulate resultset output from stored procedures, much of their value is in the query language itself. NOT having them be useable in SQLCLR would encourage folks that want to use these languages to move AWAY from stored procedures and put their data access query code in client or middle-tier programs, rather than in the database. I don't think DBAs and database developers really want to encourage that.

On the other hand, some of the value of stored procedures is being able to restrict table permissions and find errors at compile time. SQLCLR procedures don't have either of those attributes built-in, although you can restrict table permissions if you use EXECUTE AS OWNER. And coding with LINQ/Entity SQL can allow compile-time type metadata checking. Other benefits of stored procedures, such as allowing the code to be shared among database apps, and consolidation of SQL code in database objects, do apply as well to SQLCLR.

And its a given that, with either of these languages (and with SQLCLR in procedures in general) you'll be giving away procedure execution speed for (*arguably* better, meaning you'all can argue about it) code maintainability. Unless you only use LINQ/Entity SQL to process T-SQL stored procedure resultsets.

What do YOU think?

Categories:
Data Access | SQLCLR

I did a talk yesterday at TechEd Hong Kong on ADO.NET vNext. Most of them just wanted to know what ADO.NET vNext was about. Short and sweet. Here goes...

There are two main new features:
1. Entity data model (EDM) - a mapping of objects over relational database to bring data access to a higher level of abstraction. No longer do you need to code to the DBMS schema, only to the entity schema. Think of it as a new data source.
2. Entity SQL - a new query language, SQL plus extensions to query against the entity data model (EDM). A bit richer than SQL, but SQL-like.

And some integration pieces:
1. An ADO.NET provider over the EDM (the MapProvider). Uses ESQL queries.
2. An ObjectServices layer over EDM. Reminiscent of ObjectSpaces (at least to me). Uses Query<T> and an ObjectContext.
3. An integration piece that allows LINQ over the EDM. "LINQ to Entities".

LINQ for Entities joins:
  LINQ to Objects
  LINQ to SQL (was: DLINQ)
  LINQ to XML (was: XLINQ)
  LINQ to DataSets
  PLINQ (parallel LINQ)

That's all folks.

Categories:
Data Access

There was a newsgroup discussion recently about the intricacies of Multiple Active Resultsets in SQL Server 2005, and one of the questions that came up was "Who actually needs/uses MARS anyway"? I've been taking a much closer look at ADO.NET vNext lately, one reason being that I'm doing a talk on it at TechEd Hong Kong the week after next. Guess what? Both the supplied ObjectServices and LINQ for Entities samples have "multipleactiveresultsets=true" in their connection strings. Why? Without taking MARS out and seeing where the samples fail (I'll get around to it...), I'd bet they use both the "multiple resultsets on the same connection" and "issue an UPDATE statement while reading a rowset on the same connection" patterns. That's what MARS is supposed to address (the very precise ODBC error message for this is "Statement in Use").

BTW, MARS has some interesting effects on transactions and session settings, too. I'll be doing a talk on this at TechEd Europe in Barcelona in Novemeber. Don't forget to stop by and say hi if you're at either of these excellent events.

Categories:
Data Access

I came across a new article, "Data Access Tracing In SQL Server 2005" at MSDN and I'm listed as one of the authors. That's interesting, as I didn't remember writing it :-) but upon closer inspection, they give me credit because they leveraged my original article "Tracing Data Access" from the ADO.NET 2.0 series. Thanks Jennifer and Acey, now I remember you asking if that was OK. I did notice they still used an old bio for me though, just ignore that part.

Anyhow, it's an excellent paper. They added one-third more pages and even included MOF files for tracing JDBC/XA, BCP, SQL Browser, and more. Very cool. Be sure to give it a read and practice your tracing BEFORE you need to use it. 

Categories:
Data Access

After having a go at the information on the Entity Framework and other ADO.NET 3.0 papers that were posted about a month ago, I was trying to come up with a good example of relational tables or views that are designed with entities in mind. Or perhaps from entities. Concepts like inheritence, for example. Unlike my good friend Ted Neward, I'm not about to make a blanket statement like "entities good" or "entities bad" (I believe Ted has a Political Science degree of some sort in his background), just looking for examples of an entity based design "in nature".

Then, someone sent me, out of the blue, a request for a diagram of the new system metadata views in SQL Server 2005. There's a PDF of a E/R diagram for them out on the web, and the more I looked at it, the more "interesting" the design looked. I happen to be doing some work with the system views for an upcoming paper, and came across the word "inherited" in SQL Server Books online description of them. Here's some excerpts:

-For a list of columns that this view inherits, see sys.objects (Transact-SQL).

-<inherited columns> - inherits from sys.xml_schema_components

-Some catalog views inherit rows from other catalog views. For example, the sys.tables catalog view inherits from the sys.objects catalog view. The sys.objects catalog view is referred to as the base view, and the sys.tables view is called the derived view. [base view and derived view? really?]  

-This can be summarized as follows:

-The base view contains a subset of columns and a superset of rows. The derived view contains a superset of columns and a subset of rows.

It's almost as though some of these follow one of the traditional patterns for mapping object/entity models to relational. But I'll leave this to you to decide. Entities are never a subject without controversy.

I wonder how I'd query these in EntitySQL. Or DLINQ.

Categories:
Data Access

I've been looking at the DLINQ syntax a bit more (I'm participating in a BOF at TechEd next week), and its always struck me how, at its most basic, its reminiscent of XQuery FWOR. Yes I did leave out the L (Let) on purpose. I guess its the relative placement of the SELECT part of the query:

FWOR:

for $x in /customers/customer/address
where $x/city = "Portland"
order by $x/zipcode
return $x

DLINQ:

from a in db.Address
where a.City == "Portland"
orderby a.Zipcode
select a

I've done the FLWOR/SQL comparison quite a bit. Just replace "for" by "from" and "return" by "select". The ordering of the keywords in DLINQ just makes it a bit more obvious. Or maybe its the "in" keyword in the from/for clause. The superficial similarity kind of ends there, however.

A frequently asked question since SQL Server 2005 shipped is "how do I expose my custom data source to SSIS/SQLRS/your app here". If you have a data server that you must connect to, and it accepts commands and returns resultsets, its a nature fit for an ADO.NET data provider-based mechanism. Having worked with OLE DB provider writers for a few years, I'm always surprised that its quite straightforward to write an ADO.NET data provider. As an example, the ATL OLE DB Provider Wizard uses a skeleton starter provider implementation in atldb.h that weighs in at over 12000 lines of code. A similar skeleton for a ADO.NET data provider is a hundred or so lines.

I originally built an ADO.NET data provider for version 1.0 that simply accepted a command that consisted of a file path and returned file information as a rowset. I'd upgraded the provider for ADO.NET 2.0 without adding much, just restructuring to use base classes instead of interfaces. Today I added enough of the "additional classes" to make it useable in SSIS. Haven't tried much else yet. I had to add two more skeleton classes: a ProviderFactory and ConnectionStringBuilder.

The provider factory connects is the glue between the machine.config info that is used as a "provider enumerator" and the provider itself. You need to:
1. Either register the provider in the GAC or make it available to each client.
2. Add "provider configuration" entry for it, either in machine.config or in each client's application configuration file. The console app has a specific entry in app.config, for SSIS its probably easier to put it in the GAC.

So that SSIS would accept the connection string (which the provider really doesn't use), I added a dummy property, DataSource. SSIS doesn't like empty connection strings. The provider, test console app, and test SSIS project (data connection only) is attached.

Although the provider doesn't do much (many of the methods are simply stubs) it may be a useful "starter" providing the scaffolding. Let me know if this is useful and I might do some work to expand it further. (Apr 17: Updated the provider based on feedback in this blog thread)  Enjoy!

MDirProv2_For_NET2.zip (123.85 KB)

Categories:
Data Access

I think I got my December update too early. There's an update of the SQL Server 2005 samples on the downloads website. I have (thought I had) the December update, this one actually says "Date published 1/4/2006" and its 20mb instead of the old once (just under 16mb).

The site says: Since its original release, new samples have been added for the following technologies: CLR, SMO, Integration Services, Replication, and Reporting Services. See the SQLServerDatabasesAndSamplesOverview.htm file for descriptions of the new and original samples.

Well, the samples folks (thanks Bonnie and everyone) have really outdone themselves this time. Thanks!

One of the samples that particularly intruiged me was a TVF that returned a resultset from Oracle using System.Data.OracleClient. I was surprised because, although System.Data.OracleClient is on the "approved assemblies" list, the DDL cataloged the assembly as UNSAFE. Hmmm...thought I'd only need EXTERNAL_ACCESS.

So I fired up my combined VPC of SQL Server 2005/Oracle 10g and decided to see for myself. When cataloged as EXTERNAL_ACCESS I got:

Msg 10314, Level 16, State 11, Line 1
An error occurred in the Microsoft .NET Framework while trying to load assembly id 65541. The server may be running out of resources, or the assembly may not be trusted with PERMISSION_SET = EXTERNAL_ACCESS or UNSAFE. Run the query again, or check documentation to see how to solve the assembly trust issues. For more information about this error:
System.IO.FileLoadException: Could not load file or assembly 'OracleTVF, Version=1.0.0.0, Culture=neutral, PublicKeyToken=65071e111a9fd000' or one of its dependencies. Failed to grant minimum permission requests. (Exception from HRESULT: 0x80131417) ---> System.Security.Policy.PolicyException: Required permissions cannot be acquired.

When cataloged as UNSAFE, it worked fine. Well, after I changed it to access the SCOTT.DEPT table. I don't have the "titles" table the sample referred to.

The reason for this (UNSAFE) is most likely that System.Data.OracleClient (which is approved) needs to load Oracle's client libraries. These are written in unmanged code, which makes them unsafe to load, even if System.Data.OracleClient isn't unsafe. BTW, I didn't try ODP.NET. This would have required cataloging my DLL as UNSAFE because Oracle.OracleClient.dll provider isn't on the approved list. But I'm running UNSAFE already. Oh.

Categories:
Data Access | SQLCLR

Ahh...enough shameless self-promotion (for a while), let's talk transactions.

I worked in the past with folks who like to compose transactions. SQL Server supports nesting of transactions and named savepoints but not autonomous transactions. So

CREATE PROCEDURE X
AS
BEGIN TRAN
-- work here
COMMIT

calling it standalone means the work is in a transaction. Calling it from procedure Y:

CREATE PROCEDURE Y
AS
BEGIN TRAN
-- other work here
EXECUTE X
COMMIT

doesn't start an autonomous transaction, the BEGIN TRAN in X merely ups @@TRANCOUNT by 1. Interesting things happen when you roll back X while its being called by Y.

I'd like to emulate this behavior in SQLCLR, i.e. have a procedure that acts like X, and can be used standalone or composed. I can do something akin to T-SQL (and get the interesting rollback behavior with a slightly different error number) using the BeginTransaction method on the context SqlConnection. I'd heard awhile ago that System.Transactions used inside of SQLCLR would "always do the right thing". AND because of the way promotable transactions work, it would compose a context SqlConnection in the SAME local transaction. A la T-SQL or SqlConnection.BeginTransaction().

It doesn't do this. If I have a SQLCLR proc that looks like this (condensed version):

public static void X {
using (TransactionScope ts = new TransactionScope())
using (SqlConnection conn = new SqlConnection("Context connection=true"))
{
  conn.Open();
  ts.Complete();
}
}

If SQLCLR X is used standalone, all well and good, local transaction. If SQLCLR X is called from procedure Y (above) then SqlConnection.Open() starts a *distributed* transaction. Apparently it HAS to be this way, at least for now, because of how TransactionScope works.

If you WANT a distributed transaction composed with your outer transaction (your SqlConnection is calling to another instance for example), USE TransactionScope, if you DON'T want one, use SqlConnection.BeginTransaction. It won't act any different from T-SQL (except you do get a different error number) if you roll back inside an inner transaction. But you get a nesting *local* transaction with BeginTransaction.

BTW just is case you wondered if SQLCLR X proc could do this:

using (TransactionScope ts = new TransactionScope(TransactionScopeOption.RequiresNew))

and go for that elusive "autonomous transaction", don't do it. You'll get a message saying "no autonomous transaction. Because SQL Server doesn't support autonomous on a single connection. SQLCLR or not. There is the two connection case, but that's a story for another day.

This was as compressed a blog entry as I could make it, but was still quite long. Any questions?

Categories:
Data Access | SQLCLR

I answered a question on the beta newsgroups last week about OSD and RSD (remember them?) by saying that those features had been pushed out into the future. The person then asked if the material in chapter 14 of our first look book had any practical value. I must admit that you can't cut and paste the examples and run them in the upcoming release. I don't know anything concrete about feature futures yet, so here's the long answer... from last week.

The features in chapter 13 and 14 were postponed, so they may appear in
future implementations. I used to tell folks those chapters are "very
futuristic". ;-) They may not appear line-for-line, class-for-class, the way
I coded them. But let's see where they are today without guessing at future.

ObjectSpaces was moved to the WinFS group. AFAIK (blogs and official notice)
it may not appear in WinFS as exactly the same implementation. In the last
WinFS public beta, OPath was/is the WinFS query language. Can't say what the
status of it all is today. There is no OSD or RSD in .NET 2.0.

Object-relational mapping as a concept has been around since there were
relational databases and object-oriented programming. Whether
codification/generalization of it into a product is a good idea is a subject
of *endless* debate, which I *don't* want to start again here. Some of the
products, past and present, have suffered from performance issues. But the
fact remains that if you are using relational data and object classes on the
client to consume/format/present that data, you are likely doing
object-relational transformation (even if it is very shallow) to some
extent.

System.Xml.Serialization is the preferred codification of XML-object mapping
in the .NET framework today. There is also the implementation in
System.Remoting.

The concept and implementation of a "query-intermediate language" mentioned
in chapter 13 was used in .NET 2.0 in XmlCompiledTransform class.
Client-side XQuery in .NET 2.0 was postponed because the spec is not
finished yet. Many implementations of XML consumers use a single library to
permit XPath/XSLT/XQuery in the same exe, probably don't use the
intermediate language concept, but use something coneptually similar. There
is no XSD/RSD mapping in .NET 2.0.

XML-relational mapping is in SQL Server 2005 in the guise of:
1. SELECT ... FOR XML
2. OpenXml and xml.nodes
3. SQLXML4 (which is part of SQL Server 2005)
4. SQLXML3 (which is still supported)
5. XML Web Services

There is an ISO/ANSI spec SQL2003 part 14, that codifies some/most/all of
these mapping concepts. In addition to SQL Server's implementation (in 2000
and 2005) other databases have similar but different ways of approaching
this problem.

So the class names, product/feature names, and implementation may change,
but the concepts and data models remain the same. As does the use of
multiple data models in the same programming project.

Hope this helps.

Categories:
Data Access

SQL Server 2005 will, by default on Windows Server 2003 systems, enforce password policies for SQL Server logins as well as Windows logins. Nice feature, but this means that your SQL Server login password can expire. So how do you change it? Well certainly the DBA can change it as (s)he always has, but you'd hate to bother your DBA every 42 days. Never mind what the DBA would think of that... And the user interface programs, SSMS and SQLCMD don't yet provide that feature. Neither does Visual Studio 2005 Server Explorer.

The functionality does exist in the supported database APIs. That is ADO.NET, OLE DB, and ODBC. With ADO.NET you have to be using 2.0, with OLE DB and ODBC the new SNAC providers are required. In ODBC, there is a new connection option SQL_COPT_SS_OLDPWD. In OLE DB there's an Old Password connection string parameter. In ADO.NET 2.0 SqlClient it's a static method on SqlConnection called (amazingly enough) ChangePassword. It takes to strings as input and here's how it works.

You change your connection code to use a loop, like while conn.ConnectionState == ConnectionState.Closed. Loop as many times as you like, most folks will probably loop twice. Bracket your calls to Open with a try-catch block. In the catch block, look for the following error codes:

18487 - Password Expired
18488 - Must change password on first login

If you get one of these call ChangePassword. You'd think that the parameters are "old password, new password". They are not. The first parameter must have enough information to connect to the server, including at minimum server name, your userid and your old password. The second parameter is just your new password. This changes your password, now change your connection string and Open again.

There are a couple of repercussions/refinements to this:

1. You obviously shouldn't even think about keeping password in the program, if you ever did this before. Check out my MSDN article for the built-in place to keep connection strings now.
2. This can only be used to change passwords on a SQL Server 2005 server. And only SQL Login passwords, naturally. It requires that the server and client be using the new network libraries.
3. There is no standard "New Password", "Old Password" GUI box. You need to make one yourself. Standard cavaets for passwords in GUIs apply.
4. There is no way to currently tell with standard SQL Server calls, how soon your password will expire. Think "Your password will expire in N days" message we all know and love.

So that's it. I have a "rough and ready" code example (that I wrote on a bet with Larry Chestnut at an Ascend gig a while ago) I'll probably clean up and post on my website eventually but this gives you the basic idea. And BTW, this isn't meant to push SQL Logins on anyone. If you can use only Windows logins in SQL Server (any release) and forgo SQL Logins entirely, PLEASE DO.

Realized that I haven't blogged in a while. I been ...uh...working on stuff and traveling a lot lately. Big surprise, right? This week I'll be on vacation. Traveling. Now I understand what the term “busman's holiday“ means.

Last weekend I had dinner with Rob Steward of DataDirect Technologies at his house. The ribs were great (thanks Rob) and while they were cooking I asked him if there would be support of rich schema metadata, base classes, and the neat stuff I wrote about I my ADO.NET 2.0 series of articles in DataDirect's line of ADO.NET data providers. He said that there would be, of course. They want to show up in that neat new Visual Studio 2005 connection dialog (that lists the ADO.NET data providers rather than OLE DB providers) too. And make a good showing in Server Explorer.

Then I asked about whether their classes derived from System.Data.ProviderBase's. This isn't the base classes in System.Data.Common (they support those), but the base classes to assist provider writers. Surprise! Those classes are now protected sealed in .NET 2.0 beta2. Oh. Guess not, then.

But they will support all the same things you'd have gotten from these classes for free: ConnectionStringBuilder, Connection Pooling (they always did support this), etc. So you'll be good to go.

Also, in looking through DbProviderFactory and friends, the SupportedClasses property is removed. That was the one that told you which classes (like Connection, Command, etc) the provider actually implemented. Gone from machine.config entries too. Wonder why...

Categories:
Data Access

There's been a lot of interest in the new System.Transactions.dll assembly lately. Especially from users of SQL Server 2005. This is based around two functionality points.

The first point of interest is that you will use System.Transactions to use transactions in SQLCLR procedural code in SQL Server 2005. In the beta 2 implementation of the SqlServer data provider, transactional coding had to use two different code paths based on whether a transaction was already started before your procedure was called. There was a section in the "First Look at SQL Server 2005 for Developers" book on this, transaction handling seemed rather complex. Using System.Transactions will make this simpler and more elegant.

The second point is that SQL Server 2005 has a feature known as promotable transactions. When you use a single connection to SQL Server 2005 and a System.Transactions TransactionScope, a local transaction is started. If SQL Server 2000 is used, or more than one database connection is used, the same TransactionScope starts a distributed transaction. Which is a few times slower than a local transaction.

After starting a local transaction with SQL Server 2005, another connection is opened in the same TransactionScope, the original local transaction is promoted to a distributed transaction, because now a distributed transaction is needed. Hence the name promotable transactions.

It is important to remember, however, that the transaction is still scoped to the *connection*. The usual cool TransactionScope demo shows a local transaction on SQL Server 2005 instance #1 being promoted to distributed when you open a second connection to a *different* database instance. It will be also be promoted if you open a second SqlConnection to *the same instance*.  Each connection has a different transaction space (lock space), even if you are using promotable transactions. Therefore, you need a distributed transaction with two connections to the same database. Even if the connection string and other environment is exactly the same.

To "knit" two lock spaces togther you'd need something fairly drastic, a la sp_getbindtoken and sp_bindsession. And they're not doing that.

The reason why this is puzzling (I was recently reminded by a student from a recent class) is that, in MTS/COM+ you could flow transactions by composing method calls, like this:

void DoTransfer(int accta, int acctb, double amt)
{
  DoWithdrawal(accta, amt);
  DoDeposit(acctb, amt);
}

Both DoWithdrawal and DoDeposit would open a connection in MTS/COM+. System.Transactions has some COM+-like transaction composition properties. But if both DoWithdrawal and DoDeposit each open a separate SqlConnection with enlist=true in the connection string (its the default), promotable transactions won't help, they'll be running a *distributed* transaction. If you really want promotable to mean: multiple operations, one database == local transaction, you'll have to pass the SqlConnection object around too. This makes things complex, because SqlConnections aren't "agile". They don't pass from process to process, for example.

Transaction is scoped to the connection (modulo sp_bindsession).

I had an interesting flashback today. Someone wrote with a reference to a mailing list posting I'd written in 2000, about IRowsetFastLoad. They wanted code, because nowadays attachments to public postings have become a thing of the past. Looked in my OLE DB code, from back when I was teaching "Essential OLE DB", and ... voila. (I know there's a code sample in SQL BOL, and told the letter writer about it, but finding the original code was way more fun. Written in ATL and all, too.

The poster wanted to know what I'd recommend instead, if this didn't work out (his inserts are too slow). Choices that came to mind were:

1. BULK INSERT SQL statement
2. BCP API in C - it's ODBC-based IIRC
3. SqlBulkCopy class from ADO.NET 2.0

There's something for choices #1 and #3 in the new SQL Server 2005/ADO.NET 2.0 stuff. Of course SqlBulkCopy (misnamed, it's it actually more like IRowsetFastLoad) is new in ADO.NET 2.0. And BULK INSERT has been enhanaced/rewritten as a "rowset provider", an OLE DB-like provider whose name is "BULK". The enhancements are more functional than speed-based. I haven't done speed tests (it's too early) but someone posted on newsgroups in Beta 1 that "BULK" was actually slower than the old BULK INSERT. It's supposed to be as fast by RTM. The new BULK supported XML inserts also, include multiple XML documents separated by "row separator characters". Think CSV files of XML data type. Whew...

This caused me to wonder if the new SQL Native Client OLE DB provider (SQLNCLI) supports IRowsetFastLoad too. Hmmm....says in BOL that there is not only IRowsetFastLoad but an IBCPSession interface on the new SQLNCLI provider. Maybe something for choice #2 also.

Categories:
Data Access

When you retreive a rowset that contains an XML data type column or use an XML data type column as parameter input, you have two choices. You can use the vanilla .NET type, System.String, or a special types from the System.Data.SqlTypes namespace called SqlXml. ADO.NET 2.0 handles conversion when you use strings (as long as they're valid XML, naturally). SqlXml has a method that produces an XmlReader and a constructor that takes XmlReader as input. Richer datatype.

Recently I wrote code that used an XML data type as an output parameter from a stored procedure. I specified the output parameter as SqlDbType.Xml and went to cast the result to the rich SqlXml type...invalid cast. Output parameters appear to always be returned as System.String or System.Data.SqlTypes.SqlString, NOT SqlXml. After the call, the SqlDbType is still reported as SqlDbType.Xml. *Input* parameters work fine as SqlXml, parameters with Direction InputOutput or Output are SqlString. Hmmm... Let's see if the upcoming Community Technology Preview, due out soon, helps this.

Categories:
Data Access | SQL Server XML

I've known for a couple of weeks now that XQuery and the new XML-SQL client mapping have been dropped from System.Xml in .NET 2.0. The XQuery implementation over the XML data type in SQL Server 2005 is NOT going away, of course. Just the client-side bits. Folks are encouraged to keep using XSLT 1.0 and XPath 1.0 on the client for a while. Hmmm...

After reading (more than twice) through all of the reasons for this decision, the one that makes the most sense to me is the product schedules. The reason that doesn't quite ring true to me is "folks are happy with the XML DOM". I can't help thinking that folks were quite happy doing everything through cursors in the early days of SQL because they didn't quite grok where the power of the relational model was yet. SQL cursors were "comfortable", as the XMLDOM is to XML programmers today. The schedule argument is more reasonable. If XQuery spec won't be "done" until next year, there is hesitation about producing an implementation based on an "in progress" spec that could change at the last minute. If you remember XSL Patterns and XDR schemas in Microsoft APIs you'll understand why. But...

This week at Win-Dev folks "across the hall" were lecturing in earnest about the WSE 2.0 offering. The WSE (web service extension) offerings are supported add-ons, outside of the "core" .NET APIs, and mostly implementations of various WS-* specs-in-progress. They're not guarenteed to be compatible with future offerings or with Indigo, the next generation WS-*++ implementation. In fact, some of the specs that were supported in WSE 1.0 have already completely vanished from the WS-* landscape. There's *way* more churn in this space.

Got me thinking...why not a similar model for XQuery? That is, a supported add-on implementation of the current specification with namespaces that begin with Microsoft.* rather than System.Xml.*. Guarenteed to change, at least subtly, but existing to get folks used to using it. The alternatives, that is, using Saxon.NET or working on a community implementation of XSLT 2.0/XQuery 1.0/XPath 2.0 are already happening. How about it...Microsoft.Xml.Query/Microsoft.Xml.Mapping anyone?

The new SNAC (SQL Native Client) OLE DB provider handles naming of parameters a tad differently than SQLOLEDB. Let's say that I have the parameterized query (using the ODBC-style question mark parameter markers):

SELECT * FROM authors WHERE au_id LIKE ?

And I call OLE DB's ICommandWithParameters::SetParameterInfo to set some parameter information. SetParameterInfo takes a parameter name as one of the parameters. In the case of the question mark parameter, what should I put for the name. Using the SNAC OLE DB provider and using a dummy name (say..."Dummy") produces the error:

DB_E_BADPARAMETERNAME - "Parameter name is unrecognized"

In SQLOLEDB the error is not produced. Difference between providers.

Looking at the OLE DB specification, although it's different behavior, it appears that the SNAC provider is "more correct". Here's the first part of the description of DB_E_BADPARAMETERNAME.

DB_E_BADPARAMETERNAME
In an element of rgParamBindInfo, the pwszName element specified an invalid parameter name. The provider does not check whether the name was correct for the specified parameter, just whether it was a valid parameter name.

SQL Server supports named parameters and positional parameters. Positional parameters have no name. Using a null pointer as the parameter name works, both in SNAC and SQLOLEDB. If you use named parameters in a SQL statement the parameters must start with an at-sign (@) like this:

SELECT * FROM authors WHERE au_id LIKE @auid

So valid values for SQL Server parameters are "null pointer" (no name) and names that start with at-sign. Names that do not start with at-sign SHOULD produce an error. Perhaps SQLOLEDB's forgiving behavior was to accomodate conversion from the OLE DB provider for ODBC (MSDASQL), where anything could be used as a parameter name. Nonetheless, it appears that SNAC provider's behavior is "more correct" they are checking that parameter names begin with at-sign.

Interestingly using either null pointer or @anyname as a parameter name works (ie SetParameterInfo does not produce an error) when you use SNAC. Even if its the wrong name. Note the spec says "The provider does not check whether the name was correct for the specified parameter, just whether it was a valid parameter name." Correct behavior here too. Although if you are using named parameter and use the *wrong* name, SetParameterInfo will work but ICommand::Execute will fail.

So, there are a few provider differences that I've found so far. If you're writing a new program or library, you should start with SNAC provider. To use the new features in SQL Server 2005 like MARS and Query Notifications, you MUST use SNAC (or SqlClient and ADO.NET of course). But be sure to test existing programs before making the switch.

Categories:
Data Access

After having been immersed in .NET since the alpha version, I'd lost touch with how many products use OLE DB and ODBC. After all, when you talk about SQL Server and SQL 92/99/2003 standards, the standard API for a SQL database *is* still SQL CLI (call-language interface). And the canonical implementation of SQL-CLI is ODBC. When I spoke recently about SQL Server 2005 Web Services as a way to acheive interop between unlike client architectures for SQL Server, someone reminded me that ODBC drivers for "unlike architectures" have existed for years. Uh, yes, of...course.

Parts of SQL Server 2005 like linked servers, replication, reporting services, and DTS (renamed SQL Server Integration Services (SSIS) last week) use OLE DB and ODBC as well. Although SSIS supports ADO.NET too.

SQL Server 2005 ships with a new communication layer known as SNI and a new OLE DB provider and new ODBC driver that use it. These are bundled together in a part of the product that runs on both server and client called SQL Native Client (or SNAC for short). Folks that have existing non-.NET applications are *really* interested in:

1. How the SNAC provider/driver supports the new data types
2. Any subtle differences between the new provider/driver and the current ones

Two weeks ago, someone asked about support of the new "MAX" data types VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX) in both SQLOLEDB/SQL Server ODBC and SNAC OLE DB/ODBC. For example, does VARCHAR(MAX) resemble VARCHAR or TEXT? So I took out my handy-dandy OLE DB Rowset Viewer and had a look. I looked at column metadata (DBSCHEMA_COLUMNS) and the  DBCOLUMNINFO structure (after a SELECT) of the following table:

CREATE TABLE testclob (
  avar VARCHAR(8000),
  amax VARCHAR(MAX),
  atext TEXT)

Answer is: VARCHAR(MAX) looks almost exactly like a TEXT data type (as far as the API is concerned of course; they're way different as far as TSQL is concerned), both in SQLOLEDB and the SNAC OLE DB provider. And both providers yield almost the same metadata. The big caveat there is the "almost". Here's the two almosts:

1. Both VARCHAR(MAX) and TEXT have the same capabilities/properties with respect ISLONG and MAYDEFER (ie, supports deferred fetch). But, in the Schema Rowset the character_max_length and octet_length is 2147483647 (2gb) for TEXT and 0 (that zero) for VARCHAR(MAX). Be careful using this to size to allocate buffers.

2. SNAC's DBCOLUMNINFO listed (maximum) ColumnSize as 4294967295 (4 gb?) rather than 2 gb as SQLOLEDB provider did.

I'd thought that SNAC's behavior was going to be identical to SQLOLEDB to ease migration, and it appears as though, except for the one metadata anomaly this is true. And I didn't really expect SNAC and SQLOLEDB to be exactly the same regarding *new* features; after all, that's the point of using the SNAC provider/driver, new feature support. There may be a few other subtle differences, more on this later.

With the introduction of the XML data type (using an XML data model) comes there's a choice of pulling XML out of the database as a stream or column. Using an XML data type as a column type will get you a column of type XML. That's SqlXml to you in .NET. SELECT ... FOR XML gets you a stream. In SQL Server 2005 there a new keyword TYPE on SELECT ... FOR XML that makes the stream into an XML type. Some/most tools make the stream look like a column with up to a size of 4k per "row". SSMS in SQL Server 2005 does better with it than Query Analyzer.

There's different APIs for these too. In ADO.NET there is ExecuteXmlReader vs ExecuteReader/ExecuteScalar. In OLE DB you use a COM IStream implementation, there was a Stream class added to ADO classic. Although you can use the either API for stream or column, it appears to be a stretch to use “other one“ in both cases. As an example, using a 'SELECT xmlcol from xmltab' with ExecuteXmlReader only gets you the XML in the first row. As you'd expect, if you remember ExecuteXmlReader returns a singleton like ExecuteRow in ADO.NET 2.0 does. At this point, I'm recommending using the "right" API for the representation. And you can always translate to a string. And, using the client XML APIs, any of the supported XML data models (such as document or Infoset).

When you load XML into an ADO.NET DataSet using it's ReadXml() method, ADO.NET uses a fixed set of rules to determine how to infer a relational structure for the XML. These rules are doc'd in the .NET framework docs. One of the most frustrating rules was that you couldn't have the same attribute name or subelement name in more than one different element. Doing so cause the ReadXml() method to produce the well-known error “The same table (Table name) cannot be the child table in two nested relations.”

The algorithm is improved so this type of XML can be loaded in .NET 2.0. It can load my test semistructured (also known as “tags and text”) document, as well as more mundane XML documents that caused this error in previous versions. The document:

<book>
<chapter>
<title>Testing your <noun>typewriter</noun></title>
<p>The quick brown <noun>fox</noun> jumps over
the lazy <noun>dog</noun></p>
</chapter>
</book> 

Produces 4 tables, named book, chapter, p, and noun. It even serializes out using WriteXml() the same way it looked when it was loaded, adding only an extra unused namespace declaration for xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance".

This will be a help for developers using this method of XML to relational mapping.

Categories:
Data Access

As Mark Fussell will tell you, one of the coolest things in System.Xml 2.0 is the static creator methods on XmlReader and XmlWriter. It made his top 10 list, albeit at number 10.

I've been looking into how the client side handles SQL Server 2005's XML data type and was bemoaning the fact that System.Data.SqlTypes.SqlXml had a CreateReader method that probably didn't set what I wanted set - ConformanceLevel.Fragment. Sounded like just the ticket for SQL Server 2005. Checked with Reflector, and lo and behold, they *do* use XmlReader.Create with ConformanceLevel.Fragment. Very cool, and the information was helpful to Kent Tegels on the beta newsgroup as well, just this morning. My research timing karma seems pretty good lately.

Now, I want an overload that looks like SqlXml.CreateReader(XmlReaderSettings). And something that handles sequences of atomic types. Maybe in .NET 2.0 beta2. Hey Mark... maybe I'll hit him up for it at WinDev.

I’m a person who always likes to know where things live in the OS, database, or whatever product I’m dealing with. Being able to point to a specific DLL or configuration file or registry entry gives me something tangible to hang my hat on, rather than think that things happen “by magic”.

 

So I was intrigued by the apparent disappearance of the network libraries in SQL Server 2005. Start with these points:

1. SQL Server 2005 B2 does not ship a new version of MDAC
2. SQL Server 2005 B2 does ship with SQL Native Client (codename was SNAC) with new OLE DB provider and ODBC driver. It also appears to cover network connectivity.  In the new client config utility, all (four) supported protocols list netlibs as “SQLNCLI.dll”. That's SQL Native Client.
3. System.Data.SqlClient does not use MDAC any more
4. System.Data.SqlClient did use DBNETLIB.dll to talk to SQL Server in the past

So does .NET SqlClient use SQL Native Client? Dan, Niels, and I did some experiments and it doesn’t appear that SqlClient calls SNAC at all. We even installed .NET 2.0 on a machine without SNAC and did tests that include MARS and async (these require new protocol features). Worked fine, no SQLNCLI or DBNETLIB loaded or in sight. So how’d they do that?

 

Looking in System.Data.dll, it appears that there are some calls that start with the letters SNI. There are no calls to DBNETLIB (as there was in System.Data.dll in .NET 1.1). A quick search of Books Online yielded only references to SNI associated with Service Broker protocol (as in “the SNI connection string for Service Broker...”).

 

So it appears that both System.Data.dll and SQLNCLI.dll implement SNI calls directly rather than calling a specific library. And that the SNI model is used by more than just client-server protocols. It is possibly used for everything that is covered by CREATE ENDPOINT. System.Data.SqlClient doesn’t use MDAC or SNAC, but both System.Data.SqlClient and SNAC are compatible with previous versions of SQL Server, because I could use the OLE DB Native Client provider with SQL Server 2000 just fine. Hmmm.

 

Stay tuned...

 

In recent posts I'd mentioned mapping a few times, it's time to get back and explore it in earnest. There appears to be three major data models in use by programmers today:

  • Relational - that's where the majority of the corporate data is stored. SQL and its product-specific dialects is the main programming language that follows this model.
  • Objects - really defined as state and behavior, the state being the data, the behavior being what you use to manipulate that data. Object-oriented databases peaked and waned in the 90s, but object-oriented programming languages are what's used by the majority of programmers today
  • XML - XML has accomodations for markup and data, a schema language and query languages. XML-specific databases are on the rise, albeit slowly. Instead, most/all mainstream relational database vendors are adding accomodations for XML data in their databases today (ANSI-SQL 2003), just as they added accomodations for objects in the 90s (ANSI-SQL 1999).

If you program using more than one of these models, it becomes necessary to map between these models. For example, object-oriented programmers are never satisfied with my answer when asking about representing data as objects: “the familiar connection-command-rowset paradigm IS an object-oriented implementation of data access“. They don't want rowsets and rowset metadata; they want business objects. They are just as unlikely to be happy with the answer that the XML DOM or streaming apis can be implemented using an object model. So mapping (or dealing with the reality that the "other side" of the protocol pipe may be using mapping) is a necessity if you can't mandate what all of the users of your data will do. When you speak of object-oriented programmers using web services (which uses XML as a marshaling format and for just about everything else) you're even talking mapping for data marshaling. Whether you stick with composition and simple types or go all out to make relational or XML represent complex types, it's just a different refinement of the same mapping. So we're talking two way:

Relational to object mapping
Relational to XML mapping
XML to object mapping

Currently, there are Microsoft implementations of mapping in the .NET libraries in System.Xml.Serialization and the remoting SoapSerializer. The SOAP serializer is based on mapping type systems to XML based on a pre-XML schema attempt in SOAP 1.0 section 6. With the completion of the XML Schema spec, fewer people each day remember that SOAP originally stood for Simple Object Access Protocol. It's acronym status has even been revoked. The schema-centric model (document-literal format in WSDL) is what most modern web services use.

The experimental COmega language made this mapping an implicit part of the programming language, consuming metadata with two utilities SQL2COmega and XML2COmega, and making SELECT a language keyword. This be an interesting approach with a future.

I was intrigued by the inclusion of a three-part mapping schema format that made a brief appearance in .NET 2.0 alpha 1, and seems to have faded off into the future. The format was based on the presumption that you had existed XML schemas, relational databases and object models and you didn't want them to change. Current technologies like SQLXML mapping schemas in SQL Server's SQLXML use annotated schemas, with annotations that mapping XML elements and attributes to table columns or special formats like UpdateGrams and DiffGrams. This meant you had to annotate the schemas. The mapping format left XSD schemas alone, but introduced XML representations of relational schemas (known as RSD format) and object schemas (known as OSD format). Mapping any type to any other was accomplished via a mapping schema format (known as MSD) that could represent constructs outside the boundary of the "original" data model. It was also flexible in that it did not mandate a one-to-one mapping, implemented declarative mapping (for offline code generation) and could expose multiple overlapping mappings over the same data.

This mapping format made its appearance in two guises: System.Data.SqlXml (XML/relational) and ObjectSpaces (object/relational). ObjectSpaces has been postponed until "the next release" of .NET, .NET 2.0 SqlXml mapping has also been pushed off into the future. I couldn't even find a current link to this work on MSDN. System.Xml.Serialization didn't choose to implement this mapping model, keeping .NET attributes as its model to acheive mapping. So it will be intresting to see what (if anything) becomes of this model.

Bottom line is, unless you develop in a cloistered environment where you can safely ignore other data models, you're gonna have to map some data. Embedded SQL-92 development is such an environment, so is using XML web services with the native XML apis, simple types, and query languages only. The rest of everyone will have to use toolkits or roll their own mapping for now. And stay away from complicated domain-specific constructs, because the other side may be using a different domain model.

Categories:
Data Access

After writing DevelopMentor's Essential OLE DB class and teaching it to provider writers (and detail-oriented consumer writers) for a few years, I always thought that a good way to get myself acquainted with the new data model was to write a new database client provider/driver/etc whenever the model changed. I wrote a simple .NET data provider during .NET alpha 1 and published an article in MSDN magazine about it in Dec 2001. Well, the model changed again.

ADO.NET 2.0 enhanced the data provider model in some pretty revolutionary ways. It added something that was always needed, a factory for providers. This class (System.Data.Common.ProviderFactories) gets a list of providers or a provider-specific factory class (e.g. SqlClientFactory). The provider-specific factory can be used to get provider-specific Connection, Command, etc, instances. Providers that want to participate now have a special place in the config file. The 4 Microsoft providers are listed in machine.config.

In addition to this change, the new provider model is based on base classes rather than interfaces (although interfaces are still there for backward compatibility). Provider writers implement classes that derive from DbConnection, DbCommand, etc. This allows adding new functionality without breaking backward compatibility.

So I've updated the provider and posted the update on my website. The updated provider uses base classes and also has a Provider Factory class. There's also a sample program that uses it. I didn't put the provider in machine.config, but have an app.config that adds it.

An overview of the new model is provided in this MSDN online article, and two more are in the works. Look for them in MSDN online.

Enjoy.

Categories:
Data Access

If you've recently installed the SQL Express version of SQL Server 2005, you may or may not have noticed SQL Native Client (SNAC). If you want to use the new features of SQL Server 2005 like multiple active resultsets or snapshot isolation from OLE DB, ADO, or ODBC, you're going to need SNAC.

SNAC is meant to separate SQL Server's OLE DB provider and ODBC driver from MDAC. Currently the SQL Server OLE DB provider (SQLOLEDB) and ODBC driver ship as part of MDAC (Microsoft Data Access). A new version of Microsoft Data Access (MDAC 9.0) was planned in order to support the new SQL Server 2005 functionality. MDAC 9.0 won't happen. Instead, the current components of MDAC (2.8 SP1?) will ship as part of the Windows family of operating systems. MDAC will change infrequently, changes will not be tied to new SQL Server functionality. SNAC will be versioned with new versions of SQL Server and ship separately.

Once you've installed SNAC, you should see a new OLE DB provider (SQL Native Client) and a new ODBC driver (SQL Native Client). You need to use them instead of the older versions to get the SQL Server 2005-specific functionality. Note that they need to be coded into the connection string (OLE DB, ADO, or ODBC) or ODBC DSN. The SQLNCLI provider does not support XML like the SQLOLEDB provider does.

If you're using ADO.NET's SqlClient data provider, it has been rearchitected so it doesn't use MDAC. Or SNAC, as far as I know. If you've used SqlClient with SQL Server 2005 beta 1, you won't need "USE MDAC9" in connection strings as MDAC 9.0 isn't shipping with .NET 2.0 beta 1/SQL Server 2005 beta 2. SNAC is provided for vendors that still use OLE DB, ADO, and ODBC in the products. Nice feature. I have a soft spot in my heart for OLE DB (having taught provider writers for a few year), and having MDAC as a part of the OS is overdue. Not shipping MDAC for SQL Server changes will be a good thing.

Theme design by Nukeation based on Jelle Druyts