I'd been meaning to write this entry for a while, about my opinion on LINQ to SQL and the ADO.NET Entity Framework and performance. I've just finished reading the 2-part blog series "Exploring the Performance of the ADO.NET Entity Framework", and was surprised (I guess) that the database performance aspect was barely mentioned.
One way to look at performance is to examine and profile the ADO.NET code, but because both EF and LINQ to SQL are T-SQL code generators (EF is not database-specific, but I'm only talking about SQL Server here, so far), another way to talk about performance is to examine the generated T-SQL code. I want to look at the generated T-SQL code side.
In this posting I'm going to list my top worries WRT the code generated by these APIs. And try and argue for and against the worries at the same time. Here's the list.
1. LINQ to SQL and EF will proliferate dynamic SQL, and will almost surely produce suboptimal dynamic SQL, causing database performance problems and plan cache pollution. And dynamic SQL is evil to start with.
2. LINQ to SQL and EF will encourage "SELECT * FROM…" style coding because you get back a nice, clean object this way instead of the less-useful anonymous type you receive by doing projection. This will also make most/all covering indexes useless.
3. LINQ to SQL and EF will encourage "SELECT to middle-tier, then update/delete" rather than issuing direct SQL UPDATE/DELETE statements that are set-based.
4. LINQ to SQL and EF will write code that gets too much or too little data at a time. This is a variation of #2. Too much data in one query is a waste. Too little data is also bad because it means extra database round-trips.
5. LINQ to SQL and EF will discourage the use of stored procedures to return rowsets; returning rowsets in stored procedures is *thought to be* superior to returning them in dynamic SQL. In addition, the code generators will use a subset of T-SQL query constructs, that is, only the constructs that the LINQ or ESQL language supports, rather than the full power of the T-SQL query language.
6. LINQ to SQL and EF queries will be "untunable" because, even if you discover a performance problem, you can't change the underlying API code to produce the exact SQL query that you want. There's too many layers of abstraction to change it.
7. Other arguments? I'll accept other points to argue/worry about…
Here the argument about worry-point #1. Other worry-points in upcoming posts.
It's interesting to see the number of folks who do query tuning for a living salivating over the prospect of tuning the bad, bad, bad queries that will assuredly result from these two data access stacks. And the number of DBA-types who'd like to "ban LINQ/EF and databases in their companies". It's also interesting to note that most people who profess a dislike for the generated code, have never seen (or seen very little of) the generated code. Usually, when someone sites a particularly bad instance of generated code they're never able to tell me if the code came from LINQ to SQL or EF. So I'd like to open a clearinghouse for LINQ to SQL and EF queries that generate really poor SQL. Send them to me at my SQLskills email address, together with enough info/data to reproduce it. Conor Cunningham's blog has begun addressing the question of LINQ's IN operator and SQL cache pollution, by writing about IN and SQL plans.
So far, its been my experience that LINQ to SQL, being more relation-centric, will in general generate code that's closer to what a good T-SQL programmer would generate. EF is more "object-centric" and sometimes generates SQL thats meant to construct object graphs. But neither one of them (that I can deduce) can generate a full outer join.
About dynamic SQL. It's almost dogma amongst database programmers that "static SQL" in stored procedures is better for security than dynamic SQL constructed via string concatenation. Besides the obvious association between dynamic SQL and SQL injection, using dynamic SQL means that all users must be given access to the underlying tables, unless you strictly use LINQ to SQL/EF with views and sprocs. LINQ to SQL and EF make every attempt to use parameterized SQL, and LINQ to SQL claims to have minimized/eraticated the potential for SQL injection when using their code. And remember, many programmers use dynamic SQL currently, LINQ to SQL would be an improvement for them.
More about plan cache pollution. I'll start by recommending Sangeetha Shekar's blog series on the plan cache. The starting entry of the 17-part series is here. Except for the concept of "many different projections when one stored procedure will suffice" there's no cachability difference between parameterized SQL and a stored procedure. Non-parameterized SQL suffers a slight cachability difference until its reused. Because LINQ to SQL and EF are code generators, its likely that they may generate more homogenous SQL than programmers who use SqlCommand.CommandText. And programmers that use dynamic SQL (the ones most likely to use only LINQ to SQL/EF) are likely causing plan cache pollution right now in any case.
MHO.
3 thoughts on “MHO: LINQ to SQL and Entity Framework: Panacea or evil incarnate? Part 1”
I agree with you,
and I’m also very interested in the performance problem of LINQ to SQL.
I will often visit here ^^
I’ve looked at the code which is generated by LINQ. I did a few scenarios using the various aggregation and methods available to the LINQ developer. The generated SQL did look acceptable, and it was fully parametrized, and I didn’t see any plan cache pollution. For a developer writing their own SQL, LINQ would probably do a better job than the developer. I’ve seen some horrible developer generated SQL in my day.
But, here is where I think the LINQ debate breaks down: In an enterprise, developers shouldn’t be writing their own SQL… and if they do, it should be in a stored proc so DBAs can tune it later when it causes performance issues. Because LINQ is still generated on the client, DBAs cannot do their job. The queries being executed against the database have no visibility. Because LINQ queries aren’t database objects, the nifty new Management Studio reports (Object Execution Statistics) return empty reports and DBAs are unable to pro-actively monitor DB performance.
Micorosft seems to have done a good job at creating acceptable generated SQL, but in real live scenarios that’s irrelevant because it prevents DBAs from effectively tuning the database.
As a side-note, I’ve recommended to our TA that we allow LINQ to SQL because it’s a solid new technology, but to only allow developers to access data via stored procs. This requires a bit of a different approach, and the dev can’t use the full range of filters and aggregation methods, but I think it’s an acceptable compromise.
I’m one developer whos not completely convinced by LINQ to SQL (or entities, or what ever) yet, mostly because everybody and everything looks way to uncritical at the technology/technique right now, and only seems to praise how "easy" it is, and how "clever" it is to have a uniform way of accessing different data sources (which I’ve rarely found any use for in real life though).
But few mention much about performance at all, and best case scenarios I’ve seen from various technical blogs who do touch upon it is something like 10-15% slower.
So I’ll be following your blog to see what you write, and once I get enough free time, I’m likely to try and dig into it myself as well, simply so I can try it out with my own hands as well.
So I’m very interested in your findings when they get posted 🙂
Comments are closed.