Author: Kimberly L. Tripp, Founder, SQLskills.com
Summary
SQL Server 2005 provides the database application programmer with a rich new development platform by hosting the .NET Framework Common Language Runtime environment (CLR). With new capabilities come new roles and responsibilities for the Database Administrator (DBA). This whitepaper helps the DBA determine appropriate use of this new feature and guidance on when other alternatives may provide better performance, flexibility or capabilities. As well as offering guidance on suitable uses for the Database Engine .NET Framework Programming API this whitepaper also suggests code, change and release management processes that should be tailored to each DBA’s individual circumstances to ensure a professional and safe deployment.
About this paper
The features and plans described in this document are the current direction for the next version of the SQL Server. They are not specifications for this product and are subject to change. There are no guarantees, implied or otherwise, that these features will be included in the final product release.
This is a preliminary document and may be changed
substantially prior to final commercial release of the software described
herein.
The information contained in this document represents the
current view of Microsoft Corporation on the issues discussed as of the date of
publication. Because Microsoft must
respond to changing market conditions, it should not be interpreted to be a
commitment on the part of Microsoft, and Microsoft cannot guarantee the
accuracy of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the
responsibility of the user. Without
limiting the rights under copyright, no part of this document may be
reproduced, stored in or introduced into a retrieval system, or transmitted in
any form or by any means (electronic, mechanical, photocopying, recording, or
otherwise), or for any purpose, without the express written permission of
Microsoft Corporation.
Microsoft may have patents, patent applications,
trademarks, copyrights, or other intellectual property rights covering subject
matter in this document. Except as
expressly provided in any written license agreement from Microsoft, the
furnishing of this document does not give you any license to these patents,
trademarks, copyrights, or other intellectual property.
Unless otherwise noted, the example companies,
organizations, products, domain names, e-mail addresses, logos, people, places,
and events depicted herein are fictitious, and no association with any real
company, organization, product, domain name, e-mail address, logo, person,
place, or event is intended or should be inferred.
Microsoft is a registered trademark of Microsoft
Corporation in the
The names of actual companies and products mentioned herein
may be the trademarks of their respective owners.
For some features, this document assumes that the reader is familiar with SQL Server 2000 features and services. For background information about SQL Server features and services, see the official product Web site at http://www.microsoft.com/sql/ or the SQL Server 2000 Resource Kit that is available from Microsoft Press.
THIS IS NOT A PRODUCT SPECIFICATION.
©
2005 Microsoft Corporation. All rights
reserved.
Finding the Right Tool for the Job
Introduction to .NET Framework Programming in the
Database Engine
Important Notes on Implementation
Choosing the Right Tool for the Job
Looking beyond SQL Server 2005
Additional SQL Server 2005 Resources
This whitepaper sets out to provide information to the DBA
that should help them ensure successful, risk free and hence, stress-free adoption
of .NET Framework programming in the Database Engine. The Database
Administrator is the audience for this paper, for a more developer focused look
at .NET Framework programming for the SQL Server 2005 Database Engine, see the
following whitepaper on MSDN: Using
CLR Integration in SQL Server 2005.
SQL Server 2005 provides a broad set of programming
interfaces that enable developers to build robust database applications with
greater ease, performance and reliability than ever before. Along with this
breadth of programming options comes the need to consider which set of tools is
appropriate for each task. Although many tasks can be accomplished multiple
ways, each has pros and cons – so finding the best tool for the job is critical
for an application to perform and scale with load and growing business usage.
Some of the questions the DBA needs to ask are:
·
Should the
system handle this data as XML or should it be shredded and stored
relationally?
·
Should this
process – and all of its complex pieces – be handled synchronously or
asynchronously?
·
Should this
business logic, this calculation or this added security option be handled in
the client application, the middle-tier or the back-end database?
·
Should data
analysis be handled in the relational database or through the Business
Intelligence engine?
·
Should the data
transformation occur with the Integration Services ETL engine or in the database
using transforms built with Transact-SQL?
·
Should complex
business logic – traditionally running on middle-tier servers – remain in the
middle-tier or migrate to the SQL Server platform?
·
What mix of
clients and servers are running in the infrastructure – is there a need to
support Windows clients, Unix clients or both?
In most database development projects the role of technology
selection and of structural design of components that interact with the
database falls on the Database Administrator (DBA) – the person with final
responsibility for managing and recovering that business data. Most DBAs adopt
conservative attitudes to new technology. This is a natural instinct because
along with the benefits offered by new functionality, new technology can
introduce new risks to stability and integrity. The professional DBA usually manages
the risk/benefit by ensuring that there is full testing and that they understand
the new technology, taking the time to identify where it adds the most value
and, perhaps more importantly, where it should not be used. As a result of this
natural conservatism the DBA may ask “How
do I turn that feature off until I understand it?” The good news is that unlike
previous releases of SQL Server, in this release many new features are “off by default”.
Instead of leaving all features off permanently a prudent DBA
will tend to learn enough about the technology to determine where its use is
appropriate and where its application makes the most sense. There is no need to
understand every line of code – in every language the developer might use – but
there needs to be enough confidence to be able to provide great operational support,
maintenance and troubleshooting. Across many of these new features, the key to
proper usage is: understanding, impact isolation and strong control.
The SQL Server 2000 database programmer has the following
options when coding against SQL Server database tables and views:
Ø
Use
Transact-SQL to write code that runs within the database. Code can be written
as stored procedures, user-defined functions and/or triggers (that can be
regarded as stored procedures that are invoked on data change).
Ø
Use
Visual C++ to write code (an ‘Extended Stored Procedure’) that runs within the
database. Code that is written as an extended stored procedure appears to users
as a stored procedure and is executed in the same way. Parameters can be passed
to extended stored procedures, and they can participate in transactions and return
both results and return status.
Ø
Use
the sp_OA* (Object Access) system stored procedures to load and interact with
COM objects
Ø
Use
other languages and middleware such as
Each of the options above has issues when the solution
demands that data be integrated with functionality supplied by external
libraries (such as those provided with the .NET Framework); or that nontrivial
mathematical operations be applied to the data; or if the requirement is for
something more complex such as a custom aggregation of data or a true
user-defined data type.
Each of the four options has limitations:
Ø
Transact-SQL
is excellent for set-based operations such as comparisons between tables but,
due to the interpreted nature of the language it can struggle to deliver good
performance for computationally heavy tasks. Another limitation is that unlike
modern programming languages Transact-SQL does not have support for
private/public data encapsulation so it is harder to implement clean interfaces
between modules. Finally, SQL Server 2005 introduces improved error handling
within Transact-SQL however it is still susceptible to “un-trappable” errors
caused by missing objects or bad syntax that is easily handled by .NET
Framework languages.
Ø
Extended
Stored procedures are by their nature written in unmanaged code and execute
within the context of the SQL Server process. A greater level of programming
competency is required to create code that does not inadvertently leak memory
or generate unhandled exceptions that can crash the entire SQL Server process.
Extended stored procedures cannot provide in-process access to the .NET
Framework libraries without placing the server in an unsupported state. For
more information, see this knowledge base article: Using
extended stored procedures or SP_OA stored procedures to load the CLR in SQL
Server is not supported.
Ø
The
sp_OA* system stored procedures place limitations on the COM object, requiring
that its interface be implemented in a compatible way, and have
Ø
External
code can cause performance problems because data must leave the SQL Server
process space and flow to the calling application; this data marshalling can be
expensive for large volumes of data.
Ø
None
of the current options can be used to create first-class custom aggregate
functions or custom data types, where first-class means running within the
database as if it were a SQL Server primitive function or data type.
With these limitations in mind. SQL Server 2005 integrates
the .NET Framework Common Language Runtime (the CLR - the execution environment
for managed code) and thus enables database developers to place managed application
code inside the SQL Server which is safe, secure, scalable and feature rich.
Code can be written as:
Ø
User-defined
functions (scalar or table valued)
Ø
Stored
procedures
Ø
Triggers
Ø
User-defined
aggregates
Ø
User-defined
types
The mapping of user-defined functions, stored procedures and
triggers to objects written in managed code is fairly intuitive – the CLR
programs are accessed and execute the same way as their Transact-SQL equivalents.
However, user-defined aggregates and types are less intuitive and extend the
options of the database programmer in new ways:
Ø
User-defined
Aggregates allow the programmer to build custom aggregate functions (used in
conjunction with the GROUP BY clause). This enables complex statistical and
data analysis in the database engine.
Ø
User-defined
Types provide the programmer with the ability to define new types with custom
behaviors, combined with the power of the .NET Frameworks and third party
libraries, this new capability will allow strongly-typed objects to be created
instead of forcing a relational representation.
SQL Server 2005 delivers high-performance access to managed
code that runs inside the database server process. Unlike other database
technologies that have provided a degree of integration with the .NET
Framework, SQL Server 2005 hosts the runtime environment (CLR) in the database
engine’s process space; this delivers higher performance when transitioning between
the SQL Server query execution environment and the CLR. The integration is
designed to avoid conflicting memory and CPU demands between database queries
and programs. Additionally, the SQL Server and .NET Framework software
engineers worked to make the CLR safe and performant within the SQL Server
process:
Ø
The
CLR requests memory from SQL Server, not directly from Windows.
Ø
CPU-intensive
CLR memory garbage collection is controlled by SQL Server.
Ø
An
“in-process” version of the managed SQL Server client passes SQL requests
straight into the SQL Server query processor (avoiding costly network interaction).
Ø
CLR
Application Domains are created and managed by SQL Server.
All of this engineering is designed to ensure that a runaway
CLR program cannot compromise the stability of the SQL Server.
Microsoft is on a continuous journey to deliver secure
products to its customers, the Trustworthy Computing Initiative lead to SQL
Server 2000 SP3, designed to be the most secure release of SQL Server 2000.
This initiative continues to impact SQL Server as Microsoft moves further in this
release by enhancing the “off by default” security of its products;
The Database Engine .NET Framework Programming API is
“off by default” and the DBA must make the deliberate decision to activate the
feature.
SQL Server 2005 introduces the “Surface Area Configuration”
tool that empowers the DBA to control which features are enabled. This change
in philosophy ensures that potentially unused features are not enabled and left
in an unprotected state.
The shortcut to the tool is installed in the Start/All
Program menu within the SQL Server 2005 program group in the Configuration
Tools sub-group.
Select the SQL Server Surface Area Configuration option and
then when the dialog presents, select:
“Surface Area Configuration for Features”
This will open the dialog below that enables the selection
of a SQL Server instance and then the selection of each locked down option.
Selecting Surface Area Configuration for Features opens the
dialog below. Two views of the options are available, the default (by instance)
enables control by SQL Server instance; the second enables control across
components such as “Database Engine” and “Reporting Services”.
Before enabling features that increase the surface area of
the SQL Server it is recommended that the DBA ensure that their systems are:
ü At the latest service pack and
critical hotfix level (obtained from Microsoft
Update)
ü Configured according to their
recommendations for secure systems (these may be informed by Microsoft and
other third party vendor advice on server and infrastructure configuration)
The Database Engine .NET Framework Programming API is part
of the Database Engine feature set, and is referred to as “CLR Integration” in
the user interface.
The user interface provides an easy way to view and set SQL
Server 2005 instance level permissions.
The SQL Server 2005 instance level features can also be
controlled programmatically by:
-
Transact-SQL
sp_configure command
-
Managed
code using the Server Management Objects (SMO) Server object Configuration
class.
-- Enable & Check the Database Engine .NET
Framework Programming api SELECT sc.* |
Decisions around tool and API selection depend on many other
factors such as your internal staff’s skills; the recommendations of any third
party software vendors that deliver the systems that run your company; the
requirement to deliver a brand-new system; and many others.
The illustration depicts both the dilemma and the choice
that most DBAs will face. Some decisions are simple, while other decisions are
extremely complex, especially where there are many ways to implement the
functionality and no absolute direction between the technology options.
In these cases, prototyping becomes more important – a quick implementation
using two or more competing options can make the choice much clearer.
As previously stated, the professional DBA is conservative –
new technologies are not implemented lightly; they must first be understood and
then carefully tested before they become integrated into production
environments, to act otherwise would risk destabilizing the database and lead
to user distrust of the integrity of the systems the database hosts (as well as
threatening the job safety of the DBA). A further assumption is that Developers
are more willing to take risks and to adopt new tools that increase their
productivity and extend the functional domain of the solutions they can offer
their users. There is no doubt that the Database Engine .NET Framework
programming model is powerful and that many developers will relish the ability
to write in-database code in first-class programming languages.
So where do these two conflicting character types meet in
order to ensure that their systems remain stable and that new productivity and
functionality gains are realized? The answer is in finding balance, in
understanding the opportunities and more importantly defining choices rationally,
based on strict criteria for technology selection. This section suggests some
guidelines that might be followed to achieve this middle-ground.
The first set of points provides guidance on where it might
be a mistake to use the new functionality:
Ø
Heavy relational data access
Don’t move away
from Transact-SQL for simple query execution. Transact-SQL set-based access
will be faster than paying the transition cost for moving data in/out of the
.NET Framework, especially if the set-based query is replaced by cursor-like
behavior in the program (refer to the section “Warning Signs” for more on this
potentially hidden cost). Note that the opposite applies if complex computations
are taking place within the query, in this case moving the logic into a .NET
Framework program where the computation will be fully compiled can improve
performance. The transition overhead between the Transact-SQL and .NET
Framework execution environments is more noticeable for simple computations and
basic relational data access; in this case it is likely that Transact-SQL will
outperform the CLR.
Ø Long running, external calls
While it is
tempting to use the new functionality to further integrate existing business
systems, it is important to take the time to ensure that the end-user experience
does not get negatively impacted by calls to external APIs and external
systems. These impacts can be especially visible in a user-defined function
that might get called for every row of a table within a query. An external call
that costs one second per-row suddenly becomes unusable in an online system
when applied to a modest 10,000 row table.
Ø Unnecessary use of user-defined types
When the object’s
data can be easily mapped to one or more relational data types, you should stay (or go) relational. Be aware that
user-defined types have:
·
An 8K size
limitation (they must fit on a single SQL Server data page)
·
All data within
the UDT is read and then rewritten if updated.
The same size restriction
applies to user defined aggregates so care should be taken especially when
concatenating large string objects.
Ø
User-defined aggregates and online
reporting
User-defined
aggregates cannot be used in combination with the SQL Server Indexed Views, so
it is not possible to automatically pre-aggregate data for online report
performance. If stale data is acceptable then a separately created and
maintained table that periodically caches the aggregated results could be used
in place of an indexed view.
Ø
Compatibility with prior versions of
SQL Server
If your application
must support previous versions of SQL Server then you cannot use this
functionality.
Ø
Appropriate use of the technology
The Database Engine
.NET Framework programming API introduces a huge range of new possibilities for
the database programmer, however you should avoid racing into use of this new
technology unless you can state a clear rationale for its use.
The above points might seem like a strong case for never
deploying this technology; however there are many compelling scenarios where it
can provide considerable benefits:
Ø Leveraging the power of the .NET Framework and the Visual Studio programming
environment
This is where the
most gains can be made in terms of developer productivity and new
possibilities.
With the 2005 generation of SQL Server and Visual Studio, the Integrated
Development Environment and the .NET Frameworks are now in their third release
and second generation – this has allowed a huge amount of developer feedback
and means that the user-interface enables rapid application development and the
class libraries provide a rich set of objects and methods that the developer
can employ to avoid having to develop their own code to do many generic tasks.
The first-class programming languages of the .NET Framework (C#, Managed C++
and Visual Basic.NET) offer the programmer more control over error handling and
provide for better diagnostics in terms of call stacks and other debug
information.
Access to most of the local functionality is possible without granting high
levels of privilege, meaning that rich XML, string, regular expression,
localization and data manipulation functionality is readily available without
compromising data security.
And because the code is compiled and then converted to machine code at
invocation it is possible to code business logic that executes an order of
magnitude faster than Transact-SQL.
Examples where this might be done are:
- Connecting to a remote web service to access data in tabular format
(realizing the integration promises of web services at the database level)
- Calling a third party vendor’s control to add an order to an ERP system based
upon a change inside the database
- Using a set of code libraries that offer datatypes & functions specific
to your industry segment, be it research, financial, manufacturing, or sales.
Ø Replacing Extended Stored Procedures (XP’s)
Prior to the
arrival of the Database Engine .NET Framework programming API, the only way to
provide access to the external world was through extended stored procedures and
the sp_OA* stored procedures, however, as documented earlier, these methods
present a high risk to database stability even when used by experienced
developers.
It is recommended that the DBA who has systems with extended stored procedures
providing extended business logic, or who is using object model manipulation
via the sp_OA* stored procedures consider rapid adoption of this new
technology, because the SQL CLR is safer.
Using the SQLCLR has these advantages:
·
There is no
possibility of managed user-code generated access violations making SQL Server
crash.
·
There is no
possibility of managed user-code memory leaks making SQL Server slow down and
hang.
·
There is better
performance and scalability through SQL Server’s memory manager controlling
system resources.
·
There are no
security issues as security is fully integrated with both the SQL Server and
.NET Framework environments.
Note that these advantages
may not apply to code that is running in the unsafe trust bucket as calls out
to unmanaged code; use of COM automation; or calls to “process-threatening”
libraries (such as dialog generating code, thread creation code, or code that
interferes with the process context) may still have an impact on the SQL Server
instance by causing access violations or by leaking memory.
Examples
of extended stored procedures that could be replaced:– pretty much all of them
(especially considering that Managed C++ is available as a coding option)
Ø
Data validation on Updates
Enforcing a common
set of business rules when changing data across multiple clients is a perennial
problem for some systems who can have complex middle-tier validation logic. The
new API allows this logic to move to triggers in the database tier to ensure
that all updates are consistent.
An example might require that data is entered in a specific order, some of it
in systems and technologies that are not normally directly accessible to
Transact-SQL triggers – the trigger can check that a new customer is first
entered and accredited into a remote system on an IBM mainframe that manages
credit risk across all of the company’s customers.
Ø
Network traffic reduction
Some algorithms require all (or a large
percentage) of the data to produce results, moving large volumes of data
between servers can be very expensive in terms of CPU so placing this algorithm
in the database can avoid the data marshalling CPU and network cost and this will
likely result (for well architected apps) in better 3-tier performance.
Examples can be seen in many statistical calculations that require all the data
to produce moving averages of data, or where a dataset from SQL is being joined
onto data not held in a relational database (and not accessible to a linked
server) where the volume of data in SQL is much greater than that of the remote
system.
Ø
Writing general purpose functions
A general purpose
function has the following characteristics:
·
Data is passed
in as function arguments
·
There is little
or no additional data access within the function
·
Complex
computation is applied through cursor-like code to the data within a loop that
processes a row at a time
o
This
is an area where the benefits of the compiled nature of the CLR execution
environment can greatly outweigh the costs of transitioning data between the
environments. In performance tests the SQL Server team has seen benefits at as
low as three integer operations per call.
Examples
might include closing balance calculations that “number crunch” through the
day’s transactions looking for fraud or any unusual data patterns.
Ø Implementing scalar user-defined types
Although most data
can be mapped to the relational model, there are many examples where a
user-defined type makes considerable sense:
·
The type wraps
external behavior in order to present it within SQL Server (an example might be
a date type that implements UTC functionality)
·
The type uses
encapsulation to protect its contents, and where the data is usually read and
updated together (an example might be an implementation of a spatial data type,
or a complex number).
Ø Using the power of custom user-defined aggregations
Many industries
“crunch data” into custom groupings based on all/subsets of input data where
the native aggregation operators such as SUM, AVG, MIN, etc. are not
sufficient. Examples might be computing a Fourier transform or an actuarial
predictive calculation. The implementation of user-defined aggregates permits
fan out/fan in (parallel execution) across multiple threads and thus should
scale well across multiple processors.
Ø High performance table-valued, user-defined functions
Saving
pre-instantiation by supporting partial results from “streaming” table-valued,
user-defined functions, many algorithms do not require a complete list of all
items from an external data source, examples being “get the latest stock
price”, “get the latest event from the event log” and “get the first item in
the queue”. The user-defined function infrastructure supports “streaming” of
data – data is requested on demand rather than at one single time, thus
avoiding having to read large results sets into memory.
This is a capability that exceeds that of the Transact-SQL table-valued,
user-defined function, which must instantiate all of its data when called, and
in situations where partial requests are common could considerably outperform
the traditional user-defined function.
The above lists of good and bad uses for this technology should serve as a
starting point to the development and systems management communities to
understand where this new functionality fits in their toolkit.
Developers author code within Visual Studio – this is the
optimum tool for productive programming using the Database Engine .NET
Framework programming API. It is possible to use other tools to create programs,
such as the Express toolset or even Windows Notepad but these lack the Database
Engine wizards, integration with MSDN and Visual Source Safe and other
team-development tools. Additionally, Visual Studio offers a multiplicity of
deployment, test and debugging tools that a proficient developer can exploit to
deliver high-end, professional applications rapidly.
The paradigm is simple – once the
developer has completed their solution they use the .NET Framework compilers to
build an assembly (a .dll file stored on the target file server) and then either
manually or automatically deploy the binary assembly into the database.
Once the assembly has been loaded into the database it is
now independent of the original file (the .dll). This means that the database
can be backed up, moved, and even restored without worrying about external
object dependencies; they are all pre-loaded into the database.
Note that the original source code must still be maintained, and ideally all
changes tracked. This aspect of system management is covered below.
The following recommendations are made in order to maximize the productivity of
developers:
è Developers should be provided with a
full set of professional tools:
o
Visual
Studio 2005 Team System and MSDN
o
SQL
Server 2005 Express for “Occasional” Database Engine .NET Framework API uses
and some limited forms of testing and development.
o
SQL
Server 2005 Developer Edition for “Regular” Database Engine .NET Framework API
users
o
Visual
SourceSafe or equivalent source control management system for team access to
source code and release management.
è Developers need access to
production-like data
o
A
scaled down version of the production database – with a reasonable
representation of relative table sizes and with sensitive customer data
obfuscated – should be accessible in order to test and prototype accurately. An
important note is that other tools, like SQL Server 2005 Integration Services,
can be used to create a rich developer version of your production database with
obscured credit card, name, address, and social security numbers.
o
Access
to test web services for testing integration with other systems
There are some restrictions of the .NET Framework
infrastructure that predicate “a database on every (developer’s) desktop” –
when using Visual Studio debuggers to trace through the execution of new code
the CLR engine single threads. Debugging should never take place against a
production system except as a last resort.
The Developer Edition of SQL Server 2005 is recommended for
professional developers as it provides them with the SQL Server Management
Studio which is fully integrated with the source control API; this edition also
includes the SQL Server Profiler for tracing the execution of SQL Server
commands. There are a wealth of other tools that overlap the DBA and Developer
roles that will make developers more productive when creating and scripting
database objects – for example SQLCMD for scripting and automation. For more
information regarding tools, see the resources and links outlined at the end of
this whitepaper.
One of the main duties of the diligent DBA is tracking the
objects that make up a service – which tables are in which databases, what
stored procedures and functions are required, what scripts must run daily,
weekly, monthly and so on. To aid in tracking all of these objects, there are a
number of tools from Microsoft and other software vendors (e.g. Microsoft Visio
for the data model, Microsoft SQL Server Management Studio for the database and
many others).
The integration of SQL Server 2005 with the “outside world”
introduces many new objects that both supplement and extend the relational
system catalog of previous releases. This section documents how to use the
changed catalog to obtain an accurate list of these new database objects, as
well as the additional metadata for each new object type within a database.
--
SYS.ASSEMBLIES -- Name,
Assembly ID, security and “is_visible” flag SELECT * FROM sys.assemblies --
SYS.ASSEMBLY_FILES -- Assembly
ID, name of each file & assembly contents SELECT * FROM sys.assembly_files --
SYS.ASSEMBLY_MODULES -- Sql
ObjectID, Assembly ID, name & assembly method SELECT * FROM sys.assembly_modules --
SYS.ASSEMBLY_REFERENCES -- Links
between assemblies on Assembly ID SELECT * FROM sys.assembly_references --
SYS.MODULE_ASSEMBLY_USAGES -- Partial
duplicate of SYS.ASSEMBLY_MODULES -- Links SQL
Object ID to an Assembly ID SELECT * FROM sys.module_assembly_usages |
The views above are defined as system catalog views. These
catalog views provide information about the “containers” of these new
assemblies. These assemblies (or .dll files) are the result of .NET Framework
program compilation and “registration” in the database.
An interesting note: the sys.assemblies “is visible” flag
can be used to hide an assembly from having its public methods registered. This
can be used to conceal “helper” assemblies such as the .XmlSerializers
generated for assemblies that use web services to access data.
In the sys.assembly_modules view a null assembly method
indicates an assembly that will be used as a user-defined aggregate.
Note that there are also two system stored procedures that
provide access to these tables:
EXEC sys.sp_assemblies_rowset
N'<AssemblyName>' EXEC sys.sp_assembly_dependencies_rowset <AssemblyID> |
-- Read CLR
Stored Procedure Metadata SELECT schema_name(sp.schema_id) + '.' + sp.[name] AS [Name] |
-- CLR Trigger Metadata SELECT schema_name(so.schema_id) + '.' + tr.[name] AS [Name] |
-- List CLR Scalar Functions SELECT schema_name(so.schema_id) + N'.' + so.[name] AS [Name] |
-- List CLR Table Functions SELECT schema_name(so.schema_id) + N'.' + so.[name] AS [Name] |
-- List CLR Aggregate Functions SELECT schema_name(so.schema_id) + N'.' + so.[name] AS [Name] |
-- User Defined Type Metadata SELECT st.[name] AS [Type Name] |
One of the potential drawbacks of integrating environments
with distinct security models is that without careful design the aims of one
can be compromised by the other – for example an extended stored procedure must
connect back into the SQL Server to access data, and must enlist the active
transaction to protect the integrity of data it changes. Failure to do either
can result in data corruption and elevation of privilege security holes.
The architects of the Database Engine .NET Framework Programming
API have been careful to ensure that both security models are respected. The
SQL Server model using the GRANT command to provide specific object access is
respected as is the .NET Framework’s Code Access Security which controls the
interaction between modules of higher/lower privilege.
A non-administrative user must be granted execute privileges
on routines before they can be used.
To simplify the code access security model (the permissions
that define what the assembly can do) permissions are grouped into three sets:
SAFE
The assembly’s
methods can do no more than an equivalent Transact-SQL stored procedure or
function. The code runs under the caller’s credentials.
EXTERNAL ACCESS
The assembly’s
method can perform file and network input/output. External code runs with the
SQL Server service account’s privileges so impersonation should be used to
ensure that the caller’s privileges are used when controlling access to
external resources specified by user input.
UNSAFE
This permission
extends the “External” set of permissions and allows the assembly to call
unmanaged (and hence uncontrolled code).
Note that even with “UNSAFE” mode the integration code prevents a number
of .NET Framework libraries from being used – compilers, window managers,
enterprise service libraries and others are blocked as their use within the SQL
Server does not make sense.
It is recommended that the DBA restrict the use of
External/Unsafe code and that they apply stricter requirements in terms of
code-review and permissions design if their developers request these higher
levels of permission for their code.
CREATE ASSEMBLY loads the binary assembly into the SQL
Server database, there is no way of loading the source at the same time – it is
left to the DBA to manage the source code that was used to create the
assemblies. While it is likely that third party software will supply only
binary codes (and equally likely that the third party will provide professional
technical support in the case of issues with their software), it is important
that the DBA establish a firm process for managing source code created by their
developers before its absence becomes a problem.
The Programming Paradigm section above made recommendations
around the tools provided to application developers – if these recommendations
are followed it becomes relatively easy for the DBA to enforce solid change
control policies and procedures and truly delineate between development/test
systems and those that run the production system. Using SQL Server 2005, new
security features and DDL triggers can restrict access and commands (or at
least audit changes) to better enforce change control policies of the
production environment. If desired, DDL Triggers can be used to prevent all DDL
(Data Definition Language) statements (or specific statements or groups of
statements) within a database.
Note that code should never pass from development to
production without recompilation to ensure that code reviews with all relevant
stakeholders have reviewed the correct code and all dependencies have been
supplied.
è While it is recommended that Developers
exploit Visual Studio’s auto-deployment features in testing – for maximum
productivity during the development coding cycle – once complete, developers
should pass only source code and compilation command files to their DBAs for
deployment.
Note that the compilers are freely redistributable with the .NET Framework so
there is no requirement for a DBA to install Visual Studio unless they intend
to use its features for code review/debugging purposes.
è DBA’s should take a copy of the
source code and place it “in escrow” once code review and recompilation (using
the compilation command file) has taken place – this protects the code even if it
is accidentally deleted from development source systems.
è It is recommended that a source copy
be associated with each database that has loaded the assembly. This means that
a single copy of the source code might exist in the development code tree and
several (one per database) in the DBA tree. Once the assembly has been loaded
into the database with the CREATE ASSEMBLY command any supporting files (such
as source codes and documentation) can also be loaded using the ALTER ASSEMBLY
command, as depicted below:
ALTER ASSEMBLY
LeastSquaresSqlFunction |
Productivity Tip
Note
that the ALTER ASSEMBLY command supports the loading of multiple files with a
single statement.
è Both the DBA/Developer should
annotate their source control systems when checking in code so as to provide an
audit log of changes.
è The WinDiff tool (or Visual
SourceSafe Compare) is excellent for comparing source code to identify changes.
Reviewing only the differences can shorten the code re-review process when
passing patches forward from development to production. For more details on
using WinDiff, refer to the Microsoft Developer Network library download for WinDiff.
Productivity Tip
When Visual Studio is
used to build programs that exploit the SQL Server Database Engine .NET
Framework Programming api the build output pane contains the compiler command
line that is being used – this can be cut and pasted into a release command
file that the DBA can use to recompile the program.
Once source code is protected by professional source control
and developers become used to providing source and compilation command files
then release management becomes much more resilient to problems caused by code
and schema mismatches. (See the final “Futures” section for more information on
why this is especially important)
The SQL Server 2005 toolset makes it easy for DBA’s and
Developers to work together within a common source code repository. The SQL
Server 2005 Management Studio application can load/save scripts from source
control applications such as Microsoft Visual SourceSafe and many others.
DBA’s can be given read access to developer source code
projects in order to copy releases forward into projects that contain the
scripts and source codes that relate to a specific system/release. Another
option is for Release Managers to extract the appropriate versions of the
source codes and place them in a folder that the DBA can access in order to recompile
and load into the shared SQL Servers.
Controlling code promotion is a key role for DBA’s –
allowing developers to deploy directly into shared systems will almost
inevitably result in assemblies without controlled source code, and errors when
the code is finally deployed into production (either immediately as the
recompile picks up the wrong version of source code and creates code that was
never tested, or eventually as problems cannot be reproduced in test systems as
the production assembly cannot be debugged as the source doesn’t match).
The sample below illustrates a C# program compile command
using the freely distributable csc.exe (the C Sharp Compiler, distributed with
the .NET Framework)
Csc.exe /noconfig /nowarn:"1701;1702" /warn:4
/define:DEBUG;TRACE
/reference:C:\WINDOWS\Microsoft.NET\Framework\v2.0.50215\System.Data.dll
/reference:C:\WINDOWS\Microsoft.NET\Framework\v2.0.50215\System.dll
/reference:C:\WINDOWS\Microsoft.NET\Framework\v2.0.50215\System.Web.Services.dll
/reference:C:\WINDOWS\Microsoft.NET\Framework\v2.0.50215\System.XML.dll
/debug+ /optimize- /out:obj\Release\MainframeWebService.dll /target:library
IBMSystemCode.cs AssemblyInfo.cs "Web
References\com.Mainframe.webservices\Reference.cs" |
In the case of a program that invokes a web service a
further step is required to generate the serialization code, this is
illustrated below. The Sgen program (installed with the .NET
Framework 2.0 SDK and found in C:\Program Files\Microsoft Visual Studio
8\SDK\v2.0\Bin) will
create an assembly called “MainframeWebService.XmlSerializers.dll” that can be
registered in SQL Server 2005 using the CREATE ASSEMBLY command or user
interface
Sgen.exe /n /f MainframeWebService.dll |
Use the Transact-SQL below to register both of the
assemblies:
CREATE ASSEMBLY [MainframeWebService] go CREATE ASSEMBLY [MainframeWebService.XmlSerializers] go |
The code above has compiled the code supplied by the
developer, generated serialization code and then loaded both assemblies into
SQL Server 2005 for use in CREATE FUNCTION/PROCEDURE etc statements.
Productivity Tip
Note that the serialization code is loaded after its
parent assembly. Reverse the order and the parent will be autoloaded in
‘stealth’ mode, requiring an ALTER ASSEMBLY … WITH VISIBILITY = ON to uncloak
it for use in CREATE statements.
By allowing developers to reach maximum productivity with
professional tools and database development environments that are populated
with a good sample of production data; together with controlled code promotion,
the DBA can deploy the CLR technology without risk of compromising system
stability.
If the DBA installs SQL Server 2005 and then starts up the
Windows System Monitor (PERFMON.EXE) they might be disappointed to see only a
single SQL Server CLR counter (SqlServer:CLR “CLR Execution”), giving cpu
statistics summed across the server.
However this is misleading as the majority of counters live
in the .NET CLR group of counters at the top of the performance object list and
they can be filtered by the “sqlservr” process name.
Productivity Tip
Note that a server
running multiple SQL Server instance processes, each with active CLR code) will
not be easily distinguished in this list, so it is suggested that these be
combined with the instance specific “CLR Execution” to identify the active
instance by total cpu activity
There are multiple sets of counters that aid in
understanding the health and activity of programs running in the SQL Server
hosted runtime:
Ø .NET CLR Exceptions – the
Exceptions/Sec counter can provide an early warning of problems in application
code – if the value is larger than usual (developers can use exceptions for
normal functionality, not just for error handling) then application failures
may be happening and further investigation might review logs for errors.
Ø .NET CLR Loading – SQL Server
isolates code between databases by using an AppDomain (a CLR concept that
provides a self-contained runtime environment for programs); this set of
counters enables monitoring of the number of AppDomains and the number of
assemblies loaded in the system. Too great a number might
Ø .NET CLR Memory – provides detailed
information about the three types of CLR heap memory as well as garbage
collection, these counters can be used to monitor CLR memory
Ø .NET Data Provider for SQL Server –
provides information on the number of connects and disconnects per second. This
is useful for capturing activity at the database level (for code that connects
back to the SQL Server rather than returning external or processed parameters)
as this counter
In addition to the System Monitor counters above, it is also
possible to get a considerable amount of information from the SQL Server Dynamic
Management Views (DMV) that present SQL Server “Operating System” (os) data,
some useful sample queries are captured below:
-- Loaded
Assemblies (run in each database) SELECT sa.[name] |
-- SQL CLR
Memory Usage SELECT mo.[type] |
-- SQL CLR
Wait Statistics SELECT ws.* |
-- Requests that are currently in SQL CLR SELECT session_id, request_id,
start_time, status, command, database_id, wait_type, wait_time,
last_wait_type, wait_resource, cpu_time, total_elapsed_time, nest_level,
executing_managed_code FROM sys.dm_exec_requests |
-- Query performance and time
spent in SQL CLR. SELECT |
-- Obtaining CLR Execution performance counter values. SELECT object_name, counter_name, cntr_value, cntr_type FROM sys.dm_os_performance_counters WHERE counter_name LIKE '%CLR%' |
SQL Server 2005 and Visual Studio 2005 are tightly
integrated, with the SQL Server tools addressing the needs of the DBA and
Business Intelligence communities and the Visual Studio Integrated Development
Environment targeted at the professional developer whether they be working with
client, web, middle-tier or database software systems.
This functional separation points to the roles that the DBA
and the developer play in the debugging process:
-
The
DBA uses SQL Server Profiler against test SQL Servers to trace database
activity
-
The
developer uses the Visual Studio debugger to trace code execution as well as SQL
Server Profiler on their personal SQL Server (developer edition) to trace
database activity. The debugger enables end-to-end debugging, where execution can
be traced from client, to web server, to business object through to
Transact-SQL and Database Engine .NET Framework Programming API code running in
the SQL Server, opening up the code and its state and thus enabling extremely
productive debugging during development.
This combination of tools enables the DBA to focus on the
operation of the SQL Server, understanding and optimizing the query workload
(as with all previous versions of SQL Server); and provides the developer with
a set of tools that enable rapid application development.
Productivity Tip
The debugging
environment for the Database Engine .NET Framework Programming API does not
support multiple concurrent debugging, so it is recommended that developers be
equipped with their own development SQL Server instances so as to make full use
of the
This section suggests a few guidelines to aid the DBA in
troubleshooting and reviewing code written to the Database Engine .NET
Framework Programming API. This is not a complete list, it is recommended that
development and DBA teams get together to agree on their own standards (current
Transact-SQL standards are a good starting point) to ensure agreement on what
makes a good versus a bad program.
Ø
Lack
of set based operations (loops within loops)
Developers who are unfamiliar with SQL Server syntax have been known to replace
JOIN clauses and sub SELECT clauses with complex nested cursors; there are
times when this is justified (where each row of the outer table controls which
other tables are required for further processing, itself a potential design
problem J). This code usually looks like:
getdata()
foreach (datarow in datatable)
getdata()
foreach (datarow in
datatable2)
etc
and is usually replaced with a simple
This behavior is easily captured in SQL Server Profiler as the inner loops can
generate large numbers of query bursts against the server.
Ø
Cpu
intensive operations in the clr
Extremely complex logic can be
Provided the usage suggestions in earlier sections of this document are
observed then this should not be a major issue (and the migration of function
to the middle-tier is a fairly trivial coding task).
Ø
Instantiating
large objects in memory
Transact-SQL in SQL Server 2000 does not offer many ways to create large
objects, probably the worst being the ability to pass large IMAGE or N/TEXT
parameters into a stored procedure (most spool into the TempDB as @TABLE
variables or #TEMP tables); this is not so in the SQL Server 2005 .NET
Framework runtime environment where
Solutions to this problem are similar to those that would be observed in the
middle-tier:
1.
Use
paging to get subsets of data into memory
2.
Ensure
that data is passed by reference rather than by value
3.
Move
aggregated data around rather than detail-level data
Ø
Error
Handling
The Database Engine .NET Framework Programming API supports state of the art
error handling from the various supported languages, so there is simply no
excuse for poor error handling – any external operation such as database, file
system or web service access, or operations that risk overflow/underflow
exceptions should be wrapped in well structured error handling code.
Errors can be passed back to SQL Server by executing the RAISERROR command with
SQLContext.Pipe.ExecuteAndSend(). Note that Transact-SQL error handling is
considerably improved in SQL Server 2005 so that developers should be able to
handle RAISERRORS from their code.
Raising an exception is safer than returning an error code as many developers
can forget to check the @@ERROR object after a call to generic code.
Ø
Unsafe
Code Issues
The DBA should be extremely reluctant to register code as UNSAFE, especially in
shared instances, as it allows the developer to call unmanaged code, which in
turn can perform “unverified” tasks (in the other code security modes SQL
Server looks into the assembly that is being registered and ensures that only
authorized classes and calls are being made)
Unsafe code and Unmanaged code has full access and runs at a high privilege
level so it should be fully code reviewed before deployment. Particular
attention should be paid to all parameters (are they being validated and
verified?) and to any connections back to the SQL Server to ensure that
transaction boundaries are being respected so that the system is not left in a
partially updated state. Finally the code should be checked to ensure that it
doesn’t display error message dialogs (which obviously make no sense on the
server)
Ø
User
Defined Type Issues
User Defined Types should be used with caution, as well as the 8KB size
limitation discussed in earlier sections they have other limitations:
1.
They
are read/written as an atomic object
2.
The
implementation code is schema bound once in use, so the cost of change to an
existing type is potentially high
SQL Server 2005 hosts the version of the .NET Framework
runtime environment that simultaneously shipped with
The following table can be used by application developers to
test the version of the .NET Framework in use by SQL Server (note that this
query will return an empty string if the CLR is not activated) – generally the
major version is all that will need to be checked unless the application
requires features introduced by minor releases:
-- Will
return the version if the .NET Framework has been used SELECT p.[value] -- Will
return the version even if the .NET Framework is unused SELECT lm.product_version |
At the time of writing this whitepaper Windows processes can
only host a single version of the .NET Framework runtime environment, and while
this is not a certainty for future releases it is possible that a future
version of SQL Server will:
Ø Require code recompilation –
ensuring that the DBA’s copy of source codes are kept in synch with the code
loaded into the SQL Server will permit easier recompilation.
Ø Require code modification – this
will be a certainty to take advantage of new API and added options to existing API,
and a possibility if existing API/options are deprecated. Management of source
and an awareness of API usage will go a long way to allowing accurate impact
analysis and change costing.
Ø Require more than one .NET Framework
runtime to be loaded if “down level” code is detected in the SQL Server – this
could lead to better investment protection at the expense of performance so
ensure that the tradeoff is understood.
The DBA who ensures that their company’s source code is
managed; that developer API
This whitepaper has looked at the SQL Server 2005 .NET Common Language Runtime integration through the eyes of the conservative database administrator.
The sections above have:
Ø Proposed processes and procedures for ensuring safe and successful deployment of this new capability
Ø Provided a number of catalog and dynamic management view queries aimed at opening up the database for the DBA, with easy access to lists of objects and information about the state of the system
Ø Suggested the toolset required by the DBA and their developers in order to maximize their productivity and drilled into debugging and tracing.
Care has been taken to provide examples of situations where using
this new capability might not be appropriate – the saying that a hammer regards
every problem as a nail applies here, it is important that the richness of SQL
Server 2005 CLR integration be put to best purpose rather than for every
purpose.
The most important lesson to be learned from this paper is that the CLR integration provides a new and powerful toolkit for application development with many scenarios where it can add enormous productivity and unlock new options for serious enterprise systems.
Free
Resources on Microsoft.com, MSDN and
MSDN SQL
Server Developer Center
MSDN Whitepaper: An Overview of SQL Server 2005 for the Database Developer
MSDN Whitepaper: Processing XML Showplans Using SQLCLR in SQL Server 2005
MSDN Whitepaper: Using CLR Integration in SQL Server 2005
MSDN Whitepaper: XML Support in Microsoft SQL Server 2005
MSDN Whitepaper: XML Options in Microsoft SQL Server 2005
MSDN Whitepaper: What's
New in FOR XML in Microsoft SQL Server 2005
MSDN Whitepaper: XML Best Practices for Microsoft SQL Server 2005
MSDN Whitepaper: Usage Scenarios for SQL Server 2005 Native Web Services
MSDN Whitepaper: Managed Data Access Inside SQL Server with ADO.NET and SQLCLR
Microsoft SQL Server TechCenter on TechNet
Sample Book Chapters for SQL Server 2005 is a list of chapters posted from a variety of authors for books related to SQL Server 2005.
Hosting the .NET Runtime in Microsoft SQL Server on the Association for Computing Machinery (www.ACM.org). To access this article you need membership in SIGMOD, the ACM, or you can purchase just this article for download.
Service Oriented Database Architecture by David Campbell, also on the Association for Computing Machinery (www.ACM.org). To access this article you need membership in SIGMOD, the ACM, or you can purchase just this article for download.