Tuesday, 19 August 2014

Getting the most out of SQL Server: CLR Integration, Part One

So this is the first post in a series I will be doing over the next few months or so that will highlight some aspects of SQL Server I have used in the past few years, why you should consider them and how you can use them. Obviously there have been thousands of articles written about the features I will be talking about, however, I plan to take an in-depth look at these features including scenarios and examples, presenting it in a Q & A format.

The first feature is CLR (common language runtime) integration for CLR programming in SQL Server which was first made available in SQL Server 2005.

What is CLR integration all about?

In a nutshell, CLR integration allows you to create stored procedures, triggers, data types, user-defined scalar, aggregate and table valued functions with any .NET programming language including C#, Visual C++ and Visual Basic.NET. This means you have access to the .NET framework and the power of managed code in SQL Server and your SQL queries.

Why has this feature been made available when we can create those objects using SQL?

SQL Server is an RDBMS which are the kings of data storage and SQL is the king of data querying. In other words, they are designed to do specific tasks optimally although they can perform other tasks, just not as well.

One task that SQL Server can perform but not optimally is string manipulation using functions such as PATINDEX, REPLACE, SUBSTRING, STUFF, and TRIM. Granted, string manipulation functions have been made available, but you are limited as to what you can do compared to the string manipulation functionality available in the .NET framework and .NET programming languages.

Take PATINDEX for example; you can build very simple pattern matching expressions using four wildcard operators, but identifying any complex patterns would involve nesting PATINDEX calls with other string functions available to SQL Server and possibly using loop statements to look at individual characters in a string. Compare that to regular expressions and perhaps you can begin to see the limitations. Also consider that depending on how you use these functions in your SQL statements, your SQL query may not take full advantage of the SQL Server engine since SQL operations work on the basis of set theory, that is to say, it operates on sets of data rather than individual pieces of data.

Why is CLR integration disabled by default, does that mean there are potential security implications?

CLR integration being disabled by default does not stop you or someone with enough privileges from installing CLR objects onto your database, it simply stops them being executed. After reading that you might think that when it is enabled it just opens up your database to malicious use by allowing unrestricted access to CLR objects that have been installed. Hold that thought as there are safeguards in place, two of which are CAS (Code Access Security) and HPA (Host Protection Attribute).

What are CAS and HPA and how do they protect my database when CLR is enabled?

CAS is a security model for managed code (it's not just specific to SQL Server) that determines what resources such as files, networks and the registry on the computer the code can access and what level of access is permitted. There are three places these permissions are defined: the machine, user and host. The first two are quite obvious; the last is the permission set chosen when creating CLR objects on SQL Server. There are three of these sets named SAFE, EXTERNAL_ACCESS and UNSAFE.

SAFE restricts the code from accessing any external resource such as files, the network, environment variables and the registry. In other words, you cannot access anything that is not part of SQL Server. It is the safest or most restrictive permission set as the name suggests.

EXTERNAL_ACCESS has the same permissions as SAFE but with the ability to specify access to resources outside of SQL Server, some of which I named above, but the full list can be found in the CLR Integration Code Access Security article on MSDN. Because you can allow or deny permissions to specific external resources your database is still protected, but there is obviously a little more room for malicious use, especially when allowing access to the internet.

UNSAFE is the least restrictive set as the name suggests and allows unrestricted access to local and external resources as well as the execution of native or unmanaged code. You could write code that performed operations on the SQL Server process if the permissions are set correctly.

Venn diagram depicting Code Access Security policies
Code Access Security Policies (exc. Enterprise)
CAS might seem a little scary if someone with enough privileges could allow CLR objects to execute
unrestricted code with the single keyword UNSAFE, but in reality it is much more secure than that. Even if the CLR objects have UNSAFE permissions, if the machine or user have permissions that deny access to a resource that UNSAFE permissions allows, then those permissions will be used instead. When you run a CLR object, all three sets of permissions are checked, so the final set of permissions can be considered the intersection of the three, that is, machine, user and host as seen in the Venn diagram.

Then you have HPA which is much more explicit than CAS in that it disallows specific bits of code, either types or methods that can be used by the CLR object when it is created using the SAFE or EXTERNAL_ACCESS permissions set. Some examples of disallowed code are the opening of message boxes or accessing peripheral devices. HPA is more than just a security feature as it also assists in the reliability of the system by preventing usage of code that may affect threads and processes which could result in memory leaks for example. A full list of disallowed types and methods can be found in the Host Protection Attributes and CLR Integration Programming article on MSDN.

It is worth noting at this point that as of .NET Framework 4/CLR 4.0 major changes have been made to the CAS model which includes the removal of the policy-based system. However, this particular change does not affect SQL Server users; SQL Server will continue to use the CAS model defined in CLR 2.0 which uses the policy-based system as mentioned in What's New in CLR integration article on MSDN.

In part two I will be providing an example along with files and documentation for enabling regular expressions through CLR objects on your SQL Server.

Stay tuned.