SQL Server Profiler and LINQ to SQL

If you’re using LINQ to SQL, you’re eventually likely to be required to trace SQL statements from the SQL Server Profiler. Either for performance purposes or for debugging purposes. When you do, you’ll find that you’re in a world of hurt!

Below, I’ll demonstrate what I’m using and which may not be perfect, but it’s a huge step in the right direction!

A world of hurt

So, you want to trace a SQL statement back to the LINQ code that generated it? You’re really in it now! The reason is that when you execute LINQ to SQL statements, they can generate any number of SQL statements and it’s near impossible to trace them back to the individual statements that generated them. Especially on a busy production server where lots and lots of stuff is happening in the profiler

Even worse, you’ll typically have DAL (Data Access Layer) methods that are similar to how you’d use stored procedures previously, that execute a number of LINQ to SQL statements. So the offending SQL might be a tiny part of a large set of SQL statements originating from a particular DAL method.

What I’d like

I’d love to have a method for tracing every single LINQ to SQL statement from SQL Server Profiler back to my code in Visual Studio. But I haven’t been able to find a method for doing this.

What I’ve got out of the box

Basically, I have a long long long list of SQL statements in a log that I have no chance of matching up with the actual code.

What I’ve come up with

I’ve been able to add a line in the SQL Server Profiler logs as each DAL method starts and when it ends – that way I’m able to determine which DAL method a particular SQL Statements belongs to.

The naive way

Now, this can be done relatively straight forward, we’ll simply add a comment at the start and at the end of the DAL method – but this makes our code messy and hard to read. Especially if we get fancy and do a try/catch to make our code safer.

This is what the code looks to start out with (we’re using a data context factory);

        private static List<Employee> GetEmployees()
        {
            using (MyDataClassesDataContext context = 
new MyDataClassesDataContext())
            {
                return
                    context
                        .Employees
                        .Where(employee => employee.LastName.StartsWith("a"))
                        .ToList();
            }
        }

 

Adding some logging;

        private static List<Employee> GetEmployees()
        {
            using (MyDataClassesDataContext context = 
new MyDataClassesDataContext())
            {
                context.ExecuteCommand("BeginCall: GetEmployees");
                List<Employee> employees =
                    context
                        .Employees
                        .Where(employee => employee.LastName.StartsWith("a"))
                        .ToList();
                context.ExecuteCommand("EndCall: GetEmployees");
                return employees;
            }
        }

 

That

  • looks pretty bad
  • we had to refactor the code to use a temporary variable – which is ok if you need it but fairly pointless in this case
  • we must add the logging code to every DAL method!
  • the risks of missing one or a few are fairly large
  • we want to be safe against refactoring (the method changes name)
  • if the call fails with an exception, the exception will prevent the “EndCall” text from ever making it to the log.

What to do?

A cleverer way

Instead we add BeginCall every time the context is created and EndCall every time dispose is called, which allows us to structure our methods the same way as before and forces us to add the logging code. Also, instead of sending the actual name of the method, we use reflection and send MethodInfo – that makes it refactoring safe.

This is what the method call looks like – note that the logging part is invisible, the only change from the original method is the call to GetCurrentMethod

        private static List<Employee> GetEmployees()
        {
            using (MyDataClassesDataContext context =
new MyDataClassesDataContext(MethodBase.GetCurrentMethod()))
            {
                return
                    context
                        .Employees
                        .Where(employee => employee.LastName.StartsWith("a"))
                        .ToList();
            }
        }

 

Next we need to log stuff, MyDataClassesDataContext is a class that’s autogenerated by the DBML editor (as used with LINQ to SQL), but it’s partial so adding stuff is fairly straight forward.

The tricky part is that DataContext, which MyDataClassesDataContext  inherits from, implements Dispose and if we try to override the Dispose method of MyDataClassesDataContext, we’ll not be able to log anything, because that method is called once the context is allready disposed!

But not to worry, we simply re-implement Dispose and call the base method once we’ve logged what we need;

using System;
using System.Reflection;

namespace TraceLINQToSQL
{
    partial class MyDataClassesDataContext : IDisposable
    {
        private readonly string _methodName;

        public MyDataClassesDataContext(MethodBase methodBase)
            : this()
        {
            _methodName = methodBase.DeclaringType.FullName + "." + methodBase.Name;
            AddLogEntry("BeginCall Method={0}", _methodName);
        }

        public static bool LoggingEnabled { get; set; }

        public new void Dispose()
        {
            AddLogEntry("EndCall Method={0}", _methodName);
            base.Dispose();
        }

        private void AddLogEntry(string paramString, params object[] args)
        {
            if (LoggingEnabled)
            {
                ExecuteCommand("--" + string.Format(paramString, args));
            }
        }
    }
}

 

With this, every DAL method should start logging every call so that traceability is greatly improved. However, the cost is two extra roundtrips to the database per DAL method, so it might be a good idea to turn off logging for production. But in my tests, the overhead was negligible.

Going forward

From this, things could be greatly improved with a tool; a tool that groups all SQL statements from a single DAL call and sum up roundtrips/reads/writes/duration/CPU usage etc.

A method that counts how many times each DAL method is called. I haven’t had the time to create such a tool, but if you know of one, please let us know!

Advertisements

LINQ to SQL, Lambda, Generics and Readability

In my previous (and perhaps still ongoing) article series about LINQ to SQL Performance Optimization I demonstrated code that builds up a LINQ to SQL hierarchy using parts that had been loaded unconnected.

The code to hook up the hierarchy looked a bit messy, so I decided to try to clean it up. What I came up with was much cleaner. But is it more readable? I’m not so sure, my point with this post is this, be careful with what clever methods you introduce, because in the end, they might not be more readable than doing it the hard way. And let’s face it, readability is more important than brevity… at least sometimes…

The brief version

Note that with the supporting methods, this method is more verbose – but the supporting methods could be re-used over and over again…

The actual method that performs the assembling;

        public static List<Supplier> AssembleHierarchyTight(List<Supplier> suppliers, List<Product> products, List<Category> categories, List<Order_Detail> orderDetails, List<Order> orders, List<Shipper> shippers, List<Customer> customers)
        {
            LoadFrom(suppliers, supplier => supplier.Products, (supplier, product) => supplier.SupplierID == product.ProductID, products);
            LoadFrom(products, supplier => supplier.Category, (supplier, category) => supplier.CategoryID == category.CategoryID, categories);
            LoadFrom(products, supplier => supplier.Order_Details, (supplier, orderDetail) => supplier.ProductID == orderDetail.ProductID, orderDetails);
            LoadFrom(orderDetails, detail => detail.Order, (detail, order) => detail.OrderID == order.OrderID, orders);
            LoadFrom(orders, order => order.Shipper, (order, shipper) => order.ShipVia == shipper.ShipperID, shippers);
            LoadFrom(orders, order => order.Customer, (order, customer) => order.CustomerID == customer.CustomerID, customers);           

            return suppliers;
        }

The support methods;

        public static void LoadFrom<TDestination, TTarget>(
            IEnumerable<TDestination> destinations,
            Expression<Func<TDestination, EntitySet<TTarget>>> expression,
            Func<TDestination, TTarget, bool> comparator,
            IEnumerable<TTarget> sources)
            where TTarget : class
        {
            PropertyInfo property = TypeHelper.FindProperty(expression);
            List<TTarget> sourceList = sources.ToList();
            foreach (TDestination destination in destinations)
            {
                List<TTarget> filtered =
                    sourceList
                        .Where(source => comparator(destination, source))
                        .ToList();

                EntitySet<TTarget> target = (EntitySet<TTarget>)property.GetValue(destination, _emptyIndex);
                target.SetSource(filtered);
            }
        }

        public static void LoadFrom<TDestination, TTarget>(
            IEnumerable<TDestination> destinations,
            Expression<Func<TDestination, TTarget>> expression,
            Func<TDestination, TTarget, bool> comparator,
            IEnumerable<TTarget> sources)
            where TTarget : class
        {
            PropertyInfo property = TypeHelper.FindProperty(expression);
            List<TTarget> sourceList = sources.ToList();
            foreach (TDestination destination in destinations)
            {
                TTarget filtered =
                    sourceList
                        .SingleOrDefault(source => comparator(destination, source));

                property.SetValue(destination, filtered, _emptyIndex);
            }

The verbose version

This method doesn’t require any support methods;

        public static List<Supplier> AssembleHierarchyVerbose(List<Supplier> suppliers, List<Product> products, List<Category> categories, List<Order_Detail> orderDetails, List<Order> orders, List<Shipper> shippers, List<Customer> customers)
        {
            suppliers.ForEach(
                supplier => supplier.Products.SetSource(products.Where(product => product.SupplierID == supplier.SupplierID)));

            products.ForEach(
                product =>
                {
                    product.Category = categories.Single(category => category.CategoryID == product.CategoryID);
                    product.Order_Details.SetSource(
                        orderDetails.Where(detail => detail.ProductID == product.ProductID));
                });

            orderDetails.ForEach(
                detail =>
                {
                    detail.Order = orders.Single(order => order.OrderID == detail.OrderID);
                });

            orders.ForEach(
                order =>
                {
                    order.Shipper = shippers.Single(shipper => shipper.ShipperID == order.ShipVia);
                    order.Customer = customers.Single(customer => customer.CustomerID == order.CustomerID);
                });

            return suppliers;
        }

LINQ to SQL Performance Optimization, Re-Revisited

What Came Before

In my seemingly endless series of LINQ to SQL Performance Optimization series (see this initial post and this revisiting post), I’ve demonstrated some methods for improving LINQ to SQL loading times for hierarchies of data. This is the data we want to load;

And we want to do a deep load, we want the entire hierarchy. If you need really fast performance, you might need to do a bit of hand tweaking. This article series demonstrates some of the tweaks you might want to try.

The Fastest Method

My fastest method is fairly cumbersome and requires a lot of maintenance.

Basically, you must;

  • Create a stored procedure that retrieves all the data your require (I’ve somewhat automated the creation of the stored procedure, but you would still need to execute this by hand as a manual development step, which is bad)
  • A method for executing the query and loading all the data it delivers
  • A method that assembles the data from the bullet above to the appropriate hierarchy

If you add a new field/change an existing field for an object, you need to update the dbml file and the stored procedure, as typically with LINQ to SQL, you’d only need to update the dbml file.

There is another option, that isn’t as fast as my fastest method, but the maintenance requirements are lower and it’s easier to keep up to date. This is probably a better method than the Full Load version in the vast majority of cases.

The Last (But One) Method

So, tell me tell me, what is this clever new method? Well, we can load the required data using only LINQ instead of using a stored procedure. This means that we can do away with maintaining the stored procedure. The downside is that it generates more queries and is slightly slower.

Basically, you must;

  • Create a method that loads all the datasets you need to assemble your hierarchy – this is written in LINQ as opposed to the Stored Procedure in the previous example
  • A method that assembles the data from the bullet above to the appropriate hierarchy

The reason I call this the last but one method is because we can either run the LINQ queries as deferred execution or immediate execution, and the difference is significant. Which one is faster will depend on your particular situation, but in my case, deferred execution was considerably faster. I’m guessing it will most often be faster.

Deferred execution looks like this in C# (this only retrieves part of the hierarchy, suppliers=>products=>order_details);

            IEnumerable<Supplier> suppliers = context.Suppliers;
            IEnumerable<Product> products =
                context
                    .Products
                    .Join(
                    suppliers,
                    product => product.SupplierID,
                    supplier => supplier.SupplierID,
                    (product, supplier) => product)
                    .Distinct();

            IEnumerable<Order_Detail> orderDetails =
                context
                    .Order_Details
                    .Join(
                    products,
                    detail => detail.ProductID,
                    product => product.ProductID,
                    (child, parent) => child)
                    .Distinct();

And the generated SQL looks like this;

SELECT [t0].[SupplierID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax], [t0].[HomePage]
FROM [dbo].[Suppliers] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT DISTINCT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
INNER JOIN [dbo].[Suppliers] AS [t1] ON [t0].[SupplierID] = ([t1].[SupplierID])
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT DISTINCT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity], [t0].[Discount]
FROM [dbo].[Order Details] AS [t0]
INNER JOIN (
    SELECT DISTINCT [t1].[ProductID], [t1].[ProductName], [t1].[SupplierID], [t1].[CategoryID], [t1].[QuantityPerUnit], [t1].[UnitPrice], [t1].[UnitsInStock], [t1].[UnitsOnOrder], [t1].[ReorderLevel], [t1].[Discontinued]
    FROM [dbo].[Products] AS [t1]
    INNER JOIN [dbo].[Suppliers] AS [t2] ON [t1].[SupplierID] = ([t2].[SupplierID])
    ) AS [t3] ON [t0].[ProductID] = [t3].[ProductID]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

Note how we’re not using any data that’s been returned from the SQL Server, LINQ is building more and more complex SQL statements. But these are nicely optimized by the SQL Server, so we don’t need to worry too much about them.

The immediately executed version looks like this (we execute the query by calling ".ToList()”, which is the immediate part);

            List<Supplier> suppliers = context.Suppliers.ToList();
            List<Product> products =
                context
                    .Products
                    .Where(product => suppliers.Select(supplier => (int?)supplier.SupplierID).Contains(product.SupplierID))
                    .Distinct()
                    .ToList();

            List<Order_Detail> orderDetails =
                context
                    .Order_Details
                    .Where(detail => products.Select(product => product.ProductID).Contains(detail.ProductID))
                    .Distinct()
                    .ToList();

This generates these SQL statements;

SELECT [t0].[SupplierID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax], [t0].[HomePage]
FROM [dbo].[Suppliers] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT DISTINCT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[SupplierID] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, @p28)
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- @p1: Input Int (Size = 0; Prec = 0; Scale = 0) [2]
-- @p2: Input Int (Size = 0; Prec = 0; Scale = 0) [3]
-- @p3: Input Int (Size = 0; Prec = 0; Scale = 0) [4]
-- @p4: Input Int (Size = 0; Prec = 0; Scale = 0) [5]
-- @p5: Input Int (Size = 0; Prec = 0; Scale = 0) [6]
-- @p6: Input Int (Size = 0; Prec = 0; Scale = 0) [7]
...snip...
-- @p23: Input Int (Size = 0; Prec = 0; Scale = 0) [24]
-- @p24: Input Int (Size = 0; Prec = 0; Scale = 0) [25]
-- @p25: Input Int (Size = 0; Prec = 0; Scale = 0) [26]
-- @p26: Input Int (Size = 0; Prec = 0; Scale = 0) [27]
-- @p27: Input Int (Size = 0; Prec = 0; Scale = 0) [28]
-- @p28: Input Int (Size = 0; Prec = 0; Scale = 0) [29]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT DISTINCT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity], [t0].[Discount]
FROM [dbo].[Order Details] AS [t0]
WHERE [t0].[ProductID] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, @p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58, @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66, @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74, @p75, @p76)
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- @p1: Input Int (Size = 0; Prec = 0; Scale = 0) [2]
-- @p2: Input Int (Size = 0; Prec = 0; Scale = 0) [3]
-- @p3: Input Int (Size = 0; Prec = 0; Scale = 0) [4]
-- @p4: Input Int (Size = 0; Prec = 0; Scale = 0) [5]
-- @p5: Input Int (Size = 0; Prec = 0; Scale = 0) [6]
-- @p6: Input Int (Size = 0; Prec = 0; Scale = 0) [7]
...snip...
-- @p65: Input Int (Size = 0; Prec = 0; Scale = 0) [71]
-- @p66: Input Int (Size = 0; Prec = 0; Scale = 0) [72]
-- @p67: Input Int (Size = 0; Prec = 0; Scale = 0) [73]
-- @p68: Input Int (Size = 0; Prec = 0; Scale = 0) [74]
-- @p69: Input Int (Size = 0; Prec = 0; Scale = 0) [75]
-- @p70: Input Int (Size = 0; Prec = 0; Scale = 0) [76]
-- @p71: Input Int (Size = 0; Prec = 0; Scale = 0) [77]
-- @p72: Input Int (Size = 0; Prec = 0; Scale = 0) [82]
-- @p73: Input Int (Size = 0; Prec = 0; Scale = 0) [83]
-- @p74: Input Int (Size = 0; Prec = 0; Scale = 0) [84]
-- @p75: Input Int (Size = 0; Prec = 0; Scale = 0) [85]
-- @p76: Input Int (Size = 0; Prec = 0; Scale = 0) [86]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

The Last (But One) Method : Deferred Execution

Deferred execution takes about 0.191 seconds, it produces 7 queries in and the actual query code comes out to 6kb of data. This is only slightly slower than the Full Load version, but I’m certain that there are examples were the Full Load version will be considerably faster. You’ll find my full code for this method below.

The Last Method : Immediate Execution

Immediate execution takes about 0.4 seconds, it also produces 7 queries – though since each query gets a very large list of parameters, the amount of data is much larger, it comes out to 261kb of data. That’s 43 times more data!! You’ll find my full code for this method below.

Leader Board

Here’s the leader board as it currently stands;

Method Execution time Queries Query text
Full Load 0.137s 1 query 1kb (actually 195 bytes)
LINQ Full Load – Deferred Execution 0.191s 7 queries 6kb
LINQ Full Load – Immediate Execution 0.4s 7 queries 261kb
Smarter Load Options 0.45s 78 queries 114kb
Default/Lazy Load 1.4s 1031 queries 448kb
Naive Load Options 2.4s 31 queries 77kb

The code – and there’s a lot of it!

Assemble the hierarchy;

public partial class NorthwindDataContext
    {
        public static List<Supplier> AssembleHierarchy(List<Supplier> suppliers, List<Product> products, List<Category> categories, List<Order_Detail> orderDetails, List<Order> orders, List<Shipper> shippers, List<Customer> customers)
        {
            suppliers.ForEach(
                supplier => supplier.Products.SetSource(products.Where(product => product.SupplierID == supplier.SupplierID)));

            products.ForEach(
                product =>
                {
                    product.Category = categories.Single(category => category.CategoryID == product.CategoryID);
                    product.Order_Details.SetSource(
                        orderDetails.Where(detail => detail.ProductID == product.ProductID));
                });

            orderDetails.ForEach(
                detail =>
                {
                    detail.Order = orders.Single(order => order.OrderID == detail.OrderID);
                });

            orders.ForEach(
                order =>
                {
                    order.Shipper = shippers.Single(shipper => shipper.ShipperID == order.ShipVia);
                    order.Customer = customers.Single(customer => customer.CustomerID == order.CustomerID);
                });

            return suppliers;
        }
}

Deferred data fetch and hierarchy assembly;

        private static List<Supplier> ExecuteSimpleFullFetchDeferredExecution(NorthwindDataContext context)
        {
            IEnumerable<Supplier> suppliers = context.Suppliers;
            IEnumerable<Product> products =
                context
                    .Products
                    .Join(
                    suppliers,
                    product => product.SupplierID,
                    supplier => supplier.SupplierID,
                    (product, supplier) => product)
                    .Distinct();

            IEnumerable<Category> categories =
                context
                    .Categories
                    .Join(
                    products,
                    category => category.CategoryID,
                    product => product.CategoryID,
                    (category, product) => category)
                    .Distinct();

            IEnumerable<Order_Detail> orderDetails =
                context
                    .Order_Details
                    .Join(
                    products,
                    detail => detail.ProductID,
                    product => product.ProductID,
                    (child, parent) => child)
                    .Distinct();

            IEnumerable<Order> orders =
                context
                    .Orders
                    .Join(
                    orderDetails,
                    detail => detail.OrderID,
                    order => order.OrderID,
                    (child, parent) => child)
                    .Distinct();

            IEnumerable<Shipper> shippers =
                context
                    .Shippers
                    .Join(
                    orders,
                    shipper => shipper.ShipperID,
                    order => order.ShipVia,
                    (child, parent) => child)
                    .Distinct();

            IEnumerable<Customer> customers =
                context
                    .Customers
                    .Join(
                    orders,
                    customer => customer.CustomerID,
                    order => order.CustomerID,
                    (child, parent) => child)
                    .Distinct();

            return
                NorthwindDataContext.AssembleHierarchy(
                    suppliers.ToList(),
                    products.ToList(),
                    categories.ToList(),
                    orderDetails.ToList(),
                    orders.ToList(),
                    shippers.ToList(),
                    customers.ToList());
        }

Immediate data fetch and hierarchy assembly;

        private static List<Supplier> ExecuteSimpleFullFetchImmediateExecution(NorthwindDataContext context)
        {
            List<Supplier> suppliers = context.Suppliers.ToList();
            List<Product> products =
                context
                    .Products
                    .Where(product => suppliers.Select(supplier => (int?)supplier.SupplierID).Contains(product.SupplierID))
                    .Distinct()
                    .ToList();

            List<Category> categories =
                context
                    .Categories
                    .Where(category => products.Select(product => product.CategoryID).Contains(category.CategoryID))
                    .Distinct()
                    .ToList();

            List<Order_Detail> orderDetails =
                context
                    .Order_Details
                    .Where(detail => products.Select(product => product.ProductID).Contains(detail.ProductID))
                    .Distinct()
                    .ToList();

            List<Order> orders =
                context
                    .Orders
                    .Where(order => orderDetails.Select(detail => detail.OrderID).Contains(order.OrderID))
                    .Distinct()
                    .ToList();

            List<Shipper> shippers =
                context
                    .Shippers
                    .Where(shipper => orders.Select(order => order.ShipVia).Contains(shipper.ShipperID))
                    .Distinct()
                    .ToList();

            List<Customer> customers =
                context
                    .Customers
                    .Where(customer => orders.Select(order => order.CustomerID).Contains(customer.CustomerID))
                    .Distinct()
                    .ToList();

            return
                NorthwindDataContext.AssembleHierarchy(
                    suppliers,
                    products,
                    categories,
                    orderDetails,
                    orders,
                    shippers,
                    customers);
        }

LINQ to SQL Performance Optimizing revisited

In my previous blog post LINQ to SQL Performance Optimizing, I demonstrated a few ways of trying to improve performance for loading hierarchies with LINQ to SQL. I’m assuming that these methods apply to Entity Framework and NHibernate also.

What I showed previously

I demonstrated how the default method for getting data (lazy load) can be extremely detrimental to performance because it can execute a very large number of queries against the database.

I also demonstrated that using LoadOptions can be worse than not using LoadOptions, because the results are joined together forcing a huge amount of redundancy into the result set.

Lastly, I demonstrated that using propertly selected LoadOptions, performance can be improved significantly.

I also hinted at a third options (lazy load and load options being the first two options).

Full Load

The third option is to create a stored procedure that does a full load of all the data that you need for your report/hierarchy. A stored procedure can return any number of result sets, and a LINQ to SQL mapping of a stored procedure can retrieve these result sets. Unfortunately, they’re not correctly “connected” to eachother, so we have to do that “by hand”. But the performance is significantly better, in my demo case, it’s twice as fast as the previous champion!

Method Execution time Queries Query text
Default/Lazy Load 1.4 s 1031 queries 448kb
Naive Load Options 2.4 s 31 queries 77kb
Smarter Load Options 0.45 s 78 queries 114kb
Full Load 0.137s 1 query 1kb (actually 195 bytes)

(for details on the other methods, see here)

Actually, the new method is 3.28 times faster!!

It comes at a steep price

  • You must create a stored procedure that retrieves the data you need, and you need to maintain this stored procedure
  • The data you retrieve might be read only for your purposes, you need to test this (it might not be properly attached to the context that spawned it)
  • You must load the data with a custom made data context method
  • You must assemble your hierarchy “by hand”

I’m hoping to at least partially automate this process, but for now, we’re stuck doing it by hand. Below you’ll find the code used for this test. As you’ll see, it’s messy compared to lazy load or load options.

ONLY USE THIS METHOD IF YOU REALLY REALLY NEED THE RAW PERFORMANCE, PERHAPS NOT EVEN THEN!

The Code

Here’s the stored procedure I’m using, it’s based on a northwind database;

CREATE procedure [dbo].[GetSuppliers]
as
begin
  -- Supplier 
  declare @dbo_Suppliers TABLE (
   [SupplierID] Int NOT NULL,
   [CompanyName] NVarChar(40) NOT NULL,
   [ContactName] NVarChar(30),
   [ContactTitle] NVarChar(30),
   [Address] NVarChar(60),
   [City] NVarChar(15),
   [Region] NVarChar(15),
   [PostalCode] NVarChar(10),
   [Country] NVarChar(15),
   [Phone] NVarChar(24),
   [Fax] NVarChar(24),
   [HomePage] NText)


  insert into @dbo_Suppliers
  select  c.[SupplierID], c.[CompanyName], c.[ContactName], c.[ContactTitle], c.[Address], c.[City], c.[Region], c.[PostalCode], c.[Country], c.[Phone], c.[Fax], c.[HomePage] from dbo.Suppliers as c

  select * from @dbo_Suppliers

  -- Product 
  declare @dbo_Products TABLE (
   [ProductID] Int NOT NULL,
   [ProductName] NVarChar(40) NOT NULL,
   [SupplierID] Int,
   [CategoryID] Int,
   [QuantityPerUnit] NVarChar(20),
   [UnitPrice] Money,
   [UnitsInStock] SmallInt,
   [UnitsOnOrder] SmallInt,
   [ReorderLevel] SmallInt,
   [Discontinued] Bit NOT NULL)


  insert into @dbo_Products
  select  c.[ProductID], c.[ProductName], c.[SupplierID], c.[CategoryID], c.[QuantityPerUnit], c.[UnitPrice], c.[UnitsInStock], c.[UnitsOnOrder], c.[ReorderLevel], c.[Discontinued] from dbo.Products as c
  join (select distinct SupplierId from @dbo_Suppliers) as p on  c.SupplierId = p.SupplierId 

  select * from @dbo_Products

  -- Category 
  declare @dbo_Categories TABLE (
   [CategoryID] Int NOT NULL,
   [CategoryName] NVarChar(15) NOT NULL,
   [Description] NText,
   [Picture] Image)


  insert into @dbo_Categories
  select  c.[CategoryID], c.[CategoryName], c.[Description], c.[Picture] from dbo.Categories as c
  join (select distinct CategoryId from @dbo_Products) as p on  c.CategoryId = p.CategoryId 

  select * from @dbo_Categories

  -- Order_Detail 
  declare @dbo_OrderDetails TABLE (
   [OrderID] Int NOT NULL,
   [ProductID] Int NOT NULL,
   [UnitPrice] Money NOT NULL,
   [Quantity] SmallInt NOT NULL,
   [Discount] Real NOT NULL)


  insert into @dbo_OrderDetails
  select  c.[OrderID], c.[ProductID], c.[UnitPrice], c.[Quantity], c.[Discount] from dbo.[Order Details] as c
  join (select distinct ProductId from @dbo_Products) as p on  c.ProductId = p.ProductId 

  select * from @dbo_OrderDetails

  -- Order 
  declare @dbo_Orders TABLE (
   [OrderID] Int NOT NULL,
   [CustomerID] NChar(5),
   [EmployeeID] Int,
   [OrderDate] DateTime,
   [RequiredDate] DateTime,
   [ShippedDate] DateTime,
   [ShipVia] Int,
   [Freight] Money,
   [ShipName] NVarChar(40),
   [ShipAddress] NVarChar(60),
   [ShipCity] NVarChar(15),
   [ShipRegion] NVarChar(15),
   [ShipPostalCode] NVarChar(10),
   [ShipCountry] NVarChar(15))


  insert into @dbo_Orders
  select  c.[OrderID], c.[CustomerID], c.[EmployeeID], c.[OrderDate], c.[RequiredDate], c.[ShippedDate], c.[ShipVia], c.[Freight], c.[ShipName], c.[ShipAddress], c.[ShipCity], c.[ShipRegion], c.[ShipPostalCode], c.[ShipCountry] from dbo.Orders as c
  join (select distinct OrderId from @dbo_OrderDetails) as p on  c.OrderId = p.OrderId 

  select * from @dbo_Orders

  -- Shipper 
  declare @dbo_Shippers TABLE (
   [ShipperID] Int NOT NULL,
   [CompanyName] NVarChar(40) NOT NULL,
   [Phone] NVarChar(24))


  insert into @dbo_Shippers
  select  c.[ShipperID], c.[CompanyName], c.[Phone] from dbo.Shippers as c
  join (select distinct ShipVia from @dbo_Orders) as p on  c.ShipperId = p.ShipVia 

  select * from @dbo_Shippers

  -- Customer 
  declare @dbo_Customers TABLE (
   [CustomerID] NChar(5) NOT NULL,
   [CompanyName] NVarChar(40) NOT NULL,
   [ContactName] NVarChar(30),
   [ContactTitle] NVarChar(30),
   [Address] NVarChar(60),
   [City] NVarChar(15),
   [Region] NVarChar(15),
   [PostalCode] NVarChar(10),
   [Country] NVarChar(15),
   [Phone] NVarChar(24),
   [Fax] NVarChar(24))


  insert into @dbo_Customers
  select  c.[CustomerID], c.[CompanyName], c.[ContactName], c.[ContactTitle], c.[Address], c.[City], c.[Region], c.[PostalCode], c.[Country], c.[Phone], c.[Fax] from dbo.Customers as c
  join (select distinct CustomerId from @dbo_Orders) as p on  c.CustomerId = p.CustomerId 

  select * from @dbo_Customers
end

And here’s the LINQ to SQL code that downloads the data and connects the hierarchy;

using System.Collections.Generic;
using System.Data.Linq;
using System.Data.Linq.Mapping;
using System.Linq;
using System.Reflection;

namespace ConsoleApplication1
{
    public partial class NorthwindDataContext
    {
        public List<Supplier> GetSuppliers()
        {
            return AssembleHierarchy(InternalGetSuppliers());
        }

        private static List<T> ResultOrEmptyList<T>(IMultipleResults multipleResults)
            where T : class
        {
            IEnumerable<T> result = multipleResults.GetResult<T>();
            if (result == null)
            {
                return new List<T>();
            }

            return result.ToList();
        }

        private static List<Supplier> AssembleHierarchy(IMultipleResults multipleResults)
        {

            List<Supplier> suppliers = ResultOrEmptyList<Supplier>(multipleResults);
            List<Product> products = ResultOrEmptyList<Product>(multipleResults);
            List<Category> categories = ResultOrEmptyList<Category>(multipleResults);
            List<Order_Detail> orderDetails = ResultOrEmptyList<Order_Detail>(multipleResults);
            List<Order> orders = ResultOrEmptyList<Order>(multipleResults);
            List<Shipper> shippers = ResultOrEmptyList<Shipper>(multipleResults);
            List<Customer> customers = ResultOrEmptyList<Customer>(multipleResults);

            suppliers.ForEach(
                supplier => supplier.Products.SetSource(products.Where(product => product.SupplierID == supplier.SupplierID)));

            products.ForEach(
                product =>
                {
                    product.Category = categories.Single(category => category.CategoryID == product.CategoryID);
                    product.Order_Details.SetSource(
                        orderDetails.Where(detail => detail.ProductID == product.ProductID));
                });

            orderDetails.ForEach(
                detail =>
                {
                    detail.Order = orders.Single(order => order.OrderID == detail.OrderID);
                });

            orders.ForEach(
                order =>
                {
                    order.Shipper = shippers.Single(shipper => shipper.ShipperID == order.ShipVia);
                    order.Customer = customers.Single(customer => customer.CustomerID == order.CustomerID);
                });

            return suppliers;
        }

        [Function(Name = "dbo.GetSuppliers")]
        [ResultType(typeof(Supplier))]
        [ResultType(typeof(Product))]
        [ResultType(typeof(Category))]
        [ResultType(typeof(Order_Detail))]
        [ResultType(typeof(Order))]
        [ResultType(typeof(Shipper))]
        [ResultType(typeof(Customer))]
        private IMultipleResults InternalGetSuppliers()
        {
            IExecuteResult result = ExecuteMethodCall(this, (MethodInfo)MethodBase.GetCurrentMethod());
            return (IMultipleResults)result.ReturnValue;
        }
    }
}