LINQ to SQL Performance Optimizing

When you load large hierarchies of data with LINQ to SQL (and I’m assuming this goes for NHibernate and Entities Framework also), it’s easy to get a raw deal on performance.

Loading hierarchies of data

Hierarchies are used more often than you might expect, see the Northwind extract below;

 

northwind

Now say you need to create a report that lists suppliers, their products and orders for those products, and some information about the customer and the shipper. What we need is a hierarchial report – not just a single line per item but a complex report with master detail relationships. Basically, we need to load a list of suppliers with their entire hierarchies of data.

I’m using the method below to simulate a report. It’s not a very pretty, nor very useful report. But it accesses the data that we require, making sure it’s been properly loaded;

private static void SimulateReport(List<Supplier> suppliers)
{
    // Force a load the data we "need" for our report.
    StringBuilder sb = new StringBuilder();
    suppliers.ForEach(
        supplier =>
        {
            sb.AppendLine(supplier.CompanyName);
            supplier.Products.ToList().ForEach(
                product =>
                {
                    sb.AppendLine(product.ProductName);
                    sb.AppendLine(product.Category.CategoryName);
                    product.Order_Details.ToList().ForEach(
                        detail =>
                        {
                            sb.AppendLine(detail.Quantity.ToString());
                            sb.AppendLine(detail.Order.OrderID.ToString());
                            sb.AppendLine(detail.Order.Customer.ContactName);
                            sb.AppendLine(detail.Order.Shipper.CompanyName);                                    
                        });
                });
        });

    // Turn this on if you wish to look at the bizarre report
    //Console.WriteLine(sb.ToString());
}

LINQ To SQL, Basic (Lazy Load)

LINQ to SQL will handle this out of the box, we don’t need to do anything “special” to make this work. But it’s extremely “talkative”. Initially, it runs a query to find all the Suppliers in the database – it doesn’t know we’ll be needing more information about the child tables, so it only loads the Suppliers. This trivial method uses this simple method to get the suppliers;

        private static List<Supplier> ExecuteDefault(NorthwindDataContext context)
        {
            return context.Suppliers.ToList();
        }

Next, it loads the products of the supplier – again it only the products because it doesn’t know that we’ll be requesting more information shortly. Then it loads the order details and so on.

Each of these requests is a single roundtrip to the database, and roundtrips are major cost when it comes to database access. Running this report on the entire Nortwind database generates 448k of SQL statements, in 1031 individual select statements! These are run in series, one after the other, so even a 10 ms roundtrip would incur a 10 second delay…

Luckily, my roundtrips are short, this method takes 1,4 seconds.

The queries goes on and on, but it looks something like this;

SELECT [t0].[SupplierID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax], [t0].[HomePage]
FROM [dbo].[Suppliers] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[SupplierID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT [t0].[CategoryID], [t0].[CategoryName], [t0].[Description], [t0].[Picture]
FROM [dbo].[Categories] AS [t0]
WHERE [t0].[CategoryID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity], [t0].[Discount]
FROM [dbo].[Order Details] AS [t0]
WHERE [t0].[ProductID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate], [t0].[RequiredDate], [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight], [t0].[ShipName], [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion], [t0].[ShipPostalCode], [t0].[ShipCountry]
FROM [dbo].[Orders] AS [t0]
WHERE [t0].[OrderID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [10285]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[CustomerID] = @p0
-- @p0: Input NVarChar (Size = 5; Prec = 0; Scale = 0) [QUICK]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

LINQ To SQL, With Load Options

When you use LINQ to SQL, you can specify that “whenever I load this object, I will also require this object, so you might as well load that at the same time”. This can be a huge performance boost, because it allows the SQL Server to optimize the queries and it saves you a bundle on roundtrips! This blog post outlines how Load Options can help boost performance.

It all sounds marvelously clever, right? Alas, it’s no panacea, you might just as easily kill performance using this method. This is how I set up my load options before I load the suppliers to serve the report above;

        private static List<Supplier> ExecuteLoadOptions(NorthwindDataContext context)
        {
            DataLoadOptions loadOptions = new DataLoadOptions();
            loadOptions.LoadWith<Supplier>(supplier => supplier.Products);
            loadOptions.LoadWith<Product>(product => product.Category);
            loadOptions.LoadWith<Product>(product => product.Order_Details);
            loadOptions.LoadWith<Order_Detail>(detil => detil.Order);
            loadOptions.LoadWith<Order>(order => order.Customer);
            loadOptions.LoadWith<Order>(order => order.Shipper);
            context.LoadOptions = loadOptions;
            return context.Suppliers.ToList();
        }

This time, only 30 queries are generated! Yay! And that’s 30 queries for 29 suppliers, so that’s one query to find the suppliers and then one query per supplier!! Must be marvelously fast since we’ve avoided a staggering 1000 roundtrips? Right? Wrong. It’s actually slower.

The logged SQL comes in at 77kb and 31 queries (as opposed to 448k and 1031 queries), but that SQL executes in 2.4 seconds instead of 1.4 seconds.

How can it be slower? The queries look like this;

SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued], [t3].[OrderID], [t3].[ProductID] AS [ProductID2], [t3].[UnitPrice] AS [UnitPrice2], [t3].[Quantity], [t3].[Discount], [t4].[OrderID] AS [OrderID2], [t4].[CustomerID], [t4].[EmployeeID], [t4].[OrderDate], [t4].[RequiredDate], [t4].[ShippedDate], [t4].[ShipVia], [t4].[Freight], [t4].[ShipName], [t4].[ShipAddress], [t4].[ShipCity], [t4].[ShipRegion], [t4].[ShipPostalCode], [t4].[ShipCountry], [t6].[test], [t6].[CustomerID] AS [CustomerID2], [t6].[CompanyName], [t6].[ContactName], [t6].[ContactTitle], [t6].[Address], [t6].[City], [t6].[Region], [t6].[PostalCode], [t6].[Country], [t6].[Phone], [t6].[Fax], [t8].[test] AS [test2], [t8].[ShipperID], [t8].[CompanyName] AS [CompanyName2], [t8].[Phone] AS [Phone2], (
    SELECT COUNT(*)
    FROM [dbo].[Order Details] AS [t9]
    INNER JOIN ([dbo].[Orders] AS [t10]
        LEFT OUTER JOIN [dbo].[Customers] AS [t11] ON [t11].[CustomerID] = [t10].[CustomerID]
        LEFT OUTER JOIN [dbo].[Shippers] AS [t12] ON [t12].[ShipperID] = [t10].[ShipVia]) ON [t10].[OrderID] = [t9].[OrderID]
    WHERE [t9].[ProductID] = [t0].[ProductID]
    ) AS [value], [t2].[test] AS [test3], [t2].[CategoryID] AS [CategoryID2], [t2].[CategoryName], [t2].[Description], [t2].[Picture]
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN (
    SELECT 1 AS [test], [t1].[CategoryID], [t1].[CategoryName], [t1].[Description], [t1].[Picture]
    FROM [dbo].[Categories] AS [t1]
    ) AS [t2] ON [t2].[CategoryID] = [t0].[CategoryID]
LEFT OUTER JOIN ([dbo].[Order Details] AS [t3]
    INNER JOIN ([dbo].[Orders] AS [t4]
        LEFT OUTER JOIN (
            SELECT 1 AS [test], [t5].[CustomerID], [t5].[CompanyName], [t5].[ContactName], [t5].[ContactTitle], [t5].[Address], [t5].[City], [t5].[Region], [t5].[PostalCode], [t5].[Country], [t5].[Phone], [t5].[Fax]
            FROM [dbo].[Customers] AS [t5]
            ) AS [t6] ON [t6].[CustomerID] = [t4].[CustomerID]
        LEFT OUTER JOIN (
            SELECT 1 AS [test], [t7].[ShipperID], [t7].[CompanyName], [t7].[Phone]
            FROM [dbo].[Shippers] AS [t7]
            ) AS [t8] ON [t8].[ShipperID] = [t4].[ShipVia]) ON [t4].[OrderID] = [t3].[OrderID]) ON [t3].[ProductID] = [t0].[ProductID]
WHERE [t0].[SupplierID] = @x1
ORDER BY [t0].[ProductID], [t2].[CategoryID], [t3].[OrderID], [t6].[CustomerID], [t8].[ShipperID]
-- @x1: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

Why is that slower? The query above generates 52 columns and all the queries together generate 2164 rows. That comes out at 112.528 data items – quite a lot of data. What’s worse, 99% of it is duplicated, over and over again, row after row after row!

For a particular supplier, that has say 100 rows, the Supplier details is repeated for each and every one of these rows. We only need it once, but since it’s part of the join, it’s repeated over and over and over again…

In the basic case, the data isn’t repeated so even though there are more queries, the amount of data is way less.

LINQ to SQL, with Smarter Load Options

So, given the problems with load options specified above, we ought to be able to speed the execution up by connecting some of the tables, but not all. That ought to reduce the duplication level at the cost of more roundtrips. And lo and behold, it is faster.

If my first test, I cut the load options at the Product=>Order_Details;

        private static List<Supplier> ExecuteSmarterLoadOptions(NorthwindDataContext context)
        {
            DataLoadOptions loadOptions = new DataLoadOptions();
            loadOptions.LoadWith<Supplier>(supplier => supplier.Products);
            loadOptions.LoadWith<Product>(product => product.Category);
            // loadOptions.LoadWith<Product>(product => product.Order_Details);
            loadOptions.LoadWith<Order_Detail>(detil => detil.Order);
            loadOptions.LoadWith<Order>(order => order.Customer);
            loadOptions.LoadWith<Order>(order => order.Shipper);
            context.LoadOptions = loadOptions;
            return context.Suppliers.ToList();
        }

This executes in 0.45 seconds and generates 114kb of SQL Statements, broken down into 78 separate statements. That’s a major improvement.

I made a number of attempts where I cut the LoadOptions chain at different positions, and the results varied from execution times of 3.4 seconds to 0.45 seconds, depending on where I cut the chain.

A Multi Query Solution

If you need even more performance, there is a way that you can use – but it’s extremely cumbersome and you’ll only want to use it for extreme situations.

In this particular case, you’d create a stored procedure that returned all suppliers, products, order_details, orders, customers and shippers that you needed for your report – in one single query. You wouldn’t use Join, so there would be no duplicates. Instead you’d receive six different result sets from the same stored procedure.

These objects wouldn’t be properly connected as LINQ to SQL entities, so we’d need to assemble the hierarchies “by hand”. I’ll get back to this post further on, demonstrating how to do this.

Read this blog for more details on this method, that turns out to be 3.28 times faster in my example.

Conclusion

Load options are far from a Panacea when it comes to LINQ to SQL, but correctly used, it ca be a big improvement! So be careful with your load options, time your executions properly and make sure you’re using the best load options that you can for your needs!

About mfagerlund
Writes code in my sleep - and sometimes it even compiles!

3 Responses to LINQ to SQL Performance Optimizing

  1. Pingback: LINQ to SQL Performance Optimizing revisited « Mattias Fagerlund's Coding Blog

  2. Pingback: LINQ to SQL Performance Optimization, Re-Revisited « Mattias Fagerlund's Coding Blog

  3. Pingback: 2010 in review « Mattias Fagerlund's Coding Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: