LINQ to SQL, Lambda, Generics and Readability

In my previous (and perhaps still ongoing) article series about LINQ to SQL Performance Optimization I demonstrated code that builds up a LINQ to SQL hierarchy using parts that had been loaded unconnected.

The code to hook up the hierarchy looked a bit messy, so I decided to try to clean it up. What I came up with was much cleaner. But is it more readable? I’m not so sure, my point with this post is this, be careful with what clever methods you introduce, because in the end, they might not be more readable than doing it the hard way. And let’s face it, readability is more important than brevity… at least sometimes…

The brief version

Note that with the supporting methods, this method is more verbose – but the supporting methods could be re-used over and over again…

The actual method that performs the assembling;

        public static List<Supplier> AssembleHierarchyTight(List<Supplier> suppliers, List<Product> products, List<Category> categories, List<Order_Detail> orderDetails, List<Order> orders, List<Shipper> shippers, List<Customer> customers)
        {
            LoadFrom(suppliers, supplier => supplier.Products, (supplier, product) => supplier.SupplierID == product.ProductID, products);
            LoadFrom(products, supplier => supplier.Category, (supplier, category) => supplier.CategoryID == category.CategoryID, categories);
            LoadFrom(products, supplier => supplier.Order_Details, (supplier, orderDetail) => supplier.ProductID == orderDetail.ProductID, orderDetails);
            LoadFrom(orderDetails, detail => detail.Order, (detail, order) => detail.OrderID == order.OrderID, orders);
            LoadFrom(orders, order => order.Shipper, (order, shipper) => order.ShipVia == shipper.ShipperID, shippers);
            LoadFrom(orders, order => order.Customer, (order, customer) => order.CustomerID == customer.CustomerID, customers);           

            return suppliers;
        }

The support methods;

        public static void LoadFrom<TDestination, TTarget>(
            IEnumerable<TDestination> destinations,
            Expression<Func<TDestination, EntitySet<TTarget>>> expression,
            Func<TDestination, TTarget, bool> comparator,
            IEnumerable<TTarget> sources)
            where TTarget : class
        {
            PropertyInfo property = TypeHelper.FindProperty(expression);
            List<TTarget> sourceList = sources.ToList();
            foreach (TDestination destination in destinations)
            {
                List<TTarget> filtered =
                    sourceList
                        .Where(source => comparator(destination, source))
                        .ToList();

                EntitySet<TTarget> target = (EntitySet<TTarget>)property.GetValue(destination, _emptyIndex);
                target.SetSource(filtered);
            }
        }

        public static void LoadFrom<TDestination, TTarget>(
            IEnumerable<TDestination> destinations,
            Expression<Func<TDestination, TTarget>> expression,
            Func<TDestination, TTarget, bool> comparator,
            IEnumerable<TTarget> sources)
            where TTarget : class
        {
            PropertyInfo property = TypeHelper.FindProperty(expression);
            List<TTarget> sourceList = sources.ToList();
            foreach (TDestination destination in destinations)
            {
                TTarget filtered =
                    sourceList
                        .SingleOrDefault(source => comparator(destination, source));

                property.SetValue(destination, filtered, _emptyIndex);
            }

The verbose version

This method doesn’t require any support methods;

        public static List<Supplier> AssembleHierarchyVerbose(List<Supplier> suppliers, List<Product> products, List<Category> categories, List<Order_Detail> orderDetails, List<Order> orders, List<Shipper> shippers, List<Customer> customers)
        {
            suppliers.ForEach(
                supplier => supplier.Products.SetSource(products.Where(product => product.SupplierID == supplier.SupplierID)));

            products.ForEach(
                product =>
                {
                    product.Category = categories.Single(category => category.CategoryID == product.CategoryID);
                    product.Order_Details.SetSource(
                        orderDetails.Where(detail => detail.ProductID == product.ProductID));
                });

            orderDetails.ForEach(
                detail =>
                {
                    detail.Order = orders.Single(order => order.OrderID == detail.OrderID);
                });

            orders.ForEach(
                order =>
                {
                    order.Shipper = shippers.Single(shipper => shipper.ShipperID == order.ShipVia);
                    order.Customer = customers.Single(customer => customer.CustomerID == order.CustomerID);
                });

            return suppliers;
        }
Advertisements

LINQ to SQL Performance Optimization, Re-Revisited

What Came Before

In my seemingly endless series of LINQ to SQL Performance Optimization series (see this initial post and this revisiting post), I’ve demonstrated some methods for improving LINQ to SQL loading times for hierarchies of data. This is the data we want to load;

And we want to do a deep load, we want the entire hierarchy. If you need really fast performance, you might need to do a bit of hand tweaking. This article series demonstrates some of the tweaks you might want to try.

The Fastest Method

My fastest method is fairly cumbersome and requires a lot of maintenance.

Basically, you must;

  • Create a stored procedure that retrieves all the data your require (I’ve somewhat automated the creation of the stored procedure, but you would still need to execute this by hand as a manual development step, which is bad)
  • A method for executing the query and loading all the data it delivers
  • A method that assembles the data from the bullet above to the appropriate hierarchy

If you add a new field/change an existing field for an object, you need to update the dbml file and the stored procedure, as typically with LINQ to SQL, you’d only need to update the dbml file.

There is another option, that isn’t as fast as my fastest method, but the maintenance requirements are lower and it’s easier to keep up to date. This is probably a better method than the Full Load version in the vast majority of cases.

The Last (But One) Method

So, tell me tell me, what is this clever new method? Well, we can load the required data using only LINQ instead of using a stored procedure. This means that we can do away with maintaining the stored procedure. The downside is that it generates more queries and is slightly slower.

Basically, you must;

  • Create a method that loads all the datasets you need to assemble your hierarchy – this is written in LINQ as opposed to the Stored Procedure in the previous example
  • A method that assembles the data from the bullet above to the appropriate hierarchy

The reason I call this the last but one method is because we can either run the LINQ queries as deferred execution or immediate execution, and the difference is significant. Which one is faster will depend on your particular situation, but in my case, deferred execution was considerably faster. I’m guessing it will most often be faster.

Deferred execution looks like this in C# (this only retrieves part of the hierarchy, suppliers=>products=>order_details);

            IEnumerable<Supplier> suppliers = context.Suppliers;
            IEnumerable<Product> products =
                context
                    .Products
                    .Join(
                    suppliers,
                    product => product.SupplierID,
                    supplier => supplier.SupplierID,
                    (product, supplier) => product)
                    .Distinct();

            IEnumerable<Order_Detail> orderDetails =
                context
                    .Order_Details
                    .Join(
                    products,
                    detail => detail.ProductID,
                    product => product.ProductID,
                    (child, parent) => child)
                    .Distinct();

And the generated SQL looks like this;

SELECT [t0].[SupplierID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax], [t0].[HomePage]
FROM [dbo].[Suppliers] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT DISTINCT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
INNER JOIN [dbo].[Suppliers] AS [t1] ON [t0].[SupplierID] = ([t1].[SupplierID])
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT DISTINCT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity], [t0].[Discount]
FROM [dbo].[Order Details] AS [t0]
INNER JOIN (
    SELECT DISTINCT [t1].[ProductID], [t1].[ProductName], [t1].[SupplierID], [t1].[CategoryID], [t1].[QuantityPerUnit], [t1].[UnitPrice], [t1].[UnitsInStock], [t1].[UnitsOnOrder], [t1].[ReorderLevel], [t1].[Discontinued]
    FROM [dbo].[Products] AS [t1]
    INNER JOIN [dbo].[Suppliers] AS [t2] ON [t1].[SupplierID] = ([t2].[SupplierID])
    ) AS [t3] ON [t0].[ProductID] = [t3].[ProductID]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

Note how we’re not using any data that’s been returned from the SQL Server, LINQ is building more and more complex SQL statements. But these are nicely optimized by the SQL Server, so we don’t need to worry too much about them.

The immediately executed version looks like this (we execute the query by calling ".ToList()”, which is the immediate part);

            List<Supplier> suppliers = context.Suppliers.ToList();
            List<Product> products =
                context
                    .Products
                    .Where(product => suppliers.Select(supplier => (int?)supplier.SupplierID).Contains(product.SupplierID))
                    .Distinct()
                    .ToList();

            List<Order_Detail> orderDetails =
                context
                    .Order_Details
                    .Where(detail => products.Select(product => product.ProductID).Contains(detail.ProductID))
                    .Distinct()
                    .ToList();

This generates these SQL statements;

SELECT [t0].[SupplierID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax], [t0].[HomePage]
FROM [dbo].[Suppliers] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT DISTINCT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[SupplierID] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, @p28)
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- @p1: Input Int (Size = 0; Prec = 0; Scale = 0) [2]
-- @p2: Input Int (Size = 0; Prec = 0; Scale = 0) [3]
-- @p3: Input Int (Size = 0; Prec = 0; Scale = 0) [4]
-- @p4: Input Int (Size = 0; Prec = 0; Scale = 0) [5]
-- @p5: Input Int (Size = 0; Prec = 0; Scale = 0) [6]
-- @p6: Input Int (Size = 0; Prec = 0; Scale = 0) [7]
...snip...
-- @p23: Input Int (Size = 0; Prec = 0; Scale = 0) [24]
-- @p24: Input Int (Size = 0; Prec = 0; Scale = 0) [25]
-- @p25: Input Int (Size = 0; Prec = 0; Scale = 0) [26]
-- @p26: Input Int (Size = 0; Prec = 0; Scale = 0) [27]
-- @p27: Input Int (Size = 0; Prec = 0; Scale = 0) [28]
-- @p28: Input Int (Size = 0; Prec = 0; Scale = 0) [29]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT DISTINCT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity], [t0].[Discount]
FROM [dbo].[Order Details] AS [t0]
WHERE [t0].[ProductID] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, @p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58, @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66, @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74, @p75, @p76)
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- @p1: Input Int (Size = 0; Prec = 0; Scale = 0) [2]
-- @p2: Input Int (Size = 0; Prec = 0; Scale = 0) [3]
-- @p3: Input Int (Size = 0; Prec = 0; Scale = 0) [4]
-- @p4: Input Int (Size = 0; Prec = 0; Scale = 0) [5]
-- @p5: Input Int (Size = 0; Prec = 0; Scale = 0) [6]
-- @p6: Input Int (Size = 0; Prec = 0; Scale = 0) [7]
...snip...
-- @p65: Input Int (Size = 0; Prec = 0; Scale = 0) [71]
-- @p66: Input Int (Size = 0; Prec = 0; Scale = 0) [72]
-- @p67: Input Int (Size = 0; Prec = 0; Scale = 0) [73]
-- @p68: Input Int (Size = 0; Prec = 0; Scale = 0) [74]
-- @p69: Input Int (Size = 0; Prec = 0; Scale = 0) [75]
-- @p70: Input Int (Size = 0; Prec = 0; Scale = 0) [76]
-- @p71: Input Int (Size = 0; Prec = 0; Scale = 0) [77]
-- @p72: Input Int (Size = 0; Prec = 0; Scale = 0) [82]
-- @p73: Input Int (Size = 0; Prec = 0; Scale = 0) [83]
-- @p74: Input Int (Size = 0; Prec = 0; Scale = 0) [84]
-- @p75: Input Int (Size = 0; Prec = 0; Scale = 0) [85]
-- @p76: Input Int (Size = 0; Prec = 0; Scale = 0) [86]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

The Last (But One) Method : Deferred Execution

Deferred execution takes about 0.191 seconds, it produces 7 queries in and the actual query code comes out to 6kb of data. This is only slightly slower than the Full Load version, but I’m certain that there are examples were the Full Load version will be considerably faster. You’ll find my full code for this method below.

The Last Method : Immediate Execution

Immediate execution takes about 0.4 seconds, it also produces 7 queries – though since each query gets a very large list of parameters, the amount of data is much larger, it comes out to 261kb of data. That’s 43 times more data!! You’ll find my full code for this method below.

Leader Board

Here’s the leader board as it currently stands;

Method Execution time Queries Query text
Full Load 0.137s 1 query 1kb (actually 195 bytes)
LINQ Full Load – Deferred Execution 0.191s 7 queries 6kb
LINQ Full Load – Immediate Execution 0.4s 7 queries 261kb
Smarter Load Options 0.45s 78 queries 114kb
Default/Lazy Load 1.4s 1031 queries 448kb
Naive Load Options 2.4s 31 queries 77kb

The code – and there’s a lot of it!

Assemble the hierarchy;

public partial class NorthwindDataContext
    {
        public static List<Supplier> AssembleHierarchy(List<Supplier> suppliers, List<Product> products, List<Category> categories, List<Order_Detail> orderDetails, List<Order> orders, List<Shipper> shippers, List<Customer> customers)
        {
            suppliers.ForEach(
                supplier => supplier.Products.SetSource(products.Where(product => product.SupplierID == supplier.SupplierID)));

            products.ForEach(
                product =>
                {
                    product.Category = categories.Single(category => category.CategoryID == product.CategoryID);
                    product.Order_Details.SetSource(
                        orderDetails.Where(detail => detail.ProductID == product.ProductID));
                });

            orderDetails.ForEach(
                detail =>
                {
                    detail.Order = orders.Single(order => order.OrderID == detail.OrderID);
                });

            orders.ForEach(
                order =>
                {
                    order.Shipper = shippers.Single(shipper => shipper.ShipperID == order.ShipVia);
                    order.Customer = customers.Single(customer => customer.CustomerID == order.CustomerID);
                });

            return suppliers;
        }
}

Deferred data fetch and hierarchy assembly;

        private static List<Supplier> ExecuteSimpleFullFetchDeferredExecution(NorthwindDataContext context)
        {
            IEnumerable<Supplier> suppliers = context.Suppliers;
            IEnumerable<Product> products =
                context
                    .Products
                    .Join(
                    suppliers,
                    product => product.SupplierID,
                    supplier => supplier.SupplierID,
                    (product, supplier) => product)
                    .Distinct();

            IEnumerable<Category> categories =
                context
                    .Categories
                    .Join(
                    products,
                    category => category.CategoryID,
                    product => product.CategoryID,
                    (category, product) => category)
                    .Distinct();

            IEnumerable<Order_Detail> orderDetails =
                context
                    .Order_Details
                    .Join(
                    products,
                    detail => detail.ProductID,
                    product => product.ProductID,
                    (child, parent) => child)
                    .Distinct();

            IEnumerable<Order> orders =
                context
                    .Orders
                    .Join(
                    orderDetails,
                    detail => detail.OrderID,
                    order => order.OrderID,
                    (child, parent) => child)
                    .Distinct();

            IEnumerable<Shipper> shippers =
                context
                    .Shippers
                    .Join(
                    orders,
                    shipper => shipper.ShipperID,
                    order => order.ShipVia,
                    (child, parent) => child)
                    .Distinct();

            IEnumerable<Customer> customers =
                context
                    .Customers
                    .Join(
                    orders,
                    customer => customer.CustomerID,
                    order => order.CustomerID,
                    (child, parent) => child)
                    .Distinct();

            return
                NorthwindDataContext.AssembleHierarchy(
                    suppliers.ToList(),
                    products.ToList(),
                    categories.ToList(),
                    orderDetails.ToList(),
                    orders.ToList(),
                    shippers.ToList(),
                    customers.ToList());
        }

Immediate data fetch and hierarchy assembly;

        private static List<Supplier> ExecuteSimpleFullFetchImmediateExecution(NorthwindDataContext context)
        {
            List<Supplier> suppliers = context.Suppliers.ToList();
            List<Product> products =
                context
                    .Products
                    .Where(product => suppliers.Select(supplier => (int?)supplier.SupplierID).Contains(product.SupplierID))
                    .Distinct()
                    .ToList();

            List<Category> categories =
                context
                    .Categories
                    .Where(category => products.Select(product => product.CategoryID).Contains(category.CategoryID))
                    .Distinct()
                    .ToList();

            List<Order_Detail> orderDetails =
                context
                    .Order_Details
                    .Where(detail => products.Select(product => product.ProductID).Contains(detail.ProductID))
                    .Distinct()
                    .ToList();

            List<Order> orders =
                context
                    .Orders
                    .Where(order => orderDetails.Select(detail => detail.OrderID).Contains(order.OrderID))
                    .Distinct()
                    .ToList();

            List<Shipper> shippers =
                context
                    .Shippers
                    .Where(shipper => orders.Select(order => order.ShipVia).Contains(shipper.ShipperID))
                    .Distinct()
                    .ToList();

            List<Customer> customers =
                context
                    .Customers
                    .Where(customer => orders.Select(order => order.CustomerID).Contains(customer.CustomerID))
                    .Distinct()
                    .ToList();

            return
                NorthwindDataContext.AssembleHierarchy(
                    suppliers,
                    products,
                    categories,
                    orderDetails,
                    orders,
                    shippers,
                    customers);
        }

LINQ to SQL Performance Optimizing revisited

In my previous blog post LINQ to SQL Performance Optimizing, I demonstrated a few ways of trying to improve performance for loading hierarchies with LINQ to SQL. I’m assuming that these methods apply to Entity Framework and NHibernate also.

What I showed previously

I demonstrated how the default method for getting data (lazy load) can be extremely detrimental to performance because it can execute a very large number of queries against the database.

I also demonstrated that using LoadOptions can be worse than not using LoadOptions, because the results are joined together forcing a huge amount of redundancy into the result set.

Lastly, I demonstrated that using propertly selected LoadOptions, performance can be improved significantly.

I also hinted at a third options (lazy load and load options being the first two options).

Full Load

The third option is to create a stored procedure that does a full load of all the data that you need for your report/hierarchy. A stored procedure can return any number of result sets, and a LINQ to SQL mapping of a stored procedure can retrieve these result sets. Unfortunately, they’re not correctly “connected” to eachother, so we have to do that “by hand”. But the performance is significantly better, in my demo case, it’s twice as fast as the previous champion!

Method Execution time Queries Query text
Default/Lazy Load 1.4 s 1031 queries 448kb
Naive Load Options 2.4 s 31 queries 77kb
Smarter Load Options 0.45 s 78 queries 114kb
Full Load 0.137s 1 query 1kb (actually 195 bytes)

(for details on the other methods, see here)

Actually, the new method is 3.28 times faster!!

It comes at a steep price

  • You must create a stored procedure that retrieves the data you need, and you need to maintain this stored procedure
  • The data you retrieve might be read only for your purposes, you need to test this (it might not be properly attached to the context that spawned it)
  • You must load the data with a custom made data context method
  • You must assemble your hierarchy “by hand”

I’m hoping to at least partially automate this process, but for now, we’re stuck doing it by hand. Below you’ll find the code used for this test. As you’ll see, it’s messy compared to lazy load or load options.

ONLY USE THIS METHOD IF YOU REALLY REALLY NEED THE RAW PERFORMANCE, PERHAPS NOT EVEN THEN!

The Code

Here’s the stored procedure I’m using, it’s based on a northwind database;

CREATE procedure [dbo].[GetSuppliers]
as
begin
  -- Supplier 
  declare @dbo_Suppliers TABLE (
   [SupplierID] Int NOT NULL,
   [CompanyName] NVarChar(40) NOT NULL,
   [ContactName] NVarChar(30),
   [ContactTitle] NVarChar(30),
   [Address] NVarChar(60),
   [City] NVarChar(15),
   [Region] NVarChar(15),
   [PostalCode] NVarChar(10),
   [Country] NVarChar(15),
   [Phone] NVarChar(24),
   [Fax] NVarChar(24),
   [HomePage] NText)


  insert into @dbo_Suppliers
  select  c.[SupplierID], c.[CompanyName], c.[ContactName], c.[ContactTitle], c.[Address], c.[City], c.[Region], c.[PostalCode], c.[Country], c.[Phone], c.[Fax], c.[HomePage] from dbo.Suppliers as c

  select * from @dbo_Suppliers

  -- Product 
  declare @dbo_Products TABLE (
   [ProductID] Int NOT NULL,
   [ProductName] NVarChar(40) NOT NULL,
   [SupplierID] Int,
   [CategoryID] Int,
   [QuantityPerUnit] NVarChar(20),
   [UnitPrice] Money,
   [UnitsInStock] SmallInt,
   [UnitsOnOrder] SmallInt,
   [ReorderLevel] SmallInt,
   [Discontinued] Bit NOT NULL)


  insert into @dbo_Products
  select  c.[ProductID], c.[ProductName], c.[SupplierID], c.[CategoryID], c.[QuantityPerUnit], c.[UnitPrice], c.[UnitsInStock], c.[UnitsOnOrder], c.[ReorderLevel], c.[Discontinued] from dbo.Products as c
  join (select distinct SupplierId from @dbo_Suppliers) as p on  c.SupplierId = p.SupplierId 

  select * from @dbo_Products

  -- Category 
  declare @dbo_Categories TABLE (
   [CategoryID] Int NOT NULL,
   [CategoryName] NVarChar(15) NOT NULL,
   [Description] NText,
   [Picture] Image)


  insert into @dbo_Categories
  select  c.[CategoryID], c.[CategoryName], c.[Description], c.[Picture] from dbo.Categories as c
  join (select distinct CategoryId from @dbo_Products) as p on  c.CategoryId = p.CategoryId 

  select * from @dbo_Categories

  -- Order_Detail 
  declare @dbo_OrderDetails TABLE (
   [OrderID] Int NOT NULL,
   [ProductID] Int NOT NULL,
   [UnitPrice] Money NOT NULL,
   [Quantity] SmallInt NOT NULL,
   [Discount] Real NOT NULL)


  insert into @dbo_OrderDetails
  select  c.[OrderID], c.[ProductID], c.[UnitPrice], c.[Quantity], c.[Discount] from dbo.[Order Details] as c
  join (select distinct ProductId from @dbo_Products) as p on  c.ProductId = p.ProductId 

  select * from @dbo_OrderDetails

  -- Order 
  declare @dbo_Orders TABLE (
   [OrderID] Int NOT NULL,
   [CustomerID] NChar(5),
   [EmployeeID] Int,
   [OrderDate] DateTime,
   [RequiredDate] DateTime,
   [ShippedDate] DateTime,
   [ShipVia] Int,
   [Freight] Money,
   [ShipName] NVarChar(40),
   [ShipAddress] NVarChar(60),
   [ShipCity] NVarChar(15),
   [ShipRegion] NVarChar(15),
   [ShipPostalCode] NVarChar(10),
   [ShipCountry] NVarChar(15))


  insert into @dbo_Orders
  select  c.[OrderID], c.[CustomerID], c.[EmployeeID], c.[OrderDate], c.[RequiredDate], c.[ShippedDate], c.[ShipVia], c.[Freight], c.[ShipName], c.[ShipAddress], c.[ShipCity], c.[ShipRegion], c.[ShipPostalCode], c.[ShipCountry] from dbo.Orders as c
  join (select distinct OrderId from @dbo_OrderDetails) as p on  c.OrderId = p.OrderId 

  select * from @dbo_Orders

  -- Shipper 
  declare @dbo_Shippers TABLE (
   [ShipperID] Int NOT NULL,
   [CompanyName] NVarChar(40) NOT NULL,
   [Phone] NVarChar(24))


  insert into @dbo_Shippers
  select  c.[ShipperID], c.[CompanyName], c.[Phone] from dbo.Shippers as c
  join (select distinct ShipVia from @dbo_Orders) as p on  c.ShipperId = p.ShipVia 

  select * from @dbo_Shippers

  -- Customer 
  declare @dbo_Customers TABLE (
   [CustomerID] NChar(5) NOT NULL,
   [CompanyName] NVarChar(40) NOT NULL,
   [ContactName] NVarChar(30),
   [ContactTitle] NVarChar(30),
   [Address] NVarChar(60),
   [City] NVarChar(15),
   [Region] NVarChar(15),
   [PostalCode] NVarChar(10),
   [Country] NVarChar(15),
   [Phone] NVarChar(24),
   [Fax] NVarChar(24))


  insert into @dbo_Customers
  select  c.[CustomerID], c.[CompanyName], c.[ContactName], c.[ContactTitle], c.[Address], c.[City], c.[Region], c.[PostalCode], c.[Country], c.[Phone], c.[Fax] from dbo.Customers as c
  join (select distinct CustomerId from @dbo_Orders) as p on  c.CustomerId = p.CustomerId 

  select * from @dbo_Customers
end

And here’s the LINQ to SQL code that downloads the data and connects the hierarchy;

using System.Collections.Generic;
using System.Data.Linq;
using System.Data.Linq.Mapping;
using System.Linq;
using System.Reflection;

namespace ConsoleApplication1
{
    public partial class NorthwindDataContext
    {
        public List<Supplier> GetSuppliers()
        {
            return AssembleHierarchy(InternalGetSuppliers());
        }

        private static List<T> ResultOrEmptyList<T>(IMultipleResults multipleResults)
            where T : class
        {
            IEnumerable<T> result = multipleResults.GetResult<T>();
            if (result == null)
            {
                return new List<T>();
            }

            return result.ToList();
        }

        private static List<Supplier> AssembleHierarchy(IMultipleResults multipleResults)
        {

            List<Supplier> suppliers = ResultOrEmptyList<Supplier>(multipleResults);
            List<Product> products = ResultOrEmptyList<Product>(multipleResults);
            List<Category> categories = ResultOrEmptyList<Category>(multipleResults);
            List<Order_Detail> orderDetails = ResultOrEmptyList<Order_Detail>(multipleResults);
            List<Order> orders = ResultOrEmptyList<Order>(multipleResults);
            List<Shipper> shippers = ResultOrEmptyList<Shipper>(multipleResults);
            List<Customer> customers = ResultOrEmptyList<Customer>(multipleResults);

            suppliers.ForEach(
                supplier => supplier.Products.SetSource(products.Where(product => product.SupplierID == supplier.SupplierID)));

            products.ForEach(
                product =>
                {
                    product.Category = categories.Single(category => category.CategoryID == product.CategoryID);
                    product.Order_Details.SetSource(
                        orderDetails.Where(detail => detail.ProductID == product.ProductID));
                });

            orderDetails.ForEach(
                detail =>
                {
                    detail.Order = orders.Single(order => order.OrderID == detail.OrderID);
                });

            orders.ForEach(
                order =>
                {
                    order.Shipper = shippers.Single(shipper => shipper.ShipperID == order.ShipVia);
                    order.Customer = customers.Single(customer => customer.CustomerID == order.CustomerID);
                });

            return suppliers;
        }

        [Function(Name = "dbo.GetSuppliers")]
        [ResultType(typeof(Supplier))]
        [ResultType(typeof(Product))]
        [ResultType(typeof(Category))]
        [ResultType(typeof(Order_Detail))]
        [ResultType(typeof(Order))]
        [ResultType(typeof(Shipper))]
        [ResultType(typeof(Customer))]
        private IMultipleResults InternalGetSuppliers()
        {
            IExecuteResult result = ExecuteMethodCall(this, (MethodInfo)MethodBase.GetCurrentMethod());
            return (IMultipleResults)result.ReturnValue;
        }
    }
}

C# from a Delphi Perspective

or: How I learned to stop worrying and love the bomb

For a very long time, I was a Delphi only programmer. Between 1996 and 2008, I exclusively coded Delphi, and I loved it. I made the switch to C# in august 2008 and I haven’t looked back since.

I was recently asked how I felt about the switch, and I thought I’d document my answer here, in case someone else was thinking about making the switch.

It seems to me that Delphi and C# are the only two interesting alternatives for Windows Application development. Java… well… the applications look horrible…

Why move from being a “master” in one field to being a “noob” in another field?

noob

Ok, so maybe I was never a master, but at least I had 12 years of experience, and with C#, I’d be a noob (newbie/new user) again. That did scare me for some time, but in the end, learning new stuff is what being a developer should be all about – it’s the reason why I’ve stayed a consultant instead of taking a steady gig.

Switching was a chance to learn new stuff, to grow a bit. And being a noob isn’t all that bad – the fact that I have been coding since 1986 helped a lot, I’d say I’m pretty proficient at C# by now – though there’s sooo much more to learn.

Let go of the fear, study the arguments for and against instead…

Delphi is Dying

Let’s face it, Delphi has been dying for a very very long time. It might not be dead yet, but it sure smells funny. I doubt Embarcadero will be able to turn the ship around, especially since Microsoft is investing so much in C# / Visual Studio development.

Microsoft don’t even need to make money of Visual Studio – the fact that they attract developers to their environment is win for them either way; it’s a lock in factor for them. Their main business is in the OS/SQL Server/Office sectors. Visual Studio is an enabler, enabling corporations to use their other platforms.

Embarcadero must make money from their products, that’s how they make a living. How can they compete with Microsoft in the long run? I doubt Microsoft even considers them a competitor, they probably view Delphi as irrelevant.

So there will be fewer and fewer new projects developed using Delphi, and maintenance programming is not for me. Any project comes with a certain amount of maintenance, but it’s in new development my heart lies.

64 bit?

When I left Delphi, there was talk about when, oh, when will the 64bit version arrive? Will it ever arrive? It seems this debate is still raging. Not a problem with C#/Visual Studio…

C# is ALIVE

C# is very much alive, with “new” stuff like Generics and lambda programming (more on this later). Most Delphi programmers considered that Delphi 2005 was the first useful version of Delphi since Delphi 6, and Delphi 6 was years and years and years old by that time.

In the Visual Studio community, people seem to be looking forward to the next release, but the Delphi community usually just shrugged about new releases for Delphi.

Performance

So, Delphi performs better than C#, right? I’d assume so, I’ve never benchmarked the two. Since Delphi compiles code to native binaries, it ought to be faster. And I’ll grant that maybe it is, even though a good JIT compiler can optimize the binaries for the actual CPU you’re using – not some generic CPU most people are likely to use, which is what Delphi does.

But that’s largely irrelevant, hardware is cheaper than coding time, so developer productivity is far more important than raw performance. If system A is 10% more efficient than system B, but system B is 10% faster to code for – go with system B and buy a faster computer!

I used to worry a lot about this, because I ran lengthy simulations. Now I just don’t care, let the simulation finish an hour later, if I can spend an hour less time coding it up. My time is precious, my computers time isn’t. And to be frank, I’m not sure the execution difference is that significant.

Performance typically comes down to the algorithm you use, anyway. If your program is slow, don’t try to squeeze out 10% more performance out of your current algorithm, try a different algorithm! Try hashing, skip-listing, pre-compiling, caching, lock free structures, spatial partitioning, try anything except re-ordering your operations to squeeze out a liiiitle more.

But for bare-metal coding, you can’t beat Delphi, there, I said it.

Garbage Collection

Garbage Collection

Garbage collection is strange, we had it with the VIC64, but it wasn’t any good so people gave up on it. And all of a sudden, now it’s good again? I can collect my own garbage, I can manage my own memory de-allocations, no need for the system to do that for me… except, I can’t trust other tool developers to do it correctly. If I use someone else’s software in my solution, a memory leak in their software might crash my software! Also, time spent making sure my structures are de-allocated is time not spent doing something more productive.

Don’t worry about it, it just works. Spend your time doing other wonderful stuff instead!

For real time applications, clearly, garbage collection is a dangerous thing, because your application might become unresponsive at exactly the wrong time. I don’t do real time applications, so that’s not a problem for me.

Productivity

productivity

Productivity is what’s really important to me, my time is precious, my computers time isn’t. So if I can spend less time coming up with the same stuff, then I’m all for that. And with C#, I’m way more productive. I’ll try to explain why.

Environment

The Delphi environment is ok, nothing wrong with it, especially if you use some of the plugins for it. The Visual Studio environment is better and it has better plugins (Resharper). Refactoring your code using Resharper is a bliss.

But for me, productivity isn’t a property of the the environment, it’s mainly a property of the language.

Language

Having gotten over the differences between the Delphi and the C# syntax, I’m far more productive with C# than with I ever was with Delphi. Now, when I left, Delphi didn’t have Generics, but I understand that it does now. So in my examples, I’m comparing an older version of Delphi with the current version of C#, not particularly fair, but there you are.

I must have implemented a thousand descendants from Delphis List, containing my own objects. I hope I never have to do that again. With generics, that just works out of the box. Anyway, I’m laying

public void FireAll()
{
    List<Employee> employees = GetAllEmployees();
    foreach (Employee employee in employees)
    {
        employee.Fire();
    }
}

And I must have written 10000 loops in Delphi, for i := 0 to List.Count-1, I could do that in my sleep. I hope I never have to do it again. Foreach is now part of Delphi as well, but even that’s too much work for my taste! Here’s where the wonderful world of Lambda comes into play. I just love it to pieces. I’m naming my name kid Lambda;

public void FireAll()
{
    GetAllEmployees().Foreach(employee => employee.Fire());
}

Or, even more cryptic

public void FireAll()
{
    GetAllEmployees().Foreach(Fire);
}

I must have written 3365 different for loops where I was looking for a particular item in a particular list, an item that matched a particular constraint, a predicate. No more, no more I say;

What used to be (it would look very much the same in current Delphi);

        public Supplier GetSupplier(List<Supplier> suppliers, int supplierId)
        {
            foreach(Supplier supplier in suppliers)
            {
                if (supplier.SupplierID==supplierId)
                {
                    return supplier;
                }
            }
            
            return null;
        }

Would now be;

        public Supplier GetSupplier(IQueryable<Supplier> suppliers, int supplierId)
        {
            return
                suppliers
                    .SingleOrDefault(supplier => supplier.SupplierID == supplierId);
        }

How’s that for productivity?

I need to find sum up the sales figures for all suppliers in a specific region? Here’s the code;

        public decimal GetSalesForRegion(IQueryable<Supplier> suppliers, string region)
        {
            return
                suppliers
                    .Where(supplier => supplier.Region == region)
                    .SelectMany(supplier => Order_Details)
                    .Sum(order_detail => order_detail.Quantity * order_detail.UnitPrice);
        }

I can’t tell you how many times (actually, it’s 37.4) I’ve wished that I could use something like SQL to query my objects – why must it be so much harder in code than it is in SQL? Now all of a sudden, it isn’t!

Oh, and both those code snippets would work against lists of objects. But they would also work against objects in the database, because my Supplier object is a LINQ to SQL object. Both queries would automatically be converted to SQL statements that run against the database!

It’s like magic!

I’ve spend months writing code to bridge between my database and my client/business side code. No more! NO MORE!

Btw, I trust that you’re aware that 74.8% of all statistics are actually made up?

With C#, I save time on;

  • declarations (generics)
  • serialization/deserialization for local storage or transportation (Serializable/DataContract)
  • loops (lambda and generics)
  • database access (LINQ to SQL)
  • mapping from db to code and back again (LINQ to SQL)
  • rendering my objects to on the GUI (WPF)
  • communicating between by business layer and my client layer (WCF)

Vision

I feel that where Delphi lacks vision, having tried to follow C# into .NET and what not, C# is a powerhouse when it comes to vision. Even though I’m doubtful about .NET 4.0 and all this Dynamics stuff. C# guys, please don’t make the mistake of adding everything you can come up with to the language…

.NET Framework

If you ship C# projects, you must make sure your customers have the appropriate .NET framework installed. This used to be a huge problem, but most people have most frameworks installed already, so it’s not a major problem anymore. It’s still a consideration. With Delphi applications, no gigantic download is required.

Silverlight and the Web

I hate web development, it sucks, whenever you need to do something complex, you have to resort to JavaScript, and JavaScript is just a horror show. But yet, I’d like to publish applications on the web, what to do? Java is out, I have a SCJP certification, but I never used it and in my experience, Java applications just look ugly…

What to do? Silverlight to the rescue, Silverlight allows you to write business applications with C# code that execute on the the web (in a browser). It doesn’t currently work for Linux (though that’s planned), but it works for Windows and MacOSX.

I can write a program in Visual Studio for Windows – which is great for rapid prototyping, then use the exact same code for Silverlight! The same GUI, the same logic, the same…

Well, that’s not totally true, but if you work within certain guidelines, you can re-use your code with some minor reworks. For instance, you need to use a different assembly (project in Delphi lingo) even though that assembly can contain the same files. That’s because the assembly compiles to a different binary.

Conclusion

For me, C# is my choice going forward. Mainly for the productivity, the vision and the ability to create RAD business applications for web deployment.

SQL Server – ntext cannot be selected as DISTINCT

I needed to make a distinct result set from a table that contained ntext, and I got the error message below;

The ntext data type cannot be selected as DISTINCT because it is not comparable.

The query (this is from Northwind) looks like this (I know, here I don’t need the distinct, but this is an example);

select distinct * from Suppliers

What to do? The solution is to cast the ntext fields to some other field type that we can perform distinct against, for instance varchar(1000). This means we loose text in the result set – if there is more text than 1000 characters in the column we’re casting. But that’s the price we’ll have to pay;

select distinct 
  SupplierID, 
  CompanyName, 
  ContactName, 
  ContactTitle, 
  Address, 
  City, 
  Region, 
  PostalCode, 
  Country, 
  Phone, 
  Fax, 
  cast(HomePage as varchar(50)) 
from 
  Suppliers

Now, doing this manually is rather dull, so I created this script to automate it. To find out which of your fields are ntext, you can use this view;

select * from INFORMATION_SCHEMA.COLUMNS where table_name = 'Suppliers'

select * from INFORMATION_SCHEMA.COLUMNS where table_name = 'Suppliers' and DATA_TYPE='ntext'

The SQL Statement below will generate a SQL statement that casts ntext fields;

declare @TableName varchar(90)

-- Replace this with your table name
set @TableName = 'Suppliers'

declare 
  @sql varchar(MAX)

set @sql = 'select distinct ';  
  
select 
  @sql = @sql + 
  case 
    when DATA_TYPE='ntext' then 'cast('+COLUMN_NAME+' as varchar(50)),'
    else COLUMN_NAME + ',' 
   end
from 
  INFORMATION_SCHEMA.COLUMNS 
where 
  TABLE_NAME = @TableName

set @sql = substring(@sql, 1, len(@sql)-1)

set @sql = @sql + ' from ' + @TableName

print @sql

LINQ to SQL Performance Optimizing

When you load large hierarchies of data with LINQ to SQL (and I’m assuming this goes for NHibernate and Entities Framework also), it’s easy to get a raw deal on performance.

Loading hierarchies of data

Hierarchies are used more often than you might expect, see the Northwind extract below;

 

northwind

Now say you need to create a report that lists suppliers, their products and orders for those products, and some information about the customer and the shipper. What we need is a hierarchial report – not just a single line per item but a complex report with master detail relationships. Basically, we need to load a list of suppliers with their entire hierarchies of data.

I’m using the method below to simulate a report. It’s not a very pretty, nor very useful report. But it accesses the data that we require, making sure it’s been properly loaded;

private static void SimulateReport(List<Supplier> suppliers)
{
    // Force a load the data we "need" for our report.
    StringBuilder sb = new StringBuilder();
    suppliers.ForEach(
        supplier =>
        {
            sb.AppendLine(supplier.CompanyName);
            supplier.Products.ToList().ForEach(
                product =>
                {
                    sb.AppendLine(product.ProductName);
                    sb.AppendLine(product.Category.CategoryName);
                    product.Order_Details.ToList().ForEach(
                        detail =>
                        {
                            sb.AppendLine(detail.Quantity.ToString());
                            sb.AppendLine(detail.Order.OrderID.ToString());
                            sb.AppendLine(detail.Order.Customer.ContactName);
                            sb.AppendLine(detail.Order.Shipper.CompanyName);                                    
                        });
                });
        });

    // Turn this on if you wish to look at the bizarre report
    //Console.WriteLine(sb.ToString());
}

LINQ To SQL, Basic (Lazy Load)

LINQ to SQL will handle this out of the box, we don’t need to do anything “special” to make this work. But it’s extremely “talkative”. Initially, it runs a query to find all the Suppliers in the database – it doesn’t know we’ll be needing more information about the child tables, so it only loads the Suppliers. This trivial method uses this simple method to get the suppliers;

        private static List<Supplier> ExecuteDefault(NorthwindDataContext context)
        {
            return context.Suppliers.ToList();
        }

Next, it loads the products of the supplier – again it only the products because it doesn’t know that we’ll be requesting more information shortly. Then it loads the order details and so on.

Each of these requests is a single roundtrip to the database, and roundtrips are major cost when it comes to database access. Running this report on the entire Nortwind database generates 448k of SQL statements, in 1031 individual select statements! These are run in series, one after the other, so even a 10 ms roundtrip would incur a 10 second delay…

Luckily, my roundtrips are short, this method takes 1,4 seconds.

The queries goes on and on, but it looks something like this;

SELECT [t0].[SupplierID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax], [t0].[HomePage]
FROM [dbo].[Suppliers] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[SupplierID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT [t0].[CategoryID], [t0].[CategoryName], [t0].[Description], [t0].[Picture]
FROM [dbo].[Categories] AS [t0]
WHERE [t0].[CategoryID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity], [t0].[Discount]
FROM [dbo].[Order Details] AS [t0]
WHERE [t0].[ProductID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate], [t0].[RequiredDate], [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight], [t0].[ShipName], [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion], [t0].[ShipPostalCode], [t0].[ShipCountry]
FROM [dbo].[Orders] AS [t0]
WHERE [t0].[OrderID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [10285]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[CustomerID] = @p0
-- @p0: Input NVarChar (Size = 5; Prec = 0; Scale = 0) [QUICK]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

LINQ To SQL, With Load Options

When you use LINQ to SQL, you can specify that “whenever I load this object, I will also require this object, so you might as well load that at the same time”. This can be a huge performance boost, because it allows the SQL Server to optimize the queries and it saves you a bundle on roundtrips! This blog post outlines how Load Options can help boost performance.

It all sounds marvelously clever, right? Alas, it’s no panacea, you might just as easily kill performance using this method. This is how I set up my load options before I load the suppliers to serve the report above;

        private static List<Supplier> ExecuteLoadOptions(NorthwindDataContext context)
        {
            DataLoadOptions loadOptions = new DataLoadOptions();
            loadOptions.LoadWith<Supplier>(supplier => supplier.Products);
            loadOptions.LoadWith<Product>(product => product.Category);
            loadOptions.LoadWith<Product>(product => product.Order_Details);
            loadOptions.LoadWith<Order_Detail>(detil => detil.Order);
            loadOptions.LoadWith<Order>(order => order.Customer);
            loadOptions.LoadWith<Order>(order => order.Shipper);
            context.LoadOptions = loadOptions;
            return context.Suppliers.ToList();
        }

This time, only 30 queries are generated! Yay! And that’s 30 queries for 29 suppliers, so that’s one query to find the suppliers and then one query per supplier!! Must be marvelously fast since we’ve avoided a staggering 1000 roundtrips? Right? Wrong. It’s actually slower.

The logged SQL comes in at 77kb and 31 queries (as opposed to 448k and 1031 queries), but that SQL executes in 2.4 seconds instead of 1.4 seconds.

How can it be slower? The queries look like this;

SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued], [t3].[OrderID], [t3].[ProductID] AS [ProductID2], [t3].[UnitPrice] AS [UnitPrice2], [t3].[Quantity], [t3].[Discount], [t4].[OrderID] AS [OrderID2], [t4].[CustomerID], [t4].[EmployeeID], [t4].[OrderDate], [t4].[RequiredDate], [t4].[ShippedDate], [t4].[ShipVia], [t4].[Freight], [t4].[ShipName], [t4].[ShipAddress], [t4].[ShipCity], [t4].[ShipRegion], [t4].[ShipPostalCode], [t4].[ShipCountry], [t6].[test], [t6].[CustomerID] AS [CustomerID2], [t6].[CompanyName], [t6].[ContactName], [t6].[ContactTitle], [t6].[Address], [t6].[City], [t6].[Region], [t6].[PostalCode], [t6].[Country], [t6].[Phone], [t6].[Fax], [t8].[test] AS [test2], [t8].[ShipperID], [t8].[CompanyName] AS [CompanyName2], [t8].[Phone] AS [Phone2], (
    SELECT COUNT(*)
    FROM [dbo].[Order Details] AS [t9]
    INNER JOIN ([dbo].[Orders] AS [t10]
        LEFT OUTER JOIN [dbo].[Customers] AS [t11] ON [t11].[CustomerID] = [t10].[CustomerID]
        LEFT OUTER JOIN [dbo].[Shippers] AS [t12] ON [t12].[ShipperID] = [t10].[ShipVia]) ON [t10].[OrderID] = [t9].[OrderID]
    WHERE [t9].[ProductID] = [t0].[ProductID]
    ) AS [value], [t2].[test] AS [test3], [t2].[CategoryID] AS [CategoryID2], [t2].[CategoryName], [t2].[Description], [t2].[Picture]
FROM [dbo].[Products] AS [t0]
LEFT OUTER JOIN (
    SELECT 1 AS [test], [t1].[CategoryID], [t1].[CategoryName], [t1].[Description], [t1].[Picture]
    FROM [dbo].[Categories] AS [t1]
    ) AS [t2] ON [t2].[CategoryID] = [t0].[CategoryID]
LEFT OUTER JOIN ([dbo].[Order Details] AS [t3]
    INNER JOIN ([dbo].[Orders] AS [t4]
        LEFT OUTER JOIN (
            SELECT 1 AS [test], [t5].[CustomerID], [t5].[CompanyName], [t5].[ContactName], [t5].[ContactTitle], [t5].[Address], [t5].[City], [t5].[Region], [t5].[PostalCode], [t5].[Country], [t5].[Phone], [t5].[Fax]
            FROM [dbo].[Customers] AS [t5]
            ) AS [t6] ON [t6].[CustomerID] = [t4].[CustomerID]
        LEFT OUTER JOIN (
            SELECT 1 AS [test], [t7].[ShipperID], [t7].[CompanyName], [t7].[Phone]
            FROM [dbo].[Shippers] AS [t7]
            ) AS [t8] ON [t8].[ShipperID] = [t4].[ShipVia]) ON [t4].[OrderID] = [t3].[OrderID]) ON [t3].[ProductID] = [t0].[ProductID]
WHERE [t0].[SupplierID] = @x1
ORDER BY [t0].[ProductID], [t2].[CategoryID], [t3].[OrderID], [t6].[CustomerID], [t8].[ShipperID]
-- @x1: Input Int (Size = 0; Prec = 0; Scale = 0) [1]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1

Why is that slower? The query above generates 52 columns and all the queries together generate 2164 rows. That comes out at 112.528 data items – quite a lot of data. What’s worse, 99% of it is duplicated, over and over again, row after row after row!

For a particular supplier, that has say 100 rows, the Supplier details is repeated for each and every one of these rows. We only need it once, but since it’s part of the join, it’s repeated over and over and over again…

In the basic case, the data isn’t repeated so even though there are more queries, the amount of data is way less.

LINQ to SQL, with Smarter Load Options

So, given the problems with load options specified above, we ought to be able to speed the execution up by connecting some of the tables, but not all. That ought to reduce the duplication level at the cost of more roundtrips. And lo and behold, it is faster.

If my first test, I cut the load options at the Product=>Order_Details;

        private static List<Supplier> ExecuteSmarterLoadOptions(NorthwindDataContext context)
        {
            DataLoadOptions loadOptions = new DataLoadOptions();
            loadOptions.LoadWith<Supplier>(supplier => supplier.Products);
            loadOptions.LoadWith<Product>(product => product.Category);
            // loadOptions.LoadWith<Product>(product => product.Order_Details);
            loadOptions.LoadWith<Order_Detail>(detil => detil.Order);
            loadOptions.LoadWith<Order>(order => order.Customer);
            loadOptions.LoadWith<Order>(order => order.Shipper);
            context.LoadOptions = loadOptions;
            return context.Suppliers.ToList();
        }

This executes in 0.45 seconds and generates 114kb of SQL Statements, broken down into 78 separate statements. That’s a major improvement.

I made a number of attempts where I cut the LoadOptions chain at different positions, and the results varied from execution times of 3.4 seconds to 0.45 seconds, depending on where I cut the chain.

A Multi Query Solution

If you need even more performance, there is a way that you can use – but it’s extremely cumbersome and you’ll only want to use it for extreme situations.

In this particular case, you’d create a stored procedure that returned all suppliers, products, order_details, orders, customers and shippers that you needed for your report – in one single query. You wouldn’t use Join, so there would be no duplicates. Instead you’d receive six different result sets from the same stored procedure.

These objects wouldn’t be properly connected as LINQ to SQL entities, so we’d need to assemble the hierarchies “by hand”. I’ll get back to this post further on, demonstrating how to do this.

Read this blog for more details on this method, that turns out to be 3.28 times faster in my example.

Conclusion

Load options are far from a Panacea when it comes to LINQ to SQL, but correctly used, it ca be a big improvement! So be careful with your load options, time your executions properly and make sure you’re using the best load options that you can for your needs!

Silverlight, Minimal Authentication Implementation

I previously blogged about Silverlight and ASP.NET Authentication Woes, but my colleague Robin Dunlop pointed out that there is a simpler way, and sure enough, there is! I only needed to override/implement three methods, two in AuthenticationService (ValidateUser and GetAuthenticatedUser) and one method in UserRegistrationService (AddUser).

Below you’ll find a truly minimal implementation;

  • It will not retain user accounts after a system restart
  • It will expect passwords to match the username (very bad security…)
  • It will come with a user called “Bobby” (password Bobby)

AuthenticationService

using System.Collections.Generic;
using System.Linq;
using System.Security.Principal;
using System.Web.Ria;
using System.Web.Ria.ApplicationServices;

namespace BusinessApplication1.Web
{
    [EnableClientAccess]
    public class AuthenticationService : AuthenticationBase<User>
    {
        private static List<User> _users = new List<User> { new User { Name = "Bobby" } };

        public static List<User> Users { get { return _users; } }

        protected override bool ValidateUser(string userName, string password)
        {
            if (_users.Exists(user => user.Name.Equals(userName)) == false)
            {
                return false;
            }

            return userName == password;
        }

        protected override User GetAuthenticatedUser(IPrincipal pricipal)
        {
            return _users.Where(user => user.Name.Equals(pricipal.Identity.Name)).SingleOrDefault();
        }
    }
}

UserRegistrationService

        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Performance", "CA1822:MarkMembersAsStatic")]
        public void AddUser(RegistrationData user)
        {
            User newUser =
                new User
                    {
                        FriendlyName = user.FriendlyName,
                        Name = user.UserName                        
                    };

            AuthenticationService.Users.Add(newUser);
        }

A real world implementation…

A real world implementation would require you to actually store the user in some kind of persistent storage, but I’m leaving that as an exercise for the user.