Sunday, April 6, 2008

Playing around with LINQ to SQL: Using DataLoadOptions. Immediate loading and filtering of related data

Linq2Sql use deferred execution of queries, lazy loading to get data from the database. The data is only retrieved from the source when we are ready to use it, say when we iterate through it and do something to it. The are advantages to it, say when we have hierarchical data like Orders->OrderDetails, iterating through the Order will not get you all the child objects ( the OrderDetails) by default. This is a good thing of course, it returns you what you want but it can also lead to unexpected results and unnecessary round trips to the database if not careful.

order_order_details_Products

 

[TestMethod]
public void Lazy_Load_Them()
{
    var context = new NorthwindDataContext();
    context.Log = Console.Out;

    var mydata = from order in context.Orders
                 select order;

    foreach (Order o in mydata)
    {
          Console.WriteLine(o.OrderID);
    }
}

 

The only query generated is below:

SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate], [t0].[RequiredDate], [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight], [t0].[ShipName], [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion], [t0].[ShipPostalCode], [t0].[ShipCountry]
FROM [dbo].[Orders] AS [t0]

In the above case it all very simple, all we wanted was the Order(s) entities, network bandwidth was minimised, memory consumption minimised etc.  But what if inside the foreach we also want to output the OrderDetails children of Orders and their corresponding Products? All the child records will be requested when we want, in the below case when we output them...

[TestMethod]
public void Lazy_Load_Them()
{
    var context = new NorthwindDataContext();
    context.Log = Console.Out;
    var mydata = from order in context.Orders
                 select order;

    foreach (Order o in mydata)
    {
         Order_Detail od = o.Order_Details.First();//Just the first record for brevity
         Console.WriteLine(od.ProductID + ", " + od.Product.ProductName);
     }
}

Well, the results are gory, as we request details and the Product for each Order, a separate query will be created and this will mean a new round trip to the database for every Order. The queries are below and trust me, the are like that for every row of the Order table.

SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate], [t0].[RequiredDate], [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight], [t0].[ShipName], [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion], [t0].[ShipPostalCode], [t0].[ShipCountry]
FROM [dbo].[Orders] AS [t0]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

SELECT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity], [t0].[Discount]
FROM [dbo].[Order Details] AS [t0]
WHERE [t0].[OrderID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [10248]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[ProductID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [11]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

11, Queso Cabrales
SELECT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity], [t0].[Discount]
FROM [dbo].[Order Details] AS [t0]
WHERE [t0].[OrderID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [10249]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

SELECT [t0].[ProductID], [t0].[ProductName], [t0].[SupplierID], [t0].[CategoryID], [t0].[QuantityPerUnit], [t0].[UnitPrice], [t0].[UnitsInStock], [t0].[UnitsOnOrder], [t0].[ReorderLevel], [t0].[Discontinued]
FROM [dbo].[Products] AS [t0]
WHERE [t0].[ProductID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [14]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

14, Tofu
SELECT [t0].[OrderID], [t0].[ProductID], [t0].[UnitPrice], [t0].[Quantity], [t0].[Discount]
FROM [dbo].[Order Details] AS [t0]
WHERE [t0].[OrderID] = @p0
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [10250]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

As you can see we have queries for the OrderDetails and another query for the Product data..Three queries for for that! For evey single Order? Hrmm

I have made a total stuff-up, a good thing 'lazy loading' turned into a monster. I have been careless in this case, but how can we improve it? How can minimize the round trips?

DataLoadOptions: LoadWith(LambdaExpression)

LinqToSql provides a method for immediate loading of data related to the main Entity we are working through the DataLoadOptions class. This class provides two methods to get immediate loading of related data, The LoadWith method and AssociateWith method that allow for filtering of related objects.

All you have to do is specify which data that is related to your main target and should be retrieved at the same time with a Lambda expression, for example in our case above our main Entity is the Order, and we would like to get the Order_Details for each Order. This will result in only one trip to the database for all that information, we eliminate all the subqueries that we needed because of the lazy loading. This will improve things significantly for our applications and our users.

In the exaple below I want to get all the Order_Details related to my main interest, Order.

public void Get_Deferred_Data_With_Loading_Options()
{
    var dataContext = newNorthwindDataContext();
    DataLoadOptionsdataLoadOptions = newDataLoadOptions();           
    dataLoadOptions.LoadWith<Order>(order => order.Order_Details);
    dataContext.LoadOptions = dataLoadOptions;
     
    var orders = from order indataContext.Orders
                       select order;

    foreach (Order order in orders)
    {
      List<Order_Detail> ods = order.Order_Details.ToList();
      ods.ForEach(od => Console.WriteLine( order.CustomerID+ " "+ od.OrderID + " , ProdID: "+ od.ProductID + " : "+ od.UnitPrice));
    }          
}

SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate], [t0].[RequiredDate], [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight], [t0].[ShipName], [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion], [t0].[ShipPostalCode], [t0].[ShipCountry], [t1].[OrderID] AS [OrderID2], [t1].[ProductID], [t1].[UnitPrice], [t1].[Quantity], [t1].[Discount], (
    SELECT COUNT(*)
    FROM [dbo].[Order Details] AS [t2]
    WHERE [t2].[OrderID] = [t0].[OrderID]
    ) AS [value]
FROM [dbo].[Orders] AS [t0]
LEFT OUTER JOIN [dbo].[Order Details] AS [t1] ON [t1].[OrderID] = [t0].[OrderID]
ORDER BY [t0].[OrderID], [t1].[ProductID]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

VINET 10248 , ProdID: 11 : 14.0000
VINET 10248 , ProdID: 42 : 9.8000
VINET 10248 , ProdID: 72 : 34.8000
TOMSP 10249 , ProdID: 14 : 18.6000
TOMSP 10249 , ProdID: 51 : 42.4000
HANAR 10250 , ProdID: 41 : 7.7000
HANAR 10250 , ProdID: 51 : 42.4000

 

And we can also further load more child records as below.. by adding the LoadWith to get Product associated records from the Order Details entity

 

public void Get_Deferred_Data_With_Loading_Options()
{
      var dataContext = new NorthwindDataContext();

      DataLoadOptions dataLoadOptions = new DataLoadOptions();
      dataLoadOptions.LoadWith<Order>(order => order.Order_Details);
      dataLoadOptions.LoadWith<Order_Detail>(od => od.Product);
        
      dataContext.LoadOptions = dataLoadOptions;
      dataContext.Log = Console.Out;
      
       var orders = from order in dataContext.Orders
                    select order;

       foreach (Order order in orders)
       {
           List<Order_Detail> ods = order.Order_Details.ToList();
           ods.ForEach(od => Console.WriteLine( od.Product.ProductName + ", " + order.CustomerID+ " " + od.OrderID + " , ProdID: " + od.ProductID + " : " + od.UnitPrice ));
       }           
}
 

SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate], [t0].[RequiredDate], [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight], [t0].[ShipName], [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion], [t0].[ShipPostalCode], [t0].[ShipCountry], [t1].[OrderID] AS [OrderID2], [t1].[ProductID], [t1].[UnitPrice], [t1].[Quantity], [t1].[Discount], [t2].[ProductID] AS [ProductID2], [t2].[ProductName], [t2].[SupplierID], [t2].[CategoryID], [t2].[QuantityPerUnit], [t2].[UnitPrice] AS [UnitPrice2], [t2].[UnitsInStock], [t2].[UnitsOnOrder], [t2].[ReorderLevel], [t2].[Discontinued], (
    SELECT COUNT(*)
    FROM [dbo].[Order Details] AS [t3]
    INNER JOIN [dbo].[Products] AS [t4] ON [t4].[ProductID] = [t3].[ProductID]
    WHERE [t3].[OrderID] = [t0].[OrderID]
    ) AS [value]
FROM [dbo].[Orders] AS [t0]
LEFT OUTER JOIN ([dbo].[Order Details] AS [t1]
    INNER JOIN [dbo].[Products] AS [t2] ON [t2].[ProductID] = [t1].[ProductID]) ON [t1].[OrderID] = [t0].[OrderID]
ORDER BY [t0].[OrderID], [t1].[ProductID]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

Queso Cabrales, VINET 10248 , ProdID: 11 : 14.0000
Singaporean Hokkien Fried Mee, VINET 10248 , ProdID: 42 : 9.8000
Mozzarella di Giovanni, VINET 10248 , ProdID: 72 : 34.8000

Query is a bit more complex, but thats all! No more round trips to our database. Cool eh?

 

AssociateWith: Filtering at the DataContext Level

This is used to specify in advance, sub-queries to limit the amount of data retrieved for any query against that particular context. Your DataContext instance will filter the data given by the Lambda you specified with AssociateWith all the time.

For example, I want to only process orders where the QuantiyInOrder is less than 10 because of shipping difficulties or special offers.

public void Get_Deferred_Data_With_Loading_Options()
{
      var dataContext = new NorthwindDataContext();
      DataLoadOptions dataLoadOptions = new DataLoadOptions();
      dataLoadOptions.LoadWith<Order>(order => order.Order_Details);
      dataLoadOptions.AssociateWith<Order>(order => order.Order_Details.Where(p => p.Quantity < 10));            

 

SELECT [t0].[OrderID], [t0].[CustomerID], [t0].[EmployeeID], [t0].[OrderDate], [t0].[RequiredDate], [t0].[ShippedDate], [t0].[ShipVia], [t0].[Freight], [t0].[ShipName], [t0].[ShipAddress], [t0].[ShipCity], [t0].[ShipRegion], [t0].[ShipPostalCode], [t0].[ShipCountry], [t1].[OrderID] AS [OrderID2], [t1].[ProductID], [t1].[UnitPrice], [t1].[Quantity], [t1].[Discount], (
    SELECT COUNT(*)
    FROM [dbo].[Order Details] AS [t2]
    WHERE ([t2].[Quantity] < @p0) AND ([t2].[OrderID] = [t0].[OrderID])
    ) AS [value]
FROM [dbo].[Orders] AS [t0]
LEFT OUTER JOIN [dbo].[Order Details] AS [t1] ON ([t1].[Quantity] < @p0) AND ([t1].[OrderID] = [t0].[OrderID])
ORDER BY [t0].[OrderID], [t1].[ProductID]
-- @p0: Input Int (Size = 0; Prec = 0; Scale = 0) [10]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.21022.8

VINET 10248 , ProdID: 72 : 5
TOMSP 10249 , ProdID: 14 : 9
VICTE 10251 , ProdID: 22 : 6
HILAA 10257 , ProdID: 39 : 6
ERNSH 10258 , ProdID: 32 : 6
CENTC 10259 , ProdID: 37 : 1
RATTC 10262 , ProdID: 56 : 2

All our queries within the scope of this DataContext will be filtered in advance rather than you having to specify a Where clause.

 

DataLoadOptions in MSDN can be found below

http://msdn2.microsoft.com/en-us/library/system.data.linq.dataloadoptions.aspx

1 comment:

David Tchepak said...

Thanks for this Juan, very helpful. :)