Series: Batch Insert and Batch Update using Linq To SQL

Linq To SQL does not natively support bulk operations (batch operations).

A new batch operation should operate on the existing Linq To SQL infrastructure, namely

  • the L2S DataContext
  • the class declarations of the data model generated by the .dbml designer

This series sketches an implementation of batch operations using Linq To SQL:

Introduction

Bulk Operations using Linq To SQL

Batch Update

Batch Update using Linq To SQL

Generating T-SQL from Linq queries

Converting Linq Expressions to T-SQL

Batch Insert

Batch Insert using Linq To SQL

Batch Insert with Column Mapping

Batch Insert using Linq To SQL (2)

Batch Insert using Linq To SQL (2)

The BatchInsert command is now capable of generating a DbCommand which INSERTs INTO a table from a SELECT statement defined by a Linq query.

However, the current solution does not allow to assign columns of the inserted table to columns of the SELECT statement – the columns are mapping in order of the table definition and query definition.

An additional List<> and a column assignment method is necessary:

public class BatchInsert<T, S>
{
    List<Tuple<LambdaExpression, LambdaExpression>> setClauses = 
        new List<Tuple<LambdaExpression, LambdaExpression>>();

    public BatchInsert<T, S> Set<P>(Expression<Func<T, P>> columnClause, 
        Expression<Func<S, P>> valueClause)
    {
        setClauses.Add(new Tuple<LambdaExpression, LambdaExpression>(
            columnClause, valueClause));
        return this;
    }

We also need to extend the ToMSSqlString() method to handle the new collection:

    public string ToMSSqlString()
    {
        var attr = typeof(T).GetCustomAttributes(
            typeof(global::System.Data.Linq.Mapping.TableAttribute), true);

        var sb = new StringBuilder();
        sb.AppendLine("INSERT INTO " + 
            ((System.Data.Linq.Mapping.TableAttribute)attr[0]).Name);
        var sep = "    (";

        if (insertClauses.Count > 0)
        {
            foreach (var set in insertClauses)
            {
                sb.Append(sep + set.ToMSSqlString());    
                sep = ", ";
            }
            sb.AppendLine(")");

        }
        else if (setClauses.Count > 0)
        {
            foreach (var set in setClauses)
            {
                sb.Append(sep + set.Item1.ToMSSqlString());    
                sep = ", ";
            }
            sb.AppendLine(")");
        }

        return sb.ToString();
    }

The ToDbCommand() method needs to be extended to SELECT the columns from the query in the order defined by the setClauses collection:

    public DbCommand ToDbCommand(System.Data.Linq.DataContext context)
    {
        var command = context.GetCommand(source);
        //var command = context.Connection.CreateCommand();
        var sb = new StringBuilder();
        sb.Append(this.ToMSSqlString());

        if (insertClauses.Count > 0)
            sb.AppendLine(command.CommandText);
        else if (setClauses.Count > 0)
        {
            var sep = "SELECT ";
            foreach (var set in setClauses)
            {
                sb.Append(sep + set.Item2.ToMSSqlString());
                sep = ", ";
            }
            sb.AppendLine();
            sb.AppendLine("FROM (");
            sb.AppendLine(command.CommandText);
            sb.AppendLine(") AS T");
        }
        else
            throw new ArgumentException("no columns defined");

        command.CommandText = sb.ToString();

        return command;
    }

Using the FooMax table for some examples:

Creating a BatchInsert command without column mapping

var database = new DataContext();
var id = Guid.NewGuid().ToString();

var query = database.FooMaxes.Where(fm => fm.OID == 2)
    .Select(fm => 
        new { newID = fm.ID + " " + id, newText = fm.SomeText + " upd" });

var insert = BatchInsert<FooMax>.From(query)
    .Insert(fm => fm.ID)
    .Insert(fm => fm.SomeText);

var cmd = insert.ToDbCommand(database);

Console.WriteLine(cmd.CommandText);
foreach (System.Data.Common.DbParameter p in cmd.Parameters)
    Console.WriteLine(p.ParameterName + ": " + p.Value.ToString());
Console.WriteLine();

results in:

INSERT INTO dbo.FooMax
    ([ID], [SomeText])
SELECT ([t0].[ID] + @p1) + @p2 AS [newID], [t0].[SomeText] + @p3 AS [newText]
FROM [dbo].[FooMax] AS [t0]
WHERE [t0].[OID] = @p0

@p0: 2
@p1:
@p2: 55125f61-d96a-4ef8-8185-9419ffa2a80b
@p3:  upd

Creating a BatchInsert command with column mapping

var insert = BatchInsert<FooMax>.From(query)
    .Set(fm => fm.SomeText, q => q.newText)
    .Set(fm => fm.ID, q => q.newID);

cmd = insert.ToDbCommand(database);

Console.WriteLine(cmd.CommandText);
foreach (System.Data.Common.DbParameter p in cmd.Parameters)
    Console.WriteLine(p.ParameterName + ": " + p.Value.ToString());
Console.WriteLine();

var inserted = database.Execute(insert);

results in:

INSERT INTO dbo.FooMax
    ([SomeText], [ID])
SELECT [newText], [newID]
FROM (
SELECT ([t0].[ID] + @p1) + @p2 AS [newID], [t0].[SomeText] + @p3 AS [newText]
FROM [dbo].[FooMax] AS [t0]
WHERE [t0].[OID] = @p0
) AS T

@p0: 2
@p1:
@p2: 55125f61-d96a-4ef8-8185-9419ffa2a80b
@p3:  upd

Batch Insert using Linq To SQL

Whereas Batch Update operates on a single table, and therefore on a single C# class, Batch Insert operates on two C# tables and types:

  • the type to be inserted, which is known from the class representing the table definition
  • the type of the result set of a query, which is either an IQueryable<table-type>,
  • or an IQueryable<anonymous-type> in case of select new { }

Since we have to deal with two types, the BatchInsert command class has to be declared as BatchInsert<Table, Query>.

However, since the type of the query may be an anonymous type, we need to use type inference to create a BatchInsert object using a helper class BatchInsert<Table>:

public static class BatchInsert<T>
{
    public static BatchInsert<T, S> From<S>(IQueryable<S> source)
    {
        return new BatchInsert<T, S>(source);
    }
}

The BatchInsert class looks similar to the BatchUpdate class:

public class BatchInsert<T, S>
{
    IQueryable<S> source;
    List<LambdaExpression> insertClauses = new List<LambdaExpression>();

    public BatchInsert(IQueryable<S> source)
    {
        this.source = source;
    }

    public BatchInsert<T, S> Insert<P>(Expression<Func<T, P>> columnClause)
    {
        insertClauses.Add(columnClause);
        return this;
    }

    public string ToMSSqlString()
    {
        var attr = typeof(T).GetCustomAttributes(
            typeof(global::System.Data.Linq.Mapping.TableAttribute), true);

        var sb = new StringBuilder();
        sb.AppendLine("INSERT INTO " + 
            ((System.Data.Linq.Mapping.TableAttribute)attr[0]).Name);
        var sep = "    (";

        if (insertClauses.Count > 0)
        {
            foreach (var set in insertClauses)
            {
                sb.Append(sep + set.ToMSSqlString());    
                sep = ", ";
            }
            sb.AppendLine(")");

        }
        return sb.ToString();
    }

The ToMSSqlString() method generates the INSERT INTO statement with the column declaration, but we still miss the SELECT part of the statement, which can be retrieved using the DataContext.GetCommand() method

    public DbCommand ToDbCommand(System.Data.Linq.DataContext context)
    {
        var command = context.GetCommand(source);
        var sb = new StringBuilder();
        sb.Append(this.ToMSSqlString());
        sb.AppendLine(command.CommandText);
        command.CommandText = sb.ToString();
        return command;
    }
}

Since we now have a DbCommand with a valid Connection property, we can simply execute it:

public static class DataContextExtensions
{
    public static int Execute<T, S>(
        this System.Data.Linq.DataContext context, BatchInsert<T, S> insert)
    {
        var cmd = insert.ToDbCommand(context);
        cmd.Connection.Open();
        int result;
        try
        {
            result = cmd.ExecuteNonQuery();
        }
        finally
        {
            cmd.Connection.Close();
        }
        return result;
    }
}

Converting Linq Expressions to T-SQL

In my post about a Batch Update command for Linq To SQL, the code I presented uses a method ToMSSqlString() which has not yet been defined.

The method converts a Linq Expression to a T-SQL expression by evaluating the current node of the expression tree and evaluation the node’s children:

public static class ExpressionExtensions
{
  public static string ToMSSqlString(this Expression expression)
  {
    switch (expression.NodeType)
    {
      case ExpressionType.Add:
        var add = expression as BinaryExpression;
        return add.Left.ToMSSqlString() + " + " + add.Right.ToMSSqlString();
      case ExpressionType.Constant:
        var constant = expression as ConstantExpression;
        if (constant.Type == typeof(string))
          return "N'" + constant.Value.ToString().Replace("'", "''") + "'";
        return constant.Value.ToString();
      case ExpressionType.Equal:
        var equal = expression as BinaryExpression;
        return equal.Left.ToMSSqlString() + " = " +
               equal.Right.ToMSSqlString();
      case ExpressionType.Lambda:
        var l = expression as LambdaExpression;
        return l.Body.ToMSSqlString();
      case ExpressionType.MemberAccess:
        var memberaccess = expression as MemberExpression;
        // todo: if column aliases are used, look up ColumnAttribute.Name
        return "[" + memberaccess.Member.Name + "]";
    }

    throw new NotImplementedException(
      expression.GetType().ToString() + " " + 
      expression.NodeType.ToString());
  }
}

This only implements selected NodeTypes to sketch the mechanism.

As an alternative, you can also use the ExpressionVisitor class to iterate the expression tree, as illustrated in other blogs.

Whichever method you use, the result of such a method is a T-SQL string representation of the Linq expression which can be used to build the final T-SQL command.

Batch Update using Linq To SQL

Given the definition of a Batch Update command

var update = new BatchUpdate<MyTable>()
        .Set(t => t.SomeText, t => t.SomeText + " updated")
        .Set(t => t.AnotherText, t => t.AnotherText + " modified")
        .Where(t => t.ID < 10);

we can start implementing the class like this:

public class BatchUpdate<T>
{
    List<Tuple<LambdaExpression, LambdaExpression>> setClauses =
        new List<Tuple<LambdaExpression, LambdaExpression>>();
    List<Expression<Func<T, bool>>> whereClauses =
        new List<Expression<Func<T, bool>>>();

    public BatchUpdate<T> Set<P>(
        Expression<Func<T, P>> columnClause,
        Expression<Func<T, P>> valueClause)
    {
        setClauses.Add(
            new Tuple<LambdaExpression, LambdaExpression>(
                columnClause, valueClause));
        return this;
    }

    public BatchUpdate<T> Where(Expression<Func<T, bool>> whereClause)
    {
        whereClauses.Add(whereClause);
        return this;
    }

Note that the lambda expressions to reference the table columns are passed as typed Expression<Func<>>, but stored as LambdaExpressions. This is necessary to store the assignments of columns of different data types.

The methods always return the batch command object to allow for a concatenation of method calls in fluent notation.

From the definition of columns assignments and optional WHERE clause, we can generate the T-SQL UPDATE statement:

    public string ToMSSqlString()
    {
        var attr = typeof(T).GetCustomAttributes(
            typeof(global::System.Data.Linq.Mapping.TableAttribute), true);

        var sb = new StringBuilder();
        sb.AppendLine("UPDATE " +
            ((System.Data.Linq.Mapping.TableAttribute)attr[0]).Name);

        var sep = "SET ";
        foreach (var set in setClauses)
        {
            sb.AppendLine(sep + set.Item1.ToMSSqlString() +
                " = " + set.Item2.ToMSSqlString());
            sep = ",   ";
        }

        sep = "WHERE ";
        foreach (var w in whereClauses)
        {
            sb.AppendLine(sep + "(" + w.ToMSSqlString() + ")");
            sep = "AND   ";
        }

        return sb.ToString();
    }

For example, using the FooMax table of one of my previous blogs

var update = new BatchUpdate<FooMax>()
    .Set(fm => fm.SomeText, fm => fm.SomeText + " upd")
    .Set(fm => fm.ID, fm => fm.ID + " id")
    .Where(fm => fm.OID == 1);

Console.WriteLine(update.ToMSSqlString());

will output the following UPDATE statement:

UPDATE dbo.FooMax
SET [SomeText] = [SomeText] + ' upd'
,   [ID] = [ID] + ' id'
WHERE ([OID] = 1)

This statement can be executed using the DataContext.ExecuteCommand() method:

public static class DataContextExtensions
{
    public static int Execute<T>(
        this System.Data.Linq.DataContext context, BatchUpdate<T> update)
    {
        return context.ExecuteCommand(update.ToMSSqlString());
    }
}

by simply calling

var database = new DataContext();
database.Execute(update);

Bulk Operations using Linq To SQL

I tried to figure out how to implement bulk operations (batch insert, batch update) using Linq To SQL.

The bulk operation should operate on the existing Linq To SQL infrastructure, namely

  • the L2S DataContext
  • the class declarations of the data model generated by the .dbml designer

For example, the batch operation would be written like this:

var database = new DataContext();

var update = new BatchUpdate<MyTable>()
        .Set(t => t.SomeText, t => t.SomeText + " updated")
        .Set(t => t.AnotherText, t => t.AnotherText + " modified")
        .Where(t => t.ID < 10);

var recordsUpdated = database.Execute(update);

Before blogging on this topic, I found that other people also worked on this problem:

The next couple of blogs will sketch the solution I implemented.

Updating a Single Column in Linq to SQL – Summary

The standard way to update a record using Linq to SQL is to load a record from the database, assign the new column values, and submit the changes.

This method is undesirable if your table contains long text or binary columns, since they are loaded into the Linq to SQL object without being used.

Starting with a table containing nvarchar(max) and varbinary(max) columns

CREATE TABLE [dbo].[FooMax](
  [OID] [int] IDENTITY(1,1) NOT NULL,
  [ID] [nvarchar](50) NOT NULL,
  [SomeText] [nvarchar](max) NULL,
  [SomeBinary] [varbinary](max) NULL,
  CONSTRAINT [PK_FooMax] PRIMARY KEY CLUSTERED ([OID] ASC)
)

I listed a couple of methods to reduce database access to the columns actually required in the UPDATE process:

The dbml mappings for these solutions

Updating a Single Column in Linq to SQL using Attach

This answer on Stack Overflow shows another way to update specific columns of a database record in Linq to SQL:

Call the table’s Attach() method passing the new values and a dummy set of old values.

In the .dbml file, the table’s columns (except for the primary key column(s)) need to have their UpdateCheck property set to Never (default is Always).

To update only specific columns, you need to instantiate two table objects: one containing the new values to be set, and one containing values different from the values to be set. The different values need not be the actual values of the record:

using (var database = new DataContext())
{
  database.FooMaxNevers.Attach(
    new FooMaxNever() { OID = OID, ID = newID }, 
    new FooMaxNever() { OID = OID, ID = "some random value" });
  database.SubmitChanges();
}

This code generates the following SQL statement

UPDATE [dbo].[FooMax]
SET   [ID]  = @p1
WHERE [OID] = @p0

Note that this method does not retrieve the current values and instantiate an object based on the actual database record, and therefore does not generate a SELECT statement.

Updating a Single Column in Linq to SQL using an Alias

This answer on Stack Overflow shows another way to update specific columns of a database record in Linq to SQL: Map the table in the .dbml file a second time, but omit the long columns.

using (var database = new DataContext())
{
  var fooID = database.FooMaxIDs
    .Where(foo => foo.OID == OID).FirstOrDefault();
  fooID.ID = newID;
  database.SubmitChanges();
}

This generates the following SQL statements:

SELECT TOP (1) [t0].[OID], [t0].[ID]
FROM [dbo].[FooMax] AS [t0]
WHERE [t0].[OID] = @p0

UPDATE [dbo].[FooMax]
SET [ID] = @p2
WHERE ([OID] = @p0) AND ([ID] = @p1)

Updating a Single Column in Linq to SQL using ExecuteCommand

Another way to update specific columns of a database record in Linq to SQL is use the ExecuteCommand method to execute any SQL statement in the database, bypassing Linq altogether:

using (var database = new DataContext())
{
  database.ExecuteCommand(
    "UPDATE FooMax SET ID = {1} WHERE OID = {0}", OID, newID);
  database.SubmitChanges();
}

The advantage of this solution is that ExecuteCommand allows you to

  • dynamically create SQL statements
  • send the most complex statements to the database which might not easily be accomplished using Linq

The major drawback of the solution is that it shoots a big hole in your OR mapper and layered architecture:

  • Your C# code contains pieces of SQL statements as unverifiable strings
  • Changes in the database model may break existing C# code, noticed only during execution
  • Database abstraction is lost