Monday, February 19, 2007

LINQ to Code

With all of the information on the Internet telling you how much easier LINQ makes data access you may have missed out on some of the less obvious implications of the new language features. I'm referring specifically to how much easier it is to generate code using the new LINQ API. I'm going to demonstrate a brief example that uses a domain-specific language (DSL) to serialize a class to a fixed-length, record-based text file. This is useful if you find yourself having to work with legacy file formats from the dark ages. Rather than reinvent the wheel let's steal the approach used for XML serialization in the .NET Framework.

In order to convert your C# queries to SQL at run-time LINQ introduces a new construct called a lambda expression. Lambda expressions are just lambda functions that C# represents as data instead of executable code. For a good example of how to use lambda expressions and compile them see this post in Don Box's blog. If you choose to represent a function as a lambda expression tree you can analyze it at run-time and convert it to another form like a SQL query or executable code. This is known as metaprogramming (short defintion: code that manipulates code). When the compiler converts your code into a data tree it represents it using objects in the new System.Linq.Expressions namespace. The question I asked is “Can I use these objects to build a function at run-time?” The answer is yes.

As you probably know, in order to serialize a class as XML you mark it up with attributes. You may not realize this, but this is a form of metaprogramming. When you attach attributes to language constructs you are using a domain specific language which is a little language designed for a specific task. This is a very powerful technique because when you use a language specifically geared for a task it is usually much easier and less error-prone than it would have been if you had written it in a general purpose language like C#. Using attributes to specify a DSL is great because it groups the work to be performed on data with the data itself instead of storing it in an external file where it can get out of sync with changes.Here is the file I want to read from:

1600 Pennsylvania Avenue G.W. Bush
1600 Pennsylvania Avenue C. Rice
1600 Pennsylvania Avenue D. Cheney

Here is the class I want to deserialize from the file:

[TextRecordSerializable]
public class Customer{
private string address;

[FieldRange(0,35)]
public string Address
{
get { return address; }
set { address = value; }
}
private string name;
[FieldRange(35,10)]
public string Name
{
get { return name; }
set { name = value; }
}
}

The FieldRange attributes just store the starting character index and the length of the field in the record. The TextRecordSerializable attribute indicates that the class can be serialized. These two attributes are the only “commands” in our DSL. Simple, huh? Now a full implementation would be a little more complex, allowing for type conversions and such, but it still would be simple enough to describe with attributes and parameters. What we want to do is convert our DSL in to real-honest-to-goodness executable code that deserializes the object from a string (and vice versa). Basically we want to generate the following function:

Func customerParser = (line) => new Customer{Address = line.Substring(0,35), Name = line.Substring(35,10) };

This function can be built for ANY type at run-time using the following code. It might seem complex at first, but stick with it. It's not as complicated as it seems. (Click on the image below to zoom - I haven't figured out how to keep blogger from mauling my code so if anyone knows how please leave a comment)*The code below aliases System.Linq.Expressions.Expression as "Ex" to keep things short.




This code is just beautiful. The data flows out of our DSL, is filtered, and converted to an AST all in a single expression. We create our method in the static constructor and then cache it in a static variable. We then call this function inside a public class method called “Deserialize.” The end result is that we can do this:

var serializer = new TextRecordSerializer<Customer>();
var customer = serializer.Deserialize(“1600 Pennsylvania Avenue G.W. Bush ");Console.WriteLine(customer.Name);

Once our expression is compiled into a function, the code runs as fast as it would have if we had written it ourselves in raw C#. I will leave the method that generates the Serialize function as an exercise for the reader. Next time we'll look at using expressions to create much more advanced DSLs than are possible with simple attributes.

About Me

My photo
I'm a software developer who started programming at age 16 and never saw any reason to stop. I'm working on the Presentation Platform Controls team at Microsoft. My primary interests are functional programming, and Rich Internet Applications.