Monday, February 19, 2007

LINQ to Code

With all of the information on the Internet telling you how much easier LINQ makes data access you may have missed out on some of the less obvious implications of the new language features. I'm referring specifically to how much easier it is to generate code using the new LINQ API. I'm going to demonstrate a brief example that uses a domain-specific language (DSL) to serialize a class to a fixed-length, record-based text file. This is useful if you find yourself having to work with legacy file formats from the dark ages. Rather than reinvent the wheel let's steal the approach used for XML serialization in the .NET Framework.

In order to convert your C# queries to SQL at run-time LINQ introduces a new construct called a lambda expression. Lambda expressions are just lambda functions that C# represents as data instead of executable code. For a good example of how to use lambda expressions and compile them see this post in Don Box's blog. If you choose to represent a function as a lambda expression tree you can analyze it at run-time and convert it to another form like a SQL query or executable code. This is known as metaprogramming (short defintion: code that manipulates code). When the compiler converts your code into a data tree it represents it using objects in the new System.Linq.Expressions namespace. The question I asked is “Can I use these objects to build a function at run-time?” The answer is yes.

As you probably know, in order to serialize a class as XML you mark it up with attributes. You may not realize this, but this is a form of metaprogramming. When you attach attributes to language constructs you are using a domain specific language which is a little language designed for a specific task. This is a very powerful technique because when you use a language specifically geared for a task it is usually much easier and less error-prone than it would have been if you had written it in a general purpose language like C#. Using attributes to specify a DSL is great because it groups the work to be performed on data with the data itself instead of storing it in an external file where it can get out of sync with changes.Here is the file I want to read from:

1600 Pennsylvania Avenue G.W. Bush
1600 Pennsylvania Avenue C. Rice
1600 Pennsylvania Avenue D. Cheney

Here is the class I want to deserialize from the file:

[TextRecordSerializable]
public class Customer{
private string address;

[FieldRange(0,35)]
public string Address
{
get { return address; }
set { address = value; }
}
private string name;
[FieldRange(35,10)]
public string Name
{
get { return name; }
set { name = value; }
}
}

The FieldRange attributes just store the starting character index and the length of the field in the record. The TextRecordSerializable attribute indicates that the class can be serialized. These two attributes are the only “commands” in our DSL. Simple, huh? Now a full implementation would be a little more complex, allowing for type conversions and such, but it still would be simple enough to describe with attributes and parameters. What we want to do is convert our DSL in to real-honest-to-goodness executable code that deserializes the object from a string (and vice versa). Basically we want to generate the following function:

Func customerParser = (line) => new Customer{Address = line.Substring(0,35), Name = line.Substring(35,10) };

This function can be built for ANY type at run-time using the following code. It might seem complex at first, but stick with it. It's not as complicated as it seems. (Click on the image below to zoom - I haven't figured out how to keep blogger from mauling my code so if anyone knows how please leave a comment)*The code below aliases System.Linq.Expressions.Expression as "Ex" to keep things short.




This code is just beautiful. The data flows out of our DSL, is filtered, and converted to an AST all in a single expression. We create our method in the static constructor and then cache it in a static variable. We then call this function inside a public class method called “Deserialize.” The end result is that we can do this:

var serializer = new TextRecordSerializer<Customer>();
var customer = serializer.Deserialize(“1600 Pennsylvania Avenue G.W. Bush ");Console.WriteLine(customer.Name);

Once our expression is compiled into a function, the code runs as fast as it would have if we had written it ourselves in raw C#. I will leave the method that generates the Serialize function as an exercise for the reader. Next time we'll look at using expressions to create much more advanced DSLs than are possible with simple attributes.

31 comments:

Liviu said...

Interesting. But where did you find documentation? The docs in Orcas for the new features are really of shame, It is a garbage of method declarations without any examples...

Jafar Husain said...

Most of the documentation for LINQ can be found at Microsoft's LINQ site however there is precious little information about Expression trees. I just went spelunking with intellisense and found what I needed. The expression trees are very similar in structure to the CodeDOM but they are built with static factory methods in the System.Linq.Expressions.Expression class.

olmocc said...

Your code is really cool man!. Would be great that Anders & Co. extend the Expression Trees to... Stement Lists :) So it will work with functions with more than one statment: with ifs, whiles, fors... and so on.

Jafar Husain said...

I agree. This would be ideal, but if they optimize tail recursion in lambda expressions when they are compiled we will be able to do anything we could do otherwise, but perhaps not as efficiently.

mawi said...

Hey Jafar! I really like the stuff you are putting out on this blog, and look forward to your book! :)

BUT - in the interest of being able to follow along, is there anywhere I can get the code in text and not picture format? I am lazy! (maybe stupid as well, if the code is there but I've missed it).

Jafar Husain said...

Flattery will get you everywhere. :-) Alas, Blogger doesn't allow me to upload any files except images. I would send it to you by e-mail but it's terribly disorganized, full of aborted ideas, and stuffed into one project which is basically my playground. I plan to clean up the code, GPL it, and post it for everyone once I find a way to serve it.

Sorry 'bout the wait

Anonymous said...

Very interesting post, LINQ is really a promising future, and this expression tree framework is really the fundations of this all.

To copy/past code to your blog, you should try CopyAsHtml from Colin Coller (http://www.jtleigh.com/CopySourceAsHtml - the server seems to be down at the time I write this comment). I hope it can help..

Anonymous said...

did u know about haskell in WINHugs?i got assignment about haskell.....that i got a lot problem do not know how to program it

ChrisB@HPS said...

Thanks for the head start on a project I'm working on to read fixed records from text files. But what if I want to expose a property other than a string from the class to deserialize, like so:

[TextRecordSerializable]
public class Customer
{
private string address;

[FieldRange(0, 35)]
public string Address
{
get { return address; }
set { address = value; }
}
private string name;
[FieldRange(35, 10)]
public string Name
{
get { return name; }
set { name = value; }
}

// INTEGER type property
private int zipcode;
[FieldRange(45, 5)]
public int ZipCode
{
get { return zipcode; }
set { zipcode = value; }
}
}

Can you point me in the right direction so that I can handle multiple data types in the Lambda expression? Thanks

Jafar Husain said...

Absolutely. It's best to use the TypeConverter framework that .NET exposes. All you need to do is modify your LINQ query to retrieve the type converter for each type that is not a string and then modify the expression to invoke the appropriate method on the type converter.

ChrisB@HPS said...

Thanks for the lightning fast response. After reading your suggestion I found myself digging through MSDN and finding very little in the way of useful info. I understand how to separate the query from the binding expression, but the disconnect is how to use the type converter within this query and then use this query to create an expression that invokes the appropriate type converter. Can you expand a bit more? Thanks a lot.

Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Chris Falter said...

Jafar,

Great post! In your "end result" code, don't you need to pass a type to the generic TestRecordSerializer in order to do work, let alone compile? In other words, this:

var serializer = new TextRecordSerializer();

should be this:

var serializer = new TextRecordSerializer[Customer]();

(Sorry I couldn't use angle brackets, blogger disallows them.)

Have you posted the code anywhere since you wrote this?

You might consider cleaning up the comments a bit, too. The spammers are such a shame.

Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Jafar Husain said...

Hey Chris,

You are correct. In fact the generic type declaration _was_ in there but was intepreted as an HTML tag. Thanks for pointing it out.

Mea culpa, I haven't posted the code anywhere. I wrote this when VS2008 was in CTP and I no longer have it. The most interesting bits are in the code snippet. From there you should be able to fill it out. We'll call it a learning exercise :-).

Anonymous said...

I think 2moons dil changes my life. Because of 2moons gold, I meet a lot of friends. Besides, my friends usually give me some 2moon dil. I usually buy 2moons dil through Internet and advice from my friends, so I gain a lot of cheap 2moons gold and harvest in life.

9Dragons is a very good game. Through buying 9Dragons gold, I find fun in it. I am so glad that I can earn a lot of 9 Dragons gold. 9Dragons cater to the taste of young people. With cheap 9Dragons gold, you can get everything you want in this game. So I like to buy 9 Dragons gold. For me 9Dragons money is not just a simple thing.

Anonymous said...

wholesale jewelry
handmade jewelry
jewelry wholesale
fashion jewelry
costume jewelry

jel said...

new polos men poloswomen polosdiscount polos
summer polospolo shirts whosalepolo fashionembroodered polos
tennis racketsclothing poloclothingedhardyshirt
edhardyclothingsummer ed hardy clothingcheap shirtsed hardy brand
cheap ed hardypolo shirts cheapcheap tennis racketsdiscount tennis rackets
ralphlaurenpoloshirtscheappolospolo fashionpolo logo
polo shirts in voguepolo women clothinged-hardy shirtsed-hardy sunglasses
ed-hardy logopolofashioncheaptennisracquetsed-hardy sunglasses
ed-hardy sunglassesed-hardy clothingdiscount polo shirtswholesale ed hardhy shirts
clothingfashionpolos summertennisrackets discountpolos clothes
wilson k sixedhardyclotheswholesale-polo-shirts

Anonymous said...

In preparation for the purchase of a tennis racquetbefore, we must consider your financial ability to bear; On this basis, a further comparison, as far as possible, choose your tennis racket. Now a lot of cheap tennis racquet and more mixed materials, the proportion of mixed-use to control the stiffness of the tennis racquet discount and the shock-absorbing capacity, the more rigid cheap tennis racket, the swing more powerful force; but the relative resilience of the shock-absorbing capacity and discount tennis racket performance of talks on the easier it is for the wrist and elbow injury.
head junior tennis racket
wilson tennis racquet
wilson tennis racket
head tennis racket
babolat tennis racket
Womens Handbags
Cheap Purses
Designer Handbags

Anonymous said...

無料 出会い 競馬予想 無料 競馬予想 競馬予想 無料 競馬予想 無料 競馬予想 無料 近視 手術 メル友 出会い 出会い 出会い 出会い メル友 メル友 人妻 メル友 ギャンブル依存症 AV女優 無料 出会い 出逢い 掲示板 出会い系 無料 出会い 人妻 出会い 人妻 出会い セフレ 人妻 出会い セックスフレンド メル友 出会い SM 愛人 不倫 セフレ 無料 出会い 出会い系 無料 無料 出会い 富士山 写真 富士山 メル友 無臭性動画 カリビアムコム 一本堂 出会い 人妻 セックスフレンド ハメ撮り エッチな0930 メル友 無料 出会い 無料 出会い セフレ セフレ セフレ セックスフレンド セックスフレンド セックスフレンド 人妻 出会い 人妻 出会い 人妻 出会い 出会い系 出会い系 出会い系 カリビアンカム カリビアンカム

Anonymous said...

Women’s nike tn Shox Rivalry est le modèle féminin le plus tendance de baskets pour le sport. tn chaussuresConcernant la semelle :Cheap Brand Jeans ShopMen Jeans - True Religion Jeans nike shoes & Puma Shoes Online- tn nike, le caoutchouc extérieur, l’EVA intermédiaire et le textile intérieur s’associent pour attribuer à la.ed hardy shirts pretty fitCharlestoncheap columbia jackets. turned a pair of double plays to do the trick.Lacoste Polo Shirts, , Burberry Polo Shirts.wholesale Lacoste polo shirts and cheap polo shirtswith great price.Thank you so much!!cheap polo shirts men'ssweate,gillette mach3 razor bladesfor men.As for

Anonymous said...

情趣用品|情趣用品|情趣用品|情趣|情趣用品|情趣

About Me

My photo
I'm a software developer who started programming at age 16 and never saw any reason to stop. I'm working on the Presentation Platform Controls team at Microsoft. My primary interests are functional programming, and Rich Internet Applications.