unfold: metaprogramming

Showing posts with label metaprogramming. Show all posts

Saturday, May 3, 2008

Multimethods in C# (Part 1)

Have you ever needed to perform an action based on the run-time type of one or more objects? Let's take a hypothetical spaceship game for example. There are three types of objects: an Asteroid, an X-Wing and a TIE-Fighter. They are related in the following inheritance hierarchy:

We have a collection of SpaceObjects that we update in a tight loop. We need to handle collisions between these objects. For example if a TIE-Fighter collides with an asteroid we might want to apply damage to the space ship and replace the asteroid with several smaller asteroids. One way of doing this is to determine the concrete type of each SpaceObject and then call one of several overloads to handle the collision:


bool Collison(SpaceObject leftSpaceObject, SpaceObject rightSpaceObject)
{
TieFighter tieFighter = leftSpaceObject as TieFighter;
Asteroid asteroid = rightSpaceObject as Asteroid;
if (tieFighter != null && asteroid != null)
{
  return Collision(tieFighter, asteroid);
}

XWing xWing = leftSpaceObject as XWing;
if xWing != null && asteroid != null)
{
  return Collision(xWing, asteroid);
}

//try again with reversing left and right parameters

// and on and on for every combination of every object in every order...
}

bool Collision(TieFighter tieFighter, Asteroid asteroid)
{
// handle collision
}

bool Collision(XWing xwing, Asteroid asteroid)
{
// handle collision
}

The method that dispatches to the correct overload smells. It is error-prone because you have to be careful to cast in order from the most derived to least derived classes. Failure to do so will not result in the most correct method being selected. It is also repetitive because all of the information required to write this code can be inferred from information you've already declared in your overloads and your class definitions. It's also a maintenance nightmare because if we make any changes to the class hierarchy we will need to remember to update the method.

Multimethods

Essentially we want virtual method dispatch but we need it on multiple types and we need it bound at run-time because the compiler can't know enough information to do it at compile-time. This is exactly what multimethods allow us to do. In a compiler that supports multimethods you simply declare the various overloads and the compiler ensures the correct overload is called. Here's what this might look like if it were added to C#:


// when invoked this checks argument types and dynamically dispatches to overloads below
multimethod bool Collision(SpaceObject obj1, SpaceObject obj2);

bool Collision(TieFighter tieFighter, Asteroid asteroid)
{
// handle collision
}

bool Collision(XWing xwing, Asteroid asteroid)
{
// handle collision
}

bool Collision(TieFighter tieFighter1, TieFighter tieFighter2)
{
//handle collision
}

// and so on for all cominbations...

Unfortunately there is no "multimethod" modifier in C# and it's quite unlikely there ever will be. Thankfully C# 3.0 does have a feature which allows us to seamlessly add this feature: Expressions.

Adding Multimethods to C#

Let's create a library that examines our class hierarchy and a group of method overloads using reflection and then creates the method that does the dynamic dispatch for us. We'll design it so it has the following API:


var collision = DynamicDispatch.CreateFunc<SpaceObject,SpaceObject,bool>((obj1, obj2) => this.Collision(obj1,obj2));

// dispatches to Collision(XWing xWing, TieFighter tieFigher)
collision(new XWing(), new TieFighter());

What's going on here? The CreateFunc method accepts a lambda in which the most abstract version of the collision method is invoked. It returns an instance of Func<>, a new delegate type introduced in C# 3.0 that wraps a method that returns no arguments. How do I manage to generate the code required to do dynamic dispatch from a lambda function? I'll cover that in the next installment. Stay tuned. :-)

Thursday, March 1, 2007

Runtime macros in C# 3.0

Macros are perhaps the most powerful tool available to programmers for creating new kinds of abstractions. They allow developers to extend their programming language with new constructs and behaviors. Many languages have this feature, the most well-known of which is LISP. A macro is a function which operates on code, transforming it into different code (usually expanding it). Just before compilation the compiler parses the code into an abstract syntax tree (AST) and passes the applicable pieces to the macro function. The developer need only indicate somehow which portions of code they would like to transform with their macro. This is a useful technique if you notice recurring patterns in your code and your language does not provide you with sufficiently powerful abstractions to factor them out.

C# 3.0 does NOT add compile-time macros to the language, but allows you to do the same type of code manipulation at run-time. In this article I'm going to create a macro function that adds a useful new abstraction to C#.

Roughly a year or so ago Microsoft Research released an extension of the C# language called C Omega. It was an experimental language written by Erik Meijer (Mr. Haskell) among others and was the precursor to C# 3.0. Many cool features in C-Omega did not make it into C# 3.0. I understand why most of the omitted features were left out, but there was one feature that I really liked that didn't make the cut.

Ask yourself how many times you have written code like this:

int? friendsSonsAge = null;
if (person != null && person.Friend != null && person.Friend.Son != null)
{
friendsSonsAge = person.Friend.Son.Age;
}

OK, OK. I was never really good at short contrived code examples but the point is that often you want to know the value of a property buried deep in a hierarchy of objects, each of which may or may not be null. In most mainstream languages (Java, C, Javascript, VB.NET, C#, Ruby) you have no recourse but the ugly boilerplate code shown above.

In C-Omega you can do this:

int? friendsSonsAge = person.Friend.Son.Age;

if (friendsSonsAge != null)
// do something

If any of the intermediate objects in the expression on the right of the assignment are null the whole expression evaluates to null. This is called null propagation and it works very nicely together with nullable types. I have no idea why this feature wasn't added to C# 3.0. No matter, we can create a macro that duplicates this behavior ourselves. :-)

The necessary ingredient for a macro is the ability to turn code into data. C# exposes code as data in an Expression tree object. If you assign a lambda function to a variable of type Expression C# will build an AST of the function instead of creating an executable delegate.

Expression<Func<int,bool>> oldEnoughToDrink = (age) => age >= 19; // oldEnoughToDrink can not be executed, instead it points to an AST representing the function

C# cannot translate arbitrary expressions or code blocks into data, only lambda functions so we will need to nest our expression inside one. Behold, my null propagation macro function:

Person person = null;
int? personsFriendsSonsAge = Macros.GetValue<int>(() => person.Friend.Son.Age);

In this example personsFriendsSonsAge will be null because the local variable person is null. The above code does not throw a null reference exception because the lambda function is not executed, but converted to an AST, and then passed to the GetValue function. Notice how we've nested the expression inside a lambda function with no arguments:

() => person.Friend.Son.Age

The C# compiler knows to convert this lambda expression into an expression tree because the type of the first argument to the GetValue function is an Expression:

public static Nullable<T> GetValue<T>(Expression<Func<T>> f) where T : struct

Here is the body of the macro (click to view full-size):

The GetValue function takes the expression passed to it and wraps it in a series of AndAlso (short-circuit evaluation "&&" operator) expressions comparing each member expression to null. I build the conditional expression from the inside-out by nesting each previous AndAlso in a new one. I skip a check for null if a member expression is referencing a value type which can never be null. Finally I compile the expression tree into a function, execute it, and return the result.

The end result is that a little function is generated that looks like this:

() => (person != null && person.Friend != null && person.Friend.Son != null) ? person.Friend.Son.Age : null;

Now there are several things to be aware of:

1. It's slow. Generating code at runtime is comparatively expensive even if it's just a little function. That's what nice about compile-time macros: the code generation happens ahead of time. That said, execution speed is on par with an eval in IronPython so it's still quite reasonable. Just don't use it in a tight loop.

2. This version of the function only works for value types. This is due to the fact that it returns the generic Nullable class which cannot be parameterized with a reference type. You can easily add another function which works with reference types by modifying the original slightly. The need to have one function for value types and another for reference types is a necessary headache if we want to preserve type safety.

3. This is an oversimplified version for the purposes of demonstration. It only handles member access expressions like form.Size.Width, but won't work if there is a method call or indexer in the expression.

int? age= Macros.GetValue<int>(() => myCustomer.Friends[0].GetBrother().Age); // will throw an exception because a call to the GetBrother method and an indexer function is made

This should not be perceived as limitation of the language, but rather a limitation on my free time :-). In fact it is completely possible to handle method calls. The trick is to ensure that you don't call the same method again and again when making null comparisons. This can be accomplished by introducing a new lambda function every time you need to store the result of a computation. I'll leave that as an exercise for the reader. :-)

The important thing to take away from this article is that converting code to data (and vice versa) has many more uses than just generating SQL from C# queries. C# has evolved into a very powerful language for metaprogramming and I recommend learning as much as you can about it so that you can leverage these new capabilities.

Monday, February 19, 2007

LINQ to Code

With all of the information on the Internet telling you how much easier LINQ makes data access you may have missed out on some of the less obvious implications of the new language features. I'm referring specifically to how much easier it is to generate code using the new LINQ API. I'm going to demonstrate a brief example that uses a domain-specific language (DSL) to serialize a class to a fixed-length, record-based text file. This is useful if you find yourself having to work with legacy file formats from the dark ages. Rather than reinvent the wheel let's steal the approach used for XML serialization in the .NET Framework.

In order to convert your C# queries to SQL at run-time LINQ introduces a new construct called a lambda expression. Lambda expressions are just lambda functions that C# represents as data instead of executable code. For a good example of how to use lambda expressions and compile them see this post in Don Box's blog. If you choose to represent a function as a lambda expression tree you can analyze it at run-time and convert it to another form like a SQL query or executable code. This is known as metaprogramming (short defintion: code that manipulates code). When the compiler converts your code into a data tree it represents it using objects in the new System.Linq.Expressions namespace. The question I asked is “Can I use these objects to build a function at run-time?” The answer is yes.

As you probably know, in order to serialize a class as XML you mark it up with attributes. You may not realize this, but this is a form of metaprogramming. When you attach attributes to language constructs you are using a domain specific language which is a little language designed for a specific task. This is a very powerful technique because when you use a language specifically geared for a task it is usually much easier and less error-prone than it would have been if you had written it in a general purpose language like C#. Using attributes to specify a DSL is great because it groups the work to be performed on data with the data itself instead of storing it in an external file where it can get out of sync with changes.Here is the file I want to read from:

1600 Pennsylvania Avenue G.W. Bush
1600 Pennsylvania Avenue C. Rice
1600 Pennsylvania Avenue D. Cheney

Here is the class I want to deserialize from the file:

[TextRecordSerializable]
public class Customer{
private string address;

[FieldRange(0,35)]
public string Address
{
get { return address; }
set { address = value; }
}
private string name;
[FieldRange(35,10)]
public string Name
{
get { return name; }
set { name = value; }
}
}

The FieldRange attributes just store the starting character index and the length of the field in the record. The TextRecordSerializable attribute indicates that the class can be serialized. These two attributes are the only “commands” in our DSL. Simple, huh? Now a full implementation would be a little more complex, allowing for type conversions and such, but it still would be simple enough to describe with attributes and parameters. What we want to do is convert our DSL in to real-honest-to-goodness executable code that deserializes the object from a string (and vice versa). Basically we want to generate the following function:

Func customerParser = (line) => new Customer{Address = line.Substring(0,35), Name = line.Substring(35,10) };

This function can be built for ANY type at run-time using the following code. It might seem complex at first, but stick with it. It's not as complicated as it seems. (Click on the image below to zoom - I haven't figured out how to keep blogger from mauling my code so if anyone knows how please leave a comment)*The code below aliases System.Linq.Expressions.Expression as "Ex" to keep things short.

This code is just beautiful. The data flows out of our DSL, is filtered, and converted to an AST all in a single expression. We create our method in the static constructor and then cache it in a static variable. We then call this function inside a public class method called “Deserialize.” The end result is that we can do this:

var serializer = new TextRecordSerializer<Customer>();
var customer = serializer.Deserialize(“1600 Pennsylvania Avenue G.W. Bush ");Console.WriteLine(customer.Name);

Once our expression is compiled into a function, the code runs as fast as it would have if we had written it ourselves in raw C#. I will leave the method that generates the Serialize function as an exercise for the reader. Next time we'll look at using expressions to create much more advanced DSLs than are possible with simple attributes.

unfold