Wednesday, July 2, 2008

Multimethods in C# (Part 2)

In part 1 I demonstrated how to use my multimethod library in C#:

var collision = DynamicDispatch.CreateFunc<SpaceObject,SpaceObject,bool>((obj1, obj2) => this.Collision(obj1,obj2));

// dispatches to Collision(XWing xWing, TieFighter tieFigher)
collision(new XWing(), new TieFighter());

Although it appears that CreateFunc accepts a lambda function, in reality the type of its argument is a lambda expression. A Lambda expressions is the data representation of a lambda function. The CreateFunc functions analyzes the lambda expression and retrieves the overloads of the function invoked in the expression. Finally it generates the code to do a dynamic dispatch based on the argument types and returns a delegate of type Func<>.

Based on the type of the arguments passed to the collision delegate it invokes the correct method overload at run-time. In order to generate the code for this delegate I use the objects defined in the System.Linq.Expressions namespace. The code for the collision method looks something like this:



This may look a little complicated but really it's not. I simply create a function for each overload that attempts to cast the arguments to the concrete types expected. If any of the arguments are null, which they will be if the "as" operator fails, the process is repeated for the next overload until finally it reaches the most abstract overload.

So how is this done? Well first we must get a list of all the overloads and sort them in the appropriate order, from most abstract to most derived.



Some of the functions used here warrant some explanation. Iterate is a function that accepts a starting argument and a function and then generates a stream that takes an initial value, returns it, and then returns the results of recursively applying the function to the previous value. In other words FP.Iterate(0, x => x + 1) yields the following stream: 0,1,2,3,4,etc. In this case I use Iterate to walk up the inheritance tree of a type and find out what its depth is. Since the Iterate function is stateless and returns a stream I can use it in a Linq query. I sort the overloads by the max depth of any type in a given overload, and then by the sum of all the argument depths. Note that I sort the overloads in ascending order from most abstract to most derived instead of vice-versa. The reasons for this are clear when you examine how I build the expression.



Due to the fact that I want to build the expression using functional programming I will have to build it inside out recursively, starting with the most abstract function and ending with the most derived function. The reason for this is that more derived functions must call the most abstract functions, meaning they must exist before the derived functions are created. In order to achieve this I use a very versatile function: Enumerable.Aggregate. Aggregate takes an initial value, a function that accepts two arguments, and a stream. The first argument to the function is the initial value passed to the Aggregate. The second argument is the current item in the stream. Every time the function passed to aggregate is run the result is used for the accumulator variable. For example the following code yields 10:

var nums = new[]{1,2,3,4};
var output = nums.Aggregate(0, (x,y) => x + y);

Aggregate can work on anything, not just numbers. In this case my stream contains all the overloads, my accumulator variable is the expression I've built so far, and the function creates a new expression that invokes the current overload if the types match or invokes the expression in the accumulator if they don't.

This approach works well and is very elegant, but aren't we forgetting something?

What if instead of this...

collision(new XWing(), new TieFighter());

...the following code is run:

collision(new TieFighter(), new XWing());

Whoops! It wont match our first overload even though that would probably make the most sense under the circumstances. Within each handler we could write some manual code to try and reverse the arguments but that is exactly the kind of drudgery we want to avoid. Next time I'll show you how to modify our algorithm to generate code to try all the various combinations of argument orders.

Why are dynamic languages so...static?

<Rant> I'm frustrated by the glacial pace of evolution we see in the dynamic language space, specifically in the area of parallelism. This article on Ars Technica echoes what many of us have long suspected: the future is massively parallel. It was so obvious to me that the way to write parallel programs was not to deliberately assign certain types of tasks to certain cores that I was genuinely surprised at how wide-spread the practice was. We need to start expressing programs in such a way that they can be scaled to as many cores as there are available without developers having to manage the process. When you are dealing with 16, 32, 100+ cores it's no longer about making efficient parallelism easier. It about making it possible. Nobody's that good and if they tell you so they're lying. There is no silver bullet and there are many tasks that just can't be done in parallel. However the key to democratizing parallel programming is to give developers a clean way to express algorithms as stateless programs (read: functional programming), and give them access to their language's parse tree. Languages like LISP, ML, C#, VB.NET, and Perl 6 (if it is every released) expose the syntax tree to the developer allowing them to leverage the most important idea in computer science. Libraries can analyze this parse tree and rewrite it to run efficiently in a variety of different hardware scenarios. Given that the multi-core future is bearing down on us it's only natural to assume that the various popular programming languages would be scrambling to add the necessary idioms to support parallelism. Not so much. Ruby's got some functional programming constructs, but no way of getting at the AST. Python was just redesigned from the ground up and there was a grand total of zero language features added to support parallelism. Javascript was just moved to 2.0 and is similarly lacking. What's so criminal about this omission is that it is so easy to rectify. When you call "eval" an AST is created somewhere. It's simply a matter of exposing it before converting it into executable code. I just wrote a rudimentary chess AI in C# 3.0. I achieved just short of 4x speed increase on my quad-core simply by adding a single line of code and referencing the new ParallelFX CTP. I challenge anyone to achive a similar level of code clarity and performance with the dynamic languages en vogue today. </Rant>

About Me

My photo
I'm a software developer who started programming at age 16 and never saw any reason to stop. I'm working on the Presentation Platform Controls team at Microsoft. My primary interests are functional programming, and Rich Internet Applications.