Tuesday, April 22, 2008

Using Operators with Generics

Many .NET users have no doubt run into issues using math functions with generics. Let's say you want to write a Matrix class. You would like to make it a generic class so that you can use it with doubles, ints, and so on. The definition might look like this:

Attempt to compile this will fail. The compiler chokes on the following line:

output[x,y] = (value1.numbers[x,y] + value2.numbers[x,y]);

The compiler can't confirm that type T will have the + operator. Normally the way you resolve this is by using a constraint. However there is only two kinds of constraints: type-based and constructor-based. The operators aren't part of an interface and I'm not sure they should be. A better solution would have been for C# to allow operator-based constraints. After all, an exception is made for constructors. They may have decided against this route for CLS compliance. We'll probably never know.

All this conjecture doesn't get us any closer to our Matrix class. There are two ways of getting what we want. The first is reflection. In fact, if you are using late binding VB.NET will compile a modified version of the code above happily and resolve the operator using reflection at run-time. Reflection is slow though and typically we want math operations to run quickly.

That leaves us with code generation. I've been talking alot about Lambda Expressions in recent posts. Lambda Expressions are a new .NET 3.5 feature that makes it easy to generate methods on the fly quickly. The code below is a refinement of the work done by RĂ¼diger Klaehn. He uses low-level IL generation API's available in .NET 2.0. Observe how his code can be simplified dramatically by using lambda expressions instead.

This class creates and compiles expressions for each of the operators (only addition shown above for brevity). The lambda expression worries about exactly which add method to bind to based on the parameter types when it is compiled into a delegate. Pretty readable eh? Now let's rewrite our Matrix class.

Notice that the consumer of our API doesn't need to futz about with the Num class at all. They use the Matrix exactly as they would expect to:

var leftMatrix = new Matrix<int>(new[,]{ {2,2,1},{5,2,1} });
var rightMatrix = new Matrix<int>(new[,]{ {1,2,4},{1,9,1} });

var newMatrix = leftMatrix + rightMatrix;

So what's the catch? You lose static typing. If you parameterize the Matrix class with a type that doesn't have the operators defined it will trigger a run-time error. There is no way around this because there is no way of confirming the operators exist at compile-time. Does this make you uncomfortable? Get used to it.

Many statically and dynamically typed languages are gradually moving towards a new model: Static Typing Where Possible, Dynamic Typing When Needed. VB.NET is already there with its optional late binding. A similar feature is being discussed for C#. If you do a lot of work with run-time code generation you will begin to notice two things:

1. You will have to compromise on compile-time type safety more and more.
2. You will find yourself caring less and less.

Static typing is a tool, not a religion. Frankly it was never very useful for assuring program correctness and if you are test-driven it is even less useful. Static typing is most useful as metadata for your development tools. It often makes sense to live without it in specific cases where doing so prevents you from repeating yourself. Using code generation to avoid writing identical Matrix implementations for each numeric base type is an excellent example. Can you think of any others?

About Me

My photo
I'm a software developer who started programming at age 16 and never saw any reason to stop. I'm working on the Presentation Platform Controls team at Microsoft. My primary interests are functional programming, and Rich Internet Applications.