unfold

Tuesday, May 6, 2008

Complaining about .NET 3.5

I been doing some pretty serious functional programming with the .NET framework since the LINQ preview was released early last year. After rougly a year of use I feel that I now have the experience to confidently make a list of complaints. This is by no means a rant. The .NET framework is excellent but the following omissions are really needling me. 1. No IRandomAccessCollection<T> interface. We need a way of indicating that a collection is randomly accessible. This means it has an O(1) method to access a given element in an array and an O(1) count property. IList<T> and System.Array should both inherit from it. Often I want to perform a stream operation that requires random access to elements in order to do its job efficiently. Either an Array or a List will do. Copying a stream into an array is potentially expensive. If the stream is already an Array or a List it's worthwhile to attempt a run-time cast because it may save considerable time. As a result I end up doing two casts and writing an adapter. This is essentially what's done in ParallelFX today and I overhead that Joe Duffy was looking for feedback as to whether an interface like this one should be added to the BCL. Yes please. 2. No BigInteger. By this I mean an integer that grows to any size (until it uses up available memory). It was announced that this would be introduced in .NET 3.5 but it was made internal at the last second. Blog posts asking for an explanation were ignored. I believe I understand the reasoning. There is always a way of avoiding using a BigInteger and producing a more efficient algorithm. That said, sometimes I want to trade efficiency for elegance, reliability (no overflows), and speed of development. That's a decision that should be left up to me. Occasionally the lack of a BigInteger is inconvenient enough to make me use IronPython instead of C# for certain programs.* 3. Cost of Concat. Concatenating IEnumerable's together is costly. This is unacceptable because streams are a fundamental C# idiom. Wes Dyer explains more fully in this blog post. This problem has really affected the efficiency and elegance of my functional algorithms. To address this problem the following new C# syntax has been proposed: yield return someValue yield foreach someIEnumerable; I hope this is adopted. This is a big issue. 4. Having to implement IEnumerable.MoveNext() every time I implement IEnumerable<T>. C# already has syntactical support for streams (yield). Would it have been so hard for the compiler to generate this function if it didn't already exist? I know this is probably a non-starter and not that big a deal, but frankly it's an embarrassing amount of cruft to have to write. 5. Having to read/write "IEnumerable<IEnumerable<string>>" is awful. In the same vein as the previous complaint, why not provide syntactical support for declaring streams. C-Omega, the precursor to C#, used the asterisk to indicate a type was a stream in a similar way that [] indicates a type is an array. Streams are so pervasive in C# that I believe they are worthy of the same syntactical support as an array. I know the asterisk is out because C# already uses them for pointer arithmetic, but what about this: T[...][...] GetPermutations<T>(this T[...] stream) { // ... } That's just off the top of my head and there may be a better syntax but you see my point. 6. Lack of a non-null modifier. One of the unfortunate but necessary attributes of C# is that it has the concept of nullness. This is necessary because C# wants to play well with unmanaged code. Nevertheless nullness breaks polymorphism. A great way of mitigating this issue is to provide compiler support to prevent NullReferenceExceptions. C-Omega allowed you to add "!" to the end of a type and the compiler would enforce that a null value couldn't be assigned to it. Customer! cust = null; // compiler error I don't understand why this wasn't added to C#. I assume there's a good reason. Perhaps someone could explain it to me? That's all I can think of at the moment. I'll no doubt update this post as I run across further issues. *I like Python. I do think C#'s query comprehensions and its ubiquitous stream monad make it better suited for functional programming though.

Saturday, May 3, 2008

Multimethods in C# (Part 1)

Have you ever needed to perform an action based on the run-time type of one or more objects? Let's take a hypothetical spaceship game for example. There are three types of objects: an Asteroid, an X-Wing and a TIE-Fighter. They are related in the following inheritance hierarchy:

We have a collection of SpaceObjects that we update in a tight loop. We need to handle collisions between these objects. For example if a TIE-Fighter collides with an asteroid we might want to apply damage to the space ship and replace the asteroid with several smaller asteroids. One way of doing this is to determine the concrete type of each SpaceObject and then call one of several overloads to handle the collision:


bool Collison(SpaceObject leftSpaceObject, SpaceObject rightSpaceObject)
{
TieFighter tieFighter = leftSpaceObject as TieFighter;
Asteroid asteroid = rightSpaceObject as Asteroid;
if (tieFighter != null && asteroid != null)
{
  return Collision(tieFighter, asteroid);
}

XWing xWing = leftSpaceObject as XWing;
if xWing != null && asteroid != null)
{
  return Collision(xWing, asteroid);
}

//try again with reversing left and right parameters

// and on and on for every combination of every object in every order...
}

bool Collision(TieFighter tieFighter, Asteroid asteroid)
{
// handle collision
}

bool Collision(XWing xwing, Asteroid asteroid)
{
// handle collision
}

The method that dispatches to the correct overload smells. It is error-prone because you have to be careful to cast in order from the most derived to least derived classes. Failure to do so will not result in the most correct method being selected. It is also repetitive because all of the information required to write this code can be inferred from information you've already declared in your overloads and your class definitions. It's also a maintenance nightmare because if we make any changes to the class hierarchy we will need to remember to update the method.

Multimethods

Essentially we want virtual method dispatch but we need it on multiple types and we need it bound at run-time because the compiler can't know enough information to do it at compile-time. This is exactly what multimethods allow us to do. In a compiler that supports multimethods you simply declare the various overloads and the compiler ensures the correct overload is called. Here's what this might look like if it were added to C#:


// when invoked this checks argument types and dynamically dispatches to overloads below
multimethod bool Collision(SpaceObject obj1, SpaceObject obj2);

bool Collision(TieFighter tieFighter, Asteroid asteroid)
{
// handle collision
}

bool Collision(XWing xwing, Asteroid asteroid)
{
// handle collision
}

bool Collision(TieFighter tieFighter1, TieFighter tieFighter2)
{
//handle collision
}

// and so on for all cominbations...

Unfortunately there is no "multimethod" modifier in C# and it's quite unlikely there ever will be. Thankfully C# 3.0 does have a feature which allows us to seamlessly add this feature: Expressions.

Adding Multimethods to C#

Let's create a library that examines our class hierarchy and a group of method overloads using reflection and then creates the method that does the dynamic dispatch for us. We'll design it so it has the following API:


var collision = DynamicDispatch.CreateFunc<SpaceObject,SpaceObject,bool>((obj1, obj2) => this.Collision(obj1,obj2));

// dispatches to Collision(XWing xWing, TieFighter tieFigher)
collision(new XWing(), new TieFighter());

What's going on here? The CreateFunc method accepts a lambda in which the most abstract version of the collision method is invoked. It returns an instance of Func<>, a new delegate type introduced in C# 3.0 that wraps a method that returns no arguments. How do I manage to generate the code required to do dynamic dispatch from a lambda function? I'll cover that in the next installment. Stay tuned. :-)

Tuesday, April 22, 2008

Using Operators with Generics

Many .NET users have no doubt run into issues using math functions with generics. Let's say you want to write a Matrix class. You would like to make it a generic class so that you can use it with doubles, ints, and so on. The definition might look like this:

Attempt to compile this will fail. The compiler chokes on the following line:

output[x,y] = (value1.numbers[x,y] + value2.numbers[x,y]);

The compiler can't confirm that type T will have the + operator. Normally the way you resolve this is by using a constraint. However there is only two kinds of constraints: type-based and constructor-based. The operators aren't part of an interface and I'm not sure they should be. A better solution would have been for C# to allow operator-based constraints. After all, an exception is made for constructors. They may have decided against this route for CLS compliance. We'll probably never know.

All this conjecture doesn't get us any closer to our Matrix class. There are two ways of getting what we want. The first is reflection. In fact, if you are using late binding VB.NET will compile a modified version of the code above happily and resolve the operator using reflection at run-time. Reflection is slow though and typically we want math operations to run quickly.

That leaves us with code generation. I've been talking alot about Lambda Expressions in recent posts. Lambda Expressions are a new .NET 3.5 feature that makes it easy to generate methods on the fly quickly. The code below is a refinement of the work done by Rüdiger Klaehn. He uses low-level IL generation API's available in .NET 2.0. Observe how his code can be simplified dramatically by using lambda expressions instead.

This class creates and compiles expressions for each of the operators (only addition shown above for brevity). The lambda expression worries about exactly which add method to bind to based on the parameter types when it is compiled into a delegate. Pretty readable eh? Now let's rewrite our Matrix class.

Notice that the consumer of our API doesn't need to futz about with the Num class at all. They use the Matrix exactly as they would expect to:

var leftMatrix = new Matrix<int>(new[,]{ {2,2,1},{5,2,1} });
var rightMatrix = new Matrix<int>(new[,]{ {1,2,4},{1,9,1} });

var newMatrix = leftMatrix + rightMatrix;

So what's the catch? You lose static typing. If you parameterize the Matrix class with a type that doesn't have the operators defined it will trigger a run-time error. There is no way around this because there is no way of confirming the operators exist at compile-time. Does this make you uncomfortable? Get used to it.

Many statically and dynamically typed languages are gradually moving towards a new model: Static Typing Where Possible, Dynamic Typing When Needed. VB.NET is already there with its optional late binding. A similar feature is being discussed for C#. If you do a lot of work with run-time code generation you will begin to notice two things:

1. You will have to compromise on compile-time type safety more and more.
2. You will find yourself caring less and less.

Static typing is a tool, not a religion. Frankly it was never very useful for assuring program correctness and if you are test-driven it is even less useful. Static typing is most useful as metadata for your development tools. It often makes sense to live without it in specific cases where doing so prevents you from repeating yourself. Using code generation to avoid writing identical Matrix implementations for each numeric base type is an excellent example. Can you think of any others?

Sunday, January 6, 2008

Misunderestimating C# 3.0

Dare Obasanjo, who's posts on C-Omega I've enjoyed very much in the past, has written a new article called Does C# 3.0 Beat Dynamic Languages at their Own Game? He does a good job of pointing out the way that C# 3.0 adds features commonly associated with Dynamic Languages (tuples, adding methods to existing classes) without abandoning static typing.

Unfortunately he misses the mark with his criticism of type inference. Take a look at his code sample:

Look at the portion highlighted in red. Dare laments that he cannot write new {Weight = voteFunc(item), Item=item, FeedTitle = feedTitle } and must instead create a Vote class. The reason for this is that he is adding votes to a List and to instantiate this list he must parameterize the generic List with the Vote type.

Now let's rewrite that code snippet using the idioms of C# 3.0:

I wrote the ToProjectedDictionaryOfLists function. It is an extension method on IEnumerable<T> and it accepts two lambda functions. One that accepts an item in the stream and returns a key for the dictionary and another that accepts an item in the stream and converts it into a different representation ("projects" it). Using these functions, a key and value is derived for each item in the stream. This information is used to populate the dictionary with lists of items that share the same key value.

It's important to notice four things about our replacement code:

1. We've specified no type information (type inference in full effect)
2. We've liberated the code that moves the data into a hash table from the code that creates the code.
3. We've abstracted the action of creating a dictionary of lists from a stream of data into a reusable function.
4. We've expressed the problem in such a way that it can be trivially parallelized.

In previous blog posts I mused that Linq's superficial similarity to SQL might keep developers from realizing how widely applicable it is. Dare mentions Linq in the article but fails to recognize that he could use it to improve his example code in every measure.

This is really about functional programming vs. imperative programming. Because of its inherent advantages developers should always try and solve a problem functionally first and fall back to imperative solutions only where the result is more readable. In C# a functional solution was rarely more readable because the language lacked the idioms to support it. C# 3.0 makes huge strides in this area. This should have a profound effect on the way you code. If your C# 3.0 code looks like your C# 2.0 with "var" statements and the occasional Linq query against a database you've got to some remedial reading to do. :-) Trust me, you'll be glad you did.

Here is the code for the ToProjectedDictionaryOfLists function:

Tuesday, December 4, 2007

ParallelFX CTP released

The ParallelFX (PLINQ) CTP was also released yesterday. Not so coincidentally I also purchased a quad core yesterday. Microsoft has created a new site dedicated to parallelism. I will be playing with ParallelFX a lot in the weeks to come and also with Haskell. For a neat example of how to do parallelism in Haskell see this link.

Tuesday, September 18, 2007

Finally some news on PLINQ

It's here (almost)! PLINQ is a technology that allows developers to parallelize their LINQ queries. Check it out at MSDN. Competing platforms are so far behind .NET in this area it's laughable. They're still debating whether to add first-class closures to Java and Ruby still doesn't support native threads. By coaxing developers into writing functional code with LINQ Microsoft has done something truly brilliant. New programs written in .NET 3.5 will be ready to be parallelized just as Moore's law squeezes processing power in its icy grip. Imagine adding a single query operator to your query and seeing speed increases of 30% and more! Also see PTasks, a library for parallelizing procedural code. I can't wait for .NET 3.5!!

Thursday, September 6, 2007

The Luddite Developer

Throughout the short history of commercial software development we see the same dynamic repeat itself again and again: 1. New abstractions are introduced which increase developer productivity. 2. Developers resist these abstractions because they a) legitimately need to maintain low-level control to satisfy requirements OR b) they don't want to learn the new abstractions and tools required to use them 3. The developers that resist for reason b either suffer or eventually capitulate because of competitive pressure. When C was created many assembly developers resisted using it even though the benefits were obvious. Many C++ developers still resist writing client applications in Java and C#. Many Java developers still resist using slower, more powerful languages for writing web applications where performance is far less important. In some cases these decisions are justifiable (see reason a) but they are edge cases.

When a developer chooses, all other things being equal, to use a low-level abstraction instead of a high one it’s worth asking them why. If they attempt to justify their decision on the grounds of “simplicity” chances are that they are motivated by reason b). High-level abstractions are simpler than low-level ones by definition. They reduce code size and increase code clarity. Most of the time what the developer is really saying is “I didn’t want to learn anything new.” So why do some software developers resist learning new things? It’s not laziness. A developer who manually writes the same code with slight variations a hundred times and spend countless hours maintaining it can hardly be accused of being lazy. The real reason is that they are afraid of new technology.

So how can a software developer possibly be afraid of technology? Two words: cognitive dissonance. These individuals spend their lives creating technology to improve the productivity of accountants, managers, and housewives but insist on coding GUI’s by hand. They use REST instead of web services because they don’t want to learn how to generate a proxy. They’ll write reams and reams of repetitive code instead of learning a new domain-specific language and every argument they’ll use to justify it can also be used to argue that they should get rid of their compiler and write bytecode by hand.

It’s easy to understand why software developers can be afraid of new technologies. Software development is extremely challenging for many reasons. Once having successfully developed software using one set of abstractions a developer would be wise to consider carefully the decision to learn new ones. This is why it’s so important to familiarize yourself with a variety of different platforms, technologies, and development approaches before you are put on a deadline. Once you understand the strengths and weaknesses of a variety of development paradigms, platforms, libraries, and tools you can stop wasting time fearing the unknown and start fearing the known.

Do you know any luddite software developers?