Tuesday, March 20, 2007

LINQ to Python

*NOTE: My concerns about Python's chops as a functional programming language have been largely addressed by some helpful posters in the comments for this article.* April 01, 2007

Recently I argued that Python's whitespace-aware structure made it inhospitable to functional programming. It occurred to me that I should try and demonstrate the problem by partially porting Microsoft's LINQ library to Python. After all, code speaks louder than words :-).

Here is the code:

class Enumerable:      

   def __init__(self, func):
      self.func = func

   def __select__(self, func):
      for item in self.func():
         yield func(item)

   def __where__(self, func):
      for item in self.func():
         if func(item):
            yield item

   def __join__(self, stream):
      for left in self.func():
         for right in stream.func():
            yield (left, right)

   def __take__(self, count):
      iterator = self.func()
      for counter in xrange(count):
         yield iterator.next()

   def __skip__(self, count):
      iterator = self.func()
      for counter in xrange(count):
         iterator.next()
      while 1:
         yield iterator.next()
      
   def take(self, count):
      return Enumerable( lambda: self.__take__( count) )

   def skip(self, count):
      return Enumerable( lambda: self.__skip__( count) )

   def join(self,stream):
      return Enumerable( lambda: self.__join__( stream) )

   def __iter__(self):
      return self.func()

   def select(self, func):
      return Enumerable( lambda: self.__select__( func) )

   def where(self, func):
      return Enumerable( lambda: self.__where__( func) )

   def to_list(self):
      list = []
      for item in self.func():
         list.append(item)
      
      return list


def iterate(initialValue, func):
   while 1:
      yield initialValue
      initialValue = func(initialValue)

def zip(enum1, enum2):
   return Enumerable(lambda: __zip__(enum1,enum2))

def __zip__(enum1, enum2):
   iter1 = enum1.__iter__()
   iter2 = enum2.__iter__()

   while 1:
      yield (iter1.next(), iter2.next())


This library essentially adds lazy evaluation to stream operations in Python. This allows me to write the fibonnaci set in the same way that I would write it in Haskell:


def fibs():
   yield 0
   yield 1
   fibsEnum = Enumerable(fibs)
   for value in zip (fibsEnum, fibsEnum.skip(1)).select(lambda item: item[0] + item[1]):
      yield value


fibNums = Enumerable(fibs)
fibNums = fibNums.take(10)
fibNums.to_list()


The code above prints:


[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]


The example above is a terrible implementation of the fibonnacci set but is a good demonstration of laziness. This library accomplishes the following:

  1. Adds lazy evaluation to stream operations

  2. Creates wrapper object around iterator function, allowing new stream operations to be added using metaclasses which...

  3. Allows stream transformations to be written in the order they are executed with the first operation occuring on the left and the last on the right.



As you can see the limitation that lambda functions can only be one line long necessitates two function definitions for each stream transformation function. All these additional identifiers produce a lot of noise in the code that detracts from its readability. Thankfully that noise can be hidden in the class definition.

The ability to write stream transformations in the order of execution (left-to-right) greatly improves readability in my opinion. However the issue of having to put an entire expression on a single line is still an insoluble problem.

I would like nothing better than to hear that I am totally wrong, there is a much better and more Pythonic approach, that I just don't get it, or some such thing.

Any skilled Python programmers who have a better approach that addresses the readability issues I outline?

Friday, March 16, 2007

Finally an open-source Lisp IDE made for humans

I just tried Cusp, an Eclipse plug-in for Lisp. It's phenomenal :-) It's got intellisense and it's oh so pretty. I have always loved Lisp but could not stand emacs. I also thought that the only worthwhile IDEs were commercial. This solution is just what I was looking for. It looks like it was just released a few months ago! I've been waiting for one of the rocket scientists in the Lisp community to take some time off from pushing the envelope just long enough to put some work into the tool set! This IDE goes a long way towards making it as friendly as the more popular languages. I can't wait to play with it again!

Monday, March 12, 2007

Dreaming of PLINQ

PLINQ is project that aims to parralellize your LINQ queries so that they can make use of the extra processing power available on multi-core machines. I'm very much hoping that it will be released in the Orcas timeframe because it's such a killer app. It will give .NET a HUGE advantage over most other mainstream languages including Python, Javascript, Ruby, and Java. People tend to believe that LINQ queries are only good for data access because they look so much like SQL. In fact a huge number of general programming problems that have nothing to do with database access can be expressed as queries. For example here is a function that multiplies two matrices:



I wrote the To2DimensionalArray method and here is the implementation if you are interested:



If you wrote this algorithm using nested for loops it would be a set of instructions to build a machine that would give you the answer. On the other hand the query is the answer. The importance of this distinction cannot be overstated. Due to the fact that the query describes the intent of the algorithm the computer is free to determine how to best execute it. The PLINQ query processor will analyze the query at run-time, giving it access to information about available system resources and allowing it to formulate a strategy that makes the best possible use of them. It could conceivably be implemented differently every time your query is run depending on how many processors are idle, how much memory there is, etc! This is yet another good reason to express every algorithm possible as a LINQ query. The more processors are added to desktops in the future, the faster your code will run.

The key to this process is the Expression Tree. PLINQ converts your query code into an expression tree in order to analyze it. I predict that languages without the ability to morph code into data will either adapt or be left in the dust in the future. There are simply too many powerful uses for this technique.

Thursday, March 8, 2007

The Joy of Stealing

I'll never forget the first time I saw the fibonnacci set written in Haskell:

fibs :: [Int]
fibs = 0 : 1 : [ a + b (a, b) <- zip fibs (tail fibs)]

"What the heck is 'zip'?" I wondered. "That's the problem with all these academic types, they're always picking these totally uninituitive names like....oooooooohhhhh."

Then it clicked. Zip accepts two streams, takes each corresponding item in the two streams, merges them into a struct, and then emits a stream of structs. If you visualize this process you will find it looks awfully like zipping up a zipper. Hence the name.

This is a very useful function as it turns out. I was dissapointed to find that it is missing from C# 3.0 given the fact that most of the other Haskell stream functions are present and accounted for. "No problem" I said aloud to no one in particular. "I'll just write it myself."

Then I found out why it was left out. Even though C# 3.0 can create anonymous structs it can't return them from a function for the simple reason that it's a strongly-typed language and you can't refer a type's name if it doesn't have one. Luckily Wes Dyer explains that there is a solution to this problem known as "mumbling."

Basically if we pass a generic function a prototypical instance of the anonymous type which we want to return C# will be able to infer the type name transitively. Notice that mumbling only solves the problem of how to write the function definition. It doesn't solve the problem of how to actually create and return an anonymous type from within the function. Sure, I could probably instantiate it given that all anonymous structs have a default constructor. However I can't exactly set arbitrary property values on an instance of "T" now can I?

Expression Trees to the rescue. I examine the anonymous type's structure using reflection and generate a function. This function accepts one item from each stream and returns an instance of the anonymous type. Here's the code:






To help link the PropertyInfo objects with the corresponding lambda parameters when creating the expression tree I stole the EachWithIndex function from Ruby. Notice though that it would have been more elegant to merge the streams with Zip instead of using EachWithIndex. Unfortunately I have to bootstrap the language first. :-)

Here's an example of me using the Zip function to find the square root of 2:





The IterationStream is the C# equivalent of the Haskell iterate function. It simply takes a function, an initial value, and each result in the stream is equal to the result of applying the function to the previous result. Don't you love stealing good ideas from other languages?

Just for fun, here's the preceeding program in Haskell.

nextGuess :: Double -> Double
nextGuess guess = ((2.0 / guess) + guess) / 2.0

guesses = iterate nextGuess 1.0

guessAndPreviousPairs = [ (guess, previousGuess) (guess, previousGuess) <- zip (tail guesses) guesses] guessesWherePairsAreEqual = filter (\ (guess, previousGuess) -> guess == previousGuess ) guessAndPreviousPairs

answer = fst (head guessesWherePairsAreEqual)


Despite the unnecessary prototype object passed to the Zip function in the C# version I still find it to be easier to read than the Haskell version, largely due to C#'s query comprehensions. Beauty is in the eye of the beholder though. :-)


*Note: Ideally Zip would cache the generated function, making subsequent calls as fast as any other stream operator. That said I wanted to keep the example simple for the purposes of demonstration.

Wednesday, March 7, 2007

Symbols in C# 3.0 (reloaded)

Someone pointed out in the comments that the GetPropertySymbol function I wrote in a previous post did not handle nested member expressions. For example this.GetPropertySymbol( o => o.Person.Name ) would only return "Name", not "Person.Name". At first I just told him how to do it. Then I decided to do it because I could use the opportunity to demonstrate some cool C# 3.0-style functional programming.





I wrote the algorithm in functional style to challenge you to start thinking this way. The Iterate method takes a function and an initial value and then returns that initial value and then the result of passing the value to the function, the result of which is then passed back to the function for the next value and so on and so on. I stole this function from Haskell. It works this way...

Iterate( x => x + 10, 0)

...returns the following stream...

0,10,20,30...

The initial member expression is passed to iterate and every time the next member expression is required the function takes the member expression and attempts to convert its inner expression to a member expression using the "as" operator. If this conversion fails the "as" operator returns null. Then the resulting data is piped through TakeWhile which ensures that the stream stops when the first null value is passed to it. Finally I aggregate the stream of member expressions into a string by concatenating the member names, separated by a period.

Pretty neat huh?

Friday, March 2, 2007

Symbols in C# 3.0

One of the languages that gets a lot of buzz nowadays is Ruby. Ruby is a dynamic, object-oriented language with a flexible syntax. The language allows you to change an object's structure at run-time by adding new methods. You can also create entirely new code by building a code string and executing it with the eval function.

i = 30
eval("i = 20")


This function alone, which is shared by Python and Javascript, is enough to enable all sorts of advanced metaprogramming. However Ruby has something that the aforementioned languages don't have: symbols. Symbols are just strings that the interpreter checks to ensure they are valid identifier names. This just means that Ruby will throw an error if you attempt to use a symbol that contains spaces, starts with a digit, or is otherwise invalid as an identifier. This is the only method Ruby provides to manipulate code as data. In the words of jazz singer Peggy Lee: "Is that all there is?" Yup.

Despite the simplicity of this feature symbols are still incrementally more useful than strings when combined with metaprogramming techniques like code generation. Typically a method is created which takes an array of symbols and generates code for each of them using eval. The advantage of using a symbol instead of a string is that your generated code is more likely to be syntactically correct.

In the following Ruby example two functions attr_reader and attr_writer are passed an array of symbols and then create accessor and mutator methods for the name, author, and duration fields in the Song class. This is equivalent to creating three properties in .NET.

class Song
attr_reader :name, :artist, :duration
attr_writer :nmae, :artist, :duration
end


Notice the ":" before each argument. That is how you create a symbol in Ruby. The observant reader will notice that in the second function I've made a typo writing ":nmae" instead of ":name". This was intentional because I wanted to point out an important fact that might not be immediately obvious. Specifically that symbols don't necessarily refer to an identifier that exists at compile-time. In fact, there is no compile-time in Ruby. It is an interpreted language. As a result the code above will not trigger an error. Remember: symbols are just strings.

When working in a statically-typed language like C# I find having to put code in a string very annoying because my IDE is powerless to point out typos like the one above. Also, my IDE cannot provide me with intellisense or rename an identifier globally for me. However certain important interfaces in the .NET framework require me to put code into strings. Consider this Customer class that implements the System.ComponentModel.INotifyPropertyChanged interface, enabling it to be data-bound to WinForms controls:

public class Customer : INotifyPropertyChanged {

private string name;
public string Name {


get { return name; }

set {
name = value;
OnPropertyChanged(this, new PropertyChangedEventArgs("Name")); // notice we have to put our property symbol in a string
}

}
protected void OnPropertyChanged(object source, PropertyChangedEventArgs args)

{

if (this.PropertyChanged != null)
this.PropertyChanged(source, args);

}

public event PropertyChangedEventHandler PropertyChanged;

}



In this case it would be really nice to have a symbol-like construct in C#, preferably one checked for correctness by the compiler. Thankfully in C# 3.0 writing a function that will return the string representation of a symbol is trivial. You can replace the applicable line above with:

OnPropertyChanged(this, new PropertyChangedEventArgs (this.GetPropertySymbol(o => o.Name)));

This is just beautiful. Intellisense even kicks in as we type the expression:



Even if we somehow manage to make a typo with intellisense helping us, the program will not even compile. Also we can now do a global rename of the property using refactoring tools.

How does it work? GetPropertySymbol is an extension method that takes an expression tree as its argument. I've blogged about expression trees extensively here and here so I'll assume you know all about them. You do read my blog religously don't you ;-)

Did you notice that we didn't have to specify any type information and yet intellisense still managed to display the list of Customer members? This works because GetPropertySymbol declares that the first argument to the lambda function it is passed is the same type that the extension method is applied to. As a result C# can figure out the type with type inference!! Without further ado, here is the single line function definition:



All I'm doing is grabbing the member expression inside the lambda expression body and returning the name! This function comes in handy when doing data-binding as well:

this.DataBindings.Add (
new Binding(
this.GetPropertySymbol(o => o.Text),
this,
this.GetPropertySymbol(o => o.GameTitle)
)
);

Incidentally I hope no one gets the impression that I'm putting Ruby down. In fact, I really like Ruby. However the most interesting thing to me about the language is not it's oft-touted metaprogramming capabilities but the way it encourages you to write code in continuation-passing style (CPS). This approach can make writing certain types of client-server applications like web applications much more simple conceptually. I advise you to check the language out. Learning programming languages which approach problems in radically different ways than the ones you are used to will definitely make you a better programmer.

Thursday, March 1, 2007

Runtime macros in C# 3.0

Macros are perhaps the most powerful tool available to programmers for creating new kinds of abstractions. They allow developers to extend their programming language with new constructs and behaviors. Many languages have this feature, the most well-known of which is LISP. A macro is a function which operates on code, transforming it into different code (usually expanding it). Just before compilation the compiler parses the code into an abstract syntax tree (AST) and passes the applicable pieces to the macro function. The developer need only indicate somehow which portions of code they would like to transform with their macro. This is a useful technique if you notice recurring patterns in your code and your language does not provide you with sufficiently powerful abstractions to factor them out.

C# 3.0 does NOT add compile-time macros to the language, but allows you to do the same type of code manipulation at run-time. In this article I'm going to create a macro function that adds a useful new abstraction to C#.

Roughly a year or so ago Microsoft Research released an extension of the C# language called C Omega. It was an experimental language written by Erik Meijer (Mr. Haskell) among others and was the precursor to C# 3.0. Many cool features in C-Omega did not make it into C# 3.0. I understand why most of the omitted features were left out, but there was one feature that I really liked that didn't make the cut.

Ask yourself how many times you have written code like this:


int? friendsSonsAge = null;
if (person != null && person.Friend != null && person.Friend.Son != null)
{
friendsSonsAge = person.Friend.Son.Age;
}

OK, OK. I was never really good at short contrived code examples but the point is that often you want to know the value of a property buried deep in a hierarchy of objects, each of which may or may not be null. In most mainstream languages (Java, C, Javascript, VB.NET, C#, Ruby) you have no recourse but the ugly boilerplate code shown above.

In C-Omega you can do this:


int? friendsSonsAge = person.Friend.Son.Age;

if (friendsSonsAge != null)
// do something


If any of the intermediate objects in the expression on the right of the assignment are null the whole expression evaluates to null. This is called null propagation and it works very nicely together with nullable types. I have no idea why this feature wasn't added to C# 3.0. No matter, we can create a macro that duplicates this behavior ourselves. :-)

The necessary ingredient for a macro is the ability to turn code into data. C# exposes code as data in an Expression tree object. If you assign a lambda function to a variable of type Expression C# will build an AST of the function instead of creating an executable delegate.


Expression<Func<int,bool>> oldEnoughToDrink = (age) => age >= 19; // oldEnoughToDrink can not be executed, instead it points to an AST representing the function

C# cannot translate arbitrary expressions or code blocks into data, only lambda functions so we will need to nest our expression inside one. Behold, my null propagation macro function:

Person person = null;
int? personsFriendsSonsAge = Macros.GetValue<int>(() =>
person.Friend.Son.Age);


In this example personsFriendsSonsAge will be null because the local variable person is null. The above code does not throw a null reference exception because the lambda function is not executed, but converted to an AST, and then passed to the GetValue function. Notice how we've nested the expression inside a lambda function with no arguments:


() =>
person.Friend.Son.Age


The C# compiler knows to convert this lambda expression into an expression tree because the type of the first argument to the GetValue function is an Expression:

public static Nullable<T> GetValue<T>(Expression<Func<T>> f) where T : struct


Here is the body of the macro (click to view full-size):



The GetValue function takes the expression passed to it and wraps it in a series of AndAlso (short-circuit evaluation "&&" operator) expressions comparing each member expression to null. I build the conditional expression from the inside-out by nesting each previous AndAlso in a new one. I skip a check for null if a member expression is referencing a value type which can never be null. Finally I compile the expression tree into a function, execute it, and return the result.

The end result is that a little function is generated that looks like this:


() => (
person != null && person.Friend != null && person.Friend.Son != null) ? person.Friend.Son.Age : null;

Now there are several things to be aware of:

1. It's slow. Generating code at runtime is comparatively expensive even if it's just a little function. That's what nice about compile-time macros: the code generation happens ahead of time. That said, execution speed is on par with an eval in IronPython so it's still quite reasonable. Just don't use it in a tight loop.

2. This version of the function only works for value types. This is due to the fact that it returns the generic Nullable class which cannot be parameterized with a reference type. You can easily add another function which works with reference types by modifying the original slightly. The need to have one function for value types and another for reference types is a necessary headache if we want to preserve type safety.

3. This is an oversimplified version for the purposes of demonstration. It only handles member access expressions like form.Size.Width, but won't work if there is a method call or indexer in the expression.

int? age= Macros.GetValue<int>(() => myCustomer.Friends[0].GetBrother().Age); // will throw an exception because a call to the GetBrother method and an indexer function is made

This should not be perceived as limitation of the language, but rather a limitation on my free time :-). In fact it is completely possible to handle method calls. The trick is to ensure that you don't call the same method again and again when making null comparisons. This can be accomplished by introducing a new lambda function every time you need to store the result of a computation. I'll leave that as an exercise for the reader. :-)

The important thing to take away from this article is that converting code to data (and vice versa) has many more uses than just generating SQL from C# queries. C# has evolved into a very powerful language for metaprogramming and I recommend learning as much as you can about it so that you can leverage these new capabilities.

About Me

My photo
I'm a software developer who started programming at age 16 and never saw any reason to stop. I'm working on the Presentation Platform Controls team at Microsoft. My primary interests are functional programming, and Rich Internet Applications.