Tuesday, December 4, 2007

ParallelFX CTP released

The ParallelFX (PLINQ) CTP was also released yesterday. Not so coincidentally I also purchased a quad core yesterday. Microsoft has created a new site dedicated to parallelism. I will be playing with ParallelFX a lot in the weeks to come and also with Haskell. For a neat example of how to do parallelism in Haskell see this link.

Tuesday, September 18, 2007

Finally some news on PLINQ

It's here (almost)! PLINQ is a technology that allows developers to parallelize their LINQ queries. Check it out at MSDN. Competing platforms are so far behind .NET in this area it's laughable. They're still debating whether to add first-class closures to Java and Ruby still doesn't support native threads. By coaxing developers into writing functional code with LINQ Microsoft has done something truly brilliant. New programs written in .NET 3.5 will be ready to be parallelized just as Moore's law squeezes processing power in its icy grip. Imagine adding a single query operator to your query and seeing speed increases of 30% and more! Also see PTasks, a library for parallelizing procedural code. I can't wait for .NET 3.5!!

Thursday, September 6, 2007

The Luddite Developer

Throughout the short history of commercial software development we see the same dynamic repeat itself again and again: 1. New abstractions are introduced which increase developer productivity. 2. Developers resist these abstractions because they a) legitimately need to maintain low-level control to satisfy requirements OR b) they don't want to learn the new abstractions and tools required to use them 3. The developers that resist for reason b either suffer or eventually capitulate because of competitive pressure. When C was created many assembly developers resisted using it even though the benefits were obvious. Many C++ developers still resist writing client applications in Java and C#. Many Java developers still resist using slower, more powerful languages for writing web applications where performance is far less important. In some cases these decisions are justifiable (see reason a) but they are edge cases.

When a developer chooses, all other things being equal, to use a low-level abstraction instead of a high one it’s worth asking them why. If they attempt to justify their decision on the grounds of “simplicity” chances are that they are motivated by reason b). High-level abstractions are simpler than low-level ones by definition. They reduce code size and increase code clarity. Most of the time what the developer is really saying is “I didn’t want to learn anything new.” So why do some software developers resist learning new things? It’s not laziness. A developer who manually writes the same code with slight variations a hundred times and spend countless hours maintaining it can hardly be accused of being lazy. The real reason is that they are afraid of new technology.

So how can a software developer possibly be afraid of technology? Two words: cognitive dissonance. These individuals spend their lives creating technology to improve the productivity of accountants, managers, and housewives but insist on coding GUI’s by hand. They use REST instead of web services because they don’t want to learn how to generate a proxy. They’ll write reams and reams of repetitive code instead of learning a new domain-specific language and every argument they’ll use to justify it can also be used to argue that they should get rid of their compiler and write bytecode by hand.

It’s easy to understand why software developers can be afraid of new technologies. Software development is extremely challenging for many reasons. Once having successfully developed software using one set of abstractions a developer would be wise to consider carefully the decision to learn new ones. This is why it’s so important to familiarize yourself with a variety of different platforms, technologies, and development approaches before you are put on a deadline. Once you understand the strengths and weaknesses of a variety of development paradigms, platforms, libraries, and tools you can stop wasting time fearing the unknown and start fearing the known.

Do you know any luddite software developers?

Thursday, June 21, 2007

The last configuration handler I'll ever need...really

Scott Weinstein has a fantastic approach to deserializing configuration sections. I would like to propose an improvement though: make the class generic. Here's a simplified version:

Public Class XmlSerializerSectionHandler(Of T)
Implements IConfigurationSectionHandler

Public Function Create(ByVal parent As Object, ByVal configContext As Object, ByVal section As System.Xml.XmlNode) Implements IConfigurationSectionHandler.Create


Dim nav As XPathNavigator = section.CreateNavigator()

Dim ser As New XmlSerializer(GetType(T))

Return ser.Deserialize(New XmlNodeReader(section))

End Function

End Class

This eliminates the need to embed the type attribute in the data tag and moves it into the type declaration in the section tag. This...

<namessection type="TheMechanicalBride.Web.NamesSection, TheMechanicalBride.Web">
<name>Jafar Husain</name>
<name>Johhny</name>
</namessection>

Becomes...

<namessection>
<name>Jafar Husain</name>
<name>Johhny</name>
</namessection>

snip

<section name="namessection" type="TheMechanicalBride.Web.XmlSerializerSectionHandler `1[[TheMechanicalBride.Web.NamesSection, TheMechanicalBride.Web]]"/>


This is a minor modification but is definitely an improvement as the tag that contains the data no longer needs any awareness of the method by which it is serialized and deserialized.

Slaying CSS bugs with AJAX for ASP.NET

Recently I've been working on an ASP.NET project and am writing a whole host of custom controls. I really can't emphasize enough the power of custom controls. They are code which writes code, allowing you to build DSL's for expressing user-interface elements.

When building custom controls it's nice to make the elements in the control scale with the size of the control itself. This probably means that you will be using tables for any sufficiently complex control. Unfortunately in IE 5 and 6, if you nest a textbox inside of a table cell and make its width 100% the end of the textbox get's cut off thusly:



Attempts to correct this by specifying a value for margin-right are rudely ignored by IE. You can test this yourself by trying out the following code:


<span ><span id="ctl00_Main_PercentField1" style="display:inline-block;width:50px;"><table border="0" style="width:100%;"></span>
<span style="font-family: courier new;"> <tr></span>
<span > <td><div><input name="ctl00$Main$PercentField1$TextBox1" type="text" value="23" id="txt" style="width:100%"></div></td></span>
<span style="font-family: courier new;"> </tr></span>
<span ></table></span></span>

After confirming that this problem was a known bug I did the first thing most developers do: I bitched and moaned. There is not much web developers can do about browser CSS bugs. Or can they?

I've been doing heavy DHTML programming since the days of IE4 and am aware of a little known property called clientWidth. This property gives you access to the actual width of a UI element in pixels. All I needed to do, I figured was get the textbox using the DOM methods and set the CSS width to the clientWidth and subtract some absolute value. Sure enough it worked like a charm. Ten pixels is the magic number for some reason.



In addition to custom controls I've also been writing control extenders using AJAX for ASP.NET. I've complained about AJAX for ASP.NET in the past but I must admit it's a very elegant way of encapsulating javascript behaviors. It was child's play to create a custom extender that I could point to a textbox that was positioned inside a table. Here is the code:

FixTextBoxWidthBehavior.js
----------------
Type.registerNamespace('TheMechanicalBride')

TheMechanicalBride.FixTextBoxWidthBehavior = function(element)
{
GE.Water.Web.FixTextBoxWidthBehavior.initializeBase(this,[element]);

}

TheMechanicalBride.FixTextBoxWidthBehavior.prototype = {

initialize: function()
{
TheMechanicalBride.FixTextBoxWidthBehavior.callBaseMethod(this,'initialize')

var e = this.get_element();
e.style.width = "" + (e.clientWidth-10) + "px"


},

dispose : function()
{
TheMechanicalBride.FixTextBoxWidthBehavior.callBaseMethod(this,'dispose')


}
}

TheMechanicalBride.FixTextBoxWidthBehavior.registerClass('TheMechanicalBride.FixTextBoxWidthBehavior', AjaxControlToolkit.BehaviorBase);



------------------

FixTextBoxWidthExtender.vb
---------------------




Imports System
Imports System.Web.UI.WebControls
Imports System.Web.UI
Imports System.ComponentModel
Imports AjaxControlToolkit


<Assembly: WebResource("TheMechanicalBride.FixTextBoxWidthBehavior.js", "text/javascript")>

Namespace UI.WebControls

<ClientScriptResource("GE.Water.Web.FixTextBoxWidthBehavior", "GE.Water.Web.FixTextBoxWidthBehavior.js")> _
<TargetControlType(GetType(TextBox))> _
<RequiredScript(GetType(CommonToolkitScripts), 0)> _
Public Class FixTextBoxWidthExtender
Inherits ExtenderControlBase

Public Sub New()

End Sub

End Class

End Namespace



And that's it. The lesson: in the age of modern browsers you don't always have to accept CSS bugs. Get hacking.

Thursday, April 19, 2007

LINQ to Validation

I spent the weekend looking at the Enterprise Library 3.0 validation block. It's fantastic and long overdue. Developing this library was a no-brainer. I wrote something similar (though much less ambitious) as soon as I learned to use attributes in C# 1.0. Although it is a great framework as is I can't help but imagine ways in which LINQ could make it even better.

Validation has annoyed me more than any other problem with perhaps the exception of the object-relational mismatch . Validation logic should be applied at every tier: client, server, and database. However writing and maintaining the same validation rules in Javascript, a server-side language, and then again in SQL is so labour-intensive and thankless that I suspect few people actually do it.

Several attempts have been made to solve this problem in the past with limited success. Some code generators allow you to express business rules in a domain-specific language which is then translated into various target languages. There have also been attribute-based solutions that use XSL as the validation language. The problem with all these solutions is that they force the developer to use yet another language. This is necessary because the language has to be able to be expressed as data so that it can be analysed and converted into other languages. Unfortunately this means that the validation language has to be stored in config files or strings putting it out of reach of the type checking and refactoring tools available in most IDEs. Does this problem sound familiar? It should. It's the same problem we have with data access code today.

Now that LINQ is coming we will have two languages that are powerful enough to express any kind of validation rule, will have full IDE support, and can also be converted into data and thus into other languages: VB.NET and C#.

LINQ is the gift that keeps on giving. As I've written in this blog many times before, the applications for turning code into data extend far beyond improving database integration. Let's examine the ASP.NET PropertyProxyValidator control that comes with the validation block in EL3. You associate it with a property on an object and when the user attempts an action it does a server postback, calls the EL3 Validator object, and displays an error if one is encountered. This keeps you from having to create a custom validator control that performs the same validation you are already doing in your business object. However most validation (regular expression validation, string length validation) can be done in client-side code. The postback is unnecessary. Unfortunately performing the validation on the client-side means writing Javascript as well as server-side code (after all, the user's browser may have javascript turned off). Now imagine that, post-LINQ, a new validation attribute called ExpressionValidatorAttribute is added to the EL3 validation block:

public ZipCodeValidatorAttribute : ExpressionValidatorAttribute<string>
{
public override Expression<Func<string,bool>> GetExpression()
{
return x => new Regex(@"[0-9][0-9][0-9][0-9][0-9]").IsMatch(x);
}
}

Notice that we are returning a validation function as an Expression which means that it instead of returning a delegate it will return a data representation of the function. An ASP.NET validator control could retrieve the expression using reflection and convert it to the following Javascript function without breaking a sweat:

function (x){
return /[0-9][0-9][0-9][0-9][0-9]/.exec(x) != null;
}

It could then run this function on the client-side, avoiding a postback. On the server-side the validation object would just compile the expression and run it (which could be done once and cached). This would allow us to use the same code for validation on the client and server-side!

Javascript is actually a very powerful functional language. You can even translate LINQ queries to it without much effort. In fact the only validations that could not be converted to Javascript would be those that require access to some server-side resource. In practice this is rare.

Almost all validation logic is functional because it doesn't modify the state of the objects it inspects. It takes some data as an input and returns a single boolean. Therefore most validation can be expressed using lambda functions and can therefore be converted to lambda expressions. Voila! We've solved the problem of duplicated validations without having to rewrite code in different languages, learn a new language, or leave the comfort of Visual Studio.

Wednesday, April 11, 2007

Haskell for C# 3 Programmers

I've been occupying myself as best I can, waiting impatiently for .NET 3.5 to come out. In the meantime I've been playing with Lisp and Haskell, two of the languages with the most cache among the smart folks who hang out at Lambda the Ultimate. Well it’s happened: I've fallen head over heels for Haskell. It's just so beautiful. It is a pure functional language and is therefore limited in the kinds of operations that can be performed. As a result a little bit of well-thought out syntax to support these operations goes a long way towards improving readability.

It occurs to me that it is worthwhile to introduce C# users to Haskell because many of the improvements coming in C# 3.0 (and many of those introduced in C# 2.0) are actually inspired by Haskell. Learning Haskell is a great way of training yourself to think functionally so you are ready to take full advantage of C# 3.0 when it comes out. There are a variety of Haskell tutorials available on the internet so I will just focus on some similarities between C# and Haskell with the aim of deepening your understanding of both.

Type Inference

Haskell and C# 3.0 both have type inference. In C# 3.0 you write this:
var number = 3;

In Haskell you write this:

number = 3

Haskell’s type inference is a quite a bit smarter than C#'s, but I’ll give you examples of that later.

Lazy evaluation and Streams

In C# I can create a stream of all natural numbers like this:

IEnumerable<int> NaturalNumbers()
{
var i = 0;
while (true){
yield return i;
i++;
}
}

The reason that this loop does not continue forever is that the algorithm stops as soon as a yield command is reached and resumes where it left off when it is started again. In Haskell you can duplicate this behavior by using the colon operator to construct streams.

nat :: [a]
nat = 0 : map (\x -> x +1) nat

This function specifies that the first item in the stream is 0 and then the next result is a list derived from chasing the function's tail and applying a lambda to each item. The map function is equivalent to select in C# 3.0. The first item will be 0, the next will be 1 because the previous was 0, and the next will be 2 because the previous was 1, and so on. The expression on the left side of the colon is an item and the right side is a list. You can build lists like this in Haskell:

0: 1: 2 : 3: 4 : [] -- [] is an empty list in Haskell

This syntax is useful because it signals that the algorithm can stop as soon as it gets a value that it needs, just like yield. It is also very useful to express lists using this colon separated syntax when you are creating recursive list processing functions but I’ll explain that later. For now just make sure you can recognize this syntax as the construction of a list.

Functions, Functions, Functions

It probably won’t come as a shock to you that functional programming relies heavily on functions. Consequently Haskell provides all sorts of useful syntax for manipulating them. The thing that caught me off guard when I first tried to learn Haskell was the function signatures. Look at this example of a function that adds two integers:
add :: Integer -> Integer -> Integer
add x y = x + y

Does this seem odd to you? You might have expected to see something like this:
add :: (Integer, Integer) -> Integer
add (x,y) = x + y


In fact the definition above will work, but it's not really the preferred way of defining functions in Haskell. If you read the first function signature left to right it would say: “add is a function that accepts an integer, which returns a function that accepts an integer, which returns an integer.” One of Haskell's elegant features is that you can take a function that accepts n arguments and convert it into a function that accepts n-1 arguments just by specifying one argument. This is known as partial application. For example given the first add definition I can do this:
addone = add 1 -- returns function that accepts 1 argument

addone 5 -- this returns 6

In Haskell all functions accept a maximum of one argument and return one value. The value returned with be another function with one less parameter until all the arguments have been specified in which case the output value is returned. As I’ll explain later this behavior is very useful when writing certain types of recursive functions.

Recursion

Pure functional languages have no loops. Instead they rely on recursion to repeat operations. Recursive functions have a terminating case and an inductive case. As a result you often see code like this (C#):

public int Add(int x, int y)
{
if (x == 0) // terminating case
return y;
else
return Add(x-1,y+1); // inductive case
}

Haskell allows you to specify the terminating case in a different definition than the inductive case. This is called pattern matching and results in a much more declarative style of coding:


{-- function signature (takes two Nums, returns a Num) --}

add :: Num a => a -> a -> a
add 0 y = y -- terminating case
add x y = (add (x-1) (y+1)) -- inductive case

Take a look at the add function’s signature. add is a generic function. Pretend that "a" is "T" and “Num a =>” is "where T : Num" and it will be a lot clearer. In fact our add function will work on all numeric data types that implement the Num interface (Integer, Double, etc). This doesn’t work in C# because you can’t include operator definitions in an interface. In Haskell, operators are just like any other function and are not bound to a particular class. They just use a symbol instead of a proper identifier. This allows you to include them in interface definitions and thus use them in generic functions.

Now some of you might be asking “Isn’t all this recursion awfully expensive?” No, because Haskell and many other functional languages are equipped to recognize a particular kind of recursion known as tail recursion. When Haskell recognizes this type of recursion it translates it into a simple loop so that the stack doesn’t grow. For instance our C# version of Add above could be translated to the following by a smart compiler:

public int Add(int x, int y)
{
do {
if (x == 0) // terminating case
return y;
else
{
x = x-1;
y = y+1;
}
} while(true);
}

So how do you tell when a function is tail recursive or not? It’s easy. If a function ends with a call to the same function it is tail recursive. The add function defined above is tail recursive because at the end of function add is called again. Some functions are not tail recursive which means they do grow the stack. Take the following flatten function which takes a list of lists and flattens it into a list:


flatten :: [[a]] -> [a]
flatten [] = [] –- if list is empty, return empty list
flatten (y:ys) = y ++ (flatten ys)



Notice that the last operation to execute is the list concatenation (++) function. This means that the concatenation function must wait for a result from the recursive call to flatten before it can finish its computation. As a result flatten will consume both time and stack space because it will have to keep track of where we are in each previous recursive call.

The good news is that we can take many recursive functions and convert them into tail recursive functions using a straightforward technique. We introduce an accumulator function that adds a new parameter and uses it like a variable to store the computation done so far.

flattenAcc :: [a] -> [[a]] -> [a]
flattenAcc acc [] = acc
flattenAcc acc (y:ys) = flattenAcc (acc ++ y) ys

flatten = flattenAcc [] –-partial application

I want you to notice that we are using the colon list syntax in the inductive case definition of flattenAcc. This is a great example of pattern matching. We are effectively saying that in this version of flattenAcc y represents the first item in the list passed as the second parameter and ys represents the rest of that list. Haskell uses its own syntax in its pattern matching expressions resulting in very readable code.

In our revised version the last procedure called in flattenAcc is flattenAcc. This makes our function tail recursive. In order to accomplish this flattenAcc effectively stores the state of the flattened list so far in the newly introduced "acc" function parameter. Remember that when a tail recursive function get translated into a loop the function parameters act like variables. Note that we don’t need to specify a type signature for flatten because Haskell infers it.

*Note: Derek Elkins and Cale Gibbard point out in the comments that tail recursive functions are not always preferable to recursive ones because of Haskell's lazy evaluation. If you can write a non-recursive function that uses the colon syntax it is probably better than a tail recursive one that doesn't. This is because Haskell's lazy evaluation enabled you to use the non-tail recursive version on an infinite stream without getting a stack overflow.
Now you’re familiar with the basic concepts of functional programming. Next time we’ll look at features that inspired C# query comprehensions and nullable types.

Thursday, April 5, 2007

Lowered Expectations: AJAX for ASP.NET

I'm underwhelmed by AJAX for ASP.NET. It is not a very powerful set of tools for application developers. Instead it seems to be most useful as a way of standardizing the way ajax controls are written. Don't get me wrong. I'm all for standardizing the way common operations are performed. That's a big part of what has made .NET such a productive platform. However developers who believe that downloading AJAX for ASP.NET will make writing Web 2.0 applications as easy as ASP.NET made it to write Web 1.0 applications will be disappointed.

I have little doubt that the most commonly-used control will be the update panel, because it is the least disruptive way of adding AJAX to ASP.NET. You just surround your controls with it and voila, magically the page doesn't refresh any more. Unfortunately it's little more than a hack. It's not really Web 2.0 in the sense that it doesn't expose a service that is consumable by a variety of clients. It still sends presentation information mingled with data to the client. It basically just sends a smaller portion of the page and its view state to the client which then rewrites the page using the DOM.

I'm going to say something controversial now: page refreshes are not that big a deal. Before you accuse me of heresy let me remind you that ASP.NET has had the option to maintain scroll position after a page refresh since 2.0. After a (hopefully) brief flash the user sees exactly what they would have seen if the update panel had updated the page asynchronously. Often preserving scroll position doesn't even do very much for you because the fluid nature of web page layouts can move the users reference point up or down vertically while preserving the exact pixel position of the scroll bar.

The real problem with AJAX for ASP.NET is that it's very difficult to move some operations to the client-side while continuing to perform certain operations on the server-side. Why? ViewState and Session state. ASP.NET's state maintenence information is not accessible on the client side. This limits the amount of work you can do on the client lest it get out of sync with the server. To make Web 2.0 applications really easy to write you need to raise the level of abstraction. I can see a few different ways of doing this:

1. Provide powerful controls that you can manipulate on the client-side. Where is the AJAX-enabled GridView control? You know, the one that transparently pushes sorting and paging to the clients that can perform them. I suppose I will have to wait for a third-party to provide one but I shouldn't really have to.

2. Make javascript better. Javascript is growing up but you wouldn't know it if you are targeting IE. Firefox 1.5 and Actionscript 3.0 can mingle XML literals and code like this...

var name = "Jafar"
var customer = <Customer>{ name } </Customer>

This is a big deal. Dynamically generating HTML/XML using the DOM is difficult to read and write. Microsoft is well-aware of this problem. In fact, they designed a research language called C-Omega to address it which integrated XML literals and stream processing into the C# language. These improvements have found their way into the next version of VB.NET but strangely Javascript, which stands to benefit just as much from these features, is basically unchanged in IE 7.

3. Hide javascript. If Microsoft refuses to keep Javascript up to standard I suggest that it hide it as Google has done. The Google Web Toolkit compiles Java into Javascript, providing static type checking at compile-time. The inability to do this in ASP.NET really annoys me given that C# is a much better candidate for this role than Java. It maps to Javascript much more cleanly because it handles events in the same way as Javascript: function pointers. I know that this is an area of research in Microsoft but considering Google has already developed several well known applications using GWT it would seem that the concept has been proven.

It doesn't look like things will improve substantially with the next version of Orcas. Javascript intellisense and type inference will make web development a lot easier but this is really just putting some (very nice) lipstick on a pig. I hold out hope that developers will demand more from Microsoft so that they can develop the advanced web applications that their users deserve.

Tuesday, March 20, 2007

LINQ to Python

*NOTE: My concerns about Python's chops as a functional programming language have been largely addressed by some helpful posters in the comments for this article.* April 01, 2007

Recently I argued that Python's whitespace-aware structure made it inhospitable to functional programming. It occurred to me that I should try and demonstrate the problem by partially porting Microsoft's LINQ library to Python. After all, code speaks louder than words :-).

Here is the code:

class Enumerable:      

   def __init__(self, func):
      self.func = func

   def __select__(self, func):
      for item in self.func():
         yield func(item)

   def __where__(self, func):
      for item in self.func():
         if func(item):
            yield item

   def __join__(self, stream):
      for left in self.func():
         for right in stream.func():
            yield (left, right)

   def __take__(self, count):
      iterator = self.func()
      for counter in xrange(count):
         yield iterator.next()

   def __skip__(self, count):
      iterator = self.func()
      for counter in xrange(count):
         iterator.next()
      while 1:
         yield iterator.next()
      
   def take(self, count):
      return Enumerable( lambda: self.__take__( count) )

   def skip(self, count):
      return Enumerable( lambda: self.__skip__( count) )

   def join(self,stream):
      return Enumerable( lambda: self.__join__( stream) )

   def __iter__(self):
      return self.func()

   def select(self, func):
      return Enumerable( lambda: self.__select__( func) )

   def where(self, func):
      return Enumerable( lambda: self.__where__( func) )

   def to_list(self):
      list = []
      for item in self.func():
         list.append(item)
      
      return list


def iterate(initialValue, func):
   while 1:
      yield initialValue
      initialValue = func(initialValue)

def zip(enum1, enum2):
   return Enumerable(lambda: __zip__(enum1,enum2))

def __zip__(enum1, enum2):
   iter1 = enum1.__iter__()
   iter2 = enum2.__iter__()

   while 1:
      yield (iter1.next(), iter2.next())


This library essentially adds lazy evaluation to stream operations in Python. This allows me to write the fibonnaci set in the same way that I would write it in Haskell:


def fibs():
   yield 0
   yield 1
   fibsEnum = Enumerable(fibs)
   for value in zip (fibsEnum, fibsEnum.skip(1)).select(lambda item: item[0] + item[1]):
      yield value


fibNums = Enumerable(fibs)
fibNums = fibNums.take(10)
fibNums.to_list()


The code above prints:


[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]


The example above is a terrible implementation of the fibonnacci set but is a good demonstration of laziness. This library accomplishes the following:

  1. Adds lazy evaluation to stream operations

  2. Creates wrapper object around iterator function, allowing new stream operations to be added using metaclasses which...

  3. Allows stream transformations to be written in the order they are executed with the first operation occuring on the left and the last on the right.



As you can see the limitation that lambda functions can only be one line long necessitates two function definitions for each stream transformation function. All these additional identifiers produce a lot of noise in the code that detracts from its readability. Thankfully that noise can be hidden in the class definition.

The ability to write stream transformations in the order of execution (left-to-right) greatly improves readability in my opinion. However the issue of having to put an entire expression on a single line is still an insoluble problem.

I would like nothing better than to hear that I am totally wrong, there is a much better and more Pythonic approach, that I just don't get it, or some such thing.

Any skilled Python programmers who have a better approach that addresses the readability issues I outline?

Friday, March 16, 2007

Finally an open-source Lisp IDE made for humans

I just tried Cusp, an Eclipse plug-in for Lisp. It's phenomenal :-) It's got intellisense and it's oh so pretty. I have always loved Lisp but could not stand emacs. I also thought that the only worthwhile IDEs were commercial. This solution is just what I was looking for. It looks like it was just released a few months ago! I've been waiting for one of the rocket scientists in the Lisp community to take some time off from pushing the envelope just long enough to put some work into the tool set! This IDE goes a long way towards making it as friendly as the more popular languages. I can't wait to play with it again!

Monday, March 12, 2007

Dreaming of PLINQ

PLINQ is project that aims to parralellize your LINQ queries so that they can make use of the extra processing power available on multi-core machines. I'm very much hoping that it will be released in the Orcas timeframe because it's such a killer app. It will give .NET a HUGE advantage over most other mainstream languages including Python, Javascript, Ruby, and Java. People tend to believe that LINQ queries are only good for data access because they look so much like SQL. In fact a huge number of general programming problems that have nothing to do with database access can be expressed as queries. For example here is a function that multiplies two matrices:



I wrote the To2DimensionalArray method and here is the implementation if you are interested:



If you wrote this algorithm using nested for loops it would be a set of instructions to build a machine that would give you the answer. On the other hand the query is the answer. The importance of this distinction cannot be overstated. Due to the fact that the query describes the intent of the algorithm the computer is free to determine how to best execute it. The PLINQ query processor will analyze the query at run-time, giving it access to information about available system resources and allowing it to formulate a strategy that makes the best possible use of them. It could conceivably be implemented differently every time your query is run depending on how many processors are idle, how much memory there is, etc! This is yet another good reason to express every algorithm possible as a LINQ query. The more processors are added to desktops in the future, the faster your code will run.

The key to this process is the Expression Tree. PLINQ converts your query code into an expression tree in order to analyze it. I predict that languages without the ability to morph code into data will either adapt or be left in the dust in the future. There are simply too many powerful uses for this technique.

Thursday, March 8, 2007

The Joy of Stealing

I'll never forget the first time I saw the fibonnacci set written in Haskell:

fibs :: [Int]
fibs = 0 : 1 : [ a + b (a, b) <- zip fibs (tail fibs)]

"What the heck is 'zip'?" I wondered. "That's the problem with all these academic types, they're always picking these totally uninituitive names like....oooooooohhhhh."

Then it clicked. Zip accepts two streams, takes each corresponding item in the two streams, merges them into a struct, and then emits a stream of structs. If you visualize this process you will find it looks awfully like zipping up a zipper. Hence the name.

This is a very useful function as it turns out. I was dissapointed to find that it is missing from C# 3.0 given the fact that most of the other Haskell stream functions are present and accounted for. "No problem" I said aloud to no one in particular. "I'll just write it myself."

Then I found out why it was left out. Even though C# 3.0 can create anonymous structs it can't return them from a function for the simple reason that it's a strongly-typed language and you can't refer a type's name if it doesn't have one. Luckily Wes Dyer explains that there is a solution to this problem known as "mumbling."

Basically if we pass a generic function a prototypical instance of the anonymous type which we want to return C# will be able to infer the type name transitively. Notice that mumbling only solves the problem of how to write the function definition. It doesn't solve the problem of how to actually create and return an anonymous type from within the function. Sure, I could probably instantiate it given that all anonymous structs have a default constructor. However I can't exactly set arbitrary property values on an instance of "T" now can I?

Expression Trees to the rescue. I examine the anonymous type's structure using reflection and generate a function. This function accepts one item from each stream and returns an instance of the anonymous type. Here's the code:






To help link the PropertyInfo objects with the corresponding lambda parameters when creating the expression tree I stole the EachWithIndex function from Ruby. Notice though that it would have been more elegant to merge the streams with Zip instead of using EachWithIndex. Unfortunately I have to bootstrap the language first. :-)

Here's an example of me using the Zip function to find the square root of 2:





The IterationStream is the C# equivalent of the Haskell iterate function. It simply takes a function, an initial value, and each result in the stream is equal to the result of applying the function to the previous result. Don't you love stealing good ideas from other languages?

Just for fun, here's the preceeding program in Haskell.

nextGuess :: Double -> Double
nextGuess guess = ((2.0 / guess) + guess) / 2.0

guesses = iterate nextGuess 1.0

guessAndPreviousPairs = [ (guess, previousGuess) (guess, previousGuess) <- zip (tail guesses) guesses] guessesWherePairsAreEqual = filter (\ (guess, previousGuess) -> guess == previousGuess ) guessAndPreviousPairs

answer = fst (head guessesWherePairsAreEqual)


Despite the unnecessary prototype object passed to the Zip function in the C# version I still find it to be easier to read than the Haskell version, largely due to C#'s query comprehensions. Beauty is in the eye of the beholder though. :-)


*Note: Ideally Zip would cache the generated function, making subsequent calls as fast as any other stream operator. That said I wanted to keep the example simple for the purposes of demonstration.

Wednesday, March 7, 2007

Symbols in C# 3.0 (reloaded)

Someone pointed out in the comments that the GetPropertySymbol function I wrote in a previous post did not handle nested member expressions. For example this.GetPropertySymbol( o => o.Person.Name ) would only return "Name", not "Person.Name". At first I just told him how to do it. Then I decided to do it because I could use the opportunity to demonstrate some cool C# 3.0-style functional programming.





I wrote the algorithm in functional style to challenge you to start thinking this way. The Iterate method takes a function and an initial value and then returns that initial value and then the result of passing the value to the function, the result of which is then passed back to the function for the next value and so on and so on. I stole this function from Haskell. It works this way...

Iterate( x => x + 10, 0)

...returns the following stream...

0,10,20,30...

The initial member expression is passed to iterate and every time the next member expression is required the function takes the member expression and attempts to convert its inner expression to a member expression using the "as" operator. If this conversion fails the "as" operator returns null. Then the resulting data is piped through TakeWhile which ensures that the stream stops when the first null value is passed to it. Finally I aggregate the stream of member expressions into a string by concatenating the member names, separated by a period.

Pretty neat huh?

Friday, March 2, 2007

Symbols in C# 3.0

One of the languages that gets a lot of buzz nowadays is Ruby. Ruby is a dynamic, object-oriented language with a flexible syntax. The language allows you to change an object's structure at run-time by adding new methods. You can also create entirely new code by building a code string and executing it with the eval function.

i = 30
eval("i = 20")


This function alone, which is shared by Python and Javascript, is enough to enable all sorts of advanced metaprogramming. However Ruby has something that the aforementioned languages don't have: symbols. Symbols are just strings that the interpreter checks to ensure they are valid identifier names. This just means that Ruby will throw an error if you attempt to use a symbol that contains spaces, starts with a digit, or is otherwise invalid as an identifier. This is the only method Ruby provides to manipulate code as data. In the words of jazz singer Peggy Lee: "Is that all there is?" Yup.

Despite the simplicity of this feature symbols are still incrementally more useful than strings when combined with metaprogramming techniques like code generation. Typically a method is created which takes an array of symbols and generates code for each of them using eval. The advantage of using a symbol instead of a string is that your generated code is more likely to be syntactically correct.

In the following Ruby example two functions attr_reader and attr_writer are passed an array of symbols and then create accessor and mutator methods for the name, author, and duration fields in the Song class. This is equivalent to creating three properties in .NET.

class Song
attr_reader :name, :artist, :duration
attr_writer :nmae, :artist, :duration
end


Notice the ":" before each argument. That is how you create a symbol in Ruby. The observant reader will notice that in the second function I've made a typo writing ":nmae" instead of ":name". This was intentional because I wanted to point out an important fact that might not be immediately obvious. Specifically that symbols don't necessarily refer to an identifier that exists at compile-time. In fact, there is no compile-time in Ruby. It is an interpreted language. As a result the code above will not trigger an error. Remember: symbols are just strings.

When working in a statically-typed language like C# I find having to put code in a string very annoying because my IDE is powerless to point out typos like the one above. Also, my IDE cannot provide me with intellisense or rename an identifier globally for me. However certain important interfaces in the .NET framework require me to put code into strings. Consider this Customer class that implements the System.ComponentModel.INotifyPropertyChanged interface, enabling it to be data-bound to WinForms controls:

public class Customer : INotifyPropertyChanged {

private string name;
public string Name {


get { return name; }

set {
name = value;
OnPropertyChanged(this, new PropertyChangedEventArgs("Name")); // notice we have to put our property symbol in a string
}

}
protected void OnPropertyChanged(object source, PropertyChangedEventArgs args)

{

if (this.PropertyChanged != null)
this.PropertyChanged(source, args);

}

public event PropertyChangedEventHandler PropertyChanged;

}



In this case it would be really nice to have a symbol-like construct in C#, preferably one checked for correctness by the compiler. Thankfully in C# 3.0 writing a function that will return the string representation of a symbol is trivial. You can replace the applicable line above with:

OnPropertyChanged(this, new PropertyChangedEventArgs (this.GetPropertySymbol(o => o.Name)));

This is just beautiful. Intellisense even kicks in as we type the expression:



Even if we somehow manage to make a typo with intellisense helping us, the program will not even compile. Also we can now do a global rename of the property using refactoring tools.

How does it work? GetPropertySymbol is an extension method that takes an expression tree as its argument. I've blogged about expression trees extensively here and here so I'll assume you know all about them. You do read my blog religously don't you ;-)

Did you notice that we didn't have to specify any type information and yet intellisense still managed to display the list of Customer members? This works because GetPropertySymbol declares that the first argument to the lambda function it is passed is the same type that the extension method is applied to. As a result C# can figure out the type with type inference!! Without further ado, here is the single line function definition:



All I'm doing is grabbing the member expression inside the lambda expression body and returning the name! This function comes in handy when doing data-binding as well:

this.DataBindings.Add (
new Binding(
this.GetPropertySymbol(o => o.Text),
this,
this.GetPropertySymbol(o => o.GameTitle)
)
);

Incidentally I hope no one gets the impression that I'm putting Ruby down. In fact, I really like Ruby. However the most interesting thing to me about the language is not it's oft-touted metaprogramming capabilities but the way it encourages you to write code in continuation-passing style (CPS). This approach can make writing certain types of client-server applications like web applications much more simple conceptually. I advise you to check the language out. Learning programming languages which approach problems in radically different ways than the ones you are used to will definitely make you a better programmer.

Thursday, March 1, 2007

Runtime macros in C# 3.0

Macros are perhaps the most powerful tool available to programmers for creating new kinds of abstractions. They allow developers to extend their programming language with new constructs and behaviors. Many languages have this feature, the most well-known of which is LISP. A macro is a function which operates on code, transforming it into different code (usually expanding it). Just before compilation the compiler parses the code into an abstract syntax tree (AST) and passes the applicable pieces to the macro function. The developer need only indicate somehow which portions of code they would like to transform with their macro. This is a useful technique if you notice recurring patterns in your code and your language does not provide you with sufficiently powerful abstractions to factor them out.

C# 3.0 does NOT add compile-time macros to the language, but allows you to do the same type of code manipulation at run-time. In this article I'm going to create a macro function that adds a useful new abstraction to C#.

Roughly a year or so ago Microsoft Research released an extension of the C# language called C Omega. It was an experimental language written by Erik Meijer (Mr. Haskell) among others and was the precursor to C# 3.0. Many cool features in C-Omega did not make it into C# 3.0. I understand why most of the omitted features were left out, but there was one feature that I really liked that didn't make the cut.

Ask yourself how many times you have written code like this:


int? friendsSonsAge = null;
if (person != null && person.Friend != null && person.Friend.Son != null)
{
friendsSonsAge = person.Friend.Son.Age;
}

OK, OK. I was never really good at short contrived code examples but the point is that often you want to know the value of a property buried deep in a hierarchy of objects, each of which may or may not be null. In most mainstream languages (Java, C, Javascript, VB.NET, C#, Ruby) you have no recourse but the ugly boilerplate code shown above.

In C-Omega you can do this:


int? friendsSonsAge = person.Friend.Son.Age;

if (friendsSonsAge != null)
// do something


If any of the intermediate objects in the expression on the right of the assignment are null the whole expression evaluates to null. This is called null propagation and it works very nicely together with nullable types. I have no idea why this feature wasn't added to C# 3.0. No matter, we can create a macro that duplicates this behavior ourselves. :-)

The necessary ingredient for a macro is the ability to turn code into data. C# exposes code as data in an Expression tree object. If you assign a lambda function to a variable of type Expression C# will build an AST of the function instead of creating an executable delegate.


Expression<Func<int,bool>> oldEnoughToDrink = (age) => age >= 19; // oldEnoughToDrink can not be executed, instead it points to an AST representing the function

C# cannot translate arbitrary expressions or code blocks into data, only lambda functions so we will need to nest our expression inside one. Behold, my null propagation macro function:

Person person = null;
int? personsFriendsSonsAge = Macros.GetValue<int>(() =>
person.Friend.Son.Age);


In this example personsFriendsSonsAge will be null because the local variable person is null. The above code does not throw a null reference exception because the lambda function is not executed, but converted to an AST, and then passed to the GetValue function. Notice how we've nested the expression inside a lambda function with no arguments:


() =>
person.Friend.Son.Age


The C# compiler knows to convert this lambda expression into an expression tree because the type of the first argument to the GetValue function is an Expression:

public static Nullable<T> GetValue<T>(Expression<Func<T>> f) where T : struct


Here is the body of the macro (click to view full-size):



The GetValue function takes the expression passed to it and wraps it in a series of AndAlso (short-circuit evaluation "&&" operator) expressions comparing each member expression to null. I build the conditional expression from the inside-out by nesting each previous AndAlso in a new one. I skip a check for null if a member expression is referencing a value type which can never be null. Finally I compile the expression tree into a function, execute it, and return the result.

The end result is that a little function is generated that looks like this:


() => (
person != null && person.Friend != null && person.Friend.Son != null) ? person.Friend.Son.Age : null;

Now there are several things to be aware of:

1. It's slow. Generating code at runtime is comparatively expensive even if it's just a little function. That's what nice about compile-time macros: the code generation happens ahead of time. That said, execution speed is on par with an eval in IronPython so it's still quite reasonable. Just don't use it in a tight loop.

2. This version of the function only works for value types. This is due to the fact that it returns the generic Nullable class which cannot be parameterized with a reference type. You can easily add another function which works with reference types by modifying the original slightly. The need to have one function for value types and another for reference types is a necessary headache if we want to preserve type safety.

3. This is an oversimplified version for the purposes of demonstration. It only handles member access expressions like form.Size.Width, but won't work if there is a method call or indexer in the expression.

int? age= Macros.GetValue<int>(() => myCustomer.Friends[0].GetBrother().Age); // will throw an exception because a call to the GetBrother method and an indexer function is made

This should not be perceived as limitation of the language, but rather a limitation on my free time :-). In fact it is completely possible to handle method calls. The trick is to ensure that you don't call the same method again and again when making null comparisons. This can be accomplished by introducing a new lambda function every time you need to store the result of a computation. I'll leave that as an exercise for the reader. :-)

The important thing to take away from this article is that converting code to data (and vice versa) has many more uses than just generating SQL from C# queries. C# has evolved into a very powerful language for metaprogramming and I recommend learning as much as you can about it so that you can leverage these new capabilities.

Monday, February 19, 2007

LINQ to Code

With all of the information on the Internet telling you how much easier LINQ makes data access you may have missed out on some of the less obvious implications of the new language features. I'm referring specifically to how much easier it is to generate code using the new LINQ API. I'm going to demonstrate a brief example that uses a domain-specific language (DSL) to serialize a class to a fixed-length, record-based text file. This is useful if you find yourself having to work with legacy file formats from the dark ages. Rather than reinvent the wheel let's steal the approach used for XML serialization in the .NET Framework.

In order to convert your C# queries to SQL at run-time LINQ introduces a new construct called a lambda expression. Lambda expressions are just lambda functions that C# represents as data instead of executable code. For a good example of how to use lambda expressions and compile them see this post in Don Box's blog. If you choose to represent a function as a lambda expression tree you can analyze it at run-time and convert it to another form like a SQL query or executable code. This is known as metaprogramming (short defintion: code that manipulates code). When the compiler converts your code into a data tree it represents it using objects in the new System.Linq.Expressions namespace. The question I asked is “Can I use these objects to build a function at run-time?” The answer is yes.

As you probably know, in order to serialize a class as XML you mark it up with attributes. You may not realize this, but this is a form of metaprogramming. When you attach attributes to language constructs you are using a domain specific language which is a little language designed for a specific task. This is a very powerful technique because when you use a language specifically geared for a task it is usually much easier and less error-prone than it would have been if you had written it in a general purpose language like C#. Using attributes to specify a DSL is great because it groups the work to be performed on data with the data itself instead of storing it in an external file where it can get out of sync with changes.Here is the file I want to read from:

1600 Pennsylvania Avenue G.W. Bush
1600 Pennsylvania Avenue C. Rice
1600 Pennsylvania Avenue D. Cheney

Here is the class I want to deserialize from the file:

[TextRecordSerializable]
public class Customer{
private string address;

[FieldRange(0,35)]
public string Address
{
get { return address; }
set { address = value; }
}
private string name;
[FieldRange(35,10)]
public string Name
{
get { return name; }
set { name = value; }
}
}

The FieldRange attributes just store the starting character index and the length of the field in the record. The TextRecordSerializable attribute indicates that the class can be serialized. These two attributes are the only “commands” in our DSL. Simple, huh? Now a full implementation would be a little more complex, allowing for type conversions and such, but it still would be simple enough to describe with attributes and parameters. What we want to do is convert our DSL in to real-honest-to-goodness executable code that deserializes the object from a string (and vice versa). Basically we want to generate the following function:

Func customerParser = (line) => new Customer{Address = line.Substring(0,35), Name = line.Substring(35,10) };

This function can be built for ANY type at run-time using the following code. It might seem complex at first, but stick with it. It's not as complicated as it seems. (Click on the image below to zoom - I haven't figured out how to keep blogger from mauling my code so if anyone knows how please leave a comment)*The code below aliases System.Linq.Expressions.Expression as "Ex" to keep things short.




This code is just beautiful. The data flows out of our DSL, is filtered, and converted to an AST all in a single expression. We create our method in the static constructor and then cache it in a static variable. We then call this function inside a public class method called “Deserialize.” The end result is that we can do this:

var serializer = new TextRecordSerializer<Customer>();
var customer = serializer.Deserialize(“1600 Pennsylvania Avenue G.W. Bush ");Console.WriteLine(customer.Name);

Once our expression is compiled into a function, the code runs as fast as it would have if we had written it ourselves in raw C#. I will leave the method that generates the Serialize function as an exercise for the reader. Next time we'll look at using expressions to create much more advanced DSLs than are possible with simple attributes.

About Me

My photo
I'm a software developer who started programming at age 16 and never saw any reason to stop. I'm working on the Presentation Platform Controls team at Microsoft. My primary interests are functional programming, and Rich Internet Applications.