Thursday, April 19, 2007

LINQ to Validation

I spent the weekend looking at the Enterprise Library 3.0 validation block. It's fantastic and long overdue. Developing this library was a no-brainer. I wrote something similar (though much less ambitious) as soon as I learned to use attributes in C# 1.0. Although it is a great framework as is I can't help but imagine ways in which LINQ could make it even better.

Validation has annoyed me more than any other problem with perhaps the exception of the object-relational mismatch . Validation logic should be applied at every tier: client, server, and database. However writing and maintaining the same validation rules in Javascript, a server-side language, and then again in SQL is so labour-intensive and thankless that I suspect few people actually do it.

Several attempts have been made to solve this problem in the past with limited success. Some code generators allow you to express business rules in a domain-specific language which is then translated into various target languages. There have also been attribute-based solutions that use XSL as the validation language. The problem with all these solutions is that they force the developer to use yet another language. This is necessary because the language has to be able to be expressed as data so that it can be analysed and converted into other languages. Unfortunately this means that the validation language has to be stored in config files or strings putting it out of reach of the type checking and refactoring tools available in most IDEs. Does this problem sound familiar? It should. It's the same problem we have with data access code today.

Now that LINQ is coming we will have two languages that are powerful enough to express any kind of validation rule, will have full IDE support, and can also be converted into data and thus into other languages: VB.NET and C#.

LINQ is the gift that keeps on giving. As I've written in this blog many times before, the applications for turning code into data extend far beyond improving database integration. Let's examine the ASP.NET PropertyProxyValidator control that comes with the validation block in EL3. You associate it with a property on an object and when the user attempts an action it does a server postback, calls the EL3 Validator object, and displays an error if one is encountered. This keeps you from having to create a custom validator control that performs the same validation you are already doing in your business object. However most validation (regular expression validation, string length validation) can be done in client-side code. The postback is unnecessary. Unfortunately performing the validation on the client-side means writing Javascript as well as server-side code (after all, the user's browser may have javascript turned off). Now imagine that, post-LINQ, a new validation attribute called ExpressionValidatorAttribute is added to the EL3 validation block:

public ZipCodeValidatorAttribute : ExpressionValidatorAttribute<string>
{
public override Expression<Func<string,bool>> GetExpression()
{
return x => new Regex(@"[0-9][0-9][0-9][0-9][0-9]").IsMatch(x);
}
}

Notice that we are returning a validation function as an Expression which means that it instead of returning a delegate it will return a data representation of the function. An ASP.NET validator control could retrieve the expression using reflection and convert it to the following Javascript function without breaking a sweat:

function (x){
return /[0-9][0-9][0-9][0-9][0-9]/.exec(x) != null;
}

It could then run this function on the client-side, avoiding a postback. On the server-side the validation object would just compile the expression and run it (which could be done once and cached). This would allow us to use the same code for validation on the client and server-side!

Javascript is actually a very powerful functional language. You can even translate LINQ queries to it without much effort. In fact the only validations that could not be converted to Javascript would be those that require access to some server-side resource. In practice this is rare.

Almost all validation logic is functional because it doesn't modify the state of the objects it inspects. It takes some data as an input and returns a single boolean. Therefore most validation can be expressed using lambda functions and can therefore be converted to lambda expressions. Voila! We've solved the problem of duplicated validations without having to rewrite code in different languages, learn a new language, or leave the comfort of Visual Studio.

Wednesday, April 11, 2007

Haskell for C# 3 Programmers

I've been occupying myself as best I can, waiting impatiently for .NET 3.5 to come out. In the meantime I've been playing with Lisp and Haskell, two of the languages with the most cache among the smart folks who hang out at Lambda the Ultimate. Well it’s happened: I've fallen head over heels for Haskell. It's just so beautiful. It is a pure functional language and is therefore limited in the kinds of operations that can be performed. As a result a little bit of well-thought out syntax to support these operations goes a long way towards improving readability.

It occurs to me that it is worthwhile to introduce C# users to Haskell because many of the improvements coming in C# 3.0 (and many of those introduced in C# 2.0) are actually inspired by Haskell. Learning Haskell is a great way of training yourself to think functionally so you are ready to take full advantage of C# 3.0 when it comes out. There are a variety of Haskell tutorials available on the internet so I will just focus on some similarities between C# and Haskell with the aim of deepening your understanding of both.

Type Inference

Haskell and C# 3.0 both have type inference. In C# 3.0 you write this:
var number = 3;

In Haskell you write this:

number = 3

Haskell’s type inference is a quite a bit smarter than C#'s, but I’ll give you examples of that later.

Lazy evaluation and Streams

In C# I can create a stream of all natural numbers like this:

IEnumerable<int> NaturalNumbers()
{
var i = 0;
while (true){
yield return i;
i++;
}
}

The reason that this loop does not continue forever is that the algorithm stops as soon as a yield command is reached and resumes where it left off when it is started again. In Haskell you can duplicate this behavior by using the colon operator to construct streams.

nat :: [a]
nat = 0 : map (\x -> x +1) nat

This function specifies that the first item in the stream is 0 and then the next result is a list derived from chasing the function's tail and applying a lambda to each item. The map function is equivalent to select in C# 3.0. The first item will be 0, the next will be 1 because the previous was 0, and the next will be 2 because the previous was 1, and so on. The expression on the left side of the colon is an item and the right side is a list. You can build lists like this in Haskell:

0: 1: 2 : 3: 4 : [] -- [] is an empty list in Haskell

This syntax is useful because it signals that the algorithm can stop as soon as it gets a value that it needs, just like yield. It is also very useful to express lists using this colon separated syntax when you are creating recursive list processing functions but I’ll explain that later. For now just make sure you can recognize this syntax as the construction of a list.

Functions, Functions, Functions

It probably won’t come as a shock to you that functional programming relies heavily on functions. Consequently Haskell provides all sorts of useful syntax for manipulating them. The thing that caught me off guard when I first tried to learn Haskell was the function signatures. Look at this example of a function that adds two integers:
add :: Integer -> Integer -> Integer
add x y = x + y

Does this seem odd to you? You might have expected to see something like this:
add :: (Integer, Integer) -> Integer
add (x,y) = x + y


In fact the definition above will work, but it's not really the preferred way of defining functions in Haskell. If you read the first function signature left to right it would say: “add is a function that accepts an integer, which returns a function that accepts an integer, which returns an integer.” One of Haskell's elegant features is that you can take a function that accepts n arguments and convert it into a function that accepts n-1 arguments just by specifying one argument. This is known as partial application. For example given the first add definition I can do this:
addone = add 1 -- returns function that accepts 1 argument

addone 5 -- this returns 6

In Haskell all functions accept a maximum of one argument and return one value. The value returned with be another function with one less parameter until all the arguments have been specified in which case the output value is returned. As I’ll explain later this behavior is very useful when writing certain types of recursive functions.

Recursion

Pure functional languages have no loops. Instead they rely on recursion to repeat operations. Recursive functions have a terminating case and an inductive case. As a result you often see code like this (C#):

public int Add(int x, int y)
{
if (x == 0) // terminating case
return y;
else
return Add(x-1,y+1); // inductive case
}

Haskell allows you to specify the terminating case in a different definition than the inductive case. This is called pattern matching and results in a much more declarative style of coding:


{-- function signature (takes two Nums, returns a Num) --}

add :: Num a => a -> a -> a
add 0 y = y -- terminating case
add x y = (add (x-1) (y+1)) -- inductive case

Take a look at the add function’s signature. add is a generic function. Pretend that "a" is "T" and “Num a =>” is "where T : Num" and it will be a lot clearer. In fact our add function will work on all numeric data types that implement the Num interface (Integer, Double, etc). This doesn’t work in C# because you can’t include operator definitions in an interface. In Haskell, operators are just like any other function and are not bound to a particular class. They just use a symbol instead of a proper identifier. This allows you to include them in interface definitions and thus use them in generic functions.

Now some of you might be asking “Isn’t all this recursion awfully expensive?” No, because Haskell and many other functional languages are equipped to recognize a particular kind of recursion known as tail recursion. When Haskell recognizes this type of recursion it translates it into a simple loop so that the stack doesn’t grow. For instance our C# version of Add above could be translated to the following by a smart compiler:

public int Add(int x, int y)
{
do {
if (x == 0) // terminating case
return y;
else
{
x = x-1;
y = y+1;
}
} while(true);
}

So how do you tell when a function is tail recursive or not? It’s easy. If a function ends with a call to the same function it is tail recursive. The add function defined above is tail recursive because at the end of function add is called again. Some functions are not tail recursive which means they do grow the stack. Take the following flatten function which takes a list of lists and flattens it into a list:


flatten :: [[a]] -> [a]
flatten [] = [] –- if list is empty, return empty list
flatten (y:ys) = y ++ (flatten ys)



Notice that the last operation to execute is the list concatenation (++) function. This means that the concatenation function must wait for a result from the recursive call to flatten before it can finish its computation. As a result flatten will consume both time and stack space because it will have to keep track of where we are in each previous recursive call.

The good news is that we can take many recursive functions and convert them into tail recursive functions using a straightforward technique. We introduce an accumulator function that adds a new parameter and uses it like a variable to store the computation done so far.

flattenAcc :: [a] -> [[a]] -> [a]
flattenAcc acc [] = acc
flattenAcc acc (y:ys) = flattenAcc (acc ++ y) ys

flatten = flattenAcc [] –-partial application

I want you to notice that we are using the colon list syntax in the inductive case definition of flattenAcc. This is a great example of pattern matching. We are effectively saying that in this version of flattenAcc y represents the first item in the list passed as the second parameter and ys represents the rest of that list. Haskell uses its own syntax in its pattern matching expressions resulting in very readable code.

In our revised version the last procedure called in flattenAcc is flattenAcc. This makes our function tail recursive. In order to accomplish this flattenAcc effectively stores the state of the flattened list so far in the newly introduced "acc" function parameter. Remember that when a tail recursive function get translated into a loop the function parameters act like variables. Note that we don’t need to specify a type signature for flatten because Haskell infers it.

*Note: Derek Elkins and Cale Gibbard point out in the comments that tail recursive functions are not always preferable to recursive ones because of Haskell's lazy evaluation. If you can write a non-recursive function that uses the colon syntax it is probably better than a tail recursive one that doesn't. This is because Haskell's lazy evaluation enabled you to use the non-tail recursive version on an infinite stream without getting a stack overflow.
Now you’re familiar with the basic concepts of functional programming. Next time we’ll look at features that inspired C# query comprehensions and nullable types.

Thursday, April 5, 2007

Lowered Expectations: AJAX for ASP.NET

I'm underwhelmed by AJAX for ASP.NET. It is not a very powerful set of tools for application developers. Instead it seems to be most useful as a way of standardizing the way ajax controls are written. Don't get me wrong. I'm all for standardizing the way common operations are performed. That's a big part of what has made .NET such a productive platform. However developers who believe that downloading AJAX for ASP.NET will make writing Web 2.0 applications as easy as ASP.NET made it to write Web 1.0 applications will be disappointed.

I have little doubt that the most commonly-used control will be the update panel, because it is the least disruptive way of adding AJAX to ASP.NET. You just surround your controls with it and voila, magically the page doesn't refresh any more. Unfortunately it's little more than a hack. It's not really Web 2.0 in the sense that it doesn't expose a service that is consumable by a variety of clients. It still sends presentation information mingled with data to the client. It basically just sends a smaller portion of the page and its view state to the client which then rewrites the page using the DOM.

I'm going to say something controversial now: page refreshes are not that big a deal. Before you accuse me of heresy let me remind you that ASP.NET has had the option to maintain scroll position after a page refresh since 2.0. After a (hopefully) brief flash the user sees exactly what they would have seen if the update panel had updated the page asynchronously. Often preserving scroll position doesn't even do very much for you because the fluid nature of web page layouts can move the users reference point up or down vertically while preserving the exact pixel position of the scroll bar.

The real problem with AJAX for ASP.NET is that it's very difficult to move some operations to the client-side while continuing to perform certain operations on the server-side. Why? ViewState and Session state. ASP.NET's state maintenence information is not accessible on the client side. This limits the amount of work you can do on the client lest it get out of sync with the server. To make Web 2.0 applications really easy to write you need to raise the level of abstraction. I can see a few different ways of doing this:

1. Provide powerful controls that you can manipulate on the client-side. Where is the AJAX-enabled GridView control? You know, the one that transparently pushes sorting and paging to the clients that can perform them. I suppose I will have to wait for a third-party to provide one but I shouldn't really have to.

2. Make javascript better. Javascript is growing up but you wouldn't know it if you are targeting IE. Firefox 1.5 and Actionscript 3.0 can mingle XML literals and code like this...

var name = "Jafar"
var customer = <Customer>{ name } </Customer>

This is a big deal. Dynamically generating HTML/XML using the DOM is difficult to read and write. Microsoft is well-aware of this problem. In fact, they designed a research language called C-Omega to address it which integrated XML literals and stream processing into the C# language. These improvements have found their way into the next version of VB.NET but strangely Javascript, which stands to benefit just as much from these features, is basically unchanged in IE 7.

3. Hide javascript. If Microsoft refuses to keep Javascript up to standard I suggest that it hide it as Google has done. The Google Web Toolkit compiles Java into Javascript, providing static type checking at compile-time. The inability to do this in ASP.NET really annoys me given that C# is a much better candidate for this role than Java. It maps to Javascript much more cleanly because it handles events in the same way as Javascript: function pointers. I know that this is an area of research in Microsoft but considering Google has already developed several well known applications using GWT it would seem that the concept has been proven.

It doesn't look like things will improve substantially with the next version of Orcas. Javascript intellisense and type inference will make web development a lot easier but this is really just putting some (very nice) lipstick on a pig. I hold out hope that developers will demand more from Microsoft so that they can develop the advanced web applications that their users deserve.

About Me

My photo
I'm a software developer who started programming at age 16 and never saw any reason to stop. I'm working on the Presentation Platform Controls team at Microsoft. My primary interests are functional programming, and Rich Internet Applications.