Thursday, April 16, 2009

An F# Solution to Eric Lippert's Challenge

David Anson recently brought Eric Lippert's challenge to my attention. I can't resist an opportunity to showcase F#'s strengths so I thought I'd throw my hat into the ring with a solution.

I'll briefly summarize the problem for those too lazy to follow the link :-):

Given a sequence of strings, concatenate them in a way similar to natural language.  For example the sequence "ABC", "DEF", "G", and "H" should be output as:

{ABC, DEF, G and H}

"ABC" and "DEF" should be output as...

{ABC and DEF}

...and finally an empty sequence should yield the following output:

{}

Of course this problem is simple enough to solve.  The real challenge is creating a solution which is declarative and efficient.  In order to achieve the desired result we must determine if each item is the first, the last, or is in the center without relying on random access. The typical approach in C# or VB.NET would be to get an enumerator from the sequence, step through it, and create a state machine that keeps track of the last two values.  Unfortunately this approach wouldn't scale and would be full of corner cases to handle. 

Let's see if we can do better with F#.

open System open System.Text let concat_string (list: string seq) = let tripleWise = Seq.append list [null] |> Seq.scan (fun (_, previousPrevious, previous) current -> (previousPrevious, previous, current) ) (null, null, null) let contents = tripleWise |> Seq.map (function | _, null, _ -> String.Empty | null, curr, null -> curr | null, first, _ -> first | _, last, null -> sprintf " and %s" last | prev, curr, next -> sprintf ", %s" curr) let builder = StringBuilder() contents |> Seq.iter (fun item -> (builder.Append(item) |> ignore)) sprintf "{%s}" (builder.ToString()) // prints "{ABC, DEF, G and H}" printf "%s" (concat_string ["ABC"; "DEF"; "G"; "H"]) // prints "{ABC and DEF}" printf "%s" (concat_string ["ABC"; "DEF"]) // prints "{ABC}" printf "%s" (concat_string ["ABC"]) // prints "{}" printf "%s" (concat_string [])

Let's take a look at the program step by step.

Type Inference, Whitespace Awareness, and Type Aliases

let concat_string (list: string seq) =

The first line of code contains the only type declaration in the entire program. Not too shabby.  For those that don't know "string seq" is a type alias for IEnumerable<string>.  Having a type alias for sequences is really nice because sequence processing is so common in both C# and F#.  Notice that like Python, F# is also whitespace aware.  These three features help F# achieve Python-like terseness without sacrificing performance or type safety.

Structural Types

My solution leverages F#'s native support for structural types, that is types that can be used to group arbitrary values together.  The following code creates a sequence of triples, which is a group of three values.  Notice that pattern matching is used to introduce descriptive identifiers for each element in the triple.

let tripleWise = Seq.append list [null] |> Seq.scan (fun (_, previousPrevious, previous) current -> (previousPrevious, previous, current) ) (null, null, null)

The "tripleWise" identifier refers to a sequence of triples containing the previous, the current, and the next item.  For example if the input list is defined as "ABC" and "DEF" then tripleWise will be:

(null,null,null), (null,null,"ABC"), (null,"ABC","DEF"), ("ABC","DEF",null)

The scan function works in a way very similar to System.Linq.Enumerable.Aggregate.  Here's a C# example that uses Aggregate to sum a sequence of numbers. 

new[]{2,4,4,2}.Aggregate((accumulatedValue, current) => accumulatedValue + current, 0)

The difference between scan and Aggregate is that scan returns the intermediary accumulated values as a sequence.  Given the example above scan would return a sequence of the following values:

0, 2, 6, 10, 12

An empty triple is used as the initial value and scan is passed a function which shifts each item in the sequence through the triple from the right to left. Each time it returns the resulting triple. Now we have a sequence of triples containing the previous, current, and next values.  For each triple we can determine if the middle item is the first item or the last item by checking whether the left or right items are null respectively.  Note that by appending null to the end of the original sequence in the first line we can be sure that the rightmost item in the last triple will be null.

Parallelizing the Code

We've successfully avoided state and mutation and we now have a sequence of snapshots of each value and its previous and next values.  This is very important.  Now we can process each triple in the sequence in any order we like: last to first, first to last, or first and last at the same time. 

let contents = tripleWise |> Seq.map (function | _, null, _ -> String.Empty | null, curr, null -> curr | null, first, _ -> first | _, last, null -> sprintf " and %s" last | prev, curr, next -> sprintf ", %s" curr)

The code above introduces a function that uses pattern matching to declaratively transform each possible triple into a string.  It just doesn't get much more readable than this.  This function is applied to each element using map (equivalent to System.Linq.Enumerable.Select) which yields a sequence of strings.

Just for fun let's see how much work it would take to parallelize the code above using Matthew Podwasaki's F# parallel extensions library and the latest CTP of the Parallel Extensions.

let contents = tripleWise.AsParallel() |> PSeq.map (function | _, null, _ -> String.Empty | null, curr, null -> curr | null, first, _ -> first | _, last, null -> sprintf " and %s" last | prev, curr, next -> sprintf ", %s" curr)

Notice that all I had to do was add 14 characters!  Now the code above should increase in speed at a linear rate with the number of cores on the processor.  Just try that with a state machine. :-) 

In the next five years its very possible we will see processors with over 50 cores on a desktop.  This is one of the reasons I get so frustrated when I see developers unnecessarily using state in their computations.  State is the new GOTO.  It should have to be justified.

State When it Makes Sense

Notice that at the end the strings are concatenated using a StringBuilder. 

let builder = StringBuilder() contents |> Seq.iter (fun item -> (builder.Append(item) |> ignore)) sprintf "{%s}" (builder.ToString())

This operation is decidedly impure.  This demonstrates another wonderful F# feature: the ability to perform stateful operations when it makes sense. It's important to understand there's nothing preventing us from writing C#-style imperative code in F#. 

An alternate, pure approach to the problem above would've involved using reduce (equivalent to System.Linq.Enumerable.Aggregate) to concatenate all the strings together using the + operator.  Although reduce, like map, can be parallelized it is unlikely that this would improve performance because concatenating strings creates so much extra work.  Each time two strings are concatenated a new one must be created and the contents of both strings must be copied to a new memory location.  Under the circumstances it makes sense to add a dash of state and use a StringBuilder.

Learn F#

The entire program is 25 lines including whitespace and imports and I'm very comfortable with its trade-offs.  I look forward to seeing what the other developers come up with but I'm confident that no one will do better in C#.  F# really shines when it comes to computation.  Learning it is a must if you consider yourself a sharp Elvis or an Einstein and you make your living with .NET.

12 comments:

Devin said...

How much faster is this then just mystringlist.Aggregate((x,y) => x + ", " + y);

Or with Plinq mystringlist.AsParallel().AsOrdered().Aggregate((x,y) => x + ", " + y);

Jafar Husain said...

You may be aware but your solution is incomplete. Note the restriction that the last two items must be seperated by an "and" and have no comma.

As I explained I believe my solution will be faster than an Aggregate because of the considerable cost of string concatenation. That said, I would love to be proven wrong as the end result would be completely pure.

I haven't benchmarked it.

Devin said...

Yea, sorry guess I wasn’t paying enough attention.

Affordable Luxurious Wedding Dress Blog said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
asdfsadf said...
This comment has been removed by a blog administrator.
Newark Airport Parking, Airport Shuttle said...
This comment has been removed by a blog administrator.
products said...
This comment has been removed by a blog administrator.
products said...
This comment has been removed by a blog administrator.
Take Out Restaurants, Italian Restaurant said...
This comment has been removed by a blog administrator.

About Me

My Photo
I'm a software developer who started programming at age 16 and never saw any reason to stop. I'm working on the Presentation Platform Controls team at Microsoft. My primary interests are functional programming, and Rich Internet Applications.