Tuesday, March 20, 2007

LINQ to Python

*NOTE: My concerns about Python's chops as a functional programming language have been largely addressed by some helpful posters in the comments for this article.* April 01, 2007

Recently I argued that Python's whitespace-aware structure made it inhospitable to functional programming. It occurred to me that I should try and demonstrate the problem by partially porting Microsoft's LINQ library to Python. After all, code speaks louder than words :-).

Here is the code:

class Enumerable:      

   def __init__(self, func):
      self.func = func

   def __select__(self, func):
      for item in self.func():
         yield func(item)

   def __where__(self, func):
      for item in self.func():
         if func(item):
            yield item

   def __join__(self, stream):
      for left in self.func():
         for right in stream.func():
            yield (left, right)

   def __take__(self, count):
      iterator = self.func()
      for counter in xrange(count):
         yield iterator.next()

   def __skip__(self, count):
      iterator = self.func()
      for counter in xrange(count):
         iterator.next()
      while 1:
         yield iterator.next()
      
   def take(self, count):
      return Enumerable( lambda: self.__take__( count) )

   def skip(self, count):
      return Enumerable( lambda: self.__skip__( count) )

   def join(self,stream):
      return Enumerable( lambda: self.__join__( stream) )

   def __iter__(self):
      return self.func()

   def select(self, func):
      return Enumerable( lambda: self.__select__( func) )

   def where(self, func):
      return Enumerable( lambda: self.__where__( func) )

   def to_list(self):
      list = []
      for item in self.func():
         list.append(item)
      
      return list


def iterate(initialValue, func):
   while 1:
      yield initialValue
      initialValue = func(initialValue)

def zip(enum1, enum2):
   return Enumerable(lambda: __zip__(enum1,enum2))

def __zip__(enum1, enum2):
   iter1 = enum1.__iter__()
   iter2 = enum2.__iter__()

   while 1:
      yield (iter1.next(), iter2.next())


This library essentially adds lazy evaluation to stream operations in Python. This allows me to write the fibonnaci set in the same way that I would write it in Haskell:


def fibs():
   yield 0
   yield 1
   fibsEnum = Enumerable(fibs)
   for value in zip (fibsEnum, fibsEnum.skip(1)).select(lambda item: item[0] + item[1]):
      yield value


fibNums = Enumerable(fibs)
fibNums = fibNums.take(10)
fibNums.to_list()


The code above prints:


[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]


The example above is a terrible implementation of the fibonnacci set but is a good demonstration of laziness. This library accomplishes the following:

  1. Adds lazy evaluation to stream operations

  2. Creates wrapper object around iterator function, allowing new stream operations to be added using metaclasses which...

  3. Allows stream transformations to be written in the order they are executed with the first operation occuring on the left and the last on the right.



As you can see the limitation that lambda functions can only be one line long necessitates two function definitions for each stream transformation function. All these additional identifiers produce a lot of noise in the code that detracts from its readability. Thankfully that noise can be hidden in the class definition.

The ability to write stream transformations in the order of execution (left-to-right) greatly improves readability in my opinion. However the issue of having to put an entire expression on a single line is still an insoluble problem.

I would like nothing better than to hear that I am totally wrong, there is a much better and more Pythonic approach, that I just don't get it, or some such thing.

Any skilled Python programmers who have a better approach that addresses the readability issues I outline?

11 comments:

Anonymous said...

I might be missing something here (not being a functional guru) but with regard to

"...However the issue of having to put an entire expression on a single line is still an insoluble problem."

I dont understand why this cant just be an inner function. I can appreciate that its not quite as clean as you may feel an anon function is, but I would have thought it was cleaner than what you are currently having to do to get it to work.

Steven Kryskalla said...

Anonymous is right, you're separating the function definitions more than necessary, which leads to messy and duplicated code.

Here is how Enumerable looks when using nested functions:

http://lost-theory.org/python/enum2.txt

Much better! But once you program it using inner functions you notice a lot of repetition when wrapping those inner functions (the return Enumerable(lambda: inner(arg)) parts). You can abstract this again using a simple decorator:

http://lost-theory.org/python/enum3.txt

Which looks pretty clean to me.

It does pain me slightly that Python doesn't have true anonymous closures, but in reality I haven't come across a time when I really needed it. Python's lambda is good enough for many cases, and if you need anything longer than a few lines the code should probably go into true function definition... Doesn't bother me too much :)

Jafar Husain said...

Beautiful. I love your use of the decorator (my favorite 2.4 feature). That did not occur to me!

The inner function is definitely an improvement because it's locally scoped but I must still come up with an identifier for every function I use in the scope. I will end up writing inner1, inner2, etc.

I have to say I still find the one-line, one-expression limitation to be troubling. However the right to left readability improves things substantially.

Jordan said...

You can eliminate your iterative zip function by using izip in the itertools module.

PJE said...
This comment has been removed by the author.
PJE said...

Here's an idiomatic rendering of the same functionality, using generator expressions and itertools.

As you can see, lambda more than suffices for this particular application:


from itertools import islice, imap, ifilter, izip as zip

class Enumerable:

    def __init__(self, func):
        self.func = func

    def take(self, count):
        return Enumerable( lambda: islice(self, count) )

    def skip(self, count):
        return Enumerable( lambda: islice(self, count, None) )

    def join(self,stream):
        return Enumerable( lambda:
            ((left, right) for left in self for right in stream)
        )

    def __iter__(self):
        return self.func()

    def select(self, func):
        return Enumerable(lambda: imap(func, self))

    def where(self, func):
        return Enumerable(lambda: ifilter(func, self))

    def to_list(self):
        return list(self)

def iterate(initialValue, func):
    while 1:
        yield initialValue
        initialValue = func(initialValue)

Jafar Husain said...

Thanks Phillip,

This is much better still. Dual definitions will still be required if the function has not been defined in the standard library, but considering that there are simply not that many stream operator functions this doesn't seem to be as big an issue as I thought.

Anonymous said...

"However the issue of having to put an entire expression on a single line is still an insoluble problem."

Why do you have to? You can split it into as man lines as you want...

# Just to have something to split

>>> class ShrinkingList (list):
... def shrink(self): return ShrinkingList(self[1:])

>>> l = ShrinkingList(range(1,10))
>>> l
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> l.shrink().shrink().shrink().shrink().shrink()
[6, 7, 8, 9]

>>> l.shrink()\
... .shrink()\
... .shrink()\
... .shrink()\
... .shrink()
[6, 7, 8, 9]

Anonymous said...

no deposit needed - online cd poker rakeback or a unique pokerbonus, Forum sharing Tools, Articles, Internet Poker Bonuses,
free poker money no risk great news. HoldEm Poker, Omaha Poker
trip to the poker island - a great promotion run by pokerroom, $1500 Freeroll tournament to every new poker player on their.
historic mood of this year World Series of Poker, celebrating the 40th. value plus huge monthly freerolls.
poker no quiz need to get bankroll bonuses on the market. We list all the available no ... codes and referral codes to players.
poker without deposit free bonus
Attempted bonus abuse is a regular occurance, highest percentage of your rake back. Many professionals are registered ... so verification is now no quiz.
academic bonus research 200% up to a maximum bonus of $200. Find out more here. code studies, Pacific Poker.
free bankrolls no quiz to pass no quiz.
Tell your friends and earn weekly bonus when they play! New players receive a free bonus upon making their first money spend profits or goddies.
poker money bonus
Thats rigth you do not need to purchase any cash from your wallet to get outstanding poker bankrolls.

Anonymous said...

Women’s nike tn Shox Rivalry est le modèle féminin le plus tendance de baskets pour le sport. tn chaussuresConcernant la semelle :Cheap Brand Jeans ShopMen Jeans - True Religion Jeans nike shoes & Puma Shoes Online- tn nike, le caoutchouc extérieur, l’EVA intermédiaire et le textile intérieur s’associent pour attribuer à la.ed hardy shirts pretty fitCharlestoncheap columbia jackets. turned a pair of double plays to do the trick.Lacoste Polo Shirts, , Burberry Polo Shirts.wholesale Lacoste polo shirts and cheap polo shirtswith great price.Thank you so much!!cheap polo shirts men'ssweate,gillette mach3 razor bladesfor men.As for

Anonymous said...

情趣用品|情趣用品|情趣用品|情趣|情趣用品|情趣

About Me

My photo
I'm a software developer who started programming at age 16 and never saw any reason to stop. I'm working on the Presentation Platform Controls team at Microsoft. My primary interests are functional programming, and Rich Internet Applications.