Pages

05 January 2009

Doing my homework

The mapFilter function

I recently tried to write a function using Scala which would apply a function to a list, then filter all the resulting elements according to a criteria. In my infinite wisdom I named it "mapFilter". Here is an example of what I expected:

mapFilter(List(1, 2), (x:Int) => if (x%2 == 0) Some(x*2) else None)
// returns List(4)


Of course I wanted this function to be as efficient as possible so it had to do the job in one pass. Otherwise, I could just use a combination of "map" and "filter" to do the trick:

def mapFilter[T, U](l: List[T], f: T =>Option[U]): List[U] = {
  l.map(f(_)).filter(_.isDefined).map(_.get)
}

3 passes on the list, we can definitely do better than this!

Using flatMap

Somehow I remembered that the Option class is a Monad, and my subconscious related the bind operation of monads to the flatMap operation on Iterable. I should be able to do something with it,... 

Then I experimented this:

List(1, 2) flatMap { (x:Int) => if (x%2 == 0) Some(x*2) else None }

Bingo, it works!

Now, if you read the signature of the flatMap method (on the Iterable trait), understanding how this works is not so clear:

def flatMap[B](f: A => Iterable[B]): Iterable[B]

Option is not an Iterable, this shouldn't compile! Yes, it does, because any Option can be implicitly converted to an Iterable thanks an implicit conversion:

object Option {
  /** An implicit conversion that converts an option to an iterable value */
  implicit def option2Iterable[A](xo: Option[A]): Iterable[A] = xo.toList
}

Sometimes I wish there was a way to display all the available implicit definitions for a given scope,...

Monads again,...

What about the "Monadic" stuff then? Tony Morris, on the Scala mailing-list, left me a message hinting at a purer solution:
Hi Eric,
You can write mapFilter using Scalaz which has a MonadEmptyPlus data
structure. I suggest you do not use the available subversion of the
List.flatMap type signature from which to learn
So I thought that I should do my homework, have a look at it and understand why MonadEmptyPlus was a good way to write the mapFilter function.

You can find all the code from the scalaz project here.  The package I'm exploring today is the "control" package. 

A bit of Scalaz exploration

Let's jump right to the MonadEmptyPlus:

trait MonadEmptyPlus[M[_]] extends MonadPlus[M] with MonadEmpty[M]

Ohhh. We are high on abstraction here, because MonadPlus is:

trait MonadPlus[M[_]] extends Monad[M] with Plus[M]

And MonadEmpty is, as you can guess:

trait MonadEmpty[M[_]] extends Monad[M] with Empty[M]

Last but not least, Monad is:

trait Monad[M[_]] extends Pure[M] with Bind[M] with Functor[M] with Apply[M]

Honestly, I didn't suspect that my homework would take me so long! Yet, since every concept and notion of what is a Monad seems to be carefully crafted in Scalaz, this is a good idea to spend a bit of time trying to understand each of them.

Pure

The Pure trait is defined by:

/**
 * Project the given value into the environment that is abstracted over. 
 */
def pure[A](a: A): P[A]

It actually says that you can take any value of some type A, and using the "pure" function, put it in a "Container" of type P, called the "environment" here. I guess that the idea is to say that once the value is well protected and controlled in its container, it is "pure". That being said, this is still very abstract. Fortunately we have some implicit values in the the Pure object giving us some examples of what it means to be "pure":

implicit val OptionPure = new Pure[Option] {
  def pure[A](a: A) = Some(a)
}
implicit val ListPure = new Pure[List] {
  def pure[A](a: A) = List(a)
}

A Pure[Option] of the value a is simply the value "a" inside the "Some" container. So if you think of Monads as "Containers" for computation then "pure" is what creates the container with a value inside.

Bind

Bind is the evil twin of Pure and, as far as I'm concerned, the essence of monads because it is the basis of computing with Monads:

/**
 * Binds a function through an environment (known also as sequencing).
 */
trait Bind[BD[_]] {
  /**
   * Binds the given value with the given value through the environment.
   */
  def bind[A, B](f: A => BD[B], ba: BD[A]): BD[B]
}

Every word is important here. You have an "environment" again. This "environment" or "Container" contains values which have been put there using the pure function for example. Now, what you want to do is to apply a function to the value inside the container, without the value leaving it. So you "bind" a function to the environment. Again, concrete examples, given by the implicit values in the Bind object will help us understand what's going on:

implicit val OptionBind = new Bind[Option] {
  def bind[A, B](f: A => Option[B], a: Option[A]) = a flatMap f
}
implicit val ListBind = new Bind[List] {
  def bind[A, B](f: A => List[B], a: List[A]) = a flatMap f
}

When I bind a function to an Option, like Some(4), it is as if I'm doing:

val f = x => Some(x + 2)
bind(f, Some(4))

1. apply the function to the inner element

Some(f(4)) === Some(Some(6))

2. "flatten" the inner result by removing the inner "box"

Some(6)

This brings 2 remarks here:

1. "bind" for a monad is indeed the "flatMap" function of an Iterable in Scala. It maps the value then flatten the results. The big difference is in the function signature. While flatMap is a method on Iterable and accepts a function returning another Iterable, the bind function accepts a "Container" type of any sort and returns the same container type; Binding to an Option will return an Option, binding to a List will return a List ("I suggest you do not use the available subversion of the List.flatMap type signature from which to learn"...)

Functor

2. why is it necessary to have a "bind" operation? If I want to get Some(6) as a result, something like "map" should be enough. True. And this even has a name, this is the "fmap" operation of the Functor trait (also mixed in by Monad):

trait Functor[F[_]] {
 /**
  * Maps the given function across the given environment.
  */
  def fmap[A, B](f: A => B, fa: F[A]): F[B]
}

The "fmap" operation just means transforming the value(s) inside the Container to something else. But the bind operation offers an additional advantage. It allows to compose functions returning values in the Container. I can't compose directly 2 functions returning an Option:

val f = (x:Int) => if (x % 2 == 0) Some(x) else None
val g = (x:Int) => if (x % 3 == 0) Some(x) else None

// this doesn't work since g returns an Option and f expects an Int
val composed = f . g // ??

But I can compose them using "bind":

// this works as expected
val composed: (Int => Option[Int]) = (x:Int) => g(x).bind(f)

Apply

The definition of Apply is this one:

trait Apply[AP[_]] {
  def apply[A, B](f: AP[A => B], fa: AP[A]): AP[B]
}

And a good example is provided for Lists:

implicit val ListApply = new Apply[List] {
  def apply[A, B](f: List[A => B], a: List[A]) = f flatMap (f => a map(f(_)))
}

We take a list of functions, a list of values, and we return a list with the results of all the functions applied to all the values. Actually this is the essence of the "Applicative" model of programming, so I don't really understand why it's been added to the Monad trait. I guess it is because monad computations imply applicative computations as described here.

Empty and Plus

Now that the tour of Monad is over, we can look at the Empty and Plus traits.

Empty is very easy, it just returns an empty "Container" (named "Environment" here):

trait Empty[E[_]] {
  /**
   * Returns the empty environment.
   */
  def empty[A]: E[A]
}

implicit val OptionEmpty = new Empty[Option] {
  def empty[A] = None
}

implicit val ListEmpty = new Empty[List] {
  def empty[A] = Nil
}

Plus defines what it means to "append" or "merge" two environments together. Its result is very specific to the underlying Container type as you can see with the implicit values:

trait Plus[P[_]] {
  /**
   * Appends the two given values.
   */
  def plus[A](a1: => P[A], a2: => P[A]): P[A]
}

implicit val OptionPlus = new Plus[Option] {
  def plus[A](a1: => Option[A], a2: => Option[A]) = a1 orElse a2
}

implicit val ListPlus = new Plus[List] {
  def plus[A](a1: => List[A], a2: => List[A]) = a1 ::: a2
}

Empty and Plus are indeed defining some kind of addition operation on Monads, aren't they?

The real academic mapFilter function ;-)

With all that at hand I should now be able to create my mapFilter function ;-)

import scalaz.control._
def mapFilter[T, A[_], U](f: T => A[U], m: A[T])(implicit m1: MonadEmptyPlus[A])= {
  m1.bind(f, m)
}

Is that all? Yes, let's try it:

// Note that Scala type inferencer doesn't infer the types here
mapFilter[Int, List, Int]((x:Int) => if (x%2 == 0) List(x*2) else Nil,
                          List(1, 2))

However we may wonder: is that really necessary to have a MonadEmptyPlus? No, a simple Monad is also ok:

def mapFilter[T, A[_], U](f: T => A[U], m: A[T])(implicit m1: Monad[A]) = { 
  m1.bind(f, m) 
}

And that's understable with the List example because we're using "flatMap" under the covers! But this is not totally satisfying. Somehow, I would like to enforce that if f returns the empty element for a given MonadEmptyPlus, then that element is filtered from the result. 

Even better with FoldRight

We can actually define this using another abstraction from the control package: FoldRight.

trait FoldRight[F[_]] {
  /**
   * Fold the given function across the given environment.
   */
  def foldRight[A, B](t: F[A], b: B, f: (A, => B) => B): B
}

FoldRight works exactly as the foldRight method on Iterable in Scala and allows us to use the Empty and Plus definition from MonadEmptyPlus, accumulating the mapped values, unless they're empty:

def mapFilter[T, A[_], U](m: A[T], f: T => A[U])
                         (implicit m1: MonadEmptyPlus[A], f1: FoldRight[A]) = {
  f1.foldRight[T, A[U]](m, m1.empty, (t, result) => m1.plus(f(t), result))
}

And that's the end of the assignement for today!

Conclusion

This exploration of Scalaz was very interesting for me because:
  1. The abstractions are nicely separated
  2. Implicit values provide lots of concrete examples
  3. The programming style is very close to the use of type classes in Haskell (which wraps my mind in a new way). The drawback is less type inference by Scala
  4. I had to think again about that "bind" function and why it was so special
  5. It gives me the desire to understand more deeply what are the differences between the so-called "models of programming": functor, monads, applicative, arrows,...
  6. I haven't been blogging for a looooonnnnng time.

7 comments:

how.gauche said...

Of course I wanted this function to be as efficient as possible so it had to do the job in one pass. Otherwise, I could just use a combination of "map" and "filter" to do the trick:

def mapFilter[T, U](l: List[T], f: T =>Option[U]): List[U] = {
l.map(f(_)).filter(_.isDefined).map(_.get)
}

3 passes on the list, we can definitely do better than this!


Change "l" to "l.toStream" above and you should have a one-pass mapFilter. Voila!

Kris Nuttycombe said...

Am I missing something here? The type signature of f in the following example doesn't seem to be correct:

When I bind a function to an Option, like Some(4), it is as if I'm doing:

val f = x => x + 2
bind(f, Some(4))

Shouldn't it be this?

val f = x => Some(x + 2)

Eric said...

@how.gauche

You're right! This changes the type of the return value of the mapFilter function but I should definitely add more laziness to my tool belt.

@Kris

Thanks Kris, I corrected that mistake.

bblfish said...

Can't find MonadPlus or MonadEmptyPlus when searching scalaz. Was that something you made up, or did I not find right? From the text at the top it sounds like it was something in the scalaz library...

Eric said...

I think that this all has been refactored into the Monoid class, where Empty is Zero and Plus is Semigroup. So I guess that Monad would become MonadMonoid (but I can't find anything like that either).

The "old" names come from the Haskell way of doing things: http://en.wikibooks.org/wiki/Haskell/MonadPlus, but the refactoring is better since it reuses the Monoid typeclass.

bblfish said...

yes, I had not realised how old this post was. :-) I came to it while reading your recent post "The Essence of the Iterator Pattern", somehow...

It looks like there have been a lot of changes since 2009. It is quite interesting to see those. I noticed for example that there is no ListApply either anymore.

Eric said...

Yes, Scalaz is still evolving. And ListApply is back as ListApplic in Scalaz 7!