Pages

12 October 2011

Counting words - part 2

In this small post, I'm going to show how I used Functional Reactive Programming (FRP) to improve the GUI part of the small application presented in the previous post.

The best article to read on the subject is Deprecating the Observer Pattern (DOP). This article explains what are the drawbacks of using a listener-based approach to components communication in a user interface and proposes FRP as an alternative.

Be reactive

What I've been using is a simplified yet powerful version of the code described in DOP, using the Reactive library by Naftoli Gugenheim. In this library you have 2 essential concepts: EventStreams and Signals:

  • EventStream[T] is a collection of values of type T happening at certain moments in time
  • Signal[T] is a value of type T which can change from time to time

They are actually 2 sides of the same coin (more or less) as explained in this video: from an EventStream you can get a Signal by holding the last emitted value and from a Signal you can get an EventStream by getting all the changes.

The great thing about an EventStream (let's focus on that one for now) is that you can manipulate it like a collection of objects:

  • you can filter it to get only the events you're interested in
  • you can map the events to other events
  • you can flatMap with a function returning another EventStream and so on,...

But how is this helpful for GUI programming?

With GUI components

The hard reality of Swing GUI components is that they are really mutable at heart. Once you compose a GUI with components (say a Frame) containing other components (say a TextField), then, when anything happens in your application, you mutate the innermost components heavily (by changing the text color for example). When you add publish/subscribe mechanisms on top of that, you add even more developer-managed mutability since you need to add-remove listeners to the whole graph of components. It is also not very easy to understand how events "flow" between components.

The way I see the usage of FRP with GUIs is:

  • components are seen as either creating values to propagate (like a button action) or consuming values (like a text field change with the new value). Hence they have a role of an EventStream source or of an EventStream sink (sometimes both)

  • those components explicitely declare the abstract type of streamed events they're expecting. For a given TextField this might be something like NewSelectedFile whether the selection comes from a FileChooser or from a simple TextField

  • the event streams can be merged, filtered, mapped functionally, with no side-effect so that the logic of events routing is very composable and explicit

Let's see that more precisely in the context of my simple application.

An example

In my WordCount application, the first thing the user does is to select a file to count. Then she is supposed to click on the "Count" button to start the count before the results are displayed. In terms of graphical components I have:

    val openFileMenuItem = OpenFileMenuItem("./src/test/resources")
    val countAction      = CountAction()

What you don't see above is that those 2 custom GUI components have been declared as extending EventStream[T] (simplified code for the explanation):

    OpenFileMenuItem ... with EventStream[File]
    CountAction      ... with EventStream[Unit]

This means that when you open a file using the OpenFileMenuItem you're providing new File events which other components can react on, and when you invoke the CountAction you, well,... you just pressed the button, there's no meaningful value to convey, so the Unit type is appropriate here (the clients just want to know that something happened).

Then we can compose those 2 EventStreams:

  val filePath: Signal[String]  = openFileMenuItem.map(_.getPath).hold("")
  val doCount: EventStream[Any] = filePath.distinct.change | countAction

First we do a bit of filtering, because we just need the file path as a Signal[String] (using hold to transform the stream to a signal, with an empty initial value).
Then we declare that we need to do a count whether there's a distinct change in the file path value, or (|), if the user pressed countAction button.

How do we "consume" this doCount stream of events? We flatMap it to another stream of events providing the results of the counting:

 val results: EventStream[Results] =
   doCount flatMap { doIt => WordCounter.count(filePath.now).inBackground }

For each doCount event we flatMap it (actually we forget about it,...) and we use the current value of the filePath to count the number of words.

The expression computeValue.inBackground computes a value using the SwingUtils.invokeLater method to avoid computations being done on the event dispatch thread (this might cause grey screens). The inBackground method returns an EventStream[T] to signal consumers that the value is ready.

Since the result of counting is an EventStream[Results] I can then "plug" it into the GUI component doing the display:

  val resultsPanel = ResultsPanel(results)

And that's it. The ResultsPanel component doesn't care where the values come from, who created them. It is also interesting to see how the ResultsPanel declares its sub-components:

  object ResultsPanel {
    def apply(results: EventStream[Results]) = {
      new ResultsPanel(TotalWordsNumbers(results),
                       ReferencesTable(results.map(_.references)),
                       ErrorMessageBox(results.map(_.message)))
    }
  }

There are 3 sub-components and they use different parts of the Results event stream, so we use the map function to restrict exactly the stream to what is needed:

  • TotalWordsNumbers uses the full Results object to display the total words count (the one we really want), the references word count and the full text word count (to check that the count is ok)

  • ReferencesTable just needs the references

  • ErrorMessageBox needs the error message so we just map that

Finally, how is the event stream used in the component itself? Let's look at the ErrorMessageBox component:

  case class ErrorMessageBox(message: EventStream[String]) extends TextField with Observing {
    foreground = Color.red
    font = new Font("Courier", Font.PLAIN, 15);
    editable = false

    message.foreach { m => text = m }
  }

The important line is the last one where we declare that for each message event, we change the text attribute of the TextField to the new value m.

Some implementation notes

If you read the actual code, you'll find quite a few differences with the real implementation:

  • the inBackground mechanism is enhanced with an additional actionProgress eventStream which fires events before and after the action to execute. This is used to change the main frame cursor to a waiting watch when the computation takes some time

  • I found useful to introduce a trait called Trigger for EventStream[()]

  • I also added an EventStreamSourceProxy[T] extends EventStream[T] trait which can be mixed in any GUI class. This trait uses an internal EventSource[T] which is the implementation of the EventStream[T] and can be used to fire events. For example:

    // When the action is executed (a file is selected) we fire an event and
    // the whole `OpenFileMenuItem` component acts as an `EventStream[File]`.
    case class OpenFileMenuItem(start: String) extends MenuItem("Open") with EventStreamSourceProxy[File] {
      ...
      action = new Action("Open") {
        def apply() {
          ...
          source.fire(fileChooser.selectedFile)
        }
      }
    }
    
  • I had some difficult debugging time with the Observing trait. If you place it on an object that's going to be garbage collected, your components will not be notified of new events. This happened to me as I placed it on a class used for an implicit conversion, in order to get a new shinyMethod (tm):

    implicit def toComponent(a: A): ImplicitComponent = new ImplicitComponent(a)
    class ImplicitComponent(a: A) extends Observing {
      def shinyMethod = a
    }
    

Features ideas

This is something which amazed me: just having to think about event streams made me rethink the application functionalities. How? When I started thinking about what would trigger a word count I realized that:

  • there is no reason why the user should have to click the "Count" button when the file is selected, we can do the count and display the results right away

  • thinking about event stream as a flux made me realize that the file can be polled regularly for changes and the results displayed whenever there's a change

Changing my program to incorporate those 2 ideas was soooo easy:

  1. the first idea is reflected by the doCount event stream definition given above, we just say that we want to count whenever there's a file selection or a count action

    val doCount: EventStream[Any] = filePath.distinct.change | countAction
    
  2. creating a file poller is easy using the Timer class from the reactive library

    class FilePoller(path: Signal[String], delay: Long = 500) extends Trigger {
    
      private var previousLastModified = new File(path.now).lastModified()
    
      val timer = new Timer(0, delay, {t => false}) foreach { tick =>
        def newLastModified = new File(path.now).lastModified()
        if (newLastModified > previousLastModified || newLastModified == 0) {
          previousLastModified = newLastModified
          source.fire(())
        }
      }
    }
    
    The `FilePoller` uses a `path` signal regularly. If the underlying file is modified, a notification event is triggered.
    
  3. then the final version of doCount becomes:

    lazy val filePoller                = new FilePoller(filePath)
    lazy val doCount: EventStream[Any] = filePath.distinct.change | countAction | filePoller
    

I don't know about you but I really find nice that the abstractions in my implementation give me hints about what the application could do! For me this is a good sign that FRP is really well-suited for the job of GUI programming.

11 October 2011

Counting words

In this 3 parts post I want to show:

  1. how standard, out-of-the-box, Scala helped me to code a small application
  2. how Functional Reactive Programming brings a real improvement on how the GUI is built
  3. how to replace Parser Combinators with the Parboiled library to enhance error reporting

You can access the application project via github.

I can do that in 2 hours!

That's more or less what I told my wife as she was explaining one small problem she had. My wife is studying psychology and she has lots of essays to write, week after week. One small burden she's facing is keeping track of the number of words she writes because each essay must fit in a specific number of words, say 4000 +/- 10%. The difficulty is that quotations and references must not be counted. So she cannot check the file properties in Word or Open Office and she has to keep track manually.

For example, she may write: "... as suggested by the Interpretation of Dreams (Freud, 1905, p.145) ...". The reference "(Freud, 1905, p.145)" must not be counted. Or, "Margaret Malher wrote: "if the infant has an optimal experience of the symbiotic union with the mother, then the infant can make a smooth psychological differentiation from the mother to a further psychological expansion beyond the symbiotic state." (Malher cited in St. Clair, 2004, p.92)" (good luck with that :-)). In that case the quotation is not counted either and we must only count 3 words.

Since this counting is a bit tedious and has to be adjusted each time she does a revision of her essay, I proposed to automate this check. I thought "a few lines of Parser Combinators should be able to do the trick, 2 hours max". It actually took me a bit more (tm) to:

  • write a parser robust enough to accommodate for all sorts of variations and irregularities. For example, pages can be written as "p.154" or "pp.154-155", years can also be written "[1905] 1962" where 1905 is the first edition, and so on

  • use scala-swing to display the results: number of words, references table, file selection

  • write readers to extract the text from .docx or .odt files

Let's see now how Scala helped me with those 3 tasks.

Parsing the text

The idea behind parser combinators is very powerful. Instead of building a monolithic parser with lots of sub-routines and error-prone tracking of character indices, you describe the grammar of the text to parse by combining smaller parsers in many different ways.

In Scala, to do this, you need to extend one of the Parsers traits. The one I've choosen is RegexParsers. This is a parser which is well suited for unstructured text. If you were to parse something more akin to a programming language you might prefer StdTokenParsers which already define keywords, numeric/string literals, identifiers,...

I'm now just going to comment on a few points regarding the TextParsing trait which is parsing the essay text. If you want to understand how parser combinators work in detail, please read the excellent blog post by Daniel Spiewak: The Magic behind Parser Combinators.

The main definition for this parser is:

  def referencedText: Parser[Results] =
    rep((references | noRefParenthesised | quotation | words | space) <~ opt(punctuation)) ^^ reduceResults

This means that I expect the text to be:

  • a repetition (the rep combinator) of a parser

  • the repeated parser is an alternation (written |) of references, parenthetised text, quotations, words or spaces. For each of these "blocks" I'm able to count the number of words. For example, a reference will be 0 and parenthetised text will be the number of words between the parentheses

  • there can be a following punctuation sign (optional, using the opt combinator), but we don't really care about it, so it can be discarded (hence the <~ combinator, instead of ~ which sequences 2 parsers)

Then I have a function called reduceResults taking the result of the parsing of each repeated parser, to create the final Result, which is a case class providing:

  • the number of counted words
  • the references in the text
  • the quotations in the text

Using the RegexParser trait is very convenient. For example, if I want to specify how to parse "Pages" in a reference: (Freud, 1905, p.154), I can sequence 2 parsers built from regular expressions:

  val page = "p\\.*\\s*".r ~ "\\d+".r
  • appending .r to a string returns a regular expression (of type Regex)
  • there is an implicit conversion method in the RegexParsers trait, called regex from a Regex to a Parser[String]
  • I can sequence 2 parsers using the ~ operator

The page parser above can recognize page numbers like p.134 or p.1 but it will also accept p134. You can argue that this is not very well formatted, and my wife will agree with you. However she certainly doesn't want to see the count of words being wrong or fail just because she forgot a dot! The plan here is to display what was parsed so that she can eventually fix some incorrect references, not written according to the academia standards. We'll see, in part 3 of this series how we can use another parsing library to manage those errors, without breaking the parsing.

One more important thing to mention about the use of the RegexParsers trait is the skipWhitespace method. If it returns true (the default), any regex parser will discard space before any string matching a regular expression. This is convenient most of the time but not here where I need to preserve spaces to be able to count words accurately.

To finish with the subject of Parsing you can have a look at the TextParsingSpec specification. This specification features a ParserMatchers trait to help with testing your parsers. It also uses the Auto-Examples feature of specs2 to use the text of the example directly as a description:

  "Pages"                                                                           ^
  { page must succeedOn("p.33") }                                                   ^
  { page must succeedOn("p33") }                                                    ^
  { pages must succeedOn("pp.215-324") }                                            ^
  { pages must succeedOn("pp.215/324") }                                            ^
  { pages("pp. 215/324") must haveSuccessResult(===(Pages("pp. 215/324"))) }        ^

Displaying the results

The next big chunk of this application is a Swing GUI. The Scala standard distribution provides a scala-swing library adding some syntactic sugar on top of regular Swing components. If you want to read more about Scala and Swing you can have a look at this presentation.

The main components of my application are:

  • a menu bar with 2 buttons to: select a file, do the count
  • a field to display the currently selected file
  • a results panel showing: the number of counted words and the document references

wordcount application
count example, note that the parsing is not perfect since the word counts do not add up!

If you have a look at the code you will see that this translates to:

  • an instance of SimpleSwingApplication defining a top method and including all the embedded components: a menu bar, a panel with the selected file and results
  • the subcomponents themselves: the menu items, the count action, the results panel
  • the "reactions" which is a PartialFunction listening to the events created by some components, SelectionChanged for example, and triggering the count or displaying the results

I was pretty happy to see that much of the verbosity of Swing programming is reduced with Scala:

  • you don't need to create xxxListeners for everything
  • there are components providing both a Panel and a LayoutManager with the appropriate syntax to display the components: FlowPanel, BoxPanel, BorderPanel
  • thanks to scala syntax you can write action = new Action(...) instead of setAction(new Action(...))

This is nice but I think that there is a some potential for pushing this way further and create more reusable out-of-the-box components. For example, I've created an OpenFileMenuItem which is a MenuItem with an Action to open a FileChooser. Also, something like a pervasive LabeledField with just a label and some text would very useful to have in a standard library.

I also added a bit of syntactic sugar to have actions executed on a worker thread, instead of the event dispatch thread (to avoid grey screens), using the SwingUtilities.invokeLater method. For example: myAction.inBackground will be executed on a separate thread.

Eventually, I was able to code up the GUI of the application pretty fast. The only thing which I didn't really like was the Publish/React pattern. It felt a bit messy. The next part of this series will show how Functional Reactive Programming with the reactive library helped me write cleaner code.

Reading the file

I anticipated this part to be a tad difficult. My first experiments of text parsing were using a simple text file and I knew that having the user (my wife, remember,...) copy and paste her text to another file just for counting would be a deal-breaker. So I tried to read .odt and .docx files directly. This was actually much easier than anything I expected!

Both formats are zipped xml files. Getting the content of those files is just a matter of:

  • reading the ZipFile entries and find the file containing the text

    val rootzip = new ZipFile(doc.path)
    rootzip.entries.find(_.getName.equals("word/document.xml"))
    
  • loading the xml as a NodeSeq

    XML.load(rootzip.getInputStream(f)))
    
  • find the nodes containing the actual text of the document and transform them to text

    // for a Word document text is under <p><t> tags
    (xml \\ "p") map (p => (p \\ "t").map(_.text) mkString "") mkString "\n"
    

For further details you can read the code here.

Recap

That's it, parser combinators + scala-swing + xml = a quick app solving a real-world problem. In the next posts we'll try to make this even better!

24 June 2011

The Essence of the Iterator Pattern

"The Essence of the Iterator Pattern"(EIP) is the paper I liked the most last year. It gave me a brand new look over something which I had be using for years: the for loop.

In this post I'll try to present some ideas of that paper and show how to implement the solution described by the authors using Scalaz-like code. A minimum previous exposure to functional programming, functors and monads will definitely help!

What's in a for loop?

That was really what hooked me. What do you mean "what's in a for loop"?. Is there anything magic in that construct I've been using for years?

The introduction of EIP shows an example of a for loop to iterate on elements (not the C-like for used with an index). I'm transmogrifying it here to Scala but the idea remains the same:

  val basket: Basket[Fruit] = Basket(orange, apple)
var count = 0

val juices = Basket[Juice]()
for (fruit <- basket) {
count = count + 1
juices.add(fruit.press)
}

We start from a "container" of fruits: Basket. It could actually be anything, a List, a Tree, a Map... Then the for loop actually does 3 things:

  1. it returns a container having the same "shape"; juices is still a Basket
  2. it accumulates some kind of measure. Here, the number of fruits in the count variable
  3. it maps the elements to other elements: pressing the fruits to get some juice

And this for loop is actually not the most complex:

  • the count variable could influence the mapping of elements: juices.add(fruit.press(harder=count))
  • we could have several variables depending on each other: cumulative = cumulative + count
  • the mapping could also influence a "measure" variable: liquid = liquid + fruit.press.quantity

The purpose of EIP is to show that the "essence" of what happens in the for loop above can be abstracted by an Applicative Traversal. And the authors go on showing that given this Applicative abstraction, we get an incredible modularity for programming.

The Applicative typeclass

How can an Applicative traversal be better than a for loop, and what does that even mean?? EIP has a lot of sentences and expressions which can be hard to grasp if you don't have a strong functional programming / Haskell background. Let's try to dissect that slowly and start with the formal definitions anyway.

What is a Functor?

The first thing we need to talk about is the Functor typeclass:

  trait Functor[F[_]] {
def fmap[A, B](f: A => B): F[A] => F[B]
}

One way of interpreting a Functor is to describe it as a computation of values of type A. For example List[A] is a computation returning several values of type A (non-deterministic computation), Option[A] is for computations that you may or may not have, Future[A] is a computation of a value of type A that you will get later, and so on. Another way of picturing it is as some kind of "container" for values of type A.

Saying that those computations are Functors is essentially showing that we can have a very useful way of combining them with regular functions. We can apply a function to the value that is being computed. Given a value F[A] and a function f, we can apply that function to the value with fmap. For example, fmap is a simple map for a List or an Option.

Pointed Functor

By the way, how do you even create a value of type F[A]? One way to do that is to say that F[_] is Pointed:

  trait Pointed[F[_]] {
def point[A](a: => A): F[A]
}

That is, there is a point function taking a (call by name) value of type A and returning a F[A]. For example, a regular List is Pointed just by using the constructor for Lists:

  object PointedList extends Pointed[List] {
def point[A](a: => A) = List(a)
}

Then combining the 2 capabilities, pointed and functor, gives you a PointedFunctor:

  trait PointedFunctor[F[_]] {
val functor: Functor[F]
val pointed: Pointed[F]

def point[A](a: => A): F[A] = pointed.point(a)

def fmap[A, B](f: A => B): F[A] => F[B] = functor.fmap(f)
}

The PointedFunctor trait is merely the aggregation of a Pointed and a Functor.

What about Applicative then? We're getting to it, the last missing piece is Applic.

Applic

Applic is another way to combine a "container" with a function.

Instead of using fmap to apply the function to the computed value we suppose that the function is itself a computed value inside the container F (F[A => B]) and we provide a method applic to apply that function to a value F[A]:

  trait Applic[F[_]] {
def applic[A, B](f: F[A => B]): F[A] => F[B]
}

Let's take an example. Say I have a way to compute the price of a Fruit when the market is open:

  def pricer(market: Market): Option[Fruit => Double]

If the market is closed, pricer returns None, because we don't know what are the prices. Otherwise it returns a pricing function. Now if I have a grow function possibly returning a Fruit:

  def grow: Option[Fruit]

Then, using the Applic instance, you can price the Fruit:

  val price: Option[Double] = applic(pricer(market)).apply(grow)

The price will necessarily be an Option because you may not have a pricer nor a Fruit to price. And a bit of renaming and pimping reveals why we're using the term "Applicative":

  val pricingFunction = pricer(market)
val fruit = grow

val price: Option[Double] = pricingFunction ⊛ fruit

In a way we're just doing a normal function application, but we're just doing it inside the Applicative container. Now we have all the pieces to build the Applicative functor that EIP is talking about.

Applicative Functor

An Applicative Functor is the aggregation of an Applic and a PointedFunctor:

  trait Applicative[F[_]] {
val pointedFunctor: PointedFunctor[F]
val applic: Applic[F]

def functor: Functor[F] = new Functor[F] {
def fmap[A, B](f: A => B) = pointedFunctor fmap f
}
def pointed: Pointed[F] = new Pointed[F] {
def point[A](a: => A) = pointedFunctor point a
}

def fmap[A, B](f: A => B): F[A] => F[B] = functor.fmap(f)
def point[A](a: => A): F[A] = pointed.point(a)
def apply[A, B](f: F[A => B]): F[A] => F[B] = applic.applic(f)
}

Let's see how that can be implemented for a List. fmap and point are straightforward:

   def fmap[A, B](f: A => B): F[A] => F[B] = (l: List[A]) => l map f
def point[A](a: => A): F[A] = List(a)

apply turns out to be more interesting because there are 2 ways to implement it, both of them being useful:

  1. apply a list of functions to each element and gather the results in a List:

    def apply[A, B](f: F[A => B]): F[A] => F[B] = (l: List[A]) =>
    for { a <- l; func <- f } yield func(a)
  2. zip the list of functions to the list of elements to apply each function to each element

    def apply[A, B](f: F[A => B]): F[A] => F[B] = (l: List[A]) =>
    (l zip f) map (p => p._2 apply p._1)

There is even a third way to use List as an Applicative by using the fact that List is a Monoid. But more on that later, for now we still have to see how all of this relates to the for loop...

Traversing the structure

When we do a for loop, we take a "structure" containing some elements and we "traverse" it to return:

  • that same structure containing other elements
  • a value computed from the structure elements
  • some combination of above

Gibbons & Oliveira argue that any kind of for loop can be represented as the following traverse operation:

  trait Traversable[T[_]] {
def traverse[F[_] : Applicative, A, B](f: A => F[B]): T[A] => F[T[B]]
}

That is, if the container/structure of type T has this traverse function using an Applicative F, then we can do whatever we would do with a for loop on it.

To get a better feel for this traverse function, we're going to implement the Traversable trait for a binary tree and then we'll see how can we loop on that tree.

A Binary Tree

For all the other examples in this post, we're going to use a very simple binary tree:

  sealed trait BinaryTree[A]
case class Leaf[A](a: A) extends BinaryTree[A]
case class Bin[A](left: BinaryTree[A], right: BinaryTree[A]) extends BinaryTree[A]

On the other hand, the first shot at the Traversable implementation is barely readable!

 def BinaryTreeIsTraversable[A]: Traversable[BinaryTree] = new Traversable[BinaryTree] {

def createLeaf[B] = (n: B) => (Leaf(n): (BinaryTree[B]))
def createBin[B] = (nl: BinaryTree[B]) =>
(nr: BinaryTree[B]) => (Bin(nl, nr): BinaryTree[B])

def traverse[F[_] : Applicative, A, B](f: A => F[B]):
BinaryTree[A] => F[BinaryTree[B]] = (t: BinaryTree[A]) => {
val applicative = implicitly[Applicative[F]]
t match {
case Leaf(a) => applicative.apply(applicative.point(createLeaf[B]))(f(a))
case Bin(l, r) =>
applicative.apply(applicative.apply(applicative.point(createBin[B]))(traverse[F, A, B](f).apply(l))).
apply(traverse[F, A, B](f).apply(r))
}
}
}

This is a shame because the corresponding Haskell code is so concise:

  instance Traversable Tree where
traverse f (Leaf x) = pure Leaf ⊛ f x
traverse f (Bin t u) = pure Bin ⊛ traverse f t ⊛ traverse f u

A bit of pimping to the rescue and we can improve the situation:

def traverse[F[_] : Applicative, A, B](f: A => F[B]): BinaryTree[A] => F[BinaryTree[B]] = (t: BinaryTree[A]) => {
t match {
case Leaf(a) => createLeaf[B] ∘ f(a)
case Bin(l, r) => createBin[B] ∘ (l traverse f) <*> (r traverse f)
}
}

Informally the traverse method applies the function f to each node and "reconstructs" the tree by using the apply method (<*>) of the Applicative functor.

That's certainly still some Ancient Chinese to you (as it was for me) so we'd better see the traverse method at work now. But we need to take another detour :-)

Applicative Monoid

One simple thing we might want to do, when iterating on a BinaryTree, is to get the content of that tree in a List. To do that, we're going to use the 3rd way to use List as an Applicative, as mentioned earlier. It turns out indeed that any Monoid (what is it?) gives rise to an Applicative instance but in a way that's a bit surprising.

  /** Const is a container for values of type M, with a "phantom" type A */
case class Const[M, +A](value: M)

implicit def ConstIsPointed[M : Monoid] = new Pointed[({type l[A]=Const[M, A]})#l] {
def point[A](a: => A) = Const[M, A](implicitly[Monoid[M]].z)
}

implicit def ConstIsFunctor[M : Monoid] = new Functor[({type l[A]=Const[M, A]})#l] {
def fmap[A, B](f: A => B) = (c: Const[M, A]) => Const[M, B](c.value)
}

implicit def ConstIsApplic[M : Monoid] = new Applic[({type l[A]=Const[M, A]})#l] {
def applic[A, B](f: Const[M, A => B]) = (c: Const[M, A]) => Const[M, B](implicitly[Monoid[M]].append(f.value, c.value))
}

implicit def ConstIsPointedFunctor[M : Monoid] = new PointedFunctor[({type l[A]=Const[M, A]})#l] {
val functor = Functor.ConstIsFunctor
val pointed = Pointed.ConstIsPointed
}

implicit def ConstIsApplicative[M : Monoid] = new Applicative[({type l[A]=Const[M, A]})#l] {
val pointedFunctor = PointedFunctor.ConstIsPointedFunctor
val applic = Applic.ConstIsApplic
}

In the code above, Const is the Applicative instance for a given Monoid. Const contains values of type T where T is a Monoid and we progressively establish what are the properties that Const must satisfy to be Applicative.

  • it must first be Pointed. Informally, the point method puts the neutral element of the Monoid in a Const instance

  • then it must be a Functor. Here the fmap function doesn't do anything but changing the type of Const from Const[M, A] to Const[M, B]

  • finally it must be an Applic where the apply method of Applic uses the append method of the Monoid to "add" 2 values and return the result in a Const instance.

There is unfortunately a lot of typing vodoo thing here:

  • the type declaration for Const is Const[A, +B]. It has a type parameter B which is actually not represented by a value in the Const class! It is a phantom type. But it is actually indispensable to match the type declarations of the typeclasses

  • the type F that is supposed to be Applicative is... ({type l[A] = Const[T, A]})#l. Ouch, this deserves some explanation!

What we want is not so hard. The type Const[A, B] has 2 type parameters. We need a way to fix A to be T and get the resulting type which will have only one type parameter. The expression above is the most concise way to get this desired type:

  • { type l = SomeType } is an anonymous type with a type member called l. We can access that type l in Scala by using #: { type l = SomeType }#l

  • Then, in { type l[A] = SomeType[T, A] }#l, l is a higher-kinded type, having a type variable A (actually SomeType[T, A] where T is fixed)

That was a really long detour for a mere for loop, isn't it? Now... profit!


Contents of a BinaryTree...

We're going to use the Traversable instance for the BinaryTree and the List Monoid Applicative to get the contents of a BinaryTree:

  import Applicative._

val f = (i: Int) => List(i)
val tree = Bin(Leaf(1), Leaf(2))

(tree.traverse[...](f)).value must_== List(1, 2)

Simple, for each element of the tree, we put it in a List then we let the List Monoid do its magic and aggregate all the results as we traverse the tree. The only difficulty here is the limits of Scala type inference. The ... stands for type annotations that the compiler requires:

  tree.traverse[Int, ({type l[A]=Const[List[Int], A]})#l](f)

Not pretty :-(

Update: As pointed out by Ittay Dror in the comments, List[Int] is not an applicative by itself and we need to put this list into a Const value to make it usable by the traverse function.

This is actually done by an implicit conversion method, liftConst, provided by the Applicative object:
  implicit def liftConst[A, B, M : Monoid](f: A => M): A => Const[M, B] = 
(a: A) => Const[M, B](f(a))

Profit time

Not everything is lost! We can encapsulate a bit the complexity in this case. We can extract part of the code above and create a contents method which will work on any of Traversable instance (assume I'm pimping the following examples so that I can write tree.method instead of method(tree)):

  val tree: BinaryTree[Int] = Bin(Leaf(1), Leaf(2))
tree.contents must_== List(1, 2)

This is based on the following definition:

  def contents[A]: T[A] => List[A] = {
val f = (a: A) => Const[List[A], Any](List(a))
(ta: T[A]) => traverse[({type l[U]=Const[List[A], U]})#l, A, Any](f).apply(ta).value
}

It also turns out that the contents function is a specialized version of something even more generic, the reduce function, working with any Monoid:

  def contents[A]: T[A] => List[A] = reduce((a: A) => List(a))

def reduce[A, M : Monoid](reducer: A => M): T[A] => M = {
val f = (a: A) => Const[M, Any](reducer(a))
(ta: T[A]) => traverse[({type l[A]=Const[M, A]})#l, A, Any](f).apply(ta).value
}

The reduce function can traverse any Traversable structure with a function mapping each element to a Monoid element. We've used it to get the contents of the tree but can as easily get the number of elements:

  def count[A]: T[A] => Int = reduce((a: A) => 1)

tree.count must_== 2

Can it get simpler than this :-)? Actually in that case it can! Since we don't need (a: A) at all we can use reduceConst:

  def reduceConst[A, M : Monoid](m: M): T[A] => M = reduce((a: A) => m)

def count[A]: T[A] => Int = reduceConst(1)

It's like a Scala standard reduce on steroids because instead you don't need to provide a binary operation, you just need a Monoid instance.

.... and shape of a BinaryTree

We've addressed the question of doing some kind of accumulation based on the elements in the tree, now we're going to "map" them.

Monads are Applicatives too!

The following map method can be derived from the traverse method (note that no type annotations are necessary in that case, yes!):

  def map[A, B](mapper: A => B) = (ta: T[A]) => traverse((a: A) => Ident(mapper(a))).apply(ta).value

Here we're traversing with an Applicative which is very simple, the Ident class:

  case class Ident[A](value: A)

The Ident class is a simple wrapper around a value, nothing more. That simple class is an Applicative. But how?

Easy. Ident is actually a Monad and we can construct an Applicative instance from every Monad. This comes from the fact that a Monad is both a PointedFunctor and an Applic:

  trait Monad[F[_]] {
val pointed: Pointed[F]
val bind: Bind[F]

def functor: Functor[F] = new Functor[F] {
def fmap[A, B](f: A => B): F[A] => F[B] = (fa: F[A]) =>
bind.bind((a: A) => pointed.point(f(a))).apply(fa)
}

def pointedFunctor: PointedFunctor[F] = new PointedFunctor[F] {
val functor = Monad.this.functor
val pointed = Monad.this.pointed
}

def applic: Applic[F] = new Applic[F] {
def applic[A, B](f: F[A => B]) = a =>
bind.bind[A => B, B](ff => functor.fmap(ff)(a))(f)
}

def applicative: Applicative[F] = new Applicative[F] {
val pointedFunctor = Monad.this.pointedFunctor
val applic = Monad.this.applic
}
}

And the Ident class is trivially a Monad (having a pointed and a bind member):

  implicit def IdentIsMonad = new Monad[Ident] {

val pointed = new Pointed[Ident] {
def point[A](a: => A): Ident[A] = Ident(a)
}
val bind = new Bind[Ident] {
def bind[A, B](f: A => Ident[B]): Ident[A] => Ident[B] =
(i: Ident[A]) => f(i.value)
}

}

We can use our brand new map function now:

  tree.map((i: Int) => i.toString) must_== Bin(Leaf("1"), Leaf("2"))

We can even use it to get the "shape" of our container and discard all the elements:

  tree.shape must_== Bin(Leaf(()), Leaf(()))

The shape method just maps each element to ().

Decompose / Compose

I recap. We implemented a very generic way to iterate over a structure, any kind of structure (as long as it's Traversable), containing elements, any kind of element, with a function which does an "application", any kind of application. Among the possible "applications", we've seen 2 examples: collecting and mapping which are the essential operations that we usually do in a for loop.

Specifically we were able to get the contents of a tree and its shape. Is there a way to compose those 2 operations into a decompose operation that would get both the content and the shape at once? Our first attempt might be:

  def decompose[A] = (t: T[A]) => (shape(t), contents(t))

tree.decompose must_== (Bin(Leaf(()), Leaf(())), List(1, 2))

This works but it is pretty naive because this requires 2 traversals of the tree. Is that possible to do just one?

Applicative products

This is indeed possible by noticing the following: the product of 2 Applicatives is still an Applicative.

Proof, proof. We define Product as:

  case class Product[F1[_], F2[_], A](first: F1[A], second: F2[A]) {
def tuple = (first, second)
}

I spare you the full definition of Product as an Applicative to just focus on the Applic instance:

  implicit def ProductIsApplic[F1[_] : Applic, F2[_] : Applic] =
new Applic[({type l[A]=Product[F1, F2, A]})#l] {
val f1 = implicitly[Applic[F1]]
val f2 = implicitly[Applic[F2]]

def applic[A, B](f: Product[F1, F2, A => B]) = (c: Product[F1, F2, A]) =>
Product[F1, F2, B](f1.applic(f.first).apply(c.first),
f2.applic(f.second).apply(c.second))
}

That's not too complicated, you just have to follow the types. What's more troubling is the amount of type annotations which are necessary to implement decompose. Ideally we would like to write:

  def decompose[A] = traverse((t: T[A]) => shape(t) ⊗ contents(t))

Where is an operation taking 2 Applicatives and returning their product. Again the lack of partial type application for Const muddies the whole (upvote SI-2712 please!):

val shape   = (a: A) => Ident(())
val content = (a: A) => Const[List[A], Unit](List(a))

val product = (a: A) => (shape(a).⊗[({type l[T] = Const[List[A], T]})#l](content(a)))

implicit val productApplicative =
ProductIsApplicative[Ident, ({type l1[U] = Const[List[A], U]})#l1]

(ta: T[A]) => { val (Ident(s), Const(c)) =
traverse[({type l[V] = Product[Ident, ({type l1[U] = Const[List[A], U]})#l1, V]})#l, A, Unit](product).
apply(ta).tuple
(s, c)
}

We can improve the code sligthly by moving the implicit definition for productApplicative inside the Applicative companion object:

  object Applicative {
...
implicit def ProductWithListIsApplicative[A[_] : Applicative, B] =
ProductIsApplicative[A, ({type l1[U] = Const[List[B], U]})#l1]
}

Then no implicit val productApplicative is necessary and the Applicative imports will be all we need.

Collection and dispersal

There is another way to do things "in parallel" while traversing the structure. The collect method that we're going to build will do 2 things:

  • it will accumulate some kind of state, based on the elements that we meet

  • it will map each element to another kind of element

So, as we're iterating, we can do a regular mapping while computing some kind of measure. But before that, we need to take a little detour (again?? Yes, again) with the State monad.

The State monad

The State Monad is defined by:

  trait State[S, +A] {
def apply(s: S): (S, A)
}

It is basically:

  • an object keeping some previous "state", of type S
  • a method to extract a meaningful value from this "state", of type A

  • this method computes a new "state", of type S

For example, a simple counter for the number of elements in a List[Int] can be implemented by:

  val count = state((n: Int) => (n+1, ()))

It takes the previous "count" number n and returns the new state n+1 and the extracted value (() here, because we don't need to extract anything special).

The State type above is a Monad. I encourage you to read "Learn You a Haskell" to get a better understanding on the subject. I will just show here that the flatMap (or bind) method of the Monad typeclass is central in putting that State to work:

  val count = (s: String) => state((n: Int) => (n+1, s + n))

(count("a-") flatMap count flatMap count).apply(0) must_== (3, "a-012")

The count function takes the latest computed string and returns a State where we increment the current "state" by 1 and we have a new String as the result, where the current count is appended. So when we start with the string "a-" and we flatMap count 2 times, we get (3, "a-012") where 3 is the number of times we've applied the n+1 function and "a-012" the result of appending to the current string.

By the way, why do we need to apply(0)?

When we do all the flatMaps, we actually store "stateful computations". And they are executed only once we provide the initial state: 0!

Collecting elements

Let's now define a collect operation on Traversable which will help us to count:

  def collect[F[_] : Applicative, A, B](f: A => F[Unit], g: A => B) = {
val applicative = implicitly[Applicative[F]]
import applicative._

val application = (a: A) => point((u: Unit) => g(a)) <*> f(a)
traverse(application)
}

This collect operation, defined in EIP, is different from the collect operation on Scala collections which is the equivalent of filter + map. The collect of EIP is using 2 functions:

  • f: A => F[Unit] which collects data from each element "effectfully" (that is, possibly keeping state)

  • g: A => B which maps each element to something else

So we could say that the EIP collect is a bit like fold + map. Knowing this, we can use collect to count elements and do some mapping:

  val count = (i: Int) => state((n: Int) => (n+1, ()))
val map = (i: Int) => i.toString

tree.collect[({type l[A]=State[Int, A]})#l, String](count, map).apply(0) must_==
(2, Bin(Leaf("1"), Leaf("2")))

Here again the type annotations are obscuring the intent a bit and if type inference was perfect we would just read:

  val count = (i: Int) => state((n: Int) => (n+1, ()))
val map = (i: Int) => i.toString

tree.collect(count, map).apply(0) must_== (2, Bin(Leaf("1"), Leaf("2")))

I don't know about you, but I find this a bit magical. With the Applicative and Traversable abstractions, we can assemble our program based on 2 independent functions possibly developed and tested elsewhere.

Dispersing elements

The next utility function proposed by EIP is the disperse function. Its signature is:

  def disperse[F[_] : Applicative, A, B, C](f: F[B], g: A => B => C): T[A] => F[T[C]]

What does it do?

  • f is the Applicative context that's going to evolve when we traverse the structure, but regardless of what the elements of type A are
  • g is a function which, for each element of type A says what to do with the current context value, B, and map that element back to the structure

Please, please, a concrete example!

Say I want to mark each element of a BinaryTree with its "number" in the Traversal (the "label"). Moreover I want to use the element name to be able to qualify this label:

  // a BinaryTree of Doubles
val tree: BinaryTree[Double] = Bin(Leaf(1.1), Bin(Leaf(2.2), Leaf(3.3)))

// the "label" state returning integers in sequence
val labelling: State[Int, Int] = state((n: Int) => (n+1, n+1))

// for each element in the tree, and its label,
// produce a String with the name and label
val naming: Double => Int => String = (p1: Double) => (p2: Int) => p1+" node is "+p2

// testing by applying an initial state (label `0`) and
// taking the second element of the pair `(last label, resulting tree)`
tree.disperse[elided for sanity](labelling, naming).apply(0)._2 must_==
Bin(Leaf("1.1 node is 1"), Bin(Leaf("2.2 node is 2"), Leaf("3.3 node is 3")))

Note that the naming function above is curried. A more familiar way to write it would be:

  val naming: (Double, Int) => String = (p1: Double, p2: Int) => p1+" node is "+p2

But then you would have to curry that function to be able to use it with the disperse function:

  tree.disperse[...](labelling, naming.curried)

The implementation of disperse is:

  def disperse[F[_] : Applicative, A, B, C](f: F[B], g: A => B => C) = {
val applicative = implicitly[Applicative[F]]
import applicative._

val application = (a: A) => point(g(a)) <*> f
traverse(application)
}

It is using the very capabilities of the applicative functor, the point method and the <*> application.

An overview of traversals

We've seen in the 2 examples above that we get different, specialized, versions of the traverse function by constraining how mapping and Applicative effects occur. Here's a tentative table for classifying other specialized versions of the traverse function:

function map element create state mapped depend on state state depend on element
collect X X X
disperse X X X
measure X X
traverse X X X X
reduce X X
reduceConst X
map X

The only function we haven't shown before is measure. It is mapping and accumulating state but this accumulation does not depend on the current element. Here's an example:

  val crosses = state((s: String) => (s+"x", ()))
val map = (i: Int) => i.toString

tree.measure(crosses, map).apply("") must_==
("xxx", Bin(Leaf("1"), Bin(Leaf("2"), Leaf("3"))))

Other than not looking very useful, the code above is also lying! It is not possible to have a measure function accepting a State monad without having to provide the usual ugly type annotations. So the actual example is:

  tree.measureState(crosses, map).apply("") must_== 
("xxx", Bin(Leaf("1"), Bin(Leaf("2"), Leaf("3"))))

where measureState is a specialization of the measure method to States. I think that one take-away of this post is that it might be beneficial to specialize a few generic functions in Scala , like traverse, collect... for Const and State in order to avoid type annotations.

Composing traversals

There's another axis of composition that we haven't exploited yet.

In a for loop, without thinking about it, you may write:

  for (a <- as) {
val currentSize = a.size
total += currentSize
result.add(total)
}

In the body of that for loop, you have statements with dependency on each other. In an Applicative traversal, this translates to the Sequential composition of Applicatives. From 2 Applicatives, we can create a third one which is their Sequential composition. More precisely, this means that if F1[_] and F2[_] are Applicatives then F1[F2[_]] is an Applicative as well. You want the demonstration? Ok, go.

First, we introduce a utility function on ApplicFunctors:

  def liftA2[A, B, C](function: A => B => C): F[A] => F[B] => F[C] = 
fa => applic.applic(functor.fmap(function)(fa))

liftA2 allows to lift a regular function of 2 arguments to a function working on the Applicative arguments. This is using the fact that an ApplicFunctor is a Functor so we can apply function: A => B => C to the "a in the box", to get a F[B => C] "in the box". And then, an ApplicFunctor is an Applic, so we can "apply" F[B] to get a F[C]

Armed with this function, we can write the applic method for F1[F2[_]]:

  implicit val f1ApplicFunctor = implicitly[ApplicFunctor[F1]]
implicit val f2ApplicFunctor = implicitly[ApplicFunctor[F2]]

val applic = new Applic[({type l[A]=F1[F2[A]]})#l] {
def applic[A, B](f: F1[F2[A => B]]) = (c: F1[F2[A]]) => {
f1ApplicFunctor.liftA2((ff: F2[A => B]) => f2ApplicFunctor.apply(ff))(f).apply(c)
}
}

It's not so easy to get an intuition for what the code above is doing except that saying that we're using previous definitions to allow a F1[F2[A => B]] to be applied to F1[F2[A]].

In mere mortal terms, this means that if we do an Applicative computation inside a loop and if we reuse that computation in another Applicative computation, we still get an Applicative computation. The EIP illustration of this principle is a crazy function, the assemble function.

The assemble function

The assemble function takes the shape of a Traversable and a list of elements. If there are enough elements it returns Some[Traversable] filled with all the elements (+ the reminder), otherwise it returns None (and an empty list). Let's see it in action:

        // the "shape" to fill
val shape: BinaryTree[Unit] = Bin(Leaf(()), Leaf(()))

// we assemble the tree with an exact list of elements
shape.assemble(List(1, 2)) must_== (List(), Some(Bin(Leaf(1), Leaf(2))))

// we assemble the tree with more elements
shape.assemble(List(1, 2, 3)) must_== (List(3), Some(Bin(Leaf(1), Leaf(2))))

// we assemble the tree with not enough elements
shape.assemble(List(1)) must_== (List(), None)

What's the implementation of the assemble function? The implementation uses 2 Monads (which are also Applicatives as we know now):

  • the State[List[Int], _] Monad is going to keep track of what we've already consumed
  • the Option[_] Monad is going to provide, or not, an element to put in the structure
  • the composition of those 2 monads is State[List[Int], Option[_]] (our F1[F2[_]] in the ApplicFunctor definitions above

So we just need to traverse the BinaryTree with one function:

def takeHead: State[List[B], Option[B]] = state { s: List[B] =>
s match {
case Nil => (Nil, None)
case x :: xs => (xs, Some(x))
}
}

The takeHead function is a State instance where each state application removes the first element of the list of elements if possible, and returns it in an Option.
This is why the result of the assemble function, once we apply it to a list of elements, is of type (List[Int], Option[BinaryTree[Int]]).

A recursive implementation

Just for the fun of comparison, I'm going to write a recursive version doing the same thing:

  def assemble(es: List[Int], s: BinaryTree[Unit]) : (List[Int], Option[BinaryTree[Int]]) = {
(es, s) match {
case (Nil, _) => (es, None)
case (e :: rest, Leaf(())) => (rest, Some(Leaf(e)))
case (_, Bin(left, right)) => {
assemble(es, left) match {
case (l, None) => (l, None)
case (Nil, Some(l)) => (Nil, None)
case (rest, Some(l)) => assemble(rest, right) match {
case (r, None) => (r, None)
case (finalRest, Some(r)) => (finalRest, Some(Bin(l, r)))
}
}
}
}
}
assemble(List(1, 2, 3), shape) must_== (List(3), Some(Bin(Leaf(1), Leaf(2))))

It works, but it makes my head spin!

A classical for-loop implementation

By the way, what would be the real for loop version of that functionality? That one is not so easy to come up with because AFAIK there's no easy way to iterate on a BinaryTree to get a similar BinaryTree with just a for loop! So, for the sake of the argument, we're going to do something similar with just a List structure:

  def assemble[T](es: List[T], shape: List[Unit]) = {
var elements = es
var list: Option[List[T]] = None
for (u <- shape) {
if (!elements.isEmpty) {
list match {
case None => list = Some(List(elements.first))
case Some(l) => list = Some(l :+ elements.first)
}
elements = elements.drop(1)
} else {
list = None
}
}
(elements, list)
}
assemble(List(1, 2, 3), List((), ())) must_== (List(3), Some(List(1, 2)))

Contrast and compare with:

  List((), ()).assemble(List(1, 2, 3)) must_== (List(3), Some(List(1, 2)))

where you just define List as a Traversable:

  implicit def ListIsTraversable[A]: Traversable[List] = new Traversable[List] {

def traverse[F[_] : Applicative, A, B](f: A => F[B]): List[A] => F[List[B]] =
(l: List[A]) => {
val applicative = implicitly[Applicative[F]]
l match {
case Nil => applicative.point(List[B]())
case a :: rest =>
((_:B) :: (_: List[B])).curried ∘ f(a) <*> (rest traverse f)
}
}

}

The Applicative composition is indeed very powerful, but we're going to see that there are other ways to compose functions and use them with Traversables.

Monadic composition

This paragraph is exploring the fine relationships between applicative composition and monadic composition when doing traversals. We've seen that Applicative instances can be composed and that Monads can be Applicative. But Monads can also be composed using the so-called Kleisli composition. If we have:

  val f: B => M[C]
val g: A => M[B]

Then

  val h: A => M[C] = f ∎ g // is also a function from a value to a Monad

If we have 2 "monadic" functions f and g, we can then compose them, in the Kleisli sense, and use the composed version for a traversal. Indeed we can, but does this traversal have "nice properties"? Specifically, do we have:

  traverse(f ∎ g) == traverse(f) ∎ traverse(g)

The answer is... it depends.

Monad commutativity

EIP shows that, if the Monad is commutative, then this will always be true. What is a commutative Monad you ask?

A Monad is commutative if for all mx: M[X] and my: M[Y] we have:

    val xy = for {
x <- mx
y <- my
} yield (x, y)

val yx = for {
y <- my
x <- mx
} yield (x, y)

xy == yx

This is not the case with the State Monad for example:

   val mx = state((n: Int) => (n+1, n+1))
val my = state((n: Int) => (n+1, n+1))

xy.apply(0) must_== (2, (1, 2))
yx.apply(0) must_== (2, (2, 1))

Monadic functions commutativity

Another slightly different situation is when we have a non-commutative Monad but commutative functions:

  val plus1  = (a: A) => state((n: Int) => (n+1, a))
val plus2 = (a: A) => state((n: Int) => (n+2, a))
val times2 = (a: A) => state((n: Int) => (n*2, a))

Here plus1 and times2 are not commutative:

  (0 + 1) * 2 != (0 * 2) + 1

However it is obvious that plus1 and plus2 are commutative. What does that mean when we do a traversal?

If we traverse a simple List of elements using monadic composition we get:

  • List(1, 2, 3).traverse(times2 ∎ plus1) === 22
  • List(1, 2, 3).traverse(times2) ∎ List(1, 2, 3).traverse(plus1) === 32

We get different results. However, when f and g commute we get the same result:

  • List(1, 2, 3).traverse(plus2 ∎ plus1) === 10
  • List(1, 2, 3).traverse(plus2) ∎ List(1, 2, 3).traverse(plus1) === 10

Applicative composition vs Monadic composition

Another question we can ask ourselves is: if we consider the monadic functions as applicative functions (because each Monad is Applicative), do we get the nice "distribution" property we're after? The answer is yes, even when the functions are not commutative:

  • List(1, 2, 3).traverse(times2 ⊡ plus1) === 4
  • List(1, 2, 3).traverse(times2) ⊡ List(1, 2, 3).traverse(plus1) === 4

Well... more or less. The real situation is a bit more complex. List(1, 2, 3).traverse(times2 ⊡ plus1) returns a State[Int, State[Int, List[Int]]] while the second expression returns a State[Int, List[State[Int, Int]] so what I'm hiding here is some more manipulations to be able to query the final result with some kind of join.

Conclusion

You wouldn't believe it but I've only shown here half of the ideas presented in EIP!

To finish off this post here's 3 take-away points that I've learned while writing it:

  • functional programming is also about mastering some of these higher-level control structures like Applicative. Once you master them, your toolbox expands considerably in power (just consider the assemble example)

  • Scalaz is an incredible library but somewhat obscure to the beginner. For this post I've rewritten all the typeclasses I needed to have, and lots of examples (using specs2 of course). That gave me a much better understanding of the Scalaz functionality. You may consider doing the same to learn Scalaz (my code is available on github)

  • Scala is lacking behind Haskell in terms of type inference and it's a real pain for higher-order, generic programming. This can be sometimes encapsulated away by specializing generic functions to very common types (like traverseState instead of traverse). Again, please upvote SI-2712!

Finally, I want to mention that there are many other Haskell functional pearls waiting to be transliterated to Scala. I mean, it's a shame that we don't have yet any equivalent for "Learn you a Haskell" or "Typeclassopedia" in the Scala world. I hope that my post, like this other one by Debasish Ghosh, will also contribute to bridge the gap.

21 May 2011

specs2 migration guide

Rewrite is rarely the right option

Well, except when that's the only one :-).

Before starting specs2, I did try to refactor specs. It turned out that my first design was clearly not adapted to my goals. So I eventually decided to start from a blank page, as explained here. While I wanted to keep most features I didn't try to go for a 100% backward compatibility. I tried to think to features as part of the design space and wanted to be sure I had enough design freedom (a complementary view is that implementation is part of the features space).

That being said, I know that migrating to a new API is not something that people just do for the sheer fun of it. They have to have compelling reasons for doing so.

What are the good reasons why one would like to use specs2 instead of specs?

  • concurrent execution of examples: that's one major thing that is enabled by specs2 new design. Easy and reliable

  • acceptance specifications: something which was really experimental in specs and which is now completely integrated. You can use it to create an executable User Guide for example

  • nifty features such as Auto-Examples (an example here), implicit Matchers creation or Json matchers

  • lots of small fixes and consistency changes so that writing tests/specifications is just a pleasure!

Now that you've decided to take the ride with specs2, what are the steps for a successful migration?

Just replace org.specs._ with org.specs2.mutable._

The first thing to do is to switch the base import from org.specs._ to org.specs2.mutable._. For simple specifications, with no context setup and simple equality matchers, nothing else is required.

However, depending on the specs features you've been using you'll have to change a few more things:

  1. matchers
  2. context setup
  3. ScalaCheck
  4. miscellaneous: arguments, specification title, specification inclusions, tags, ...

Most of those changes have specific motivations which I leave out of this post to keep it short. If you have questions on any given change please ask on the mailing-list.

You may also want to watch this presentation by @prasinous to learn how she did her own migration of the Salat project.

Matchers

Most of the matchers in specs2 have simply been copied over from the same matchers in specs. There are however some differences:

  • mustBe, mustNotBe and other mustXXX variants don't exist anymore. Only must_==, must_!=, mustEqual, mustNotEqual are left. Otherwise you have to write must not be(x) instead of mustNotBe

  • the matchers having not as a prefix have been removed too. You're encouraged to write must not beEmpty instead of must notBeEmpty

  • the verify matcher has been removed, so you should write f(a) must beTrue instead of a must verify(f) (or better, use ScalaCheck properties!)

  • beLike used to take a PartialFunction[T, Boolean] as an argument. It is now PartialFunction[T, MatchResult[_]] which allows better failure messages: a must beLike { case ThisThing(b) => b must be_>(0) }. If you have nothing special to assert, just return ok: a must beLike { case ThisThing(_) => ok }

  • the fail() method has been removed in favor of the simple return of a Failure object: aFailure or failure(message)

  • String matchers: ignoreCase and ignoreSpace matchers for string equality are now constraints which can be added to the beEqualTo matcher: eEqualTo(b).ignoreSpace.ignoreCase

  • Iterable matchers: only and inOrder are now constraints with can be applied to the contain matcher instead of having dedicated matchers like containInOrder

  • Option matchers: "beSomething" is now just beSome

  • xUnit assertions have been removed

There are certainly other differences, I'll keep the list updated as I'll see them.

Contexts

That's the tough part! There are 3 ways to manage contexts in specs:

  • "automagic" variables
  • before / after methods
  • system / specification contexts

All of this has actually been reduced to the direct use of Scala natural features: the easy creation of traits and case classes. And a few support traits to avoid duplication of code. You definitely should read the User Guide section on Contexts before starting your migration.

Automagic variables

This was a very cool functionality of specs but also the greatest source of bugs! In specs you can nest examples, declare variables in each scope and have those variables being automatically reset when executing each example:

  "this system" should {
val var1 = ...
"example 1" in {
val var2 = ...
"subexample 1" in { ... }
"subexample 2" in { ... }
}
"example 2" in { ... }
}

Handling those variables is a lot less magic in specs2. What's the simplest way to get new variables in Scala? Simply open a new scope by placing some code into a new object! That's it, nothing more to say:

  "this system" should {
"example 1" in new c1 {
// do something with var1
}
"example 2" in new c1 {
// do something else with a new var1
}
}
trait c1 {
val var1 = ...
}

Well, almost :-). It turns out that the body of an Example has to be something akin to a Result. In order to allow our context, and everything inside, to be a Result, we need to have our c1 trait extend the Scope trait and benefit from an implicit conversion from Scope to Result:

  import org.specs2.specification._

trait c1 extends Scope {
val var1 = ...
}

From there, having nested contexts, like the ones in the first specs example is easy. We use inheritance to create them:

  "this system" should {
"example 1" in {
"subexample 1" in new c2 { ... }
"subexample 2" in new c2 { ... }
}
"example 2" in new c1 { ... }
}
trait c1 extends Scope { val var1 = ... }
trait c2 extends c1 { val var2 = ... }
Before / After

In specs, you can run setup code as you would do with any kind of JUnit code. This is done by declaring a doBefore block inside a sus:

  "this system" should {
doBefore(cleanAll)
"example 1" { ... }
"example 2" { ... }
}

In specs2, there are several ways to do that. The first one is as simple as running that code into the Scope trait:

  "this system" should {
"example 1" in new c1 { ... }
"example 2" in new c1 { ... }
}
trait c1 extends Scope {
val var1 = ...
cleanAll
}

This works well for "before" setup but we can't easily setup any "after" behavior because we need additional machinery to make sure that the teardown code is executed even if there is a failure. This is where you can use the After trait and define the after method:

  "this system" should {
"example 1" in new c1 { ... }
"example 2" in new c1 { ... }
}
trait c1 extends Scope with After {
def after = // teardown code goes here
}

For good measure, even if it's not necessary, there is a corresponding Before trait and before method for the setup code.

Remove duplication

If you don't need any "local" variable in your contexts, but only before/after behavior, you can reduce the amount of code in the example above with the AfterExample trait (or BeforeExample for before behavior):

  class MySpec extends Specification with AfterExample {

def after = // teardown code goes here

"this system" should {
"example 1" in { ... }
"example 2" in { ... }
}
}

Yet another alternative is to use an implicit context:

  implicit val context = new Scope with After {
def after = // teardown code goes here
}

"this system" should {
"example 1" in { ... }
"example 2" in { ... }
}
BeforeSus / AfterSus / BeforeSpec / AfterSpec

Some other declarations in specs, like beforeSpec, allow to specify some setup code to be executed before all the specification examples. In specs2, the Specification is seen as a sequence of Fragments and you have to insert a Step Fragment at the appropriate place:

  step(cleanDB)
  "first example" in { ... }
  "second example" in { ... }  
  step(println("finished!")) 
Specification / sus contexts

The purpose of Contexts in specs is to be able to define and reuse a given setup/teardown procedure. As seen above, in specs2, traits extending Before or After play exactly the same role.

ScalaCheck

With specs there is a special matcher to check ScalaCheck properties. It comes in 4 forms:

  1. property must pass
  2. generator must pass(function)
  3. function must pass(generator)
  4. generator must validate(partialFunction)

In specs2 we're using the fact that the body of an Example expects anything that can be converted to a Result, so there are implicit conversions transforming ScalaCheck properties and Scala functions to Results and the examples above become:

(we suppose that the function to test has 2 parameters of type T1 and T2, and 2 implicit Arbitrary[T1] and Arbitrary[T2] instances in scope)

  1. "ex" in check { property }
  2. "ex" in check { function }
    or "ex" in check (arbitrary1, arbitrary2) { function } to be explicit about the Arbitrary instances to use
  3. no equivalent
  4. no equivalent

You can also notice that specs2 uses implicit Arbitrary instances instead of Gen instances directly but creating an Arbitrary from a Gen is easy:

  import org.scalacheck._

val arbitrary: Arbitrary[T] = Arbitrary(generator)

Miscellaneous

Arguments

This is also a part of your specifications which is likely to necessitate a change. In specs there were several ways to modify the behavior of the execution or the reporting:

All of this has been completely redesigned in specs2:

And just to be specific about what's not going to compile during your specs2 migration:

  • detailedDiffs() needs to be replaced by a diffs(...) or your own Diffs object
  • shareVariables() makes no sense because all variables are shared in specs2 unless you isolate them in Contexts
  • setSequential() is replaced with the addition of a sequential argument
DataTables

DataTables have not really changed, except for their package being org.specs2.matcher instead of org.specs.util. You may however get a few compilation errors because of the ! operator as explained here. Just replace it with !! in that case.

Specification title

In specs the specification title is a member of the Specification class whereas Specifications in specs2 are traits. If you want to specify a Specification title in specs2 you can insert it as a Fragment:

  class MySpec extends Specification { def is =
"Specification title".title ^
...
^ end
}

class MySpec extends mutable.Specification {
"Specification title".title
...
}
Include a specification in another one

This used to be done with include or isSpecifiedBy/areSpecifiedBy. In specs2 there are 3 ways to "include" other specifications with different behaviours which mostly make sense with the HmtlRunner:

If you have a "parent" specification spec1

  1. include(spec2) will include all the Fragments of spec2 into spec1 as if they were part of it

  2. link(spec2) will include all the Fragments of spec2 into spec1. When executing spec1, spec2 will be executed and the html runner will create a link to a separate page for spec2

  3. see(spec2) will include all the Fragments of spec2 into spec1. When executing spec1, spec2 will not be executed but the html runner will create a link to a separate page for spec2.
Tags

The tagging system in specs2 has been completely changed but not in way drastic way for users of the API. The major difference is that tags are positional which opens new possibilities for tagging like creating Sections.

Conclusion

There are certainly a million other small differences between your specification written with specs and what it will look like with specs2. I can only apologize in advance for the additional work, offer my best support, and hope that you'll be able to rip out more benefits and more fun of writing specifications with specs2!