A++ [Eric Torreborre's Blog]: specs

Showing posts with label specs. Show all posts

21 May 2011

specs2 migration guide

Rewrite is rarely the right option

Well, except when that's the only one :-).

Before starting specs2, I did try to refactor specs. It turned out that my first design was clearly not adapted to my goals. So I eventually decided to start from a blank page, as explained here. While I wanted to keep most features I didn't try to go for a 100% backward compatibility. I tried to think to features as part of the design space and wanted to be sure I had enough design freedom (a complementary view is that implementation is part of the features space).

That being said, I know that migrating to a new API is not something that people just do for the sheer fun of it. They have to have compelling reasons for doing so.

What are the good reasons why one would like to use specs2 instead of specs?

concurrent execution of examples: that's one major thing that is enabled by specs2 new design. Easy and reliable
acceptance specifications: something which was really experimental in specs and which is now completely integrated. You can use it to create an executable User Guide for example
nifty features such as Auto-Examples (an example here), implicit Matchers creation or Json matchers
lots of small fixes and consistency changes so that writing tests/specifications is just a pleasure!

Now that you've decided to take the ride with specs2, what are the steps for a successful migration?

Just replace org.specs._ with org.specs2.mutable._

The first thing to do is to switch the base import from org.specs._ to org.specs2.mutable._. For simple specifications, with no context setup and simple equality matchers, nothing else is required.

However, depending on the specs features you've been using you'll have to change a few more things:

matchers
context setup
ScalaCheck
miscellaneous: arguments, specification title, specification inclusions, tags, ...

Most of those changes have specific motivations which I leave out of this post to keep it short. If you have questions on any given change please ask on the mailing-list.

You may also want to watch this presentation by @prasinous to learn how she did her own migration of the Salat project.

Matchers

Most of the matchers in specs2 have simply been copied over from the same matchers in specs. There are however some differences:

mustBe, mustNotBe and other mustXXX variants don't exist anymore. Only must_==, must_!=, mustEqual, mustNotEqual are left. Otherwise you have to write must not be(x) instead of mustNotBe
the matchers having not as a prefix have been removed too. You're encouraged to write must not beEmpty instead of must notBeEmpty
the verify matcher has been removed, so you should write f(a) must beTrue instead of a must verify(f) (or better, use ScalaCheck properties!)
beLike used to take a PartialFunction[T, Boolean] as an argument. It is now PartialFunction[T, MatchResult[_]] which allows better failure messages: a must beLike { case ThisThing(b) => b must be_>(0) }. If you have nothing special to assert, just return ok: a must beLike { case ThisThing(_) => ok }
the fail() method has been removed in favor of the simple return of a Failure object: aFailure or failure(message)
String matchers: ignoreCase and ignoreSpace matchers for string equality are now constraints which can be added to the beEqualTo matcher: eEqualTo(b).ignoreSpace.ignoreCase
Iterable matchers: only and inOrder are now constraints with can be applied to the contain matcher instead of having dedicated matchers like containInOrder
Option matchers: "beSomething" is now just beSome
xUnit assertions have been removed

There are certainly other differences, I'll keep the list updated as I'll see them.

Contexts

That's the tough part! There are 3 ways to manage contexts in specs:

"automagic" variables
before / after methods
system / specification contexts

All of this has actually been reduced to the direct use of Scala natural features: the easy creation of traits and case classes. And a few support traits to avoid duplication of code. You definitely should read the User Guide section on Contexts before starting your migration.

Automagic variables

This was a very cool functionality of specs but also the greatest source of bugs! In specs you can nest examples, declare variables in each scope and have those variables being automatically reset when executing each example:

  "this system" should {
   val var1 = ... 
   "example 1" in {
     val var2 = ...
     "subexample 1" in { ... }
     "subexample 2" in { ... }
   }
   "example 2" in { ... }
 }

Handling those variables is a lot less magic in specs2. What's the simplest way to get new variables in Scala? Simply open a new scope by placing some code into a new object! That's it, nothing more to say:

  "this system" should {
   "example 1" in new c1 {
     // do something with var1
   }
   "example 2" in new c1 {
     // do something else with a new var1
   }
 }
 trait c1 {
   val var1 = ...
 }

Well, almost :-). It turns out that the body of an Example has to be something akin to a Result. In order to allow our context, and everything inside, to be a Result, we need to have our c1 trait extend the Scope trait and benefit from an implicit conversion from Scope to Result:

  import org.specs2.specification._

 trait c1 extends Scope {
   val var1 = ...
 }

From there, having nested contexts, like the ones in the first specs example is easy. We use inheritance to create them:

  "this system" should {
   "example 1" in {
     "subexample 1" in new c2 { ... }
     "subexample 2" in new c2 { ... }
   }
   "example 2" in new c1 { ... }
 }
 trait c1 extends Scope { val var1 = ... }
 trait c2 extends c1 { val var2 = ... }

Before / After

In specs, you can run setup code as you would do with any kind of JUnit code. This is done by declaring a doBefore block inside a sus:

  "this system" should {
   doBefore(cleanAll)
   "example 1" { ... }
   "example 2" { ... }
 }

In specs2, there are several ways to do that. The first one is as simple as running that code into the Scope trait:

  "this system" should {
   "example 1" in new c1 { ... }
   "example 2" in new c1 { ... }
 }
 trait c1 extends Scope {
   val var1 = ...
   cleanAll
 }

This works well for "before" setup but we can't easily setup any "after" behavior because we need additional machinery to make sure that the teardown code is executed even if there is a failure. This is where you can use the After trait and define the after method:

  "this system" should {
   "example 1" in new c1 { ... }
   "example 2" in new c1 { ... }
 }
 trait c1 extends Scope with After {
   def after = // teardown code goes here
 }

For good measure, even if it's not necessary, there is a corresponding Before trait and before method for the setup code.

Remove duplication

If you don't need any "local" variable in your contexts, but only before/after behavior, you can reduce the amount of code in the example above with the AfterExample trait (or BeforeExample for before behavior):

  class MySpec extends Specification with AfterExample {

   def after = // teardown code goes here

  "this system" should {
    "example 1" in { ... }
    "example 2" in { ... }
  }
}

Yet another alternative is to use an implicit context:

  implicit val context = new Scope with After {
   def after = // teardown code goes here
 }

 "this system" should {
   "example 1" in { ... }
   "example 2" in { ... }
 }

BeforeSus / AfterSus / BeforeSpec / AfterSpec

Some other declarations in specs, like beforeSpec, allow to specify some setup code to be executed before all the specification examples. In specs2, the Specification is seen as a sequence of Fragments and you have to insert a Step Fragment at the appropriate place:

  step(cleanDB)

  "first example" in { ... }

  "second example" in { ... }

  step(println("finished!"))

Specification / sus contexts

The purpose of Contexts in specs is to be able to define and reuse a given setup/teardown procedure. As seen above, in specs2, traits extending Before or After play exactly the same role.

ScalaCheck

With specs there is a special matcher to check ScalaCheck properties. It comes in 4 forms:

property must pass
generator must pass(function)
function must pass(generator)
generator must validate(partialFunction)

In specs2 we're using the fact that the body of an Example expects anything that can be converted to a Result, so there are implicit conversions transforming ScalaCheck properties and Scala functions to Results and the examples above become:

(we suppose that the function to test has 2 parameters of type T1 and T2, and 2 implicit Arbitrary[T1] and Arbitrary[T2] instances in scope)

"ex" in check { property }
"ex" in check { function }
or "ex" in check (arbitrary1, arbitrary2) { function } to be explicit about the Arbitrary instances to use
no equivalent
no equivalent

You can also notice that specs2 uses implicit Arbitrary instances instead of Gen instances directly but creating an Arbitrary from a Gen is easy:

  import org.scalacheck._

 val arbitrary: Arbitrary[T] = Arbitrary(generator)

Miscellaneous

Arguments

This is also a part of your specifications which is likely to necessitate a change. In specs there were several ways to modify the behavior of the execution or the reporting:

detailedDiffs()
shareVariables()
setSequential()
Command-line options: -sus, -ex, --color, -finalstats,...
Configuration objects

All of this has been completely redesigned in specs2:

all the arguments can be specified from inside the Specification
most of them can be passed on the command-line as Strings

And just to be specific about what's not going to compile during your specs2 migration:

detailedDiffs() needs to be replaced by a diffs(...) or your own Diffs object
shareVariables() makes no sense because all variables are shared in specs2 unless you isolate them in Contexts
setSequential() is replaced with the addition of a sequential argument

DataTables

DataTables have not really changed, except for their package being org.specs2.matcher instead of org.specs.util. You may however get a few compilation errors because of the ! operator as explained here. Just replace it with !! in that case.

Specification title

In specs the specification title is a member of the Specification class whereas Specifications in specs2 are traits. If you want to specify a Specification title in specs2 you can insert it as a Fragment:

  class MySpec extends Specification { def is =
   "Specification title".title ^
   ...
                              ^ end
 }

 class MySpec extends mutable.Specification {
   "Specification title".title
   ...
 }

Include a specification in another one

This used to be done with include or isSpecifiedBy/areSpecifiedBy. In specs2 there are 3 ways to "include" other specifications with different behaviours which mostly make sense with the HmtlRunner:

If you have a "parent" specification spec1

include(spec2) will include all the Fragments of spec2 into spec1 as if they were part of it

link(spec2) will include all the Fragments of spec2 into spec1. When executing spec1, spec2 will be executed and the html runner will create a link to a separate page for spec2

see(spec2) will include all the Fragments of spec2 into spec1. When executing spec1, spec2 will not be executed but the html runner will create a link to a separate page for spec2.

Tags

The tagging system in specs2 has been completely changed but not in way drastic way for users of the API. The major difference is that tags are positional which opens new possibilities for tagging like creating Sections.

Conclusion

There are certainly a million other small differences between your specification written with specs and what it will look like with specs2. I can only apologize in advance for the additional work, offer my best support, and hope that you'll be able to rip out more benefits and more fun of writing specifications with specs2!

18 February 2011

Scalacheck generators for JSON

More than 6 months without a single post, because I've been focusing on the creation of specs2, which should be out in a few weeks (if you want to access the preview drop me a line).

Among the new features of specs2 there will be JSON matchers to help those of you handling JSON data. When developing those matchers I used ScalaCheck to test some of the utility functions I was using. I want to show here that writing custom data generators with ScalaCheck is really easy and almost follows the grammar for the data.

Here's the code:


/**
 * Generator of JSONType objects with a given tree depth
 */
import util.parsing.json._
import org.scalacheck._
import Gen._

trait JsonGen {

  implicit def arbitraryJsonType: Arbitrary[JSONType] = 
    Arbitrary { sized(depth => jsonType(depth)) }
   
  /** generate either a JSONArray or a JSONObject */
  def jsonType(depth: Int): Gen[JSONType] = oneOf(jsonArray(depth), jsonObject(depth))

  /** generate a JSONArray */
  def jsonArray(depth: Int): Gen[JSONArray] = for {
    n    <- choose(1, 4)
    vals <- values(n, depth)
  } yield JSONArray(vals)

  /** generate a JSONObject */
  def jsonObject(depth: Int): Gen[JSONObject] = for {
    n    <- choose(1, 4)
    ks   <- keys(n)
    vals <- values(n, depth)
  } yield JSONObject(Map((ks zip vals):_*))

  /** generate a list of keys to be used in the map of a JSONObject */
  def keys(n: Int) = listOfN(n, oneOf("a", "b", "c"))

  /** 
   * generate a list of values to be used in the map of a JSONObject or in the list
   * of a JSONArray.
   */
  def values(n: Int, depth: Int) = listOfN(n, value(depth))

  /** 
   * generate a value to be used in the map of a JSONObject or in the list
   * of a JSONArray.
   */
  def value(depth: Int) = 
    if (depth == 0) 
      terminalType 
    else 
      oneOf(jsonType(depth - 1), terminalType)
  
  /** generate a terminal value type */
  def terminalType = oneOf(1, 2, "m", "n", "o")
}
/** import the members of that object to use the implicit arbitrary[JSONType] */
object JsonGen extends JsonGen

Two things to notice in the code above:

The generators are recursively defined, which makes sense because the JSON data format is recursive. For example a jsonArray contains values which can be a terminalType or a jsonType. But jsonType can itself be a jsonArray

The top generator used in the definition of the Arbitrary[JSONType] is using a "sized" generator. This means that we can tweak the ScalaCheck parameters to use a specific "size" for our generated data. Here I've choosen to define "size" as being the depth of the generated JSON trees. This depth parameter is propagated to all generators until the value generator. If depth is 0 when using that generator, this means that we reached the bottom of the Tree so we need a "terminal" value. Otherwise we generate another JSON object with a decremented depth.

You can certainly tweak the code above using other ScalaCheck generators to obtain more random trees (the one above is too balanced), with a broader range of values but this should get you started. It was definitely good enough for me as I spotted a bug in my code on the first run!

06 May 2010

Mini-Parsers to the rescue

An example of curryfication

I'm implementing a reasonably complex algorithm at the moment with different transformation phases. One transformation is a "curryfication" of some terms to be able to transform some expressions like:

f(a, b, c)

to

.(.(.(f, a), b), c)

Being test-compulsive, I want to be able to test that my transformation works. Unfortunately the data structures I'm dealing with are pretty verbose in my tests.

One of my specs examples was this:


"A composed expression, when curried" should {
  "be 2 successive applications of one parameter" + 
  "when there are 2 parameters to the method expression" in {
    ComposedExp(MethodEx(method), const :: arb :: Nil).curryfy must_==
      Apply(Apply(Curry(method), Curry(const)), Curry(arb))
  }
}

But Apply(Apply(Curry(method), Curry(const)), Curry(arb))) is really less readable than .(.(method, const), arb). And this can only get worse on a more complex example.

With the help of a mini-parser

So I thought that I could just write a parser to recreate a "Curried" expression from a String.

A first look at the Scala 2.8.0 Scaladoc(filter with "parser") was a bit scary. Especially because my last parser exercise was really a long time ago now.

But no, parser combinators and especially small, specific parsers like the one I wrote, are really straightforward:


object CurriedParser extends JavaTokenParsers {
  val parser: Parser[Curried] = application | constant
  val constant = ident ^^ { s => Curry(s) }
  val application = (".(" ~> parser) ~ (", " ~> constant <~ ")") ^^ { case a ~ b => 
    Apply(a, b) 
  }    
  def fromString(s: String): Curried = parser.apply(new CharSequenceReader(s)).get
}

In the snippet above I declare that:

my parser returns a Curried object

my parser is either an application or a constant (with the | operator)

a constant is a Java identifier (ident is a parser inherited from the JavaTokenParsers trait)

a constant should be transformed to a Curry object

an application is composed of two parts (separated by the central ~ operator). The first part is: some syntax ("(") and the result of something to parse recursively with the parser (i.e. an application or a constant). The second part is a constant, surrounded by some syntax (", " and ")"). Note that the syntactic elements are being discarded by using ~> and <~ instead of ~.

once an application is parsed, part 1 and part 2 are accessible as matchable object of the form a ~ b and this object can be used to build an Apply object

the fromString method simply passes a String to the parser and gets the result

I have to say that this parser is really rudimentary. It doesn't handle errors in the input text, there absolutely needs to be a space after the comma, and so on.

Yet it really fits my testing purpose for a minimum amount of development time.

Open question

I hope that this post can serve as an example to anyone new to Scala wanting to play with parsers and I leave an open question for senior Scala developers:

Is there a way to extends the case classes representing algebraic datatypes in Scala so that:

each case class has a proper toString representation (that's already the case and that's one benefit of case classes), but that representation can be overriden (for example to replace Apply(x, y) with .(x, y))

there is an implicit parser that is able to reconstruct the hierarchy of objects from its string representation

09 June 2008

Edit distance in Scala

How handy.

I was just looking for a way to highlight differences between 2 strings for my Behavior-Driven Development library, specs. And reading the excellent Algorithm Design Manual. Intuitively, I was thinking that there was a better way to show string differences than the one used in jUnit:

Expected kit<t>en but was kit<ch>en

Expected <skate> but was <kite>

The first message shows that <t> has been replaced by <ch>, while the second message says that everything is different! There must be a way to find the minimal set of operations necessary to transform one string to another, right?

The Levenshtein distance

There is indeed. The "Edit" distance, also called "Levenshtein" distance, computes exactly this, the minimal number of insertions, deletions and substitutions required to transform "World" into "Peace" (5, they're very far apart,...).

The computation is usually done by:

examining the first 2 or last 2 letters of each string
assuming the eventual transformation which may occur for those 2 letters:

they're equal: "h..." and "h...." --> distance = 0
the first one has been removed: "ah..." and "h..." --> distance = 1
the second one has been inserted: "h..." and "ah..." --> distance = 1
they've been substituted: "ha..." and "ga..." --> distance = 1

3. saying that the final distance is the minimum of the distance when choosing one the

4 possibilities + the distance resulting from the consequences of that choice

You can notice that many variations are possible: the distance could be different depending on the letters being added or removed, there could be more allowed operations such as the swapping of 2 letters (distance 1 instead of 2 substitutions of distance 2),...

Then the final result can either be constructed incrementally or recursively. Incrementally, for "skate" and "kite" you will compute costs from any small prefix to another prefix:

"s" to "k" --> distance 1
"s" to "ki" --> distance 2
"sk" to "ki" --> distance 2
"ska" to "kit" --> distance 3

Then you increase the prefix size and you reuse the distance numbers found for smaller prefixes. The different results can be stored in the following matrix:

s k a t e

k 1 1 2 3 4
i 2 2 2 3 4
t 3 3 3 2 3
e 4 4 4 4 2

Recursively, you do the inverse and you establish that the distance between 2 strings can be computed from knowing the distance between smaller prefixes and you travel the matrix to its upper left corner.

This is a very good example of Dynamic Programming, where the optimal quantity (the distance at [i, j] in the matrix) you're looking for only depends on:

the optimal quantity for a subset of the data (the distance at [i-1, j-1], [i-1, j], [i, j-1])
the resulting state (the substrings defined by a position [i-x, j-x] in the matrix)
not how you got to that resulting state

The EditDistance trait

For the record, I will write here the code I'm using for the implementation, which is constructing the solution incrementally, as opposed to the recursive solution presented on Tony's blog, here.

The few interesting things to notice about this code are:

the algorithm itself ("initializing the matrix"), which is very straightforward
the minimum function which is a bit strange because it is not comparing all alternatives. The cost for an insertion is only computed if the cost for a suppression is not better than the cost for a substitution. I would like to work out a formal proof on why this is correct by my intuition tells me that a substitution is generally a "good" operation. It has the same cost as an insertion or a suppression but processes 2 letters at the time. So if we find a better cost using a suppression, it is going to be the best
the matrix traversal to retrieve one of the possible paths showing the letters which needs to be transformed for each string: not so fun code with a lot of corner cases
the "separators" feature which allows to select the separators of your choice to display the string differences


trait EditDistance {
  /**
   * Class encapsulating the functions related to the edit distance of 2 strings
   */
  case class EditMatrix(s1: String, s2: String) {
    /* matrix containing the edit distance for any prefix of s1 and s2: 
       matrix(i)(j) = edit distance(s1[0..i], s[0..j])*/
    val matrix = new Array[Array[int]](s1.length + 1, s2.length + 1)

    /* initializing the matrix */
    for (i <- 0 to s1.length;
         j <- 0 to s2.length) {
      if (i == 0) matrix(i)(j) = j // j insertions
      else if (j == 0) matrix(i)(j) = i  // i suppressions
      else matrix(i)(j) = min(matrix(i - 1)(j) + 1, // suppression
                              matrix(i - 1)(j - 1) + (if (s1(i - 1) == s2(j - 1)) 0 
                                                           else 1), // substitution
                              matrix(i)(j - 1) + 1) // insertion
   
    }
    /** @return the edit distance between 2 strings */
    def distance = matrix(s1.length)(s2.length)

    /** prints the edit matrix of 2 strings */
    def print = {
      for (i <- 0 to s1.length) {
        def row = for (j <- 0 to s2.length) yield matrix(i)(j)
        println(row.mkString("|"))
      }
      this
    }

    /** @return a (String, String) displaying the differences between each 
         input strings. The used separators are parenthesis: '(' and ')'*/
    def showDistance: (String, String) = showDistance("()")

    /**
     * @param sep separators used to hightlight differences. If sep is empty,
     * then no separator is used. If sep contains one character, it is 
     * taken as the unique separator. If sep contains 2 or more characters, 
     * the first 2 characters are taken as opening separator and closing 
     * separator.
     *
     * @return a (String, String) displaying the differences between each 
     *  input strings. The used separators are specified by the caller.
     */
    def showDistance(sep: String) = {
      val (firstSeparator, secondSeparator) = separators(sep)
      def modify(s: String, c: Char): String = modifyString(s, c.toString)
      def modifyString(s: String, mod: String): String = { 
         (firstSeparator + mod + secondSeparator + s).
         removeAll(secondSeparator + firstSeparator)
       }
      def findOperations(dist: Int, i: Int, j:Int, s1mod: String, s2mod: String): (String, String) = {
        if (i == 0 && j == 0) {
            ("", "")
        }
        else if (i == 1 && j == 1) {
            if (dist == 0) (s1(0) + s1mod, s2(0) + s2mod)
          else (modify(s1mod, s1(0)), modify(s2mod, s2(0)))
        }
        else if (j < 1) (modifyString(s1mod, s1.slice(0, i)), s2mod)
        else if (i < 1) (s1mod, modifyString(s2mod, s2.slice(0, j)))
        else {
          val (suppr, subst, ins) = (matrix(i - 1)(j), matrix(i - 1)(j - 1), 
                                           matrix(i)(j - 1))  
          if (suppr < subst)
            findOperations(suppr, i - 1, j, modify(s1mod, s1(i - 1)), s2mod)
          else if (ins < subst)
            findOperations(ins, i, j - 1, s1mod, modify(s2mod, s2(j - 1)))
          else if (subst < dist)
            findOperations(subst, i - 1, j - 1, modify(s1mod, s1(i - 1)), 
                                           modify(s2mod, s2(j - 1)))
          else
            findOperations(subst, i - 1, j - 1, s1(i - 1) + s1mod, s2(j - 1) + s2mod)
        }
      }
      findOperations(distance, s1.length, s2.length, "", "")
    }
    def min(suppr: Int, subst: Int, ins: =>Int) = {
      if(suppr < subst) suppr
      else if (ins < subst) ins
      else subst
    }
  }
  def editDistance(s1: String, s2: String): Int = EditMatrix(s1, s2).distance
  def showMatrix(s1: String, s2: String) = EditMatrix(s1, s2).print
  def showDistance(s1: String, s2: String) = EditMatrix(s1, s2).showDistance
  def showDistance(s1: String, s2: String, sep: String) = EditMatrix(s1, s2).showDistance(sep)

  private def separators(s: String) = (firstSeparator(s), secondSeparator(s))
  private def firstSeparator(s: String) = if (s.isEmpty) "" else s(0).toString
  private def secondSeparator(s: String) = {
    if (s.size < 2) firstSeparator(s)  else s(1).toString
  }
}

Conclusion

What's the conclusion? Computer science is useful of course, but you only recognize it once you know it!

13 March 2008

PDD: Properties Driven Development

Quick quizz: 1 + 2 =? and 1 + 2 + 3 =? and 1 + 2 + 3 + 4 =?

In other words, what is the result of the sumN function, which sums all integers from 1 to n?

If we had to use a "direct" (some would say "naive") TDD approach, we could come with the following code:
[Warning!! The rest of the post assumes that you have some basic knowledge of Scala, specs and Scalacheck,... ]

"the sumN function should" {
  def sumN(n: Int) = (1 to n) reduceLeft((a:Int, b:Int) => a + b)
  "return 1 when summing from 1 to 1" in {
    sumN(1) must_== 1
  }
  "return 2 when summing from 1 to 2" in {
   sumN(2) must_== 3
  }
  "return 6 when summing from 1 to 3" in {
    sumN(3) must_== 6
  }
}

But if we browse our mental book of "mathematical recipes" we remember that

sumN(n) = n * (n + 1) / 2

This is actually a much more interesting property to test and Scalacheck helps us in checking that:



"the sumN function should" {
  def sumN(n: Int) = (1 to n) reduceLeft((a:Int, b:Int) => a + b)
  "return n(n+1)/2 when summing from 1 to n" in {
    val sumNInvariant = (n: Int) => sumN(n) == n * (n + 1) / 2
    property(sumNInvariant) must pass
  }
}

Even better, Scalacheck has no reason to assume that n is strictly positive! So it quickly fails on n == -1 and a better implementation is:



"the sumN function should" {
  def sumN(n: Int) = {
     assume(n >= 0) // will throw an IllegalArgumentException if the constraint is violated
     (1 to n) reduceLeft((a:Int, b:Int) => a + b)
  }
  "return n(n+1)/2 when summing from 1 to n" in {
     val sumNInvariant = (n: Int) => n <= 0 || sumN(n) == n * (n + 1) / 2
    property(sumNInvariant) must pass          
  }   
}

This will be ok and tested for a large number of values for n. Using properties is indeed quite powerful. Recreation time! You can now have a look at this movie, where Simon Peyton-Jones shows that Quickcheck (the Haskell ancestor of Scalacheck) detects interesting defaults in a bit packing algorithm.

Fine, fine, but honestly, all those examples look very academic: graph algorithms, mathematical formulas, bits packing,... Can we apply this kind of approach to our mundane, day-to-day development? Tax accounting, DVD rentals, social websites?

I am going to take 2 small examples from my daily work and see how PDD could be used [yes, YAA, Yet Another Acronym,... PDD stands for Properties Driven Development (and not that PDD)].

From the lift framework (courtesy of Jamie Webb): a 'camelCase' function which transforms underscored names to CamelCase names
From my company software: some pricer extension code for Swaps where fees values have to be subtracted from the NPV (Net Present Value) under certain conditions

I'll develop the examples first and try to draw conclusions latter on.

Example 1: camelCase

So what are the properties we can establish for that first example? Can we describe it informally first?

The camelCase function should CamelCase a name which is under_scored, removing each underscore and capitalizing the next letter.

Try this at home, this may not be so easy! Here is my proposal:



def previousCharIsUnderscore(name: String, i: Int) = i > 1 && name.charAt(i - 1) == '_'
def underscoresNumber(name: String, i: Int) = {
  if (i == 0) 0
  else name.substring(0, i).toList.count(_ == '_')
}
def indexInCamelCased(name: String, i: Int) = i - underscoresNumber(name, i)
def charInCamelCased(n: String, i: Int) = camelCase(n).charAt(indexInCamelCased(n, i))

val doesntContainUnderscores = property((name: String) => !camelCase(name).contains('_'))

val isCamelCased = property ((name: String) => {
  name.forall(_ == '_') && camelCase(name).isEmpty ||
  name.toList.zipWithIndex.forall { case (c, i) =>
    c == '_' ||
    indexInCamelCased(name, i) == 0 && charInCamelCased(name, i) == c.toUpperCase ||
    !previousCharIsUnderscore(name, i) && charInCamelCased(name, i) == c ||
    previousCharIsUnderscore(name, i) && charInCamelCased(name, i) == c.toUpperCase
  }
})
doesntContainUnderscores && isCamelCased must pass

This property says that:

the CamelCased name must not contain underscores anymore
if the name contains only underscores, then the CamelCased name must be empty
for each letter in the original name, either:
- it is an underscore
- it is the first letter after some underscores, then it becomes the first letter of the CamelCased word and should be uppercased
- the previous character isn't an underscore, so it should be unchanged
- the previous character is an underscore, so the letter should be uppercased

Before running Scalacheck, we also need to create a string generator with some underscores:



implicit def underscoredString: Arbitrary[String] = new Arbitrary[String] {
def arbitrary = for { length <- choose(0, 5)                                                  string <-   vectorOf(length, frequency((4, alphaNumChar),                                                                (1, elements('_'))))                       } yield List.toString(string)  }

This works and in the process of working on the properties I observed that:

the full specification for CamelCasing name is not so easy!

it is not trivial to relate the resulting name to its original. I had to play with indices and the number of underscores to be able to relate characters before and after. However, if that code is in place, the testing code is almost only 1 line per property to check

the properties above specify unambiguously the function. I could also have specify weaker properties with less code, by not avoiding to specify that some letters should be unchanged or that the CamelCased name contains an uppercased letter without checking its position.

Example 2: Pricer extension

The logic for this extension is:

to have the NPV (NetPresentValue) being calculated by the Parent pricer
to collect all fees labeled "UNDERLYING PREMIUM" for that trade
to subtract the fee value from the NPV if the valuation date for the trade is >= the fee settlement date
to apply step 3 only if a pricing parameter named "INCLUDE_FEES "is set to true, while another pricing parameter "NPV_INCLUDE_CASH" is set to false

This looks certainly like a lot of jargon for most of you but I guess that it is pretty close to a lot of "Business" requirements. What would a property for those requirements look like (in pseudo Scala code)?



(originalNPV, fees, valuation date, pricing parameters) =>

if (!INCLUDE_FEES ||  NPV_INCLUDE_CASH)
  newNPV == originalNPV
else
  newNPV == originalNPV - fees.reduceLeft(0) { (fee, result) => result +
              if (fee.isUnderlyingPremium && fee.settlementDate <= valuationDate)      
                fee.getValue    
              else      
                0
            }

The most remarkable thing about this property is that it looks very close to the actual implementation. On the other hand, Scalacheck will be able to generate a lot of test cases:

an empty fee list
a list with no underlying premium fee
a list with a fee which settle date is superior to the valuation date
a list with a fees which settle date is inferior to the valuation date
the 4 possible combinations for the values of the pricing parameters

You can also notice that I use as the first parameter the originalNPV, which doesn't directly come from a generator but which would be the result of the original pricer with the other generated parameters (fees, valuation date, pricing parameters).

Conclusion

As a conclusion, and at the light of the 2 previous examples, I would like to enumerate the result of my recent experiments with Properties-Driven-Development:

First of all is that PDD is TDD, on steroids. In PDD, we also have data and assertions but data are generated and assertions are more general.

I don't believe that this replaces traditional TDD in all situations. There are situations where generating even 4 or 5 cases manually is easier and faster. Especially when we consider that making an exact oracle (the savant word for verification of test expectation) is sometimes tedious as in the camelCase function. In that situation developping the cases manually using the == method would have been much faster

PDD on the other hand allow the specify very clearly what is the rule. This is something that you would have to infer reading several examples when using TDD

On the other hand having several examples also facilitate the understanding of what's going on. "foo_bar" becomes "FooBar" is more easy to catch than "if a letter is preceded by,..."

PDD is very good at generating data you wouldn't think of: empty list, negative numbers, a string with underscores only,...

A tip: sometimes, it is useful to include in the generated parameters the result returned by the function you want to test. For example, in the second example, my parameters could be: (originalNPV, newNPV, fees, valuation date, pricing parameters). That way, when Scalacheck reports an error, it also reports the actual value you got when showing a counter-example

Sometimes the properties you want to check will almost mimic the implementation (as in example 2). I think that this is may be very often the case with business code if written properly or that this may show that your code is missing a key abstraction

It really gets some time to get your head wrap around finding properties. And soon you'll start thinking things like: "I know that property A, B and C characterize my function, but are they sufficient?" and you realize that you coming close to Programming == Theorem Proving

As a summary, PDD is not the long awaited Silver Bullet (sorry ;-) ,...) but it is indeed a wonderful tool to have in your toolbox. It will help you test much more thoroughly your programs while seeing them yet another way.

25 February 2008

Better mocks with jMock (and specs)

Now, I don't know how I managed to do without it. Now, I wonder how everyone can do without it. Now, I think that nothing big can be done without it.

I'm not advertising a brand new toy, but I'm talking about mock objects. Whether they are real mocks or just stubs, it is almost impossible to unit test Java components without them. Not that you can't test them but if you really want to isolate a piece of code, mocks show up one way or the other.

One of the best libraries for mock objects in Java is jMock. Yet, Java's verbosity makes it sometimes difficult to understand the intention of the mock expectations. Enter now jMocks with Scala!

Some statistics about my blog

Let's say I want to write another front-end to publish my posts to Blogger. I have encapsulated all Blogger functionalities in a Scala trait:


trait Blogger {
  def allPosts: List[Post]
  def todayPosts: List[Post]
  def post(p: Post, tags: List[Tag]): Unit
...
}

Now I want to test a "Statistics" component which will compute some stats about my posts:


object statsSpecification extends Specification with JMocker {
  "a statistics component" should {
    "return the number of posts for today" in {
      val blogger = mock(classOf[Blogger])
      val stats = new Statistics(blogger)
      expect {
        one(blogger).todayPosts will returnValue(List(Post("...")))
      }
      stats.numberOfPostsForToday
    }
  }
}
class Statistics(blogger: Blogger) {
  def numberOfPostsForToday: Int = blogger.todayPosts.size
}

In that short specification we:

create a mock: blogger = mock(classOf[Blogger]). I would have preferred to write blogger = mock[Blogger] but there is no way in Scala to create an object from its type only

Add an expectation in the expect block. Again here the loan pattern makes things a lot clearer than the corresponding "Double-brace block" in Java (even if it is a clever java trick!).

Specify what the return value should be in the same expression by defining "will" as Scala infix operator. In the Java equivalent we would have to make a separate method call (which our favorite IDE may insist on putting on the next line!)
one(blogger).todayPosts; will(returnValue(List(Post("..."))))

Pushing further with nested expectations

There is also a situation where using Scala and jMock could be a real win.
[What follows is extracted from the specs Wiki, talk about reusability!]

You need to mock an object, like a Connection, which is supposed to give you access to a service, that you also want to mock and so on. For example, testing some code accessing the Eclipse platform can be very difficult for that reason.

Using specs you can use blocks to specify nested expectations:


// A workspace gives access to a project and a project to a module
case class Module(name: String)
case class Project(module: Module, name: String)
case class Workspace(project: Project)
val workspace = mock(classOf[Workspace])

expect {
  one(workspace).project.willReturn(classOf[Project]) {p: Project =>
      // nested expectations on project
      one(p).name willReturn "hi"
      one(p).module.willReturn(classOf[Module]) {m: Module =>
        // nested expectation on module
        one(m).name willReturn "module"}
  }
}


// a workspace is a list of projects
case class Project(name: String)
case class Workspace(projects: List[Project])
val workspace = mock(classOf[Workspace])
expect {
  // the workspace will return project mocks with different expectations
  one(workspace).projects willReturnIterable(classOf[Project],
         {p: Project => one(p).name willReturn "p1" },
         {p: Project => one(p).name willReturn "p2" })
}

I haven't yet tested this capability on a real project but I clearly remember having had that kind of requirement.

I hope that this short post can make you feel that using mocks can be easy and elegant, especially if you use them with Scala! (and specs,....)

PS: Thanks again to Lalit Pant for showing the way with Scala and jMock

15 January 2008

Better unit tests with ScalaCheck (and specs)

Writing unit tests can seem tedious sometimes.

Some people tell you: "Hey, don't write unit tests only! Do Test-Driven Development". You write the test first, then the code for it. This way:

you end-up writing only the code that's necessary to deliver some concrete value for your customer/user
you drive the design of your system
you add frequent refactorings to the mixture to ensure your code stays clean

Or even better, "Do Behaviour-Driven Development!". With BDD, you get nice executable specifications for your system, which can almost read like English.

While I fully adhere to the above principles, I also think that there is a continuum between specifications and tests. And at the end of this continuum, it's all about testing that your software works. Even given silly inputs. Your unit tests should provide that kind of coverage.

And it's not that easy. A single line of code can go wrong in so many different ways. Try copying a file. Here comes ScalaCheck to the rescue!

Introducing ScalaCheck

Using ScalaCheck, you define:

properties which should always be true
random data to exercise the property

and ScalaCheck generates the test cases for you. Isn't it great?

Let's take a concrete example to illustrate this, because I feel I almost lost my only reader here (thanks bro, you're a real brother).

If you want, you Can

Last week I started specifying and testing the famous Can class from the lift framework. The Can class is the Option class from Scala library, on steroids. [To Scala newcomers: there are many good posts on Option, Maybe (in Haskell), Either and all this monad folkore but I will send you to a concrete example here].

Basically, a Can is either Empty (it contains nothing) or Full (it contains a value). This is a fairly common situation in software or elsewhere: the user with name "Smith" exists in the database (Full) or not (Empty), I got the power (Full) or I haven't (Empty).

When a Can is empty, it can be enhanced with an error message explaining why it is empty. In that case, it will be a Failure object.

Now, if you want to test an "equals" method working for all different cases you have to specify a lot of test cases:

2 Full objects which are equal
2 Full objects which are not equal
2 Empty objects which are equal
2 Empty objects which not equal
2 Failure objects which are equal
2 Failure objects which not equal
A Full object and an Empty object (not equal)
A Full object and an Failure object (not equal)
A Failure object and an Empty object (not equal)

When I said it could be tedious,... And I'm even simplifying the situation since Failures can be chained, optionally contain an Exception, etc,...

Properties

Here is the solution, implemented using specs and ScalaCheck, with the support of Rickard Nillson, author of the ScalaCheck project:


object CanUnit extends Specification with CanGen {
  "A Can equals method" should {
    "return true when comparing two identical Can messages" in {
      val equality = (c1: Can[Int], c2: Can[Int]) => (c1, c2) match {
        case (Empty, Empty) => c1 == c2
        case (Full(x), Full(y)) => (c1 == c2) == (x == y)
        case (Failure(m1, e1, l1), 
              Failure(m2, e2, l2)) => (c1 == c2) == ((m1, e1, l1) == (m2, e2, l2))
        case _ => c1 != c2
      }
      property(equality) must pass
    }
  }
}

How does it read?

"equality" is a function taking 2 Cans. Then, depending on the Can type, it says that the result from calling the equals method on the Can class should be equivalent to calling equals on the content of the Can if it is a Full Can for instance.

Create a "property" with this function and declare that the property must pass. That's all.

Well, you may want to have a look at what's generated. Add the display parameter:


import org.specs.matcher.ScalacheckParameters._
...
property(equality) must pass(display)

Then you should see in the console:


....
Tested: List(Arg(,Failure(cn,Full(net.liftweb.util.CanGen$$anon$0$UserException),List()),0),... Tested: ...
Tested: ...
....
+ OK, passed 100 tests.

And if one test fails:


A Can equals method should
  x return true when comparing two identical Can messages
    A counter-example is 'Full(0)' (after 1 try) (CanUnit.scala line 21)

But you may have, at this point, the following nagging question: "Where does all this test Data come from?". Let's have a look below.

Generating data

Data generators are defined "implicitly". You define a function which is able to generate random data and you mark it as "implicit". When ScalaCheck tries to generate a given of object, it's looking for any implicit definition providing this. Like:

implicit def genCan[T](dummy: Arb[Can[T]])
                      (implicit a: Arb[T] => Arbitrary[T]) = new Arbitrary[Can[T]] {
 def getArbitrary = frequency(
   (3, value(Empty)),
   (3, arbitrary[T].map(Full[T])),
   (1, genFailureCan)
 )
}

This code says that generating a Can, optionally full of an element of type T, which has its own implicit Arbitrary generator, is like choosing between:

an Empty object, 3 times out of 7
an arbitrary object of type T, put in a Full object, 3 times out of 7
a Failure object (which has its own way of being generated via another function), 1 time out of 7

[The "dummy" parameter is here to help Scala type inferencer, AFAIK. The world is not perfect, I know]

Here is the Failure generator, which make heavy use of ScalaCheck predefined generation functions:

def genFailureCan: Gen[Failure] = for {
  msgLen <- choose(0, 4)        
  msg <- vectorOf(msgLen, alphaChar)        
  exception <- arbitrary[Can[Throwable]]        
  chainLen <- choose(1, 5)        
  chain <- frequency((1, vectorOf(chainLen, genFailureCan)), (3, value(Nil)))}          yield Failure(msg.mkString, exception, chain.toList)

In the above method,

choose returns a random int number inside a range
vectorOf returns a collection of arbitrary object, with a specified length
alphaChar returns an arbitrary alphanumeric character
arbitrary[Can[Throwable]] returns an arbitrary Can, making all this highly recursive!

Random thoughts

I hope this sparked some interest in trying to use ScalaCheck and specs to define real thorough unit tests on your system.

The added value is similar to BDD, you will see "properties" emerge and this will have a better chance at producing rock-solid software.

From now on, you too can be a ScalaCheck man! (see lesson 4)

Pages