A++ [Eric Torreborre's Blog]: scalacheck

Showing posts with label scalacheck. Show all posts

18 February 2011

Scalacheck generators for JSON

More than 6 months without a single post, because I've been focusing on the creation of specs2, which should be out in a few weeks (if you want to access the preview drop me a line).

Among the new features of specs2 there will be JSON matchers to help those of you handling JSON data. When developing those matchers I used ScalaCheck to test some of the utility functions I was using. I want to show here that writing custom data generators with ScalaCheck is really easy and almost follows the grammar for the data.

Here's the code:


/**
 * Generator of JSONType objects with a given tree depth
 */
import util.parsing.json._
import org.scalacheck._
import Gen._

trait JsonGen {

  implicit def arbitraryJsonType: Arbitrary[JSONType] = 
    Arbitrary { sized(depth => jsonType(depth)) }
   
  /** generate either a JSONArray or a JSONObject */
  def jsonType(depth: Int): Gen[JSONType] = oneOf(jsonArray(depth), jsonObject(depth))

  /** generate a JSONArray */
  def jsonArray(depth: Int): Gen[JSONArray] = for {
    n    <- choose(1, 4)
    vals <- values(n, depth)
  } yield JSONArray(vals)

  /** generate a JSONObject */
  def jsonObject(depth: Int): Gen[JSONObject] = for {
    n    <- choose(1, 4)
    ks   <- keys(n)
    vals <- values(n, depth)
  } yield JSONObject(Map((ks zip vals):_*))

  /** generate a list of keys to be used in the map of a JSONObject */
  def keys(n: Int) = listOfN(n, oneOf("a", "b", "c"))

  /** 
   * generate a list of values to be used in the map of a JSONObject or in the list
   * of a JSONArray.
   */
  def values(n: Int, depth: Int) = listOfN(n, value(depth))

  /** 
   * generate a value to be used in the map of a JSONObject or in the list
   * of a JSONArray.
   */
  def value(depth: Int) = 
    if (depth == 0) 
      terminalType 
    else 
      oneOf(jsonType(depth - 1), terminalType)
  
  /** generate a terminal value type */
  def terminalType = oneOf(1, 2, "m", "n", "o")
}
/** import the members of that object to use the implicit arbitrary[JSONType] */
object JsonGen extends JsonGen

Two things to notice in the code above:

The generators are recursively defined, which makes sense because the JSON data format is recursive. For example a jsonArray contains values which can be a terminalType or a jsonType. But jsonType can itself be a jsonArray

The top generator used in the definition of the Arbitrary[JSONType] is using a "sized" generator. This means that we can tweak the ScalaCheck parameters to use a specific "size" for our generated data. Here I've choosen to define "size" as being the depth of the generated JSON trees. This depth parameter is propagated to all generators until the value generator. If depth is 0 when using that generator, this means that we reached the bottom of the Tree so we need a "terminal" value. Otherwise we generate another JSON object with a decremented depth.

You can certainly tweak the code above using other ScalaCheck generators to obtain more random trees (the one above is too balanced), with a broader range of values but this should get you started. It was definitely good enough for me as I spotted a bug in my code on the first run!

13 March 2008

PDD: Properties Driven Development

Quick quizz: 1 + 2 =? and 1 + 2 + 3 =? and 1 + 2 + 3 + 4 =?

In other words, what is the result of the sumN function, which sums all integers from 1 to n?

If we had to use a "direct" (some would say "naive") TDD approach, we could come with the following code:
[Warning!! The rest of the post assumes that you have some basic knowledge of Scala, specs and Scalacheck,... ]

"the sumN function should" {
  def sumN(n: Int) = (1 to n) reduceLeft((a:Int, b:Int) => a + b)
  "return 1 when summing from 1 to 1" in {
    sumN(1) must_== 1
  }
  "return 2 when summing from 1 to 2" in {
   sumN(2) must_== 3
  }
  "return 6 when summing from 1 to 3" in {
    sumN(3) must_== 6
  }
}

But if we browse our mental book of "mathematical recipes" we remember that

sumN(n) = n * (n + 1) / 2

This is actually a much more interesting property to test and Scalacheck helps us in checking that:



"the sumN function should" {
  def sumN(n: Int) = (1 to n) reduceLeft((a:Int, b:Int) => a + b)
  "return n(n+1)/2 when summing from 1 to n" in {
    val sumNInvariant = (n: Int) => sumN(n) == n * (n + 1) / 2
    property(sumNInvariant) must pass
  }
}

Even better, Scalacheck has no reason to assume that n is strictly positive! So it quickly fails on n == -1 and a better implementation is:



"the sumN function should" {
  def sumN(n: Int) = {
     assume(n >= 0) // will throw an IllegalArgumentException if the constraint is violated
     (1 to n) reduceLeft((a:Int, b:Int) => a + b)
  }
  "return n(n+1)/2 when summing from 1 to n" in {
     val sumNInvariant = (n: Int) => n <= 0 || sumN(n) == n * (n + 1) / 2
    property(sumNInvariant) must pass          
  }   
}

This will be ok and tested for a large number of values for n. Using properties is indeed quite powerful. Recreation time! You can now have a look at this movie, where Simon Peyton-Jones shows that Quickcheck (the Haskell ancestor of Scalacheck) detects interesting defaults in a bit packing algorithm.

Fine, fine, but honestly, all those examples look very academic: graph algorithms, mathematical formulas, bits packing,... Can we apply this kind of approach to our mundane, day-to-day development? Tax accounting, DVD rentals, social websites?

I am going to take 2 small examples from my daily work and see how PDD could be used [yes, YAA, Yet Another Acronym,... PDD stands for Properties Driven Development (and not that PDD)].

From the lift framework (courtesy of Jamie Webb): a 'camelCase' function which transforms underscored names to CamelCase names
From my company software: some pricer extension code for Swaps where fees values have to be subtracted from the NPV (Net Present Value) under certain conditions

I'll develop the examples first and try to draw conclusions latter on.

Example 1: camelCase

So what are the properties we can establish for that first example? Can we describe it informally first?

The camelCase function should CamelCase a name which is under_scored, removing each underscore and capitalizing the next letter.

Try this at home, this may not be so easy! Here is my proposal:



def previousCharIsUnderscore(name: String, i: Int) = i > 1 && name.charAt(i - 1) == '_'
def underscoresNumber(name: String, i: Int) = {
  if (i == 0) 0
  else name.substring(0, i).toList.count(_ == '_')
}
def indexInCamelCased(name: String, i: Int) = i - underscoresNumber(name, i)
def charInCamelCased(n: String, i: Int) = camelCase(n).charAt(indexInCamelCased(n, i))

val doesntContainUnderscores = property((name: String) => !camelCase(name).contains('_'))

val isCamelCased = property ((name: String) => {
  name.forall(_ == '_') && camelCase(name).isEmpty ||
  name.toList.zipWithIndex.forall { case (c, i) =>
    c == '_' ||
    indexInCamelCased(name, i) == 0 && charInCamelCased(name, i) == c.toUpperCase ||
    !previousCharIsUnderscore(name, i) && charInCamelCased(name, i) == c ||
    previousCharIsUnderscore(name, i) && charInCamelCased(name, i) == c.toUpperCase
  }
})
doesntContainUnderscores && isCamelCased must pass

This property says that:

the CamelCased name must not contain underscores anymore
if the name contains only underscores, then the CamelCased name must be empty
for each letter in the original name, either:
- it is an underscore
- it is the first letter after some underscores, then it becomes the first letter of the CamelCased word and should be uppercased
- the previous character isn't an underscore, so it should be unchanged
- the previous character is an underscore, so the letter should be uppercased

Before running Scalacheck, we also need to create a string generator with some underscores:



implicit def underscoredString: Arbitrary[String] = new Arbitrary[String] {
def arbitrary = for { length <- choose(0, 5)                                                  string <-   vectorOf(length, frequency((4, alphaNumChar),                                                                (1, elements('_'))))                       } yield List.toString(string)  }

This works and in the process of working on the properties I observed that:

the full specification for CamelCasing name is not so easy!

it is not trivial to relate the resulting name to its original. I had to play with indices and the number of underscores to be able to relate characters before and after. However, if that code is in place, the testing code is almost only 1 line per property to check

the properties above specify unambiguously the function. I could also have specify weaker properties with less code, by not avoiding to specify that some letters should be unchanged or that the CamelCased name contains an uppercased letter without checking its position.

Example 2: Pricer extension

The logic for this extension is:

to have the NPV (NetPresentValue) being calculated by the Parent pricer
to collect all fees labeled "UNDERLYING PREMIUM" for that trade
to subtract the fee value from the NPV if the valuation date for the trade is >= the fee settlement date
to apply step 3 only if a pricing parameter named "INCLUDE_FEES "is set to true, while another pricing parameter "NPV_INCLUDE_CASH" is set to false

This looks certainly like a lot of jargon for most of you but I guess that it is pretty close to a lot of "Business" requirements. What would a property for those requirements look like (in pseudo Scala code)?



(originalNPV, fees, valuation date, pricing parameters) =>

if (!INCLUDE_FEES ||  NPV_INCLUDE_CASH)
  newNPV == originalNPV
else
  newNPV == originalNPV - fees.reduceLeft(0) { (fee, result) => result +
              if (fee.isUnderlyingPremium && fee.settlementDate <= valuationDate)      
                fee.getValue    
              else      
                0
            }

The most remarkable thing about this property is that it looks very close to the actual implementation. On the other hand, Scalacheck will be able to generate a lot of test cases:

an empty fee list
a list with no underlying premium fee
a list with a fee which settle date is superior to the valuation date
a list with a fees which settle date is inferior to the valuation date
the 4 possible combinations for the values of the pricing parameters

You can also notice that I use as the first parameter the originalNPV, which doesn't directly come from a generator but which would be the result of the original pricer with the other generated parameters (fees, valuation date, pricing parameters).

Conclusion

As a conclusion, and at the light of the 2 previous examples, I would like to enumerate the result of my recent experiments with Properties-Driven-Development:

First of all is that PDD is TDD, on steroids. In PDD, we also have data and assertions but data are generated and assertions are more general.

I don't believe that this replaces traditional TDD in all situations. There are situations where generating even 4 or 5 cases manually is easier and faster. Especially when we consider that making an exact oracle (the savant word for verification of test expectation) is sometimes tedious as in the camelCase function. In that situation developping the cases manually using the == method would have been much faster

PDD on the other hand allow the specify very clearly what is the rule. This is something that you would have to infer reading several examples when using TDD

On the other hand having several examples also facilitate the understanding of what's going on. "foo_bar" becomes "FooBar" is more easy to catch than "if a letter is preceded by,..."

PDD is very good at generating data you wouldn't think of: empty list, negative numbers, a string with underscores only,...

A tip: sometimes, it is useful to include in the generated parameters the result returned by the function you want to test. For example, in the second example, my parameters could be: (originalNPV, newNPV, fees, valuation date, pricing parameters). That way, when Scalacheck reports an error, it also reports the actual value you got when showing a counter-example

Sometimes the properties you want to check will almost mimic the implementation (as in example 2). I think that this is may be very often the case with business code if written properly or that this may show that your code is missing a key abstraction

It really gets some time to get your head wrap around finding properties. And soon you'll start thinking things like: "I know that property A, B and C characterize my function, but are they sufficient?" and you realize that you coming close to Programming == Theorem Proving

As a summary, PDD is not the long awaited Silver Bullet (sorry ;-) ,...) but it is indeed a wonderful tool to have in your toolbox. It will help you test much more thoroughly your programs while seeing them yet another way.

15 January 2008

Better unit tests with ScalaCheck (and specs)

Writing unit tests can seem tedious sometimes.

Some people tell you: "Hey, don't write unit tests only! Do Test-Driven Development". You write the test first, then the code for it. This way:

you end-up writing only the code that's necessary to deliver some concrete value for your customer/user
you drive the design of your system
you add frequent refactorings to the mixture to ensure your code stays clean

Or even better, "Do Behaviour-Driven Development!". With BDD, you get nice executable specifications for your system, which can almost read like English.

While I fully adhere to the above principles, I also think that there is a continuum between specifications and tests. And at the end of this continuum, it's all about testing that your software works. Even given silly inputs. Your unit tests should provide that kind of coverage.

And it's not that easy. A single line of code can go wrong in so many different ways. Try copying a file. Here comes ScalaCheck to the rescue!

Introducing ScalaCheck

Using ScalaCheck, you define:

properties which should always be true
random data to exercise the property

and ScalaCheck generates the test cases for you. Isn't it great?

Let's take a concrete example to illustrate this, because I feel I almost lost my only reader here (thanks bro, you're a real brother).

If you want, you Can

Last week I started specifying and testing the famous Can class from the lift framework. The Can class is the Option class from Scala library, on steroids. [To Scala newcomers: there are many good posts on Option, Maybe (in Haskell), Either and all this monad folkore but I will send you to a concrete example here].

Basically, a Can is either Empty (it contains nothing) or Full (it contains a value). This is a fairly common situation in software or elsewhere: the user with name "Smith" exists in the database (Full) or not (Empty), I got the power (Full) or I haven't (Empty).

When a Can is empty, it can be enhanced with an error message explaining why it is empty. In that case, it will be a Failure object.

Now, if you want to test an "equals" method working for all different cases you have to specify a lot of test cases:

2 Full objects which are equal
2 Full objects which are not equal
2 Empty objects which are equal
2 Empty objects which not equal
2 Failure objects which are equal
2 Failure objects which not equal
A Full object and an Empty object (not equal)
A Full object and an Failure object (not equal)
A Failure object and an Empty object (not equal)

When I said it could be tedious,... And I'm even simplifying the situation since Failures can be chained, optionally contain an Exception, etc,...

Properties

Here is the solution, implemented using specs and ScalaCheck, with the support of Rickard Nillson, author of the ScalaCheck project:


object CanUnit extends Specification with CanGen {
  "A Can equals method" should {
    "return true when comparing two identical Can messages" in {
      val equality = (c1: Can[Int], c2: Can[Int]) => (c1, c2) match {
        case (Empty, Empty) => c1 == c2
        case (Full(x), Full(y)) => (c1 == c2) == (x == y)
        case (Failure(m1, e1, l1), 
              Failure(m2, e2, l2)) => (c1 == c2) == ((m1, e1, l1) == (m2, e2, l2))
        case _ => c1 != c2
      }
      property(equality) must pass
    }
  }
}

How does it read?

"equality" is a function taking 2 Cans. Then, depending on the Can type, it says that the result from calling the equals method on the Can class should be equivalent to calling equals on the content of the Can if it is a Full Can for instance.

Create a "property" with this function and declare that the property must pass. That's all.

Well, you may want to have a look at what's generated. Add the display parameter:


import org.specs.matcher.ScalacheckParameters._
...
property(equality) must pass(display)

Then you should see in the console:


....
Tested: List(Arg(,Failure(cn,Full(net.liftweb.util.CanGen$$anon$0$UserException),List()),0),... Tested: ...
Tested: ...
....
+ OK, passed 100 tests.

And if one test fails:


A Can equals method should
  x return true when comparing two identical Can messages
    A counter-example is 'Full(0)' (after 1 try) (CanUnit.scala line 21)

But you may have, at this point, the following nagging question: "Where does all this test Data come from?". Let's have a look below.

Generating data

Data generators are defined "implicitly". You define a function which is able to generate random data and you mark it as "implicit". When ScalaCheck tries to generate a given of object, it's looking for any implicit definition providing this. Like:

implicit def genCan[T](dummy: Arb[Can[T]])
                      (implicit a: Arb[T] => Arbitrary[T]) = new Arbitrary[Can[T]] {
 def getArbitrary = frequency(
   (3, value(Empty)),
   (3, arbitrary[T].map(Full[T])),
   (1, genFailureCan)
 )
}

This code says that generating a Can, optionally full of an element of type T, which has its own implicit Arbitrary generator, is like choosing between:

an Empty object, 3 times out of 7
an arbitrary object of type T, put in a Full object, 3 times out of 7
a Failure object (which has its own way of being generated via another function), 1 time out of 7

[The "dummy" parameter is here to help Scala type inferencer, AFAIK. The world is not perfect, I know]

Here is the Failure generator, which make heavy use of ScalaCheck predefined generation functions:

def genFailureCan: Gen[Failure] = for {
  msgLen <- choose(0, 4)        
  msg <- vectorOf(msgLen, alphaChar)        
  exception <- arbitrary[Can[Throwable]]        
  chainLen <- choose(1, 5)        
  chain <- frequency((1, vectorOf(chainLen, genFailureCan)), (3, value(Nil)))}          yield Failure(msg.mkString, exception, chain.toList)

In the above method,

choose returns a random int number inside a range
vectorOf returns a collection of arbitrary object, with a specified length
alphaChar returns an arbitrary alphanumeric character
arbitrary[Can[Throwable]] returns an arbitrary Can, making all this highly recursive!

Random thoughts

I hope this sparked some interest in trying to use ScalaCheck and specs to define real thorough unit tests on your system.

The added value is similar to BDD, you will see "properties" emerge and this will have a better chance at producing rock-solid software.

From now on, you too can be a ScalaCheck man! (see lesson 4)

Pages