22 April 2009 

A FIT-like library for Scala

Ever since I started the specs project, I saw the possibility to provide a better FIT-like library for Scala developers.

Having been a Fitnesse user in the past, I was dissatisfied with the following:
  • having interpreted tables with no possibility for automated refactoring
  • being limited to tables and not being able to add simple expectations close to the specification text
  • being limited to tables while my domain data could be too complex to fit in a tabular format, hence supplementary work when trying to map business concepts to FIT tables
A tool for developers

I first started with the following assumption: most "business users", "subject matter experts" (SME) can't write and maintain FIT specifications. This point of view is not shared by everyone, but my experience shows me that the most workable FIT process is:
  1. A SME gives examples of what is expected, on the blackboard, on a paper, on an excel spreadsheet, whatever
  2. A developer walks through the examples with the SME, asking questions, negociating features,... This is the most important part: communicating clearly the specifications through examples.
  3. A developer writes and implements the examples with the FIT library
  4. The developer comes back to the SME to clarify hidden assumptions and preconditions as writing things down precisely often shows under-specified parts
  5. The developer completes the executable specification and refactors it
  6. The SME validates the final result which should be as readable as possible for her. Possibly, she adds some variations on the first examples, iterating on the whole process
The main conclusion of the small process depicted above is that:
  • We can use developer tools because developers are the primary users for the FIT-like specification tools
Of course, this isn't the end of the story. SME are indeed secondary users of those tools and we should take care of them in the future.

But let's focus for now on our primary users. If we create a tool for developers:
  • An IDE can be used to edit, refactor and navigate the specifications (think Eclipse, IntelliJ)
  • A high-level language can be used to produce them (think Scala of course!)
  • The specifications can be stored in a Source configuration and control system (think Subversion, git,...)
Basic specifications

I first started with this vision: a specification should be some free text, explaining what the system is supposed to do. Among that text there are sentences or text fragments which represents examples of what the system should really be doing. I want to be able to "tag" those fragments and "link" them to a piece of code implementing the actions on the system and my expectations.

Free text mixed with executable code = ...? Scala XML literals of course!

Having XML literals in Scala allows to write any kind of text and where necessary, insert some code in curly braces {} to be executed. So my first idea was to allow the developer to:
  • write some specification text in a Scala file, with minimal decorum to create a Specification class:
class HelloWorldSpecification extends HtmlSpecification {

"The greeting application" is <textile>

h3
. Presentation

This new application should say "hello" in different languages.

</textile>
}
  • "tag" part of that text that I consider being an example which should be executed
For example,<ex>by default, saying hello by default should use English</ex>
  • add a piece of code implementing the example and setting expectations
For example,<ex>by default, saying hello by default should use English</ex>
{ greet must_== "hello" }

I also liked the idea of using wiki markup to introduce simple formatting features to the Specification. Hence the use of tags showing that the Specification text should be interpreted as Textile marked-up text (the other available markup language is Markdown).

The result of this specification can be seen here. The first example executes without any issue and is highlighted as green text:

For example,by default, saying hello by default should use English

You can note that this constrast a lot with other Specification libraries approaches, like Cucumber, where:
  • Text is interpreted and mapped to methods. If I write "Given I have entered 50 in the calculator", there should be a method like I_have_entered_x_in_the_calculator
  • The specification flow is imposed through the use of the "Given, when, then" format. My personal feeling is that this is detrimental to writing a good specification where more text, images, tables are needed to explain the expected behavior than just scenarii descriptions
Nesting tables all the way down,...

The first shots at my HtmlSpecifications were encouraging. Simple literate specifications can even be a good way to give examples of an API usage, as in the Mockito specification for specs.

However, I really was missing the table formats I used to work with in Fitnesse.

Then came the next "vision": tables are good, nested tables are better because they can look closer to the data structures we're juggling with all day long. So I should allow tables to be nested and composed of other tables. Hey, even having tabs would be nice, wouldn't it?!

The good news is that nothing else that pure Scala was needed for that.

Let me introduce: Properties, Fields and Forms

Properties

The basic building block of a Form is a Property (of type Prop in specs, Property being reserved for something like this).

A Prop is something which has:
  • a label
  • an expected value
  • an actual value
  • a constraint
The most common type of property is declared like this in a Form:

// "Name" is the property label, "Eric" is its actual value
val name
= prop("Name", "Eric")
and the expected value can be set with:
name("Bob")
Once executed the property will check for the equality of the actual and expected value, using a beEqualTo matcher. Note that it is possible to change the matcher used to check the values:
val name = prop("Name", "Eric").matchesWith(equalToIgnoringCase(_))
name
("eric").execute.isOk // true
Fields

Actually there is an even simpler building block for Forms than a Prop: a Field. A Field is simply a piece of input or output data with a label:

val firstName = field("First name", "Eric") // just a name and value
val tradeId = field("Trade id", trade.getId) // retrieved from the database
Forms

Armed with Props and Fields, building a Form is plain easy:
  1. declare the props and fields of the form
  2. declare how they should be layed out
class Person extends Form {
val firstName
= prop("First name", "Eric")
val lastName
= prop("Last name", "Torreborre")
tr
(firstName)
tr
(lastName)
}

In the Person form above, we decided that the properties should be displayed on 2 different rows but they might as well be displayed on the same row:
   tr(firstName, lastName)
And this can get arbitrarily more complex because you add also forms in a row:

   tr(firstName, lastName, address) // address is a Form representing the address

Tabs are also very easy to use when there is more data to organize:

class ClubMember extends Form {
new tabs() {
new tab("Contact details") {
tr
(field("First name", "Eric"))
tr
(field("Last name", "Torreborre"))
}
new tab("Sports") {
th2
("Sport", "Years of practice")
tr
(field("Squash", 10))
tr
(field("Tennis", 5))
tr
(field("Windsurf", 2))
}
}
}

Special forms

As usual, as few patterns emerge when you start using something more frequently. For example, it is pretty common to display data in straightforward tables with column headers. In that case, each cell should be a property with no label and the properties labels should be used to create the header row. The TableForm serves exactly that purpose:
class PayLine(vDate: String, p: Double, r: Double) extends LineForm {
val valueDate
= field("Value date", vDate)
val pay
= prop("NPV_PAY_NOTIONAL", pricePay(valueDate))(p)
val rec
= prop("NPV_REC_NOTIONAL", priceRec(valueDate))(r)
}
new TableForm {
tr
(PayLine("6/1/2007", -1732.34, 0.0))
tr
(PayLine("4/30/2008", -580332.88, 0.0))
}.report

In a TableForm, each row is a LineForm whose properties and fields will not have their labels displayed inside the table, but which will be used to build the table header. There is an even more cryptic (at least to the eyes of some,...) way of creating such a table using a DataTable, the DataTableForm:
new TradePrice {
"Value date" | "NPV_PAY_NOTIONAL" | "NPV_REC_NOTIONAL" |
"6/1/2007" ! -1732.34 ! 0.0 |
"4/30/2008" ! -580332.88 ! 0.0 | { (vDate: String, pay: Double, rec: Double) =>
tr
(vDate, prop(pricePay(vDate))(pay), prop(priceRec(vDate))(rec))
}
}.report
In that example, the properties are created on the fly, without any label at all, because the table header will be directly taken from the DataTable header.

BagForm

This is certainly the most advanced table featured at the moment. In many situations, each row of the table maps to an entity of the domain (let's say a "Customer").

The BagForm allows to specify that a Seq of actual entities should match the expected one declared on each row:
case class CustomerLine(name: String, age: Int) extends EntityLineForm[Customer] {
// the prop method accepts a function here, taking the proper attribute on the "Entity"
prop
("Name", (_:Customer).getName)(name)
prop
("Age", (_:Customer).getAge)(age)
}
class Customers(actualCustomers: Seq[Customer]) extends BagForm(actualCustomers)

// usage example
new Customers(customersFromTheDatabase) {
tr
(CustomerLine("Eric", 36))
tr
(CustomerLine("Bob", 27))
}
Each Customer EntityLineForm declares some properties whose actual values are "extracted" from an aItaliquectual entity.

When the table is constructed (see "usage example"), each line is tested against the actual entities passed as parameter (customersFromTheDatabase), then only the best matches are kept to create the final table, based on the number of successful properties on each line. And if some expected rows are not matched by any actual entity, they are added at the end of the table as failures. A picture is worth 1000 words, the result is here.

Conclusion

The paint on LiterateSpecification and Form is still fresh! I'm promoting it as "Alpha" for the next specs release (1.5.0), because I expect a few bugs to crop in here and there and lots of additional features to be necessary to support "industrial" specifications.

In any case, I expect that the vision I had and the support of the Scala language will allow everyone to create beautiful specifications with lots of examples to discuss with Subject Matter Experts.

Try it, send me all your feedback, I hope you'll like it, I sure do!

Labels: ,

11 March 2009 

How we decide

For once, let's think about something else than programming. Or will we ;-)?

The bookshop

Two weeks ago, going back home after some consulting, I arrived early at the airport with a few dollars in my pocket, and I decided to stop by the book shop to have a look at some refreshing mental food.

Haha, I wish that was so easy! I had a first round, looking at almost all the books, then I narrowed down my choice to a few books, maybe 2 or 3 and started thinking really hard.

"Do I prefer a short and small book?"
"Should I take the most interesting?"
"Should it rather be entertaining? I should relax"
"Do I have enough cash? If not, is that worth withdrawing more money?"
"Do I rather to read a more political book?"
"Do I need a book at all?"

All this thinking (I'm not kidding, I spent at least 20 minutes) convinced me that this was the book I should buy: How we decide.

Feelings can be a good decision-maker

This book was for me a very good follow-up on Damasio's book Descartes' error where we learn, through different stories, the role of emotions in the decision making process. Like the story of that guy having a brain injury and not feeling any more emotions. 

One morning, coming to Damasio's lab he explained everyone how he's been able to stay very calm in his car, avoiding a deadly car accident due to the iced road that morning. After the session, as he was about to leave, the doctor asked him to select the best date for the next appointment.  That emotionless guy stood there,  weighting for 30 minutes all the possibles options that were open on his schedule. The team just wanted to strangle him for not being able to make up his mind!

This kind of story is one of many showing the power of emotions or, you could say, the power of the unconscious decision circuits. Most of other Jonas Lehrer examples are taken from games, whether decisions must be taken fast, like baseball or american football, or whether decisions are very complex, like chess or backgammon. 

In all those cases, Jonas Lehrer tells us that we have "dopamin learning circuits" which are directly connected to the reward system. Some neurons register when "it's good", when "it works", while others register the surrounding context and the consequences of our actions. The net result is that we progressively get a "feeling", an "intuition" of what works and what doesn't.

For example, Garry Kasparov has a global feeling of what is a good strategic position on the chessboard. Joe Montana had an instinctive feeling of when was the best moment to make a pass on the field. We must also note that both of them have spent a lot of time thinking hard about how they failed, how they made mistakes.

Is there anything we can use out of this for programming? 

We should fail early, fail often and try to learn as much as possible from our failures:
  • by reducing the develop-test cycle so that we have very frequent feedbacks of when changes are breaking stuff
  • by using retrospectives/post mortems to analyse what is right when working as a group of people on a project
  • by scrutinizing the code when it hurts. We have to change 100 lines of code all over the place for a seemingly innocent evolution. That hurts, doesn't it? Let's use this as a reinforcement for extreme refactoring. Next time we have to decide if we should refactor or not, the decision should feel obvious
This is also a good justification for code elegance. Code is elegant when it's easy to read, modify, reuse and getting that feeling helps making all the micro-decisions when writing code:
  • naming
  • formatting
  • refactoring
  • abstracting
  • checking, validating
Dont always trust your feelings, Luke

Our "dopamin circuits" are trained to recognize success or failure patterns and give us a good feeling of what the next decision should be. Unless we totally fool ourselves. 

For example, we're easily fooled by randomness, thinking we understand how probabilities work. In 1913, in a casino at the Roulette wheel, the Black color came out 26 times in a row! Can you imagine the number of people who put all their money on the White after the tenth or fifteenth time,... Even if you're reminded that each draw is independent from the previous one, wouldn't you bet all your money too?

That's not the only situation where we should make better use of our reason. Behavioral science has now plenty of evidence showing how limited our rationality is:
  • we sometimes just don't learn from failing patterns. In some crisis situations, when emotions are high, we keep doing the wrong thing despite any adverse evidence. It's just impossible to think out of the box.
  • we're generally very bad at understanding probabilities.  For example, if there are 10 lottery tickets at 10$ and the reward is 200$, there's no reason not to play. But if we learn that we buy one ticket while another guy/gal buys the remaining 9 tickets we usually find the situation unfair and we don't play!
  • we're subject to "framing issues" and "loss aversion". Will you prefer your doctor to say that you have 20% chances to die or 80% chances to live?
The solution here is to use our frontal precortex and really try to take rational decisions knowing our natural biaises and weaknesses.

Is there anything we can use out of this for programming? 

Programming is a highly rational activity, but not always. Let's not forget to use our frontal precortex when:
  • we have "impossible" bugs. Especially under pressure, we're can sometimes be completely stucked on a bug just thinking: "this is impossible!!!! I'm doing A, I should have B!!!!". In this situation the best is to stop and brainstorm. Try to come out with as many ideas as possible. Then carefully select and experiment each of them. There's no magic in programming, things happen for a reason, so let's try to find it fast!
  • we're tuning for performance. Performance tuning should be as scientific as possible. If I have the feeling that X could be slowing down my system, I shouldn't tweak anything unless I have real evidence that X is the culprit
Too much information

Rationality= good. Too much rationality = bad. 

Yes, this not obvious but for example, too much information is not the necessary ingredient for a good decision: 
  • some clients are tasting and comparing strawberry jams. They do a first ranking of their preferences. Now we ask them to think about why they like them and the rankings are completely changed! I have the feeling that this is related to Condorcet's paradox.
  • numerous studies suggest that "experts" may not be good predictors of what's really going to happen in their field (well, also because they can be biaised by their own opinions)
  • before MRI, doctors were sending patients with back pains to rest for a few days. Now they can see inside the body they suggest surgery and medecine. Even to well-being patients!
Is there anything we can use out of this for programming? 

Focus!
  • when tuning performance, it is very easy to be completely overwhelmed by data. I would suggest taking one "red flag" only (like a display issue) and making a thourough analysis of that only issue until it is completely explained, ignoring what's not relevant
  • lean programming suggests that accumulating huge backlogs of features and issues may not be the best and most way to make decisions. One reason is that we're very sensitive to irrelevant alternatives.
Conclusion: Emotions vs Reason? 

I really learned something with that book. When too many rational criteria come up to my mind on which book I should buy, I'm just losing my time and most probably I am going for a bad choice. I'd rather let my emotions drive me and learn instinctively what a good decision is for me (talk about an obsessional guy,...).

In conclusion, the debate is not such much "emotions or reason" but more why and how we use both of them to make our decisions.

    Labels: ,

    13 February 2009 

    An exercise with arrows in Scala

    I'm still a bit fascinated by the so-called "computing models" that you get when programming with monads or arrows.

    That fascination comes from that fact that the things I'm using everyday can be abstracted to useful entities. Of course, it is pretty well-known to programmers that functions can be described formally with parameters, types, return values and so on. Given the definition of functions you can even compose them and create a useful function out of 2 other ones.

    But I was pretty amazed, when reading the paper from John Hughes on Arrows, to see that there are lots of properties and combinators that you can define for your basic computing blocks. And that these combinators, defined at an abstract level, can then be used to structure the concrete and dirty work of your everyday functions.

    The Challenge

    3 few weeks ago, I was intrigued by the small challenge brought up by Debasish Gosh, about translating one Haskell snippet to Scala. The function clusterBy, in Haskell, is defined by:

    clusterBy :: Ord b => (a -> b) -> [a] -> [[a]]
    clusterBy f = M.elems . M.map reverse . M.fromListWith (++) . map (f &&& return)

    and basically, it takes a list of elements and returns a list of lists, where the elements are grouped according to their result via a function. If 2 elements have the same result with f, then they end up in the same list. For example:

    clusterBy length ["ant", "part", "all"] => [["all", "ant"], ["part"]]

    That example looked interesting to me for 2 reasons:
    • that's not the kind of thing that you program in one line with Java
    • it uses Arrows and I remembered that some code for Arrows in Scala was available
    So I set out to rewrite the clusterBy function in Scala, using Arrows.

    The solution

    Here's my solution (partial here, read more at the end):

    def clusterBy[A, B](l: List[A])(f: A => B) = {
    (listArr(f &&& identity) >>> arr(groupByFirst) >>> arr(second) >>> arr(reverse))(l)
    }

    And the clusterBy function in action as a specs expectation:

    clusterBy(List("ant", "part", "all"))(_.length) must beEqualTo(List(List("all", "ant"), List("part")))

    Let's dissect the clusterBy function to understand what exactly Arrows are bringing to the table. 

    "Lifting" a function to an Arrow (what the h...?)

    First of all, for our understanding, we can remove some noise:

    arr(groupByFirst) is just taking a function, groupByFirst, and "lifting" it to make it an Arrow object which can play nicely with other arrows using the >>> operator.

    There are implicit definitions in the Arrows object transforming functions to Arrows but unfortunately type inference didn't seem to authorize me the removal of arr(...). If I could do it, I would get something like:

    def clusterBy[A, B](l: List[A])(f: A => B) = {
      (listArr(f &&& identity) >>> groupByFirst >>> second >>> reverse)(l)
    }

    >>>, the "sequence" operator

    How should we read that expression now? 

    The >>> operator reads very easily. It says:

    f1 >>> f2 => take the result of f1 and give it to f2.

    It is the equivalent of the composition operator, except that it reads the other way around, from left to right. With composition we would have "f2 . f1". The >>> operator (arguably) follows the natural reading flow (well, in English at least,...).

    &&&, the "branching" operator

    The next important operator is &&&:

    f1 &&& f2 => A function with takes an input and returns a pair with the results of both f1 and f2.

    So (f &&& identity) in our example simply takes an element and creates a pair containing the result of the application of f and the element itself:

    (f(x), x)

    But what we want to do is to apply this function (an Arrow more precisely) to the list of elements which is our input. That's exactly what listArr does! It creates a Arrow taking a list (read "enhanced function") from a function taking a single element.

    Reading the whole expression in detail

    With all that knowledge, let's read again the definition now with our example as an illustration. 

    The input data is:

    List("ant", "part", "all")

    "take the list of elements and return (f(x), x) for each element":
    listArr(f &&& identity)
    => List((3, "ant"), (4, "part"), (3, "all"))

    "then group by the first element", i.e. create a list of pairs where the first element is f(x) and the second element is a list of all y where f(x) == f(y))
    >>> groupByFirst
    => List((4, List("part"), (3, List("ant", "all")))

    "then take the second element"
    >>> second
    => List(List("part"), List("ant", "all"))
    Note that the "second" function defined here just takes the second element of a pair. This means that the >>> operator is smart enough to know that we're operating on lists of pairs and not on a single pair!

    "then reverse the list"
    >>> reverse
    => List(List("ant", "all"), List("part"))

    The hidden stuff under the carpet

    To be able to create this nice one-liner in Scala, I had to create special-purpose functions: groupByFirst, reverse, second.

    First of all, you can argue that the real meat of the clusterBy function is actually the groupByFirst function:

    def groupByFirst[A, B] = new Function1[List[(A, B)], List[(A, List[B])]] {
      def apply(l: List[(A, B)]) = {
        l.foldLeft(new HashMap[A, List[B]]: Map[A, List[B]]) { (res: Map[A, List[B]], cur: Pair[A, B]) =>
          val (key, value) = cur
          res.update(key, value :: res.getOrElse(key, Nil))}.toList}
    }


    And actually if you look at Debasish's blog, there are full Scala solutions that fill the same space as the groupByFirst function. On the other hand, I find that the Arrows notation shows pretty well the process flow between "elementary" functions.

    Then you can wonder why I couldn't basically "detach" the reverse operation from the List class, like this maybe?

    def reverseList[T](l: List[T]) = l.reverse
    def reverse[T] = reserveList _ 

    However, this just doesn't work because the type inference seems to be not constrained enough when it comes to sequence the functions with the >>>  operator. The only way I found to have this working was to define a Function1 object:

    def reverse[T] = new Function1[List[T], List[T]] { 
      def apply(l: List[T]) = l.reverse 
    }

    This last issue is maybe the biggest drawback when using Arrows with Scala. Now I can propose my own challenge to the Scala community. Can you find a better way ;-) ?...

    Labels: ,

    05 January 2009 

    Doing my homework

    The mapFilter function

    I recently tried to write a function using Scala which would apply a function to a list, then filter all the resulting elements according to a criteria. In my infinite wisdom I named it "mapFilter". Here is an example of what I expected:

    mapFilter(List(1, 2), (x:Int) => if (x%2 == 0) Some(x*2) else None)
    // returns List(4)


    Of course I wanted this function to be as efficient as possible so it had to do the job in one pass. Otherwise, I could just use a combination of "map" and "filter" to do the trick:

    def mapFilter[T, U](l: List[T], f: T =>Option[U]): List[U] = {
      l.map(f(_)).filter(_.isDefined).map(_.get)
    }

    3 passes on the list, we can definitely do better than this!

    Using flatMap

    Somehow I remembered that the Option class is a Monad, and my subconscious related the bind operation of monads to the flatMap operation on Iterable. I should be able to do something with it,... 

    Then I experimented this:

    List(1, 2) flatMap { (x:Int) => if (x%2 == 0) Some(x*2) else None }

    Bingo, it works!

    Now, if you read the signature of the flatMap method (on the Iterable trait), understanding how this works is not so clear:

    def flatMap[B](f: A => Iterable[B]): Iterable[B]

    Option is not an Iterable, this shouldn't compile! Yes, it does, because any Option can be implicitly converted to an Iterable thanks an implicit conversion:

    object Option {
      /** An implicit conversion that converts an option to an iterable value */
      implicit def option2Iterable[A](xo: Option[A]): Iterable[A] = xo.toList
    }

    Sometimes I wish there was a way to display all the available implicit definitions for a given scope,...

    Monads again,...

    What about the "Monadic" stuff then? Tony Morris, on the Scala mailing-list, left me a message hinting at a purer solution:
    Hi Eric,
    You can write mapFilter using Scalaz which has a MonadEmptyPlus data
    structure. I suggest you do not use the available subversion of the
    List.flatMap type signature from which to learn
    So I thought that I should do my homework, have a look at it and understand why MonadEmptyPlus was a good way to write the mapFilter function.

    You can find all the code from the scalaz project here.  The package I'm exploring today is the "control" package. 

    A bit of Scalaz exploration

    Let's jump right to the MonadEmptyPlus:

    trait MonadEmptyPlus[M[_]] extends MonadPlus[M] with MonadEmpty[M]

    Ohhh. We are high on abstraction here, because MonadPlus is:

    trait MonadPlus[M[_]] extends Monad[M] with Plus[M]

    And MonadEmpty is, as you can guess:

    trait MonadEmpty[M[_]] extends Monad[M] with Empty[M]

    Last but not least, Monad is:

    trait Monad[M[_]] extends Pure[M] with Bind[M] with Functor[M] with Apply[M]

    Honestly, I didn't suspect that my homework would take me so long! Yet, since every concept and notion of what is a Monad seems to be carefully crafted in Scalaz, this is a good idea to spend a bit of time trying to understand each of them.

    Pure

    The Pure trait is defined by:

    /**
     * Project the given value into the environment that is abstracted over. 
     */
    def pure[A](a: A): P[A]

    It actually says that you can take any value of some type A, and using the "pure" function, put it in a "Container" of type P, called the "environment" here. I guess that the idea is to say that once the value is well protected and controlled in its container, it is "pure". That being said, this is still very abstract. Fortunately we have some implicit values in the the Pure object giving us some examples of what it means to be "pure":

    implicit val OptionPure = new Pure[Option] {
      def pure[A](a: A) = Some(a)
    }
    implicit val ListPure = new Pure[List] {
      def pure[A](a: A) = List(a)
    }

    A Pure[Option] of the value a is simply the value "a" inside the "Some" container. So if you think of Monads as "Containers" for computation then "pure" is what creates the container with a value inside.

    Bind

    Bind is the evil twin of Pure and, as far as I'm concerned, the essence of monads because it is the basis of computing with Monads:

    /**
     * Binds a function through an environment (known also as sequencing).
     */
    trait Bind[BD[_]] {
      /**
       * Binds the given value with the given value through the environment.
       */
      def bind[A, B](f: A => BD[B], ba: BD[A]): BD[B]
    }

    Every word is important here. You have an "environment" again. This "environment" or "Container" contains values which have been put there using the pure function for example. Now, what you want to do is to apply a function to the value inside the container, without the value leaving it. So you "bind" a function to the environment. Again, concrete examples, given by the implicit values in the Bind object will help us understand what's going on:

    implicit val OptionBind = new Bind[Option] {
      def bind[A, B](f: A => Option[B], a: Option[A]) = a flatMap f
    }
    implicit val ListBind = new Bind[List] {
      def bind[A, B](f: A => List[B], a: List[A]) = a flatMap f
    }

    When I bind a function to an Option, like Some(4), it is as if I'm doing:

    val f = x => Some(x + 2)
    bind(f, Some(4))

    1. apply the function to the inner element

    Some(f(4)) === Some(Some(6))

    2. "flatten" the inner result by removing the inner "box"

    Some(6)

    This brings 2 remarks here:

    1. "bind" for a monad is indeed the "flatMap" function of an Iterable in Scala. It maps the value then flatten the results. The big difference is in the function signature. While flatMap is a method on Iterable and accepts a function returning another Iterable, the bind function accepts a "Container" type of any sort and returns the same container type; Binding to an Option will return an Option, binding to a List will return a List ("I suggest you do not use the available subversion of the List.flatMap type signature from which to learn"...)

    Functor

    2. why is it necessary to have a "bind" operation? If I want to get Some(6) as a result, something like "map" should be enough. True. And this even has a name, this is the "fmap" operation of the Functor trait (also mixed in by Monad):

    trait Functor[F[_]] {
     /**
      * Maps the given function across the given environment.
      */
      def fmap[A, B](f: A => B, fa: F[A]): F[B]
    }

    The "fmap" operation just means transforming the value(s) inside the Container to something else. But the bind operation offers an additional advantage. It allows to compose functions returning values in the Container. I can't compose directly 2 functions returning an Option:

    val f = (x:Int) => if (x % 2 == 0) Some(x) else None
    val g = (x:Int) => if (x % 3 == 0) Some(x) else None

    // this doesn't work since g returns an Option and f expects an Int
    val composed = f . g // ??

    But I can compose them using "bind":

    // this works as expected
    val composed: (Int => Option[Int]) = (x:Int) => g(x).bind(f)

    Apply

    The definition of Apply is this one:

    trait Apply[AP[_]] {
      def apply[A, B](f: AP[A => B], fa: AP[A]): AP[B]
    }

    And a good example is provided for Lists:

    implicit val ListApply = new Apply[List] {
      def apply[A, B](f: List[A => B], a: List[A]) = f flatMap (f => a map(f(_)))
    }

    We take a list of functions, a list of values, and we return a list with the results of all the functions applied to all the values. Actually this is the essence of the "Applicative" model of programming, so I don't really understand why it's been added to the Monad trait. I guess it is because monad computations imply applicative computations as described here.

    Empty and Plus

    Now that the tour of Monad is over, we can look at the Empty and Plus traits.

    Empty is very easy, it just returns an empty "Container" (named "Environment" here):

    trait Empty[E[_]] {
      /**
       * Returns the empty environment.
       */
      def empty[A]: E[A]
    }

    implicit val OptionEmpty = new Empty[Option] {
      def empty[A] = None
    }

    implicit val ListEmpty = new Empty[List] {
      def empty[A] = Nil
    }

    Plus defines what it means to "append" or "merge" two environments together. Its result is very specific to the underlying Container type as you can see with the implicit values:

    trait Plus[P[_]] {
      /**
       * Appends the two given values.
       */
      def plus[A](a1: => P[A], a2: => P[A]): P[A]
    }

    implicit val OptionPlus = new Plus[Option] {
      def plus[A](a1: => Option[A], a2: => Option[A]) = a1 orElse a2
    }

    implicit val ListPlus = new Plus[List] {
      def plus[A](a1: => List[A], a2: => List[A]) = a1 ::: a2
    }

    Empty and Plus are indeed defining some kind of addition operation on Monads, aren't they?

    The real academic mapFilter function ;-)

    With all that at hand I should now be able to create my mapFilter function ;-)

    import scalaz.control._
    def mapFilter[T, A[_], U](f: T => A[U], m: A[T])(implicit m1: MonadEmptyPlus[A])= {
      m1.bind(f, m)
    }

    Is that all? Yes, let's try it:

    // Note that Scala type inferencer doesn't infer the types here
    mapFilter[Int, List, Int]((x:Int) => if (x%2 == 0) List(x*2) else Nil,
                              List(1, 2))

    However we may wonder: is that really necessary to have a MonadEmptyPlus? No, a simple Monad is also ok:

    def mapFilter[T, A[_], U](f: T => A[U], m: A[T])(implicit m1: Monad[A]) = { 
      m1.bind(f, m) 
    }

    And that's understable with the List example because we're using "flatMap" under the covers! But this is not totally satisfying. Somehow, I would like to enforce that if f returns the empty element for a given MonadEmptyPlus, then that element is filtered from the result. 

    Even better with FoldRight

    We can actually define this using another abstraction from the control package: FoldRight.

    trait FoldRight[F[_]] {
      /**
       * Fold the given function across the given environment.
       */
      def foldRight[A, B](t: F[A], b: B, f: (A, => B) => B): B
    }

    FoldRight works exactly as the foldRight method on Iterable in Scala and allows us to use the Empty and Plus definition from MonadEmptyPlus, accumulating the mapped values, unless they're empty:

    def mapFilter[T, A[_], U](m: A[T], f: T => A[U])
                             (implicit m1: MonadEmptyPlus[A], f1: FoldRight[A]) = {
      f1.foldRight[T, A[U]](m, m1.empty, (t, result) => m1.plus(f(t), result))
    }

    And that's the end of the assignement for today!

    Conclusion

    This exploration of Scalaz was very interesting for me because:
    1. The abstractions are nicely separated
    2. Implicit values provide lots of concrete examples
    3. The programming style is very close to the use of type classes in Haskell (which wraps my mind in a new way). The drawback is less type inference by Scala
    4. I had to think again about that "bind" function and why it was so special
    5. It gives me the desire to understand more deeply what are the differences between the so-called "models of programming": functor, monads, applicative, arrows,...
    6. I haven't been blogging for a looooonnnnng time.

    Labels: , ,

    15 October 2008 

    Commando programming

    When you program for a trader you don't have time to write tests
    I've been wondering for some time if the quote above, or a variant like: "you don't have time to refactor your code", was really true or not. I generally bug my colleagues when they don't write tests along with their code and I usually prophetize that they will actually spare time by writing tests instead of losing time.

    Unfortunately I don't have strong evidence if this is true or not. The only thing I can say is that almost every time I've tried to take shortcuts in my developments, I've been bitten by very silly bugs and regretted my recklessness!

    Anyway, what would you do if you had to code at the speed of light for a drug-addict trader (and they really need drugs those days,...)? What kind of practices would you adopt to go faster? How would you prepare yourself for those situations?

    With those questions in mind, I was happy to be recently involved in a one week project where the biggest part of my job was to do some "Commando programming" to create the necessary tools to support the project.

    In this post, I'll do my own retrospective of that project and try to highlight the points which I think are decisive in the context of "Commando programming". Actually most of the "Ding!" points (does that ring a bell to you?) below are applicable to any programming situation. The only difference is that there's no time for preparation in a "Commando" situation. You have to be seriously fit for the job.

    This post is too long, that's the Steve Yegge syndrom,... So here's a summary of the take-away points, so you can get a first impression if it's worth reading or not:
    Ding! Know your destination
    Ding! Be a fast code-reader
    Ding! Know at least one scripting language really well
    Ding! Know your platform ecosystem
    Ding! Don't rewrite anything
    Ding! Take careful risks and prototype
    Ding! Have at least one high-level functional test
    Ding! Test the difficult stuff
    Ding! Isolate the file system
    Ding! Use the console
    Ding! Have lots of sample data
    Ding! Time is money, concepts are time
    Ding! PPP: Practice, Practice, Practice!!!
    Ding! Cool down and document/review your code
    Ding! Cool down and take a break


    The project: go see your doctor, now!

    In my company we've setup a methodology and a set of tools to assess the performance of our customers deployments and one of them asked us to come over for one week to deploy this so-called "HealthCheck" process. After the first initial interviews, we determined that one very essential objective of the project was to reduce the time to save a trade from 1.2 seconds to less than 500 ms. Very clear and quantifiable objective, that's a good start!

    Ding! Know your destination

    First point here, this may look totally obvious but still it needs to be said: the overall user objective must be very clear for everyone.

    This point is important for at least 4 reasons:
    1. It's hard to say that you're going fast if you don't know where you're going!
    2. The fastest development in the world is the one you don't do because you don't need it
    3. It helps a lot, as seen later, to have a clear overall objective when negotiating with yourself or with your customer the next micro-task to develop
    4. Developing fast is not the alpha and omega of productivity. Seeing the project from its overall objective can make you realize that saving other people's time may actually more important than saving your own
    Reading the thermometer

    Assessing the performance of our systems involves parsing and analyzing lots of different log files: client requests, server requests, database requests, sql execution times, workflow times, garbage collection times,... The scripts we started working with were shell scripts doing the following:
    • Reading and managing log files, including archiving old versions
    • Parsing the files and extracting the time spent for a given "quantity" (client request, SQL call,...)
    • Computing small statistics: minimum / average / maximum time
    • Creating graphs in order to be able to spot "spikes" that we will be able to label as "Red flags"
    The scripts we had were doing the job, but were pretty slow considering the amount of data we had to process. More than one hour could be spent just running the scripts on a subset of the log files.

    Analyzing the scripts, I thought that I could implement something more effective,(...)

    Ding! Be a fast code-reader

    Commando programming may involve reading a lot of code that's not yours before being able to do anything else. This has been written before: we usually spend far more time reading code than writing it. Reading code faster will necessarily speed you up.

    (...) more effective so I started prototyping something using Ruby. Why Ruby? Because I knew I would be fast to implement my ideas with it. Or was I?

    Ding! Know at least one scripting language really well

    No, I wasn't so fast,... I haven't programmed in Ruby for quite some time and I've forgotten many of the idioms or methods of the core api. My head is so full of Scala nowadays, that I would certainly have been faster using Scala for that task. The morality here is that knowing a "scripting" language is very helpful but you really need to have everything in your mental RAM in order to be effective.

    This goes for api knowledge as well as "environmental" knowledge: you should spend no time figuring out how to develop/build/test/deploy your code. Being stuck for 2 hours because you don't know how to set a class path for example is a huge waste of value.

    Anyway, I was able to show the effectiveness of using a language like Ruby to speed up the analysis time so I decided to officially port the scripts from shell to another language. But now which language should I select for my commando developments?

    Choosing a language is a vast question and the criteria governing that choice may not be entirely related to the intrinsic language features. This choice may also be governed by more "contextual" considerations, like the ability of other developers to learn the language and associated tools.

    In my case, Groovy was the winner for 4 reasons:
    1. It has all sort of features to help write code faster like closures, literals,...
    2. It can access Java libraries and I was planning to integrate some Database analysis tools we had written in Java
    3. That's one of the closest language to Java on the JVM so other developers will be able to pick up the new scripts faster. And it is usually better known than JRuby, Scala or Jython for example
    Ding! Know your platform ecosystem

    I also knew, when choosing Groovy, that I would be able to use all sorts of niceties. One of them was the Ant integration which allowed me to easily reuse the exec task (see below). Knowing my way around Ant was a good thing (I know that's not exceptional for a Java developer, it's just an example ;-) ).

    Rewrite everything?

    That was the most "dangerous" part of this project.

    I would honestly have preferred avoiding it. Since I had to replace one small part of the scripts with some new Groovy logic and also, since the scripts would need to be extended anyway, I decided to rewrite them all in Groovy.

    Ding! Don't rewrite anything

    Haha, I just said I did it! Well, not that you can never rewrite code or some library part, but you have to be damn sure about what you're doing. So,...

    Ding! Take careful risks and prototype

    Doing "Commando programming" will undoubtedly require taking risks and trying out some solutions before going on. More than ever, it needs to be done with lots of care because the last thing you want, is to find yourself in the middle of the road at the end of the week, because you didn't have time to finish implementing your new "Grand Vision", whatever it is.

    In my case I mitigated the risks a lot by "wrapping" the existing shell commands in my Groovy scripts. So I was essentially running the same thing but executed from Groovy! That was really worth it because in the process, I could also reduce a lot the existing amount of code thanks to the usual Groovy tricks.

    Test or not to test

    Now we come to the meat of the discussion! TDD, BDD, unit tests, code-and-see, what should be done? I honestly don't have a definitive answer but here's what I did.

    Ding! Have at least one high-level functional test

    You need to have at least one high-level functional test guiding you in your developments. Something which you can come back to and say: "this case still doesn't pass, I'm not finished yet". You may actually have implemented more code than just what was necessary to pass that test case but at least this primary use case should be ok.

    Ding! Test the difficult stuff

    Complex logic is definitely something you want to isolate and test early. If you don't unit test it thoroughly, it will usually haunt you in the most painful way: hidden inside all the rest. Besides, this complex logic usually means "value" for your customer, more than anything else, so it's worth cherishing it.

    Ding! Isolate the file system

    Actually, you should soon realize that any interaction with the file system is slowing you down.
    Being able to mock the file system, or to write to temporary files with automatic clean up is a huge time-saver for unit testing. In a "Commando" situation you would be prepared with this and have all sorts of functions and ready to use design to help with this. In my case, I had to write all lot of it,...

    Ding! Use the console

    The Read-Eval-Print loop. In my developments I found very useful to test interactively my regular expressions inside the Groovy console. Whether or not I turned those expressions to actual unit tests,... I didn't do it all the time I must say. One reason is that, once I had confidence that this regular expression was encapsulated in a higher-level concept, like "extractClassName", it would be right forever.

    Ding! Have lots of sample data

    If possible try to gather as much sample data as you can and run it through your program. Does it blow-up right away? Well, that's actually good, it may be the sign that you forgot a requirement. Instead of spending time trying to refine your requirements until they're exhaustive, or try to think about all possible cases, run the system with the maximum data available. There are 2 drawbacks to this: one is the time spent re-executing large datasets and analyzing failures, the other is not knowing what to look at! Your program may not break with enough smoke to prove you wrong.

    On my project I had lots of log files I could run my scripts against. And fortunately my scripts broke in very unambiguous ways, showing where I was wrong.

    To design or not to design

    So you're programming like hell, you don't have time to draw all those nifty UML diagrams or devise the best design patterns for the job. "We'll design when we'll have time."

    But is it really buying you time? I don't think so. I'm not saying that you should write diagrams or add tons of interface and clever indirections.

    But you need design in the sense that the problem and solution domains should be very, very clear to you.

    Ding! Time is money, concepts are time

    So it's worth taking a bit of time and think: "What's really this system like?". Clarifying the concepts is very, very important. For us, it was first of all giving proper names to our analysis concepts:
    • An entity can have measurable properties ("A trade server has requests")
    • For each property, you can define several quantities ("each request can take some average time")
    • For a given "Test environment", we can run several "Test campaigns"
    This helped a lot in our discussions, in structuring the scripts and planning our actions.

    The next step was to realize that the type of queries we wanted to execute to analyze logging data looked pretty much like SQL queries: "between 15:00 and 15:03. return all update queries with a time > 300ms and group them by workflow rules". This gave us a good hint that the next step for our scripts would be to put everything in a database!

    Coding

    Consider this:

    "I know where I'm going, I have a clear mind about my system and it's concepts, I know how to write minimal and effective tests, is there anything more I could do now to speed up my coding?"

    Let's reformulate the sentence above:

    "I know I want a gold medal at the Olympics, I know I'll get it by being a table tennis champion and I know the rules of the game, I have a very good racket, is there anything more I could do now to be the best at table tennis?"

    Ding! PPP: Practice, Practice, Practice!!!

    That's so obvious I'm ashamed to have to write it,... but that makes a real difference for "Commando programming":
    • technique: practice your regular-expression-fu for example, or know how to write a parser for a simple language (I planned to do that for the verbose gc logs but we didn't have time nor interest to rewrite this part)
    • libraries: no time should be spent looking at the API docs
    • tools: if you need remote debugging, it should be fast to setup, because you did it thousands of times before
    • computer science: you should have at least a good sense of the complexity of your algorithms
    • integration: how to call Excel from Groovy to create graphics was one of our coding issues
    • system / network tools: that wasn't too necessary on our project, but I can imagine this making a real difference in other settings
    Cooling down for a minute

    Two things of equal importance I've also noticed during that week. First of all, when I tried to go fast I couldn't help but noticing entropy. Yes my code was doing the job, but at the same time it wasn't always consistent, documented, properly named or refactored.

    Ding! Cool down and document/review your code

    This really can look like a pure loss of time. But I really observed that it wasn't. For 2 reasons:
    • Actually it doesn't take that long to review/document/refactor the code (try it at home)! Except maybe for the naming part, it's a bit like playing sudoku, you just try to make things fall into place
    • I observed that I was much less dragged down implementing new feature on clear-cut code with obvious intentions, I had much less mental noise. You know, like: "This computeTotal function is actually also sorting the results, so I know I don't have to do it there". If computing and sorting are better separated or named, that can reduce the "mental tax" while reasoning about something else.
    The second thing is the equivalent of "cooling down the code",... applied to your body!

    Ding! Cool down and take a break

    We're not machines. I was so enthusiastic with that project that I almost coded day and night (and I was away from my family,...). But I also observed one very interesting phenomenon: time was slowing down! Sometimes I was just starring at my screen without realizing that 5 minutes had passed without doing nothing substantial. That should be a clear signal for taking a break. This gives me an interesting idea: "stopwatch" programming. Prepare the list of tasks you want to program and program features by 5, 7 or 10 minutes cycles. And watch out when you have almost empty cycles.

    The doctor says: fit for service

    Conclusion for the project: it was very successful (this doesn't happen all the time so I'm glad to report it). We were able to analyze and spot 3 major bottlenecks and give good recommendations so that our maximum saving time was eventually around 400 ms. The tools served us, but actually ours brains served us more.

    But what's the biggest room in the world? The room for improvement ;-) ! I think we'll be able to transfer more of that wisdom to smarter analysis scripts in the future.

    Back to the original question
    When you program for a trader you don't have time to write tests
    True? False? At the light of my experience of this project, the first thing I'm really convinced of is the first part of the sentence: "You don't have time"!

    And if you don't have time, you can essentially do 2 things:
    • Observe and experiment: whatever makes you spare time is good. In my case, writing tests is a proven way to go faster for example
    • Practice, learn, invest: new tools, new languages, new libraries, contests, coding dojos,...
    Ok. Thanks for listening, I leave you there, I have to go,... and practice my "Commando-fu"!

    Labels: ,

    27 July 2008 

    My first ICFP contest, Part 2: parser combinators are so cool

    This is the sequel to my last post on participating to the ICFP contest 2008.

    The Setup

    As Curt wrote it, we entered the contest doing things that should have been done well before:
    • setting up the wifi access so that I can connect with my laptop (it never worked so Bryan had to get out and buy new network cables)
    • exchanging keys to access the Subversion repository
    • learning how to use the build and test system
    I guess that we need to make some mistakes (at least) once oneself before getting it!

    However, the biggest lesson for me in this contest, was the use of Parser Combinators.

    Do you speak Martian?

    The second programming task in the contest, after setting up the network connection to be able to receive messages and send back commands, was to parse the input messages giving the position of the Martian rover, and its immediate environment.

    As the task specifies, there are 5 kinds of messages:
    • initialization messages -> what's the map like? what are you technical constants?
    • telemetry stream -> where are you, what's your speed, what's around?
    • adverse event -> crashed in a crater!
    • success -> yes, home!
    • end of run -> time and status
    The format of those messages is not particularly complex but the telemetry message can contain an unlimited number of obstacles descriptions for example.

    For many developers, the most obvious way to transforms meaningless strings to meaningful objects like Telemetry or Martian, would be to start traversing the string, check for characters like 'T' for Telemetry, or 'I' for Initialization then pass the rest of the string to another function and continue slicing the string until all data is analyzed.

    Parser combinators

    But Haskell developers know better,... They can just describe the grammar of the messages they're parsing by combining parsers, then applying the resulting parser to the string input.

    Something like that:

    telemetry :: Parser Message
    telemetry = do ts <- int
    state <- accelStateParser
    turning <- turningStateParser
    spaces
    x <- double
    y <- double
    dir <- double
    speed <- double
    obstacles <- obstaclesParser
    messageEnd
    return $ Telemetry ts state turning x y dir speed obstacles

    accelStateParser :: Parser AccelState
    accelStateParser = (accelerating <|> braking <|> rolling)

    accelerating = stateParser 'a' Accelerating
    braking = stateParser 'b' Braking
    rolling = stateParser '-' Rolling

    stateParser :: Char -> a -> Parser a
    stateParser c cons = do _ <- char c
    return cons

    turningStateParser :: Parser TurningState
    turningStateParser = hardleft <|> softleft
    <|> hardright <|> softright
    <|> straight

    hardleft = stateParser 'L' HardLeft
    softleft = stateParser 'l' SoftLeft
    hardright = stateParser 'R' HardRight
    softright = stateParser 'r' SoftRight
    straight = stateParser '-' Straight
    obstaclesParser :: Parser [Object]
    obstaclesParser = do l <- many obstacleParser
    return l

    obstacleParser :: Parser Object
    obstacleParser = do code <- oneOf "bchm"
    spaces
    case code of
    'b' -> object Boulder
    'c' -> object Crater
    'h' -> object Home
    'm' -> martian
    _ -> error "Can't get here in parser!"

    object :: (Circle -> a) -> Parser a
    object cons = do x <- double
    y <- double
    r <- double
    spaces
    return $ cons (Circle (x:.y) r)

    martian :: Parser Object
    martian = do x <- double
    y <- double
    dir <- double
    speed <- double
    spaces
    return $ Martian (Circle (x:.y) 0.4) dir speed

    In the above code, you can read that the parser for telemetry messages is a sequence of different parsers: a timestamp parser, a acceleration state parser, a turning state parser, ..., a parser for lists of obstacles. And a parser for list of obstacles is just "many" parsers for one obstacle. This is indeed as easy as being able to describe a grammar for that mini-language.

    "k" should better be 1

    But I didn't say that describing a grammar was easy! For instance, at first, we made a mistake in the way we managed spaces. We had almost each parser trying to consume spaces before doing its job. This generated unnecessary ambiguity in the parsing.

    For example, let's say that you want to be able to parse this kind of message:

    " O T O O T"

    (honestly, I don't know why you would like to do something like that but you're the boss here)

    You'd better combine those 3 parsers:
    • spaces -> will consume any number of spaces

    • oneParser = char 'O' >>= spaces -> will consume 'O' followed by any number of spaces

    • twoParser = char 'T' >>= spaces -> will consume 'T' followed by any number of spaces

    If you define parsers differently, having "oneParser" and "twoParser" consuming spaces before they parse their character, which parser should be used on the first space of the message?

    Actually, parser combinators in Haskell are also capable of backtracking and "unparsing" results if the chosen parser was not appropriate (trying things with the the "try" parser combinator). But why add more complexity when things can be described more directly?

    Of course, there is a formal theory behind the example above: LL(k) grammars, where LL(1) is a type of grammar only needing to look at the next character to know which parser to use. But there's nothing like a simple "real-life" example to be able to grasp it.

    Compare and contrast

    Now, honestly, most other contest solutions were not using Parser combinators and the aren't soooo ugly (in Dylan for example). I think they're just less readable and more error-prone because of index manipulation.

    For fun and comparison, I also developed a similar parser in Scala:

    def telemetryMessage = for { timestamp <- intNumber
    accState <- accelState
    turnState <- turningState
    vehicleX <- doubleNumber
    vehicleY <- doubleNumber
    vehicleDir <- doubleNumber
    vehicleSpeed <- doubleNumber
    obstacles <- objects }
    yield Telemetry(timestamp, accState(), turnState(),
    vehicleX, vehicleY, vehicleDir,
    vehicleSpeed, obstacles)

    def accelState = (
    "a" ^^^ Accelerating
    | "b" ^^^ Braking
    | "-" ^^^ Rolling
    )

    def turningState = (
    "l" ^^^ SoftLeft
    | "L" ^^^ HardLeft
    | "-" ^^^ Straight
    | "r" ^^^ SoftRight
    | "R" ^^^ HardRight
    )

    def objects = rep(objectParser)
    def objectParser: Parser[Object] = (
    "b" ~ spaces ~> circleParser ^^ (new Boulder(_))
    | "c" ~ spaces ~> circleParser ^^ (new Crater(_))
    | "h" ~ spaces ~> circleParser ^^ (new Crater(_))
    | "m" ~ spaces ~> martianParser
    )

    def circleParser = for { x <- doubleNumber
    y <- doubleNumber
    r <- doubleNumber }
    yield new Circle(x, y, r)

    def martianParser = for { circle <- circleParser
    dir <- doubleNumber
    speed <- doubleNumber }
    yield new Martian(circle, dir, speed)


    The interesting things in the Scala implementation, although it could seem a bit ugly at first sight, are:

    • the use of a sequence operator "~>" getting rid of what has already been parsed. Indeed, when we have parsed "b " we know we need to create a Boulder object. The 'b' character and the space are then not usefull anymore

    • the use of functions to transform the parsed string into a meaningful object, like "l" ^^^ SoftLeft, creating a new SoftLeft object (it's a case class) when parsing "l"

    • implicit definitions. In the example above I don't need to declare my parser for 'l' as letter("l"). Combining "l" with another parser automatically creates a parser for the character 'l'

    Why XML?

    One of the reasons for the ubiquitous adoption (and hate!) of XML is, I think, the general availability of parsers and binding libraries. This is why this often becomes the easiest option for configuration files (is it changing with YAML and JSON?).

    It doesn't have to be that way. I hope that parser combinators can now bring another legitimate option when considering the implementation of an external mini-language for configuration or protocol. They're simple to use and readily available on your Java platform through Scala. Try them at home!

    Labels: ,

    08 July 2008 

    My first ICFP contest, Part 1: ten days to learn Haskell

    I wanted to do that since I read Tom's Moertel post on his participation to the ICFP contest 2001. I Told my wife: "One day, I'll participate to the ICFP contest" (head slightly tilted up with a brilliant and determined look in the eyes).

    One day, I'll participate to the ICFP contest

    Fortunately, that day came quick as I met, last month, other geeks in Tokyo ready to get into the race. I should say über-geeks when I realize the gaps I have in my knowledge of some fields of computing compared to them. System and network programming being only the emerged part of the iceberg.

    I met them for the first time during the monthly meeting of the Tokyo Society for the Application of Currying (try to explain that to your grandma). That night we decided that we would form a team for the next ICFP contest. Then, we spent the next week discussing the team name, that was the difficult decision. The easy one was the language we would use. What is THE language of choice for functional hackers? Haskell of course! [flaming starts here]

    But how do you learn Haskell in 10 days and hope to be able to contribute at least a little?

    Learn Haskell in 10 days!

    Actually, I've reading about Haskell for a while now, starting with the very enjoyable A History of Haskell: Being LazyWith Class. From there and working with Scala for the past year, I had understood the following:
    • the "functional" approach vs the "sequential" approach
    • immutable data structures
    • functions as first-class objects
    • lazy evaluation
    • thinking in types
    • thinking in higher-ranked types
    • Options / Maybe
    • List operations: map, folds,...
    • pattern matching
    • type classes / traits
    This makes an impressive list of language features but I must say that Scala is a really "transitional language" for Java programmers towards Haskell. Riiiight. I "understand" it but how well can I program within 10 days?

    The laundry list

    Using a programming language always involves a wealth of down-to-earth considerations worth getting right as fast as possible. Here is my list and how I organized my environment:
    1. the compiler: I used GHC and the Hugs interpreter for Windows
    2. the editor: I used the Functional plugin for eclipse, providing editing, compilation, creation of an executable
    3. the librairies: I downloaded / compiled and installed QuickCheck2 and otherwise used the bundled libraries with GHC
    4. code coverage: hcp (not really indispensable for a beginning but a nice to have for a non-strict language. I was happy to eventually get it to work)
    This is clearly non-optimal yet (and I wonder if it can be on Windows?) and I'll certainly explore:
    1. a better editor: considering what can be done in Java, it's pretty amazing that there is no high-class editor for Haskell. I'll try Yi of course, but it's still in its infancy. "Rename function" or "Rename data type" should be pretty accessible (provided enough open-source or company support, that is,...)
    2. a build/test/continuous integration system. The guys from Starling Software have their own system for that matter (QAM) and I haven't seen yet such a standard thing for Haskell.
    3. more libraries, libraries management: I still haven't done it on windows but I plan to install Cabal and download the whole Hackage.
    4. Switching to linux forever. I'll certainly cover this in part 2 but Windows seems less and less a decent alternative for hackers (I didn't say developers ;-) ).
    Yes, all set-up! I can start my first tutorial!

    No, I'd rather do a code dojo first

    Why bother using a tutorial when I have a nice little programming exercise at hand? I did it in Java, I did it in Ruby, I did it in Scala, it was time to use Haskell.

    Basically the question is: write a program to transform Roman number back and forth to Integers: unromanize("MCMLXXII") == 1972.

    Here is my solution with Haskell:

    -- SOME VALUES DECLARATION
    -- definition of the roman letters with their numerical value
    romansList = [("I", 1), ("IV",4), ("V", 5), ("IX", 9), ("X", 10), ("XL", 40),
    ("L", 50), ("XC", 100), ("C", 100), ("CD", 400), ("D", 500),
    ("CM", 900), ("M", 1000)]
    -- numerical values only
    values = map snd romansList
    -- letters only
    letters = map fst romansList

    -- return the value corresponding to a letter
    value :: String -> Int
    value c = findWithDefault 1 c (fromList romansList)

    -- return the letter corresponding to a value
    letter :: Int -> String
    letter i = findWithDefault "I" i reversed
    where reversed = fromList(map switch romansList)
    switch = (\(x, y) -> (y, x))

    -- to get the value of a RomanString, start
    -- from the end and accumulate the value of each letter
    -- substract the value of the current letter if its value is
    -- inferior to the previous one
    -- ex: "XIV" -> 5 (for "V") - 1 (for "I") + 10 (for "X")
    unromanize :: String -> Int
    unromanize s = fst(foldr add (0, 0) s)

    add :: Char -> (Int, Int) -> (Int, Int)
    add c (total, lastElem) =
    if (lastElem <= val) then (total + val, val)
    else (total - val, val)
    where val = value [c]
    -- [c] is the Char 'c' as a String

    -- return the RomanString corresponding to an Int value
    -- start with an empty string and the value to convert
    -- use the romanizeString function to compute the RomanString

    romanize :: Int -> String
    romanize i = fst $ romanizeString ("", i)

    -- if the remainder is 0, there is no more conversion to do
    -- otherwise, try to substract the biggest possible RomanChar from the remainder
    -- this RomanChar is then appended to the current result
    romanizeString :: (String, Int) -> (String, Int)
    romanizeString result@(res, remain) =
    if (remain == 0) then result
    else romanizeString(res ++ (letter maxi), remain - maxi)
    where maxi = maxValue remain
    maxValue r = last (takeWhile (<= r) values)


    And here is a simple QuickCheck properties that I wrote to check the conversion:
    prop_unromanize_romanize :: Int -> Property
    prop_unromanize_romanize x = x >= 0 ==> unromanize(romanize x) == x

    First impressions

    Working on that simple example (but also on string generators to check the reverse property romanize(unromanize) -- much more tricky) led me to the following observations:
    • programming in Haskell really "feels" pure and elegant. There is an economy of symbols which is quite remarkable for such an expressivity
    • not having variables feels like programming in a straight jacket when coming from an imperative background! But the feeling disappears quickly
    • the "functional/data" orientation makes a whole different mental model when you come from an object-oriented perspective. It looks like your objects, encapsulating their data and showing only their business logic have been turned inside out and methods carefully separated from data.
    • browsing the libraries feels like there a huge drive towards abstraction in Haskell. Nowhere else I've seen concepts decomposed so that Monoids would emerge and be reused
    • new language, new librairies. It really takes time but Haskell libraries are full of jewels which are really worth knowing inside out (the same can certainly be said from many languages)
    • although the Prelude contains a lot of useful functions, some facilities you may find in scripting languages are missing. We recently discussed the lack of a "split" operation for Strings on the Haskell mailing-list recently. I know that different people have different semantics for a "split" operation but not including any kind in the standard library feels like it's missing something
    • the compiler is your friend! It goes at great length explaining you why and where you failed to properly type your program
    Go, go, go

    After those 2 or 3 sessions of installation and programming I was very far from being able to code on my own for the contest, yet I would be able to pair-up and provide a vigilant pair of eyes on someone else code. Stay tuned for part 2: the contest!

    Labels: ,

    About me

    • I'm Eric
    • I love software, from day-to-day coding, to architecture, consulting, functional analysis, marketing,...
    • And I especially like the challenge of doing that with great people around me, challenging each other to find the best organisation and techniques to boost the project
    • Some years ago I also got addicted with green bars,...
    • My email: etorreborre@yahoo.com
    Powered by Blogger
    and Blogger Templates