A++ [Eric Torreborre's Blog]: 06 November 2011

Another blog post to show that the esoteric type system tricks you read about Haskell or Scala actually have real uses.

Groovy vs Scala for log analysis

These days I'm implementing a non-critical application which deals with:

importing performance log files
parsing them and storing some structured data corresponding to each line
creating some graphs to show the maximum execution times, the cumulated maximum execution times, the events inside a given time range, and so on,...

This, in itself, is a very interesting exercise for me since I had coded a similar application more than 3 years ago in Groovy. After deciding that the Groovy implementation was slow and cumbersome, I decided to give it a go with Scala (and MongoDB for the backend :-)).

I've been really amazed to see that many of the things I learnt for free, on my spare time, were applicable in the case of that application, to yield much better code. This post shows one of these techniques: "Unboxed Tagged Types".

Unboxed what?

Two months ago, I saw this enigmatic tweet by @milessabin. I followed the link, read the code and thought: "oh nice, I see, at least Miles is having fun playing with Scala's type system". But I wasn't really able to see what that thing could be used for.

Then, this week, developing my log analysis application, I became midly annoyed about one specific messy point.

There is time and,... time

The log records I'm getting from the log files are all timestamped with a number of millis which are what Java's Date.getTime returns when you ask for it. That is to say, the number of milliseconds, as a Long, elapsed from January, 1st, 1970, 00:00:00.000 GMT (the so-called EPOCH time by people thinking that the world started in the seventies).

Not very user friendly. Usually you would like to display that as readable date, using the java.text.SimpleDateFormat class for example. So in Scala, you are very tempted to write code like that:

  val hhmmFormat = new SimpleDateFormat("hh:mm")

  implicit def toTimeDisplay(t: Long) = new TimeDisplay(t)

  case class TimeDisplay(time: Long) {
    def hhmm = hhmmFormat.format(new Date(time))
  }

  > val startTime: Long = 12398234093458L
  > startTime.hhmm
  res0: java.lang.String = 12:54

Then you move on, there's so much to do. In particular I wanted to be able to specify a time range to select exactly the events occuring during that time. The most straightforward way to do that is to give a "start time" and an "end time":

   /**
    * DaytimeRange(0, 2*60*60*1000) is 00:00 -> 02:00
    */
   case class DaytimeRange(start: Long, end: Long)

Here starts the ugliness. When I want to check if a startTime given by the log file is included in the DaytimeRange I have to do a conversion to make sure I'm using the proper Longs: the number of milliseconds since the start of the day, not the milliseconds since the EPOCH time!

Similarly, if I blindly try to reuse the hhmm method defined above, I need to make sure I apply that to a number of milliseconds corresponding to an EPOCH time and not just since the beginning of the day.

That's the recipe for disaster,...

Twitter forever

Fortunately the answer was right there, in my Twitter timeline (well in my memory of the timeline to be more precise :-)): use "Unboxed newtypes".

It all fits in a few lines of code but makes everything incredibly clear. First we define "Tagged types":

  type Tagged[U] = { type Tag = U }
  type @@[T, U] = T with Tagged[U]

Then we declare that there are 2 different types of time:

  trait Day
  trait Epoch

And we declare that a given Long will either represent the number of millis since 1970 or since the beginning of the day:

  type Epochtime = Long @@ Epoch
  type Daytime   = Long @@ Day

Daytime simply means that we have a Long value, with an additional Day type.

Finally, we provide 2 functions to create instances of those types from Longs:

  def daytime(i: java.lang.Long): Daytime     = i.asInstanceOf[Daytime]
  def epochtime(i: java.lang.Long): Epochtime = i.asInstanceOf[Epochtime]

with a method which explicitly converts EPOCH millis to "day" millis:

  def epochtimeToDaytime(time: Long): Daytime = {
    val calendar = Calendar.getInstance
    calendar.setTime(new Date(time))
    daytime(((calendar.get(HOUR_OF_DAY)* 60 +
              calendar.get(MINUTE))    * 60 +
              calendar.get(SECOND))    * 1000 +
              calendar.get(MILLISECOND))
  }

Using the new toys

We can use the Daytime type for our DaytimeRange class:

  case class DaytimeRange(start: Daytime, end: Daytime)

There's no risk that we now accidentally create a DaytimeRange instance with Longs which do not represent elapsed millis since the beginning of the day. The compiler reminds us to write code like:

  /** @return the number of millis from a string representing hours and minutes */
  def hhmmToDaytime(s: String): Daytime = ...

  DaytimeRange(hhmmToDaytime("10:00"), hhmmToDaytime("10:20"))

And if we want to create a DaytimeRange instance from 2 startTimes found in the log file:

  DaytimeRange(epochtimeToDaytime(s1), epochtimeToDaytime(s2))

Similarly, we can use the Epochtime for the hhmm display

  implicit def toEpochtimeDisplay(t: Epochtime) = new EpochtimeDisplay(t)

  case class EpochtimeDisplay(time: Epochtime) {
    // here new Date expects a Long, but this is ok because Epochtime *is* a Long
    def hhmm = hhmmFormat.format(new Date(time))
  }

We can safely reuse this code to display a DaytimeRange instance:

  case class DaytimeRange(start: Daytime, end: Daytime) {

    // the developer *has* to think about which kind of time he's handling
    def show = daytimeToEpochtime(start).hhmm + " -> " + daytimeToEpochtime(end).hhmm

  }

Final comments

It's practical

This technique is very pratical because it avoids making silly mistakes with Longs representing different concepts while still keeping the ability to use them as Long objects without having to "Unbox" them. Indeed we could also have created a case class like:

  case class Daytime(time: Long)

But then we would have had to "unbox" the time value everytime we wanted to do an addition or a comparison.

WTF?

I had a compiler puzzler when with my first implementation:

  case class DaytimeRange(start: Daytime, end: Daytime)
             ^
  found   : Double
  required: AnyRef

  Note: an implicit exists from scala.Double => java.lang.Double, but methods inherited from Object are
  rendered ambiguous.  This is to avoid a blanket implicit which would convert any scala.Double to any AnyRef.
  You may wish to use a type ascription: `x: java.lang.Double`.

Go figure what that means in this context,... After much head scratching, I found a workaround:

  type Epochtime = java.lang.Long @@ Epoch
  type Daytime   = java.lang.Long @@ Day

  def daytime(i: java.lang.Long): Daytime     = i.asInstanceOf[Daytime]
  def epochtime(i: java.lang.Long): Epochtime = i.asInstanceOf[Epochtime]

I used java.lang.Long instead of scala.Long because it looks like we need to get AnyRef objects while scala.Long is only AnyVal. But the compiler message is still very obscure in that case.

This is not a unit system!

Because Epochtime and Daytime are still Longs, it is still possible to add them and make a mess!

Kudos to @retronym too

You'll see that Unboxed Tagged Types are also part of the next scalaz7. Jason came up with the @@ type alias and is using the tag types to distinguish "multiplicative" monoids from "additive" monoids. Or conjonctive vs disjonctive. This means that given a Monoid[Boolean], we can specify if it does an AND or if it does an OR. Scalaz is becoming the ultimate toolbox,...

A++ [Eric Torreborre's Blog]

Pages

11 November 2011

Practical uses for Unboxed Tagged Types