Pages

11 November 2011

Practical uses for Unboxed Tagged Types

Another blog post to show that the esoteric type system tricks you read about Haskell or Scala actually have real uses.

Groovy vs Scala for log analysis

These days I'm implementing a non-critical application which deals with:

  1. importing performance log files
  2. parsing them and storing some structured data corresponding to each line
  3. creating some graphs to show the maximum execution times, the cumulated maximum execution times, the events inside a given time range, and so on,...

This, in itself, is a very interesting exercise for me since I had coded a similar application more than 3 years ago in Groovy. After deciding that the Groovy implementation was slow and cumbersome, I decided to give it a go with Scala (and MongoDB for the backend :-)).

I've been really amazed to see that many of the things I learnt for free, on my spare time, were applicable in the case of that application, to yield much better code. This post shows one of these techniques: "Unboxed Tagged Types".

Unboxed what?

Two months ago, I saw this enigmatic tweet by @milessabin. I followed the link, read the code and thought: "oh nice, I see, at least Miles is having fun playing with Scala's type system". But I wasn't really able to see what that thing could be used for.

Then, this week, developing my log analysis application, I became midly annoyed about one specific messy point.

There is time and,... time

The log records I'm getting from the log files are all timestamped with a number of millis which are what Java's Date.getTime returns when you ask for it. That is to say, the number of milliseconds, as a Long, elapsed from January, 1st, 1970, 00:00:00.000 GMT (the so-called EPOCH time by people thinking that the world started in the seventies).

Not very user friendly. Usually you would like to display that as readable date, using the java.text.SimpleDateFormat class for example. So in Scala, you are very tempted to write code like that:

  val hhmmFormat = new SimpleDateFormat("hh:mm")

  implicit def toTimeDisplay(t: Long) = new TimeDisplay(t)

  case class TimeDisplay(time: Long) {
    def hhmm = hhmmFormat.format(new Date(time))
  }

  > val startTime: Long = 12398234093458L
  > startTime.hhmm
  res0: java.lang.String = 12:54

Then you move on, there's so much to do. In particular I wanted to be able to specify a time range to select exactly the events occuring during that time. The most straightforward way to do that is to give a "start time" and an "end time":

   /**
    * DaytimeRange(0, 2*60*60*1000) is 00:00 -> 02:00
    */
   case class DaytimeRange(start: Long, end: Long)

Here starts the ugliness. When I want to check if a startTime given by the log file is included in the DaytimeRange I have to do a conversion to make sure I'm using the proper Longs: the number of milliseconds since the start of the day, not the milliseconds since the EPOCH time!

Similarly, if I blindly try to reuse the hhmm method defined above, I need to make sure I apply that to a number of milliseconds corresponding to an EPOCH time and not just since the beginning of the day.

That's the recipe for disaster,...

Twitter forever

Fortunately the answer was right there, in my Twitter timeline (well in my memory of the timeline to be more precise :-)): use "Unboxed newtypes".

It all fits in a few lines of code but makes everything incredibly clear. First we define "Tagged types":

  type Tagged[U] = { type Tag = U }
  type @@[T, U] = T with Tagged[U]

Then we declare that there are 2 different types of time:

  trait Day
  trait Epoch

And we declare that a given Long will either represent the number of millis since 1970 or since the beginning of the day:

  type Epochtime = Long @@ Epoch
  type Daytime   = Long @@ Day

Daytime simply means that we have a Long value, with an additional Day type.

Finally, we provide 2 functions to create instances of those types from Longs:

  def daytime(i: java.lang.Long): Daytime     = i.asInstanceOf[Daytime]
  def epochtime(i: java.lang.Long): Epochtime = i.asInstanceOf[Epochtime]

with a method which explicitly converts EPOCH millis to "day" millis:

  def epochtimeToDaytime(time: Long): Daytime = {
    val calendar = Calendar.getInstance
    calendar.setTime(new Date(time))
    daytime(((calendar.get(HOUR_OF_DAY)* 60 +
              calendar.get(MINUTE))    * 60 +
              calendar.get(SECOND))    * 1000 +
              calendar.get(MILLISECOND))
  }

Using the new toys

We can use the Daytime type for our DaytimeRange class:

  case class DaytimeRange(start: Daytime, end: Daytime)

There's no risk that we now accidentally create a DaytimeRange instance with Longs which do not represent elapsed millis since the beginning of the day. The compiler reminds us to write code like:

  /** @return the number of millis from a string representing hours and minutes */
  def hhmmToDaytime(s: String): Daytime = ...

  DaytimeRange(hhmmToDaytime("10:00"), hhmmToDaytime("10:20"))

And if we want to create a DaytimeRange instance from 2 startTimes found in the log file:

  DaytimeRange(epochtimeToDaytime(s1), epochtimeToDaytime(s2))

Similarly, we can use the Epochtime for the hhmm display

  implicit def toEpochtimeDisplay(t: Epochtime) = new EpochtimeDisplay(t)

  case class EpochtimeDisplay(time: Epochtime) {
    // here new Date expects a Long, but this is ok because Epochtime *is* a Long
    def hhmm = hhmmFormat.format(new Date(time))
  }

We can safely reuse this code to display a DaytimeRange instance:

  case class DaytimeRange(start: Daytime, end: Daytime) {

    // the developer *has* to think about which kind of time he's handling
    def show = daytimeToEpochtime(start).hhmm + " -> " + daytimeToEpochtime(end).hhmm

  }

Final comments

  • It's practical

This technique is very pratical because it avoids making silly mistakes with Longs representing different concepts while still keeping the ability to use them as Long objects without having to "Unbox" them. Indeed we could also have created a case class like:

  case class Daytime(time: Long)

But then we would have had to "unbox" the time value everytime we wanted to do an addition or a comparison.

  • WTF?

I had a compiler puzzler when with my first implementation:

  case class DaytimeRange(start: Daytime, end: Daytime)
             ^
  found   : Double
  required: AnyRef

  Note: an implicit exists from scala.Double => java.lang.Double, but methods inherited from Object are
  rendered ambiguous.  This is to avoid a blanket implicit which would convert any scala.Double to any AnyRef.
  You may wish to use a type ascription: `x: java.lang.Double`.

Go figure what that means in this context,... After much head scratching, I found a workaround:

  type Epochtime = java.lang.Long @@ Epoch
  type Daytime   = java.lang.Long @@ Day

  def daytime(i: java.lang.Long): Daytime     = i.asInstanceOf[Daytime]
  def epochtime(i: java.lang.Long): Epochtime = i.asInstanceOf[Epochtime]

I used java.lang.Long instead of scala.Long because it looks like we need to get AnyRef objects while scala.Long is only AnyVal. But the compiler message is still very obscure in that case.

  • This is not a unit system!

Because Epochtime and Daytime are still Longs, it is still possible to add them and make a mess!

  • Kudos to @retronym too

You'll see that Unboxed Tagged Types are also part of the next scalaz7. Jason came up with the @@ type alias and is using the tag types to distinguish "multiplicative" monoids from "additive" monoids. Or conjonctive vs disjonctive. This means that given a Monoid[Boolean], we can specify if it does an AND or if it does an OR. Scalaz is becoming the ultimate toolbox,...

14 comments:

gseitz said...

I quite like to hide the nasty cast to the tagged type.

So instead of:
def daytime(i: java.lang.Long): Daytime = i.asInstanceOf[Daytime]
you get:
def daytime(i: java.lang.Long): Daytime = Tag(i)

See https://github.com/retronym/scalaz7-experimental/blob/master/core/src/main/scala/scalaz/Tag.scala#L5

Richard Wallace said...

This is very nice. I've been working on a rewrite of an internal system in Scala and many of the "entities" have Ids. I've created type aliases so it's somewhat clear which ids are being use where, but I really wanted to make it impossible to screw that up. This will let me do that pretty easily. Thanks!

Richard Wallace said...

Damn. Spoke too soon. Just tried converting some code over and for some reason it can't be used in case classes. The scalac 2.9.1 tells me

[error] found : Double
[error] required: AnyRef
[error] Note: an implicit exists from scala.Double => java.lang.Double, but
[error] methods inherited from Object are rendered ambiguous. This is to avoid
[error] a blanket implicit which would convert any scala.Double to any AnyRef.
[error] You may wish to use a type ascription: `x: java.lang.Double`.

But there isn't a Double in site in that case class.

Jeff Olson said...

scala> val epoch = 0L.asInstanceOf[Epochtime]
epoch: Epochtime = 0


scala> List(epoch)
java.lang.ClassCastException: [J cannot be cast to [Ljava.lang.Object;
at ...

For more fun see this thread.

Thomas Lockney said...

One thing that threw me is that I kept looking at 'type @@[T, U] = T with Tagged[U]' and thinking there must be some kind of magic going on here that I hadn't seen before. I actually spent a while refreshing my memory via the Scala Spec to make sure I wasn't missing something. I finally just opened up a Scala prompt and tried it out. Clearly I still have a lot to learn. Thanks for pointing this out and helping me along the way!

afwlehmann@googlemail.com said...

Interesting read, thank you very much indeed.

Ittay Dror said...

Why is Tagged required?

If the U type was a trait with methods, then this would avoid collisions, but as long as "tag" traits are used, what is wrong with T with U?

And I don't think this avoids boxing. When calling daytime or epochtime, any value is boxed in java.lang.Long

Eric said...

@Ittay In Miles original gist, he gives an example using a class as the tag type: https://gist.github.com/89c9b47a91017973a35f. The Tagger trait provides a container for that type in that case.

You're right about the boxing in my example because I'm not really using primitive types. I would make 2 comments about that:

- I had to use Java Longs because using scala's "AnyVal" Longs was creating the compiler error I mention at the end of the post. I created an issue for this: https://issues.scala-lang.org/browse/SI-5183. I also haven't tried to see if @specialized annotations would work

- you can still consider that the Longs are "unboxed" in the sense that you can still use all the Long operations on them. This would not be the case if I enclosed them in different classes to distinguish them

Turp1Twin said...

Is it possible for you to publish a gist with a complete example of this? Thanks much!

Eric said...

Here a gist with my code: https://gist.github.com/1371518

Eric said...

btw @gseitz and @jeff thanks for your comments. Jeff, I didn't know the extent of the issue of tagging AnyVal types in Scala. Your post on the mailing-list is very informative!

Daniel said...

But if you make your type java.lang.Long, then it is boxed!

Eric said...

I know :-). But it is still "unboxed" in the sense that I don't have to create a special wrapper class for it (first point of my final comments).

Vlad Patryshev said...

The right stuff. Thanks! Will go use it in our code... probably everywhere. Time units, tagging "String ids" or "long ids", I wonder what else.