Another blog post to show that the esoteric type system tricks you read about Haskell or Scala actually have real uses.
Groovy vs Scala for log analysis
These days I'm implementing a non-critical application which deals with:
- importing performance log files
- parsing them and storing some structured data corresponding to each line
- creating some graphs to show the maximum execution times, the cumulated maximum execution times, the events inside a given time range, and so on,...
This, in itself, is a very interesting exercise for me since I had coded a similar application more than 3 years ago in Groovy. After deciding that the Groovy implementation was slow and cumbersome, I decided to give it a go with Scala (and MongoDB for the backend :-)).
I've been really amazed to see that many of the things I learnt for free, on my spare time, were applicable in the case of that application, to yield much better code. This post shows one of these techniques: "Unboxed Tagged Types".
Unboxed what?
Two months ago, I saw this enigmatic tweet by @milessabin. I followed the link, read the code and thought: "oh nice, I see, at least Miles is having fun playing with Scala's type system". But I wasn't really able to see what that thing could be used for.
Then, this week, developing my log analysis application, I became midly annoyed about one specific messy point.
There is time and,... time
The log records I'm getting from the log files are all timestamped with a number of millis which are what Java's Date.getTime
returns when you ask for it. That is to say, the number of milliseconds, as a Long
, elapsed from January, 1st, 1970, 00:00:00.000 GMT
(the so-called EPOCH
time by people thinking that the world started in the seventies).
Not very user friendly. Usually you would like to display that as readable date, using the java.text.SimpleDateFormat
class for example. So in Scala, you are very tempted to write code like that:
val hhmmFormat = new SimpleDateFormat("hh:mm")
implicit def toTimeDisplay(t: Long) = new TimeDisplay(t)
case class TimeDisplay(time: Long) {
def hhmm = hhmmFormat.format(new Date(time))
}
> val startTime: Long = 12398234093458L
> startTime.hhmm
res0: java.lang.String = 12:54
Then you move on, there's so much to do. In particular I wanted to be able to specify a time range to select exactly the events occuring during that time. The most straightforward way to do that is to give a "start time" and an "end time":
/**
* DaytimeRange(0, 2*60*60*1000) is 00:00 -> 02:00
*/
case class DaytimeRange(start: Long, end: Long)
Here starts the ugliness. When I want to check if a startTime
given by the log file is included in the DaytimeRange
I have to do a conversion to make sure I'm using the proper Long
s: the number of milliseconds since the start of the day, not the milliseconds since the EPOCH
time!
Similarly, if I blindly try to reuse the hhmm
method defined above, I need to make sure I apply that to a number of milliseconds corresponding to an EPOCH
time and not just since the beginning of the day.
That's the recipe for disaster,...
Twitter forever
Fortunately the answer was right there, in my Twitter timeline (well in my memory of the timeline to be more precise :-)): use "Unboxed newtypes".
It all fits in a few lines of code but makes everything incredibly clear. First we define "Tagged types":
type Tagged[U] = { type Tag = U }
type @@[T, U] = T with Tagged[U]
Then we declare that there are 2 different types of time:
trait Day
trait Epoch
And we declare that a given Long
will either represent the number of millis since 1970 or since the beginning of the day:
type Epochtime = Long @@ Epoch
type Daytime = Long @@ Day
Daytime
simply means that we have a Long
value, with an additional Day
type.
Finally, we provide 2 functions to create instances of those types from Long
s:
def daytime(i: java.lang.Long): Daytime = i.asInstanceOf[Daytime]
def epochtime(i: java.lang.Long): Epochtime = i.asInstanceOf[Epochtime]
with a method which explicitly converts EPOCH
millis to "day" millis:
def epochtimeToDaytime(time: Long): Daytime = {
val calendar = Calendar.getInstance
calendar.setTime(new Date(time))
daytime(((calendar.get(HOUR_OF_DAY)* 60 +
calendar.get(MINUTE)) * 60 +
calendar.get(SECOND)) * 1000 +
calendar.get(MILLISECOND))
}
Using the new toys
We can use the Daytime
type for our DaytimeRange
class:
case class DaytimeRange(start: Daytime, end: Daytime)
There's no risk that we now accidentally create a DaytimeRange
instance with Longs which do not represent elapsed millis since the beginning of the day. The compiler reminds us to write code like:
/** @return the number of millis from a string representing hours and minutes */
def hhmmToDaytime(s: String): Daytime = ...
DaytimeRange(hhmmToDaytime("10:00"), hhmmToDaytime("10:20"))
And if we want to create a DaytimeRange
instance from 2 startTime
s found in the log file:
DaytimeRange(epochtimeToDaytime(s1), epochtimeToDaytime(s2))
Similarly, we can use the Epochtime
for the hhmm
display
implicit def toEpochtimeDisplay(t: Epochtime) = new EpochtimeDisplay(t)
case class EpochtimeDisplay(time: Epochtime) {
// here new Date expects a Long, but this is ok because Epochtime *is* a Long
def hhmm = hhmmFormat.format(new Date(time))
}
We can safely reuse this code to display a DaytimeRange
instance:
case class DaytimeRange(start: Daytime, end: Daytime) {
// the developer *has* to think about which kind of time he's handling
def show = daytimeToEpochtime(start).hhmm + " -> " + daytimeToEpochtime(end).hhmm
}
Final comments
- It's practical
This technique is very pratical because it avoids making silly mistakes with Longs representing different concepts while still keeping the ability to use them as Long objects without having to "Unbox" them. Indeed we could also have created a case class like:
case class Daytime(time: Long)
But then we would have had to "unbox" the time
value everytime we wanted to do an addition or a comparison.
- WTF?
I had a compiler puzzler when with my first implementation:
case class DaytimeRange(start: Daytime, end: Daytime)
^
found : Double
required: AnyRef
Note: an implicit exists from scala.Double => java.lang.Double, but methods inherited from Object are
rendered ambiguous. This is to avoid a blanket implicit which would convert any scala.Double to any AnyRef.
You may wish to use a type ascription: `x: java.lang.Double`.
Go figure what that means in this context,... After much head scratching, I found a workaround:
type Epochtime = java.lang.Long @@ Epoch
type Daytime = java.lang.Long @@ Day
def daytime(i: java.lang.Long): Daytime = i.asInstanceOf[Daytime]
def epochtime(i: java.lang.Long): Epochtime = i.asInstanceOf[Epochtime]
I used java.lang.Long
instead of scala.Long
because it looks like we need to get AnyRef
objects while scala.Long
is only AnyVal
. But the compiler message is still very obscure in that case.
- This is not a unit system!
Because Epochtime
and Daytime
are still Longs
, it is still possible to add them and make a mess!
- Kudos to @retronym too
You'll see that Unboxed Tagged Types are also part of the next scalaz7. Jason came up with the @@
type alias and is using the tag types to distinguish "multiplicative" monoids from "additive" monoids. Or conjonctive vs disjonctive. This means that given a Monoid[Boolean]
, we can specify if it does an AND
or if it does an OR
. Scalaz is becoming the ultimate toolbox,...