Pages

08 October 2007

Scala to heaven, second step: anatomy of a scala script

This objective of this post is to contrast some java and scala code aimed at accomplishing the same scripting task: cleaning-up my garbage system.

I want to be able to count old items in a database and archive them (through my system API). Of course, if anything goes wrong, I want to be able to restore them.

I'll show what is the approach I took using java then the corresponding scala-way of doing things.

The Java way: connect to the server

First things first: get a server connection. Typically, this is how I do it in java:

DataServerConnection connection = ConnectionUtil.connect("me", "my password", "my environment file");
// I do my stuff here

Of course, when my job is over, I have to close the connection:

connection.disconnect();

But in this situation, and in plenty others like writing to an OutputStream, I may just forget to close your resource. So, here's:

The Scala way: connect to the server

There is, among a gazillion things, a very useful feature in Scala: the possibility to have parameters of a method evaluated lazily. This means that the parameter you pass to a method will be evaluated only when the method body requires it, and not as the method is called, as it is the case in Java.

This way, I can write a better connect method:
def connect(user: String, password: String, env: String, actions: => Any) = {
val connection = ConnectionUtil.connect(user, password, env)
actions
connection.disconnect
}
And use it like that:

connect("me", "my password", "my env. file", actions())

The actions are only performed once the connection is open, and the connection is closed without having to think about it. And for even more readability, I can even use the following syntax:

def connect(user: String, password: String, env: String)(actions: => Any) = {
val connection = ConnectionUtil.connect(user, password, env)
actions
connection.disconnect
}
connect("me", "my password", "my env. file"){
actions
}

Nice Ruby/blocks feel, isn't it? [for a better implementation of this pattern with try/catch and all, please check the Loan Pattern].

The Java way: processing stuff

For this script, the overall process is the same:
  • get some "Market data items" for some "type" and "currency"
  • count/archive/restore the oldest ones
The usual way to do that in Java is to nest some for loops and do the job inside the most inner loop:
for (String type : types) {
for (String currency : currencies(type)) {
for (String name : getMarketDataItemNames(type, currency)) {
final int id = market().getMarketDataItemId(type, currency, name);
doAction(action, type, id);
}
}
}
This buries deep inside the "selection" logic, the "action" logic. One alternative would be to construct first the list of elements to process, then process them. But in my case, this would mean dragging a very big chunk of the database in memory.

The Scala way: processing stuff

Scala offers the possibility to cleanly separate the selection logic from the action logic:

// select items
def items = for (itemType <- market.getMarketDataItemTypes.toStream;
currency <- currencies(itemType);
name <- market.getMarketDataItemNames(itemType, currency);
itemId = getItemId(itemType, currency, name))
yield (itemType, itemId, name)

// archive items
def archive(items: Iterable[Item]) =
for ((itemType, itemId, itemName) <- items)
archive(itemType, itemId, itemName)

archive(items)

The for/yield construct returns ("yields") a list composed of "items" (type, id, name), ready to be processed by the archive function. The interesting thing is that this list doesn't have to be build in memory at once. It is a Stream, i.e. a list whose elements are being fetched as they are needed.

The Java way: aggregating results

The last step in my script is to display the current number of processed elements as well as their total number. I did it very simply with Java:
final int totalProcessed = 0;
for (String type : types) {
for (String currency : currencies(type)) {
for (String name : getMarketDataItemNames(type, currency)) {
final int id = market().getMarketDataItemId(type, currency, name);
System.out.println(action + " item: " + name);
int processed = doAction(action, type, id);
totalProcessed += processed;
System.out.println("Done: " + processed + " Total: " + totalProcessed);
}
}
}

Again, the reporting logic is buried inside the loops, and this may fine indeed for a simple script. In other circumstances, you may want to be able to achieve a bit more independence between the functionalities:

The Scala way: aggregating results

The idea here is to be able to write:

report(count(items))
report(archive(items))
report(restore(items))

With the same report function which will:
  • take a list of Report resulting from each action,
  • print the current Report
  • cumulate the current Report with a running total Report
  • without having to process everything, then do the reporting,...
I will not go in every detail of the exact solution (which is a bit complex to my taste in fact, see below) but here are the principles. First of all, each action yields its result as a Report object, containing the name of the processed item and the result of the processed action:
def archive(items: Iterable[Item]) =
for ((itemType, itemId, itemName) <- items)
yield Report(itemName, archive(itemType, itemId, itemName))


Then, the report function judiciously uses the reduce function to do the sum and report each element:

def report(reports : Iterable[Report]) = {
reports.reduceLeft {(x:Report, y: Report) =>
(x + y).report
}
}

Not so readable for the non-expert eye, right?! Press F1:
  • report iterates over a list of Reports
  • it sums all elements 2 by 2 until we have a final result. List("h", "e", "l", "l", "o").reduceLeft((a: String, b: String) => a + b) would produce: "hello"
  • before returning the summed element, it calls the report function to allow the aggregated report to print itself to the console:

class Report(itemName: String, processed: Int) {
var total: Int = 0
def +(c: Report) = { c.total = total + c.processed; c }
def report = { reportItem; reportTotal; this }
def reportItem = println("Item " + itemName + ": " + processed)
def reportTotal = println("total number: " + total)
}
Conclusion

The sad truth is that my real-world solution is a tad more complex that the one presented above. In the "real-life", I have different types of Reports because each action doesn't bring the same kind of results.

count only returns counted elements, archive and restore return deleted elements + processed elements (to double-check that the action is ok). Abstracting over this and defining a Summable interface to provide addition over both Int and Tuples proved a bit more challenging than using the plain Java solution.

But the good news is that as long as you're not looking for too much abstraction, you will find really neat ways to write common programming logic in Scala.