18 May 2006

Back to language roots (and to plumbing once in a while)

I have been musing a lot with computer languages recently.

Parsing Ruby code

First of all, I needed to be able to analyze some Ruby classes and extract part of the operations behavior. For instance, I needed to parse the following code:

if (p1 == 2 && p2 == 1)
@attribute = 3
@attribute = 4

I wanted to extract something like:

p1==2 && p2==1 'implies' @attribute = 3
not(p1==2 && p2==1) 'implies' @attribute = 4

So I started to write my own parser, that would take the ruby class file and:

-find the class definition
-find the operation definition
-parse the if-then-else-endif expressions (that can be nested,...)

There's got to be a better way!

A better way

Google was my guide to a super ruby library: ParseTree (

ParseTree takes some ruby code and return a "sexp" (symbol expression) that represents the program being parsed. For instance:

def example

[:defn, :example, [:args], [:call, [:lit, 1], :+, [:array, [:lit, 1]]]]
Then, the ParseTree library offers a Sexprocessor class that allow the easy consumption of the sexp.

This is fine for the theory. The usual practise of a programmer is less shiny:
  • I had to download also RubyInline which is a library that allows c code to be compile then called by ruby code
  • I had to let RubyInline compile the ParseTree c code, which took me some hours to do tweak it right, from modifying part of ParseTree c code to modifying the RubyInline compilation command to work on my Windows laptop (the ParseTree/RubyInline folks don't seem to be willing to live with Microsoft around). If you encounter the same difficulties, send me a mail, I'll try to help you
When I do that, I really feel like a computer plumber, there are so many more interesting things to do with a computer! Anyway.

Then I realized that the trip wasn't over: parts of the sexp had to be translated back to Ruby code again!

Back to the roots of programming

This is where I found (or more exactly refound) Paul Graham articles on Lisp ( I was really fascinated by the data <-> code equivalence offered by Lisp. The syntax is simplistic and the code is already expressed as a syntax tree!

The funny thing is that the first language I was taught in my engineer school was Scheme, a Lisp dialect. At that time, I mostly saw the power of recursivity, but not this idea of extending the language itself with macros, and so on.

One more funny thing before returning to Ruby: Lisp was not invented as a new language, but more like discovered as an experiment to find another computation axiomatisation than Turing machines (John MacCarthy, 1957!, see Paul Graham's article).

From Ruby to sexp to Ruby again

Anyway, back to Ruby, the idea is to use another Ruby library: Ruby2Ruby (in the zenhacks gem) that should do the trick. I have not yet finished the round-trip experiment, but this should do it. The idea behing Ruby2Ruby is to implement most of the Ruby language as Ruby code, leaving only a few primitives translated to C. This provides some interesting lines of code in the library tests, check it out:

r2r2r2 = RubyToRubyToRuby.translate(RubyToRuby).sub("RubyToRuby","RubyToRubyToRuby")

Good luck with that!

A mini-language for acceptance testing

Today, I wanted to write acceptance tests for our generation algorithms. The trouble is that our algorithms explore a tree of possibilities based of the system behavior. What I would like to do is:
  • to specify a pattern of possible behaviors
  • having some java code generate the system behavior based on the pattern
Not clear? Let's say I have the following pattern:

[a*|b]*cd. This would mean that, if my system prints a, b, c or d letters when stimulated, then the printed string must follow the [a*|b]*cd pattern.

So, rolling up my sleeves, I first tried to implement a parser for those types of expression. Again, implementing my own parser? There gotta be a better way!

A better way, revisited

I thought about JavaCC, Antlr and then I recalled an article about JParsec. JParsec is a port of the Parsec Haskell library. The main difference of JParsec with JavaCC and Antlr is that it is not a code generator. You do not feed it with a grammar and get back a parser for your (mini?)language.

So I tried to define my mini-language parser with Jparsec. Unfortunately, I got stuck by the lack of available documentation (the codehaus server was down all day long, it still is). At the end of the day, it looked like I was back to plumbing again, having a library and using it as a blind man with trials and errors.

Perseverance, mini-languages are a must-have

I should many collect writings on the qualities needed to be a good programmer. Perseverance seems to be one of those. It would be very tempting, in my situation, to let down the JParsec trial and to write my own parser. Or to let down the whole mini-language idea and to write ad-hoc acceptance tests.

However, I feel that perseverance here is important. Mastering the creation of mini-languages is such a powerful tool in your toolbelt.

Because the best way to leverage the assembly language was to create a programming language, the best way to leverage a programming language is to create mini languages that are adapted to your domain.

Build up on your own language

Related to parsers and the use of mini-languages, I would add a concluding thought: every computation should be done within your programming language.

This is why I like Ruby and Rails.

This is why I don't like java AOP: when you use java AOP, the syntax is specific for the annotations, you have an extra "weaving" pass in your development process.

In the end, this is where Lisp may be leading us (or only me?): parsers and code generators should be included in the basic toolkit of any decent programming language and every programmer should master them.

No comments: