The Setup
As Curt wrote it, we entered the contest doing things that should have been done well before:
- setting up the wifi access so that I can connect with my laptop (it never worked so Bryan had to get out and buy new network cables)
- exchanging keys to access the Subversion repository
- learning how to use the build and test system
However, the biggest lesson for me in this contest, was the use of Parser Combinators.
Do you speak Martian?
The second programming task in the contest, after setting up the network connection to be able to receive messages and send back commands, was to parse the input messages giving the position of the Martian rover, and its immediate environment.
As the task specifies, there are 5 kinds of messages:
- initialization messages -> what's the map like? what are you technical constants?
- telemetry stream -> where are you, what's your speed, what's around?
- adverse event -> crashed in a crater!
- success -> yes, home!
- end of run -> time and status
For many developers, the most obvious way to transforms meaningless strings to meaningful objects like Telemetry or Martian, would be to start traversing the string, check for characters like 'T' for Telemetry, or 'I' for Initialization then pass the rest of the string to another function and continue slicing the string until all data is analyzed.
Parser combinators
But Haskell developers know better,... They can just describe the grammar of the messages they're parsing by combining parsers, then applying the resulting parser to the string input.
Something like that:
telemetry :: Parser Message
telemetry = do ts <- int
state <- accelStateParser
turning <- turningStateParser
spaces
x <- double
y <- double
dir <- double
speed <- double
obstacles <- obstaclesParser
messageEnd
return $ Telemetry ts state turning x y dir speed obstacles
accelStateParser :: Parser AccelState
accelStateParser = (accelerating <|> braking <|> rolling)
accelerating = stateParser 'a' Accelerating
braking = stateParser 'b' Braking
rolling = stateParser '-' Rolling
stateParser :: Char -> a -> Parser a
stateParser c cons = do _ <- char c
return cons
turningStateParser :: Parser TurningState
turningStateParser = hardleft <|> softleft
<|> hardright <|> softright
<|> straight
hardleft = stateParser 'L' HardLeft
softleft = stateParser 'l' SoftLeft
hardright = stateParser 'R' HardRight
softright = stateParser 'r' SoftRight
straight = stateParser '-' Straight
obstaclesParser :: Parser [Object]
obstaclesParser = do l <- many obstacleParser
return l
obstacleParser :: Parser Object
obstacleParser = do code <- oneOf "bchm"
spaces
case code of
'b' -> object Boulder
'c' -> object Crater
'h' -> object Home
'm' -> martian
_ -> error "Can't get here in parser!"
object :: (Circle -> a) -> Parser a
object cons = do x <- double
y <- double
r <- double
spaces
return $ cons (Circle (x:.y) r)
martian :: Parser Object
martian = do x <- double
y <- double
dir <- double
speed <- double
spaces
return $ Martian (Circle (x:.y) 0.4) dir speed
In the above code, you can read that the parser for telemetry messages is a sequence of different parsers: a timestamp parser, a acceleration state parser, a turning state parser, ..., a parser for lists of obstacles. And a parser for list of obstacles is just "many" parsers for one obstacle. This is indeed as easy as being able to describe a grammar for that mini-language.
"k" should better be 1
But I didn't say that describing a grammar was easy! For instance, at first, we made a mistake in the way we managed spaces. We had almost each parser trying to consume spaces before doing its job. This generated unnecessary ambiguity in the parsing.
For example, let's say that you want to be able to parse this kind of message:
" O T O O T"
(honestly, I don't know why you would like to do something like that but you're the boss here)
You'd better combine those 3 parsers:
- spaces -> will consume any number of spaces
- oneParser = char 'O' >>= spaces -> will consume 'O' followed by any number of spaces
- twoParser = char 'T' >>= spaces -> will consume 'T' followed by any number of spaces
Actually, parser combinators in Haskell are also capable of backtracking and "unparsing" results if the chosen parser was not appropriate (trying things with the the "try" parser combinator). But why add more complexity when things can be described more directly?
Of course, there is a formal theory behind the example above: LL(k) grammars, where LL(1) is a type of grammar only needing to look at the next character to know which parser to use. But there's nothing like a simple "real-life" example to be able to grasp it.
Compare and contrast
Now, honestly, most other contest solutions were not using Parser combinators and the aren't soooo ugly (in Dylan for example). I think they're just less readable and more error-prone because of index manipulation.
For fun and comparison, I also developed a similar parser in Scala:
def telemetryMessage = for { timestamp <- intNumber
accState <- accelState
turnState <- turningState
vehicleX <- doubleNumber
vehicleY <- doubleNumber
vehicleDir <- doubleNumber
vehicleSpeed <- doubleNumber
obstacles <- objects }
yield Telemetry(timestamp, accState(), turnState(),
vehicleX, vehicleY, vehicleDir,
vehicleSpeed, obstacles)
def accelState = (
"a" ^^^ Accelerating
| "b" ^^^ Braking
| "-" ^^^ Rolling
)
def turningState = (
"l" ^^^ SoftLeft
| "L" ^^^ HardLeft
| "-" ^^^ Straight
| "r" ^^^ SoftRight
| "R" ^^^ HardRight
)
def objects = rep(objectParser)
def objectParser: Parser[Object] = (
"b" ~ spaces ~> circleParser ^^ (new Boulder(_))
| "c" ~ spaces ~> circleParser ^^ (new Crater(_))
| "h" ~ spaces ~> circleParser ^^ (new Crater(_))
| "m" ~ spaces ~> martianParser
)
def circleParser = for { x <- doubleNumber
y <- doubleNumber
r <- doubleNumber }
yield new Circle(x, y, r)
def martianParser = for { circle <- circleParser
dir <- doubleNumber
speed <- doubleNumber }
yield new Martian(circle, dir, speed)
The interesting things in the Scala implementation, although it could seem a bit ugly at first sight, are:
- the use of a sequence operator "~>" getting rid of what has already been parsed. Indeed, when we have parsed "b " we know we need to create a Boulder object. The 'b' character and the space are then not usefull anymore
- the use of functions to transform the parsed string into a meaningful object, like "l" ^^^ SoftLeft, creating a new SoftLeft object (it's a case class) when parsing "l"
- implicit definitions. In the example above I don't need to declare my parser for 'l' as letter("l"). Combining "l" with another parser automatically creates a parser for the character 'l'
One of the reasons for the ubiquitous adoption (and hate!) of XML is, I think, the general availability of parsers and binding libraries. This is why this often becomes the easiest option for configuration files (is it changing with YAML and JSON?).
It doesn't have to be that way. I hope that parser combinators can now bring another legitimate option when considering the implementation of an external mini-language for configuration or protocol. They're simple to use and readily available on your Java platform through Scala. Try them at home!
6 comments:
just curious about "double" and "int" parsers you use; could you provide reference or implementation plz.
p.s. Scala implementation is shorter, what a piece of art!
You're right Sanny, I left them out but here they are:
def intNumber: Parser[Int] = wholeNumber <~ spaces ^^ (java.lang.Integer.parseInt(_.mkString))
def doubleNumber: Parser[Double] = floatingPointNumber <~ spaces ^^ (java.lang.Double.parseDouble(_))
And the "spaces" parser too:
def spaces = rep(whiteSpace)
Personally, I think that parser combinators are probably the smartest innovation in the entire world of functional programming. It is interesting that the Haskell syntax is quite a bit cleaner (no need for tildes), but the Scala implementation is shorter. It could just be language familiarity, but I like to think that it's more a factor of Scala's hybrid power. :-)
Thanks, Eric. But I was asking about haskell's "double" and "int" parsers ;-).
Apart from "integer haskell" (which is too hi-level), I could not find any standard ones.
Regards,
This is how we did it in Haskell for the int and double parsers:
int :: Parser Int
int = do digits <- many digit
spaces
return $ read digits
double :: Parser Double
double = do digits <- many doubleChars
spaces
return $ read digits
where doubleChars = oneOf "-.0123456789"
You can notice that the double parser is far from perfect since it will try to parse strings like "1-23.45-6." before failing on "read".
Daniel,
I also love Haskell's syntax. It is truly minimal *and* powerful.
The funny thing is that I don't like the infix notation with backquotes, which you don't need in Scala. On the other hand, in Haskell, you can write "function parameter" which you can't in Scala.
That would be very useful to allow this in some embedded Scala DSLs. But I guess this is not possible to have both notations available in a language with functions as first class objects.
Regarding the Scala implementation size, it seems that the Haskell implementation could also be shortened using Control.Applicative.*> . This is similar to ^^^ in Scala as far as I understand.
Similarly there should be a way to use type classes to get the same effect as implicits for letter parsers, but I haven't experimented with that yet.
Now that the contest is over, I need to find some other pet project to continue training in Haskell. Err, no. Actually I mostly need time!
Post a Comment