23 November 2017

If at first it doesn't succeed...

If at first it doesn’t succeed, try something else

Today we start from a very common programmer need and we end up checking laws! With some considerations about software modularity.

I am currently implementing a command-line application in Haskell. This application needs a OAuthToken to access some webservice. There are 2 ways to get a token:

  • with an environment variable: OAUTH_TOKEN
  • with a call to another command-line tool, ztoken, which gives you a token based on your credentials

This translates to the following Haskell functions

getTokenFromEnvironment :: ExceptT GetTokenError IO OAuthToken

getTokenFromCommandLine :: ExceptT GetTokenError IO OAuthToken

Here I am using:

  • IO as my base monad for interacting with the external world

  • ExceptT GetTokenError as a way to declare an error if I cannot get a token because either I don’t have an OAUTH_TOKEN variable in my environment or ztoken does not accept my credentials. I leave other issues, like network access problems to exceptions in the IO monad

data GetTokenError =
  CommandLineError String |
  EnvironmentError String

What I really want now is a way to combine those 2 calls into one

getToken :: ExceptT GetTokenError IO OAuthToken

The first thing which came to my mind was the Alternative type class and in particular the <|> operator:

(<|>) :: f a -> f a -> f a

This typeclass has an instance for ExceptT e provided that there is a Monoid e, meaning that I can accumulate errors. Actually what I just need is the Alt typeclass which just requires a Semigroup e. For this I need to adapt my error type a little bit:

data GetTokenError =
  CommandLineError String |
  EnvironmentError String |
  RepeatedErrors GetTokenError GetTokenError
  deriving (Eq, Show)

The RepeatedErrors case gives us the possibility to accumulate errors in the case of repeated failed calls to retrieve an OAuthToken. You could argue that it is not particularly well modelled because RepeatedErrors stores errors more like a tree rather than a list. But let’s leave it at that for now.

The Semigroup instance for GetTokenError looks like this:

instance Semigroup GetTokenError where
  e1 <> e2 = RepeatedErrors e1 e2

I can finally define:

getToken :: ExceptT GetTokenError IO OAuthToken
getToken =
  getTokenFromEnvironment <|>

This is really nice because this is exactly what I want to express!

  • get the token from the environment
  • if that doesn’t work get it from the command line

More abstraction on the way

This looks all good but something is annoying. My functions are using ExceptT GetTokenError IO which is a very concrete monad stack. I am going to make further calls to other monad stacks to make HTTP calls and I might have to lift all over the place to align all the different stacks. This is even more annoying if I create a small library for getting tokens because I impose my stack choices on all clients of the API.

There is a way out of this, the monad transformers library (mtl). In the mtl there are all sorts of typeclasses abstracting features provided by monad transformers. One of them is MonadError:

class Monad m => MonadError e m | m -> e where
  throwError :: e -> m a
  catchError :: m a -> (e -> m a) -> m a

If a Monad has a MonadError instance then you can catch and throw errors. Nice. However there is a catch (no pun intended :-)). The | m -> e part of MonadError means that the Monad m you are going to eventually select to support the MonadError functionality is dependent on the error type e.

Concretely this means that this type signature propagates to the top of the application:

getToken :: (MonadError GetTokenError m, MonadIO m) => m OAuthToken

Indeed, if I mix other calls to getToken, for example

getPartitions :: (MonadError GetTokenError m, MonadIO m) => m [Partition]
getPartitions =
  do token <- getToken
     callService token partitionsRequest

Then the MonadError GetTokenError m constraints stays because m is tied to it forever. And if callService declares its own error type, I won’t be able to mix both getToken and callService calls:

callService :: (MonadError HttpError m, MonadIO m) :: OauthToken -> Request a -> a

Maybe one solution would be to define getPartitions as:

getPartitions :: (MonadError (GetTokenError Either ServiceError) m, MonadIO m) => m [Partition]

In a way we are back to the previous problem. We now have a “concrete stack” of errors where we just want “a” structure capable of holding both GetTokenError and HttpError.

Lenses to the rescue

A partial solution to the problem is given by lenses and is wonderfully explained by Georges Wilson in “next level mtl with classy optics” (I also recommend this gist by Nick Partridge):

-- create prisms for GetTokenError
makeClassyPrisms ''GetTokenError

-- now e is anything having a Prism allowing us to extract or inject a GetTokenError
getToken :: (MonadError e m, MonadIO m, AsGetTokenError e) => m OAuthToken

So it becomes kind of easier to mix calls having different error types:

getPartitions :: (MonadError e m, AsGetTokenError e, AsHttpError e, MonadIO m) => m [Partition]

This is probably a better way to mix different error types with MonadError. We don’t get rid of the functional constraint though. The error types still “bubble-up” to the top and the method used to do the authentication is exposed to the clients of getPartitions. So if we switch the authentication mechanism and get different error types we will have to change all the functions calling getPartitions.

I think we need to really take care of this kind of situation because it makes software a lot harder to evolve when a small change has ripple effects across all the software layers.

Errors translation / encapsulation

What we can do is to define a new error type subsuming GetTokenError and HttpError:

data ServiceError =
  AuthenticationError GetTokenError |
  CallError HttpError

makeClassy ''ServiceError

instance AsGetTokenError ServiceError
instance AsHttpError ServiceError

And we need a bit of boilerplate to do the translation between getToken and callService. To be able to “liberate” those 2 functions from their MonadError e m, MonadIO m constraints we can run them with the minimum stack which satisfies these constraints, this is ExceptT <error type> IO a:

runOAuthToken :: (MonadError e m, AsCliError e, MonadIO m) => ExceptT GetTokenError IO a -> m a
runOAuthToken = runExceptTIO (review _AuthenticationError)

runCallService :: (MonadError e m, AsCliError e, MonadIO m) => ExceptT HttpError IO a -> m a
runCallService = runExceptTIO (review _HttpError)

runExceptTIO :: (MonadError e m, MonadIO m) => (f -> e) -> ExceptT f IO a -> m a
runExceptTIO mapError ioa =
  do valueOrError <- liftIO (runExceptT ioa)
     fromEitherM (throwError . mapError) valueOrError

(fromEitherM comes from the from-sum package)

Then we can make both calls and “unify” them under new constraints:

getPartitions :: (MonadError e m, AsServiceError e, MonadIO m) => m [Partition]
getPartitions =
  do token <- runToken getToken
     runService (callService token partitionsRequest)


Unfortunately being more abstract with getToken breaks the use of <|> which was so convenient. One option for fixing it is to define a similar operator for mtl classes:

(<|>) :: (MonadError e m, Semigroup e) => m a -> m a -> m a
(<|>) ma1 ma2 =
  catchError ma1 (\e1 ->
    catchError ma2 (\e2 ->
      throwError (e1 <> e2)))

But this still doesn’t work given the way we have defined our errors as prisms over a general error type! We need to define a extension of <|> which uses the appropriate Prism to aggregate errors:

(<|?>) :: (MonadError e m, Semigroup o) => Prism' e o -> m a -> m a -> m a
(<|?>) p ma1 ma2 = catchError ma1 (\e1 ->
  case e1 ^? p of
    Nothing -> throwError e1
    Just o1 ->
      catchError ma2 (\e2 ->
        case e2 ^? p of
          Nothing -> throwError e1
          Just o2 -> throwError (review p (o1 <> o2))))

And finally

getToken :: (MonadError e m, MonadIO m, AsGetTokenError e) => m OAuthToken
getToken =
  (<|?>) _GetTokenError

This is not syntactically as nice as before but we can improve that:

infix 4 <!>
infix 3 <?>

data Alternating a = Alternating a a

(<!>) :: MonadError e m => m a -> m a -> Alternating (m a)
(<!>) = Alternating

(<?>) :: (MonadError e m, Semigroup o) => Prism' e o -> Alternating (m a) -> m a
(<?>) = error "left to the reader"

getToken =
  _GetTokenError <?>
  getTokenFromEnvironment <|>

What about laws?

This is not the happy end of the story. The Alternative type class specifies that <|> must be an associative operation. This means that we probably need to prove that our <|> and its evil twin <|?> are associative operators. What does that mean for instances of MonadError e m? By the way what is even a MonadError e m?

I was pretty shocked to realize that MonadError doesn’t come up with any laws in Haskell! We need to turn to Purescript for this:

-- | - Left zero: `throwError e >>= f = throwError e`
-- | - Catch:     `catchError (throwError e) f = f e`
-- | - Pure:      `catchError (pure a) f = pure a`

And we probably need an additional law for our <|> operator:

-- | - Catch associativity: `catchError (catchError ma f1) f2 =
--                           catchError ma (\e -> catchError (f1 e) f2)`

Since many instances already exist for MonadError the best we can do is to check if those laws hold for these instances.

I was at first very enthusiastic about that. There are 2 libraries in Haskell which can be used to retroactively find properties for a give API: quickspec and speculate. However I was unable to get any of them to compile / find laws. Since I had been shaving too many yaks on this application I decided to stop there. But I hope I will get some time to get those libraries to find laws.

Anyway, the best I could do was to check the laws for all the instances of MonadError: EitherT, ListT, MaybeT, ExceptT and so on. I verified a bunch of them using QuickCheck and they seem to hold. Actually if that wasn’t the case I would be surprised and would probably question the implementation. I also proved that the laws were holding for the Either instance of MonadError. It goes like this, for the associativity of catch:

-- this is what we want to prove
catchError (catchError m k2) k1 == catchError m (\\e -> catchError (k2 e) k1)

--> case 1: m = Left t
catchError (catchError (Left t) k2) k1 == catchError (Left t) (\\e -> catchError (k2 e) k1)

-- we apply on each side the definition of catchError for Either when the (m a)
-- value is a Left value which is to call the handler, CQFD
catchError (k2 t) k1 = catchError (k2 t) k1

--> case 2: m = Right a
catchError (catchError (Right a) k2) k1 == catchError (Right a) (\\e -> catchError (k2 e) k1)

-- we apply on each side the definition of catchError for Either when the (m a) value
-- is a Right value which is to leave it as it is, CQFD
catchError (Right a) k1 = Right a

-- we do it once more on the left side of the equation
Right a = Right a

I suspect that it is possible to prove the laws for other MonadError instances, StateT would probably be interesting.

What have we learned?

There are plenty of lessons for me in this exercise:

  • Haskell is full of interesting generalization, <|> is one of them and worth spotting in production code

  • Haskell libraries are not necessarily complete and can probably benefit from contributions

  • The modularity story of Haskell is not as obvious as what you might think. I had to think hard to find a satisfying solution and one huge problem is still unsolved. How can I test my full application code with a dummy authentication, which would always succeed? This will be the subject of a following post!

10 September 2017

ICFP 2017

I already blogged last year on ICFP, the awesome conference on Functional Programming, to share a neat trick.

This year I would like to do a small brain dump on some of the sessions which I really liked.

Workshop on higher-programming with effects

Programming a web server with effects

This is one of several talks showing the benefits of programming with a language with natively supports the description of effects in the type system: Koka. Not only Daan Leijen is a great presenter whose enthusiasm is contagious but with Koka it feels entirely natural to write asynchronous code with interleaving, cancellation, resource-cleanup and so on.

The also hints at something which got me really interested, the yield operator to create iterators can be seen as an effect. Coupled to the async support above this makes me think that a whole streaming library could be created as a set of effects!

Workshop on type-driven development

Type-directed diffing of structured data

A fantastic idea: can be do better than line-diffing to diff code and do AST diffs? The presenter introduced a language for representing AST patches (not as easy as it seems) then different possibilities for algorithms doing the diff (clearly a hard problem if you want to find the minimum diff).


Super 8 languages for making movies

I’m not a big fan of dynamic languages but the power of Racket is impressive. The presenter shows how she created a “Tower of DSLs” to create a video-editing tool. Complete with parser, GUI, documentation, integration with video libraries in a lot less lines of code than anything else I know.

Faster coroutine pipelines

Coroutine pipelines are the functional way to create streaming libraries. However they can be made faster. How? But using an encoding based on continuations rather than the “direct” encoding. This trick seems to be pervasive. For example, for algebraic effects, ScalaEffekt uses such an encoding and gets much better performances than eff.

A unified approach for solving seven programming problems

Racket is at it again but with a secret weapon, miniKanren. If you view a computation as a “logical” relationship between a program and some outputs, it is very reasonable to ask your solver to find the programs which give you the right outputs. And find, for example, Quines which are programs which output themselves!

Generic functional parallel algorithms: parallel scan

Scans, computing the prefix sums of a list of ints, are an important ingredient in some algorithms, like regular expression search. If they can be performed in parallel it’s even better! This talk shows that a scan algorithm can be defined for generic datastructures, just defined by sums, products and functor composition. Then by adjusting the decomposition you get different amounts of parallelism and work to compose results.

Effect-driven quickchecking of compilers

Very entertaining presentation showing that it is entirely possible to generate well-type terms to test a compiler.

Compiling to categories

My favourite talk this year. What if you consider lambda and apply in the lambda calculus as 2 operations which could be interpreted differently? Conal Elliott developed a GHC plugin to transform Haskell expressions into objects in a category where function composition corresponds to the composition of morphisms in a category. What does that give us? Instead of applying functions and getting a result we can generate circuits computing them, differentiate them or do interval computations where the results are intervals instead of just being values.

Staged Generic Programming

Generic programming is cool. Write a few lines of code and let the compiler figure the right data structure and / or program. Unfortunately the performances might be very bad compared to hand-written code. With staging we can let the compiler be smarter and generate “unrolled” code which will perform as well as the hand-written one.

I was also glad to get to talk to Jeremy Yallop, the presenter, who is a very nice guy, now working at Docker on unikernels, something to keep an eye on!

Whip: higher-order contracts for modern services

This is looks more like a CUFP talk, with real applications in the industry :-). The idea is to install some “decorators” on webservices to check where contracts are being broken. They have a nice website and I am tempted to try this at work.

Haskell Symposium

Algebraic graphs with Class

A simple but very powerful to describe graphs “algebraically”. Some atoms: the empty graph, the singleton graph and 2 operations: overlay (sort of addition) and connect (sort of multiplication). Based on that you can represent any graph and encode any algorithm against that representation, which can then optimised for different purposes.

Quickspec / Speculate

Those 2 tools will generate the equations that you API is supposed to satisfy. For example, given [], ++ and reverse they should be able to generate reverse ([] ++ xs) == reverse xs. But even more, they can now generate implications and inequalities!

Composable networks and remote monads

The remote monad explores different ways of “grouping” operations when addressing remote services. After tried to do something similar for eff I appreciate a lot more the subject :-).

Testing with crowbar

Impressive example about finding bugs in a unicode library in 10 minutes which QuickCheck can not find in 1 week. The secret? Fuzzing and code coverage. Generate random sequence, turn them to test cases and mutate the sequences until you find some “interesting” part of the code. Those are likely to contain bugs.

ML Family workshop

State machines all the way down

My periodic reminder that I need to find some time to use Idris. Even more so after Edwin Brady said that the state machine programming with the type system tracking before/after state subsumes the effects library! He also told me that he was implementing the Idris compiler in Idris and this was a good example for him about “when not to go too far with types”.

Effects without monads: non determinism

Oleg Kiselyov is back, implementing non-determinism with TTFI (typed tagless final interpreters). This example is particularly interesting because he shows that monads are not necessary to interpret this mini-language. Not only that but that but they cannot even be used if your translation is about generating code. Because you cannot write code which will write code!

Bioinformatics: the typed tagless final way

A concrete example of people using TTFI in real life for large and extensible DSLs.


Bonsai: DSL for serverless decisioning

This DSL is pretty simple, it is just decision trees (if / then / else) but compiled for maximum throughput for the ad industry.

Gens N’ Roses: appetite for reduction

Jacob Stanley talk on Hedgehog where shrinks come for free and failures are beautiful. A “must-use” for Haskell users.

Haskell implementors workshop

Syntactic musings

This was a lightning talk but it made me laugh. Those are proposals to improve haskell syntax with almost zero possibilities to be adopted :-):

How to write bracket without parenthesis:

Instead of

  (acquire some resources)
  (do your stuff which might be
   pretty long)
  (and clean up)

Go (yes, bullet points!)

  . acquire some resources
  . do your stuff which might be
     pretty long
  . and clean up

Then set a value for a bunch of functions

context fixed (value) where
  add a = value + a
  mul a = value * a

to avoid the repetition of value and to avoid accidentally messing with it. This screams like “module/functor envy” to me.

The last one is crazy but brilliant. Get away with grouping with parentheses by removing whitespace where you want to indicate higher precedence

a+b * c+d

Why not :-)?

An experiment in fragment-based code distribution

I thought this idea could really not roll but now I am intrigued. Following Joe Armstrong’s “Why do we need modules at all?” Philip Schuster implemented a build system for haskell where each program is decomposed into “slices” containing just one function and its dependencies. Everything identified by numbers. Advantages: just compile and package what you need, never care about version numbers. But the job of curating all of that to get a coherent snapshot, oh boy! As a data point, running anything on Scotty requires 12.000 slices.


A very intense week as in the previous editions I have been to. I don’t know yet what will stick with me and influence my future work but there’s plenty to think about. I also enjoyed very much the discussions I had with Jonathan and Philipp, the authors of scala-effekt.

20 August 2017

Specs2 4.x

It has been now almost 3 years since the last major version of specs2. But why would a new major version of a testing library be even needed? Because the world doesn’t stop revolving!

In particular Scala can now be compiled to JavaScript and even natively. It is then only natural that specs2 users want to use specs2 on their Scala.js projects:

How hard can this be? First of all specs2 3.x relies on scalaz-stream which doesn’t have a Scala.js build, and more largely on scalaz (which has a Scala.js build).

This is where an issue becomes an opportunity! If I need to find an alternative to scalaz-stream why not remove scalaz altogether? Indeed it is very inconvenient for a test library to rely on third-party libraries which might conflict with users libraries. Because of this conflict I have to publish several versions of specs2 for different versions of scalaz (7.0.x, 7.1.x, 7.2.x and so on). This is pretty time-consuming, not only because of having to wait for more than 30 minutes for all the jars to be published but also because I have to adapt to some small differences between each scalaz version.

Removing scalaz and scalaz-stream is easier said than done:

  1. specs2 relies heavily on functional programming so I need the usual FP suspects: Functor, Applicative, Monad and so on

  2. specs2 interacts with the file system, writes out to the console, executes specifications concurrently so it needs a way to control effects

  3. specs2 is structured around the idea of a “stream of specification fragments” so it needs a streaming abstraction

All of this before I can even consider moving to a Scala.js build!

specs2 own FP / streaming infrastructure

Fortunately I have developed on the side many of the pieces needed above:

  1. eff is a library to handle effects, including concurrency and resources management

  2. producer is a simple streaming library designed to work with any Monad, including the Eff monad

  3. origami implements a notion of composable “folds” which can be applied to any stream, which is a pattern I (re)-discovered on earlier versions of specs2: reporting the results of an executed specification is just “folding” the specification, a bit like using foldLeft on a List

A copy of those libraries is now included in the specs2-common module.

The only thing missing was the FP abstractions. It turns out that it is not too hard to implement the whole menagerie:

  • Functor, Applicative, Monad, Traverse, Foldable, Monoid. Those typeclasses are very tied together
  • Tree and TreeLoc for manipulating trees

I am greatly indebted to the Scalaz and cats projects for this code, which I reduced to the only parts I needed for specs2. Those abstractions and datatypes are now available in the specs2-fp module (not intended for external use).

Execution model

At the heart of specs2 is a Fragment, which is a Description and an Execution yielding a Result. The Description holds an informal description of the specified system and the Execution runs some Scala code to verify that behaviour.

Using Scala.js imposes a big change is specs2. We can never await to the get the result of executing a Fragment. Also since scalaz.concurrent.Task is out of the equation, in specs2 4.x an Execution is implemented using a scala.concurrent.Future, like this:

case class Execution(
 run:       Option[Env => Future[() => Result]],
 executing: Option[Throwable Either Future[Result]])

This is a lot more complicated than just Future[Result]. Why is that?

run takes an action to execute () => Result, possibly concurrently Future[() => Result] and which might depend on the environment Env => Future[() => Result]. You might wonder why we don’t simply use Env => Future[Result]. This is to allow the user to add some behaviour around the result, for example with the AroundEach trait and the def around[R : AsResult](r: =>R): Result method.

We also want to “stream” results, which are executed concurrently and display them as soon as they are available. This is done by having executing holding either a Throwable if we could not even start the execution or the Result being currently computed.

Note that we can never call await in a Scala.js program so when we are executing a Fragment the best we can get is the equivalent of Future[Result] (in reality an Action[Result] which is a type possibly having more effects than just concurrency).

This triggers major changes in the implementation of specs2 and there are rippling effects up to the Runners used to run a specification. However most of the users of specs2 should not be concerned with these changes as the “user” DSL for specs2 isn’t modified. However if you are on the JVM and really need to await to get the result of a fragment execution you can do so by using one of the methods in the org.specs2.control.ExecuteActions object. For example result.runAction(executionEnv) will return an Error Either Result. Running the same method on Scala.js will throw an exception.

Scala.js modules

specs2 runs on Scala.js” is ever going to be a partial truth. There are many limitations related to Scala.js and this means that, while specs2-core can be used on any Scala.js project, there are some specs2 features and modules which cannot be used with Scala.js. Here is a list (up to my current knowledge):

  1. the isolate argument to run each example in its own copy of the specification cannot be used (because this uses reflection to instantiate classes)

  2. no custom Selector/Executor/Printer/Notifier can be passed from the command line (for the same reason)

  3. the RandomSequentialExecution trait cannot be used because the implementation uses a TrieMap not available on Scala.js. This could be fixed in the future.

  4. running only examples which previously failed with was x on the command-line is not possible because this accesses file system

  5. all the modules where scala-xml is used don’t have a Scala.js version because there’s no scala-xml for Scala.js: specs2-form, specs2-html

  6. modules accessing the file system cannot be used: specs2-html, specs2-markdown

  7. the specs2-gwt module defines fragments based on the result of previous fragments, so it currently needs to await. This could be fixed potentially with a smart enough implementation in the future.

Breaking changes

A major version number is also the opportunity to fix some of the API flaws. I tried to limit the changes in the low-level API to the strict necessary but I also wanted to clean-up one important aspect of the high-level API.

When specs2 3.x came out I envisioned that there could be ways to create a specification from arguments passed on the command line, or from an ExecutionContext (to execute futures). With the addition of some traits like CommandLineArguments you could declare:

class MySpec extends Specification with CommandLineArguments {
  def is(args: CommandLine) = s2"""
    try this $ok


class MySpec extends Specification { def is = s2"""
  try this $ex1

  def ex1 = { implicit ec: ExecutionContext => Future(1) must be_==(1).await }

I later realized that it would be a lot more convenient to directly “inject” the command line arguments or the execution context in the specification like so:

class MySpec(args: CommandLine) extends Specification { def is = s2"""
  try this $ex1

 def ex1 = { (1 + 1) must be_==(2).when(args.contains("universe-is-sane"))}
class MySpec(implicit ec: ExecutionContext) extends Specification { def is = s2"""
  try this $ex1

  def ex1 = { Future(1) must be_==(1).await }

Now there were 2 ways to do the same thing, the second one being a lot easier than the first one. In specs2 4.x the traits supporting the first behavior org.specs2.specification.Environment, org.specs2.specification.CommandLineArguments, org.specs2.specification.ExecutionEnvironment have been removed. Same thing for the implicit conversions for functions like implicit ee: ExecutionEnv => Result to create examples.

This will demand some migration efforts for some users but the result will be more concise specifications.

Another word on injected execution environments. From specs2 3.8.9, 2 distinct execution environments are being used in specs2. One for the execution of the specification itself, the other one for the execution of the user examples (to map futures or await for results for example). This is to make sure that some incorrect user code doesn’t prevent the full specification from being executed.

With specs2 4.x we can go one step further and allow each specification to get its own dedicated execution environment and make sure that one specification execution doesn’t impact another one:

import org.specs2.specification.core.{Env, OwnExecutionEnv}

class MySpec extends(val env: Env) extends Specification with OwnExecutionEnv { def is = s2"""

The injected env is used to pass command-line argument to a copy of the user execution env, implicitly available as a member of the OwnExecutionEnv trait. That trait also takes care of shutting down the execution environment once the specification has finished executing so as not to lose resources.

Feedback is essential

In conclusion, specs2 4.x should be a smaller gap from specs2 3.x than specs2 2.x to specs2 3.x. With a lot more to offer!

  1. no more dependencies for specs2-core

  2. a Scala.js version for specs2-core

  3. better control on execution environments

The release notes can be found here.

Please, please, please ask on gitter, on the mailing-list, on twitter if you have any issue. And a big thank you for reporting any mistake, big or small, that I may have made on this release!