23 November 2017

If at first it doesn't succeed...

If at first it doesn’t succeed, try something else

Today we start from a very common programmer need and we end up checking laws! With some considerations about software modularity.

I am currently implementing a command-line application in Haskell. This application needs a OAuthToken to access some webservice. There are 2 ways to get a token:

  • with an environment variable: OAUTH_TOKEN
  • with a call to another command-line tool, ztoken, which gives you a token based on your credentials

This translates to the following Haskell functions

getTokenFromEnvironment :: ExceptT GetTokenError IO OAuthToken

getTokenFromCommandLine :: ExceptT GetTokenError IO OAuthToken

Here I am using:

  • IO as my base monad for interacting with the external world

  • ExceptT GetTokenError as a way to declare an error if I cannot get a token because either I don’t have an OAUTH_TOKEN variable in my environment or ztoken does not accept my credentials. I leave other issues, like network access problems to exceptions in the IO monad

data GetTokenError =
  CommandLineError String |
  EnvironmentError String

What I really want now is a way to combine those 2 calls into one

getToken :: ExceptT GetTokenError IO OAuthToken

The first thing which came to my mind was the Alternative type class and in particular the <|> operator:

(<|>) :: f a -> f a -> f a

This typeclass has an instance for ExceptT e provided that there is a Monoid e, meaning that I can accumulate errors. Actually what I just need is the Alt typeclass which just requires a Semigroup e. For this I need to adapt my error type a little bit:

data GetTokenError =
  CommandLineError String |
  EnvironmentError String |
  RepeatedErrors GetTokenError GetTokenError
  deriving (Eq, Show)

The RepeatedErrors case gives us the possibility to accumulate errors in the case of repeated failed calls to retrieve an OAuthToken. You could argue that it is not particularly well modelled because RepeatedErrors stores errors more like a tree rather than a list. But let’s leave it at that for now.

The Semigroup instance for GetTokenError looks like this:

instance Semigroup GetTokenError where
  e1 <> e2 = RepeatedErrors e1 e2

I can finally define:

getToken :: ExceptT GetTokenError IO OAuthToken
getToken =
  getTokenFromEnvironment <|>

This is really nice because this is exactly what I want to express!

  • get the token from the environment
  • if that doesn’t work get it from the command line

More abstraction on the way

This looks all good but something is annoying. My functions are using ExceptT GetTokenError IO which is a very concrete monad stack. I am going to make further calls to other monad stacks to make HTTP calls and I might have to lift all over the place to align all the different stacks. This is even more annoying if I create a small library for getting tokens because I impose my stack choices on all clients of the API.

There is a way out of this, the monad transformers library (mtl). In the mtl there are all sorts of typeclasses abstracting features provided by monad transformers. One of them is MonadError:

class Monad m => MonadError e m | m -> e where
  throwError :: e -> m a
  catchError :: m a -> (e -> m a) -> m a

If a Monad has a MonadError instance then you can catch and throw errors. Nice. However there is a catch (no pun intended :-)). The | m -> e part of MonadError means that the Monad m you are going to eventually select to support the MonadError functionality is dependent on the error type e.

Concretely this means that this type signature propagates to the top of the application:

getToken :: (MonadError GetTokenError m, MonadIO m) => m OAuthToken

Indeed, if I mix other calls to getToken, for example

getPartitions :: (MonadError GetTokenError m, MonadIO m) => m [Partition]
getPartitions =
  do token <- getToken
     callService token partitionsRequest

Then the MonadError GetTokenError m constraints stays because m is tied to it forever. And if callService declares its own error type, I won’t be able to mix both getToken and callService calls:

callService :: (MonadError HttpError m, MonadIO m) :: OauthToken -> Request a -> a

Maybe one solution would be to define getPartitions as:

getPartitions :: (MonadError (GetTokenError Either ServiceError) m, MonadIO m) => m [Partition]

In a way we are back to the previous problem. We now have a “concrete stack” of errors where we just want “a” structure capable of holding both GetTokenError and HttpError.

Lenses to the rescue

A partial solution to the problem is given by lenses and is wonderfully explained by Georges Wilson in “next level mtl with classy optics” (I also recommend this gist by Nick Partridge):

-- create prisms for GetTokenError
makeClassyPrisms ''GetTokenError

-- now e is anything having a Prism allowing us to extract or inject a GetTokenError
getToken :: (MonadError e m, MonadIO m, AsGetTokenError e) => m OAuthToken

So it becomes kind of easier to mix calls having different error types:

getPartitions :: (MonadError e m, AsGetTokenError e, AsHttpError e, MonadIO m) => m [Partition]

This is probably a better way to mix different error types with MonadError. We don’t get rid of the functional constraint though. The error types still “bubble-up” to the top and the method used to do the authentication is exposed to the clients of getPartitions. So if we switch the authentication mechanism and get different error types we will have to change all the functions calling getPartitions.

I think we need to really take care of this kind of situation because it makes software a lot harder to evolve when a small change has ripple effects across all the software layers.

Errors translation / encapsulation

What we can do is to define a new error type subsuming GetTokenError and HttpError:

data ServiceError =
  AuthenticationError GetTokenError |
  CallError HttpError

makeClassy ''ServiceError

instance AsGetTokenError ServiceError
instance AsHttpError ServiceError

And we need a bit of boilerplate to do the translation between getToken and callService. To be able to “liberate” those 2 functions from their MonadError e m, MonadIO m constraints we can run them with the minimum stack which satisfies these constraints, this is ExceptT <error type> IO a:

runOAuthToken :: (MonadError e m, AsCliError e, MonadIO m) => ExceptT GetTokenError IO a -> m a
runOAuthToken = runExceptTIO (review _AuthenticationError)

runCallService :: (MonadError e m, AsCliError e, MonadIO m) => ExceptT HttpError IO a -> m a
runCallService = runExceptTIO (review _HttpError)

runExceptTIO :: (MonadError e m, MonadIO m) => (f -> e) -> ExceptT f IO a -> m a
runExceptTIO mapError ioa =
  do valueOrError <- liftIO (runExceptT ioa)
     fromEitherM (throwError . mapError) valueOrError

(fromEitherM comes from the from-sum package)

Then we can make both calls and “unify” them under new constraints:

getPartitions :: (MonadError e m, AsServiceError e, MonadIO m) => m [Partition]
getPartitions =
  do token <- runToken getToken
     runService (callService token partitionsRequest)


Unfortunately being more abstract with getToken breaks the use of <|> which was so convenient. One option for fixing it is to define a similar operator for mtl classes:

(<|>) :: (MonadError e m, Semigroup e) => m a -> m a -> m a
(<|>) ma1 ma2 =
  catchError ma1 (\e1 ->
    catchError ma2 (\e2 ->
      throwError (e1 <> e2)))

But this still doesn’t work given the way we have defined our errors as prisms over a general error type! We need to define a extension of <|> which uses the appropriate Prism to aggregate errors:

(<|?>) :: (MonadError e m, Semigroup o) => Prism' e o -> m a -> m a -> m a
(<|?>) p ma1 ma2 = catchError ma1 (\e1 ->
  case e1 ^? p of
    Nothing -> throwError e1
    Just o1 ->
      catchError ma2 (\e2 ->
        case e2 ^? p of
          Nothing -> throwError e1
          Just o2 -> throwError (review p (o1 <> o2))))

And finally

getToken :: (MonadError e m, MonadIO m, AsGetTokenError e) => m OAuthToken
getToken =
  (<|?>) _GetTokenError

This is not syntactically as nice as before but we can improve that:

infix 4 <!>
infix 3 <?>

data Alternating a = Alternating a a

(<!>) :: MonadError e m => m a -> m a -> Alternating (m a)
(<!>) = Alternating

(<?>) :: (MonadError e m, Semigroup o) => Prism' e o -> Alternating (m a) -> m a
(<?>) = error "left to the reader"

getToken =
  _GetTokenError <?>
  getTokenFromEnvironment <|>

What about laws?

This is not the happy end of the story. The Alternative type class specifies that <|> must be an associative operation. This means that we probably need to prove that our <|> and its evil twin <|?> are associative operators. What does that mean for instances of MonadError e m? By the way what is even a MonadError e m?

I was pretty shocked to realize that MonadError doesn’t come up with any laws in Haskell! We need to turn to Purescript for this:

-- | - Left zero: `throwError e >>= f = throwError e`
-- | - Catch:     `catchError (throwError e) f = f e`
-- | - Pure:      `catchError (pure a) f = pure a`

And we probably need an additional law for our <|> operator:

-- | - Catch associativity: `catchError (catchError ma f1) f2 =
--                           catchError ma (\e -> catchError (f1 e) f2)`

Since many instances already exist for MonadError the best we can do is to check if those laws hold for these instances.

I was at first very enthusiastic about that. There are 2 libraries in Haskell which can be used to retroactively find properties for a give API: quickspec and speculate. However I was unable to get any of them to compile / find laws. Since I had been shaving too many yaks on this application I decided to stop there. But I hope I will get some time to get those libraries to find laws.

Anyway, the best I could do was to check the laws for all the instances of MonadError: EitherT, ListT, MaybeT, ExceptT and so on. I verified a bunch of them using QuickCheck and they seem to hold. Actually if that wasn’t the case I would be surprised and would probably question the implementation. I also proved that the laws were holding for the Either instance of MonadError. It goes like this, for the associativity of catch:

-- this is what we want to prove
catchError (catchError m k2) k1 == catchError m (\\e -> catchError (k2 e) k1)

--> case 1: m = Left t
catchError (catchError (Left t) k2) k1 == catchError (Left t) (\\e -> catchError (k2 e) k1)

-- we apply on each side the definition of catchError for Either when the (m a)
-- value is a Left value which is to call the handler, CQFD
catchError (k2 t) k1 = catchError (k2 t) k1

--> case 2: m = Right a
catchError (catchError (Right a) k2) k1 == catchError (Right a) (\\e -> catchError (k2 e) k1)

-- we apply on each side the definition of catchError for Either when the (m a) value
-- is a Right value which is to leave it as it is, CQFD
catchError (Right a) k1 = Right a

-- we do it once more on the left side of the equation
Right a = Right a

I suspect that it is possible to prove the laws for other MonadError instances, StateT would probably be interesting.

What have we learned?

There are plenty of lessons for me in this exercise:

  • Haskell is full of interesting generalization, <|> is one of them and worth spotting in production code

  • Haskell libraries are not necessarily complete and can probably benefit from contributions

  • The modularity story of Haskell is not as obvious as what you might think. I had to think hard to find a satisfying solution and one huge problem is still unsolved. How can I test my full application code with a dummy authentication, which would always succeed? This will be the subject of a following post!

No comments: