I have now been working with Haskell for around 3 months and I can say that it is a pretty enjoyable experience. Using Functional Programming is a lot more straightforward than with Scala and I rarely have to fight the syntax. I haven't been severely bitten by laziness yet (meaning I cannot call myself an "experienced" Haskell developer :-)) and I am even starting seeing the advantages of it in terms of mental flow: "I don't have to care about this, if it's not evaluated it will cost me nothing".
I have also had my share of head scratching with the mtl
(Monad Transformer Library) and with lenses. But this is not
what I want to write about today. My big preoccupation is not so much how we can use technique X or Y to write
something in the small, but rather how we can write code in the large. Already at the level of a service using a
database, writing to S3 and publishing/subscribing to an event broker the question of code organisation becomes a major
one.
Organising Haskell code
In languages like Scala you have packages, traits and classes and various dependencies injection libraries to give you some guidance. In Haskell you will have to read a handful of blog posts and make your own opinion. Should you use Free Monads and an "onion architecture"? Should you use the mtl, the "next level mtl with classy optics" or a bit of both? Or maybe the Service Pattern? Should we call it the Handle Pattern? There is also a library for dependency injection using Template Haskell.
With my previous Scala project I experimented with new ideas for dependency injection and we open-sourced a library
called
Grafter to support this approach. This library has served us well even if it
could definitely be improved. In Grafter we use a cool technique called "tree rewriting" to help with the wiring and
re-wiring of our application. But that library relies on some "tricks" which are not available to Haskell, like the
capability
to collect objects of a given type at runtime (effectively doing an isInstanceOf
test) or reflection to access the
properties of a given case class. Not only we cannot do that in Haskell but we don't even have objects to begin with!
So with Haskell I was forced to revisit my choices for structuring code. Maybe I should start with what I want to achieve?
My objectives
My overall goal is to be fairly productive producing code across several applications and teams, possibly sharing libraries, in an environment where objectives can shift really fast.
This means:
-
be able to declare components having a public interface which is just a list of functions. "Information hiding" is not just something we learn in textbooks, it is very important to isolate from changes and prevent everything to become a huge bowl of spaghetti. Also in the kind of systems I am working on, code reuse and evolution does not happen at the function level. One day you use service 1 to get product definitions, the next you have to use service 2 and up to 10 functions need to be reimplemented. So the good old idea of "component" still makes sense.
-
have simple function definitions, with mostly
IO
for an easy interoperability of different components. In Haskell, functions, especially the effectful ones, can easily be abstracted to be returning some monadm
with all sorts of constraints:MonadReader
,MonadBaseControl
,MonadCatch
,MonadFileSystem
,MonadLogger
and so on. At first glance this doesn't look like a major issue but my (limited) experience tells me that this leads to: 1. complex type signatures 2. playing lots of type tetris when interacting with several such functions. Not fun. -
have components declare their dependencies and configuration at their declaration site. This way they are kind of "self-contained" and it should be easy to just extract one of them to a new library for sharing
-
have a way to easily wire a full application from a list of all wanted components and configurations. If a new dependency is added to a component, we shouldn't have to change the wiring code. This really helps with maintaining the code. You need a new service? Just add a dependency. You want to split an existing component in 2? This is just 2 or 3 steps.
-
have a way to replace specific components in an existing application. This is useful in all sorts of situations. For example you want to test how well your application performs if one endpoint responds faster, or always respond "ok". Just write
replace component
or something similar and you're done. When you want to test a complex piece of business logic orchestrating different services you can just mock them out, all of them, or just some of them.
My hunch is that all of this is crucial to scale development to grow a medium size application or to grow a team working on interrelated services.
A design space
Interfaces
Objective n.1 can be satisfied by either using typeclasses or creating a record of functions. Typeclasses can declare
their requirements: I can do MonadFileSystem
if I know how to do MonadIO
for example. And typeclasses can have
instances, even with some rules of overriding them. Having some language support is nice! Or is it?
First of all you need to understand how typeclass resolution works and understand the difference between Overlapping
and Overlappable
. Then you might have to jump through some hoops when mixing different components together having
different constraints. Haskell is so polymorphic that it is only when you are really executing an expression that
the underlying data structure supporting the constraints "appears". For example if I runReaderT action r
then the
MonadReader r
constraint on action
gets resolved and action
appears to be a ReaderT r m a
. This possibility of
delaying the selection of the concrete data structure in Haskell is a great force, but it makes the code also hard to
understand because you have to run the typeclass resolution algorithm in your head to understand what's going on.
Worse, I experienced a strange bug 2 weeks ago with the nakadi-client
library.
After a refactoring I was making requests which were unauthenticated. How is that possible?
I am using local
to modify the environment of MonadNakadi
which is some sort of MonadReader
to access Zalando event bus.
And I am indeed setting an authentication token on the requests. It turns out that I had 2 layers of Reader
in my stack
and I was probably not modifying the right one. I eventually refactored the code to set authentication earlier in
the process without local
and things were back to normal. My inability to understand what was going on lead to a bug
which could not be caught by the compiler. Not cool. (the nakadi-client
library is very cool though, I'm so grateful
it exists :-)).
When I read about the Handle Pattern, that was an illumination, this is what I want, a simple collection of functions with a simple interface.
For example I might want to declare a Calculator
as:
data Module =
Module
{ add :: Int -> Int -> IO Int
, multiply :: Int -> Int -> IO Int
}
When I get such a record, the implementation is completely hidden, I am protected from any evolution of the implementation. How do I even create such a component? Like anything in Haskell, with a function:
new :: Adder.Module -> Multiplier.Module -> Module
new adder multiplier = Module (addWith adder) (multiplyWith multiplier)
Besides the guilty pleasure of reusing a well-known OO keyword this gives us a way to declare the dependencies for our
component. And addWith
, multiplyWith
are (private) part of the module implementation.
Interoperability
Each function returns IO
so the interoperability is maximal, no crazy constraints to accommodate.
There is a small issue though. A small issue which drove me mad.
We need to operate our services, not just develop them.
When something goes wrong in production it is incredibly useful to have a FlowId
identifying requests coming from
clients and flowing through all of our services. But if you have components returning IO
values there is no way to pass
this FlowId
from one component to another. This necessitates a MonadReader FlowId
constraint, or maybe a
ReaderT FlowId IO a
return type. Then we are back to more complex type signatures for something which is actually a
very small concern in the scale of things. And that concern is "polluting" our whole ecosystem! Same if we want to use
a logging library like katip we need to add something like MonadReader KatipContext
everywhere. Just for a single small concern!
I tried many different ideas to get around this issue and end back in IO but nothing worked. Because I believe that nothing
can work in a proper functional programming context. If you want the callee to know about its context and its caller, you
need to pass some information! So you need a form of Reader
and we decided to extend the IO
type to RIO
, just one
more letter:
newtype RIO a = RIO { runRIO :: Env -> IO a }
where Env
is defined as
data Env =
Env
{ context :: Maybe Context
, namespace :: Namespace
} deriving (Eq, Show)
This means that in addition to doing IO the functions of a component can require a bit of knowledge from the caller:
-
a
Context
for example to pass theFlowId
-
a Namespace (a list of names) for example to describe some nested processes:
processing event { "eventId" = "123" } > getting master data > authenticating { "role" = "admin" }
Context
and Namespace
are essentially "stringly" data just modelling the context of the caller with the following
properties:
- contexts can be replaced, setting a context removes the previous one
- namespaces are appended, like breadcrumbs on a website
I hear some of you saying that we could get fancy and use something like:
newtype RIO r a = RIO { runRIO :: r -> IO a }
Now we don't have to be concrete in the environment type. However we lose a lot in terms of simplicity and we risk
having to deal with RIO r1 a
and RIO r2 b
and have to find ways to unify r1
and r2
. No problem, let's use
lenses! And then we go through even more complex type signatures.
I am clearly making a compromise here. By not using the most typed signature I expose myself to the danger of programming with strings. But I think this is worth it because we get easier code for things which matter a lot more than flow ids or contextual logging.
Configuration
Inside each "Module" file there should be a declaration for the module configuration:
data Config =
Config
{ invertSigns :: Bool }
And new
becomes:
new :: Config -> Adder.Module -> Multiplier.Module -> Module
new (Config invert) adder multiplier =
Module (addWith invert adder)
(multiplyWith multiplier)
addWith :: Bool -> Adder.Module -> Int -> Int -> RIO Int
addWith True adder a b = pure (adder & add a b)
addWith False adder a b = pure (- (adder & add a b))
This way components are self-contained and easy to extract to libraries
Replacing components
What do we do about wiring a bunch of components and in particular how to replace one of them, right at the bottom of
the stack? Do you need to recreate the full application, calling new
all over the place?
I found a neat trick to do this: a "registry", and ... the State
monad.
Let's create a data structure holding all the components we want to build:
data Modules =
Modules
{ _adder :: Maybe Adder.Module
, _multiplier :: Maybe Multiplier.Module
, _calculator :: Maybe Calculator.Module
, _config :: Config
}
How do we make a calculator
? We get it from Modules
. If missing we create it with new
by first getting its dependencies:
makeCalculator :: State Modules Calculator.Module
makeCalculator = do
modules <- get
case _calculator modules of
Just c -> pure c
Nothing -> do c <- new <$> makeConfig <*> makeAdder <*> makeMultiplier
_ <- put c
pure c
Then we put the newly created component in the registry, done. (recursively for all the dependencies!)
With makeCalculator
we can pass a Modules
value where all the components are set to Nothing
:
prodModules =
Modules
{ _adder = Nothing
, _multiplier = Nothing
, _calculator = Nothing
, _config = prodConfig
}
Replacing a component is super easy, just set it in the registry:
testModules =
prodModules
{ _adder = Just (testAdder)
}
We can automate a bit of that with 2 typeclasses:
-- means that there is a possibility to get the Module from a registry s
-- (and possibly get nothing) and also to set it in the registry s
class Register s m where
access :: s -> Maybe m
register :: s -> m -> s
-- Make s m means that there is a way to build the module m given an initial configuration s
class Make s m where
make :: State s m
Then for a given module a Makeable
instance can be created like this:
instance ( Register s Module
, Make s Config
, Make s Adder.Module
, Make s Multiplier.Module
) => Make s Module where
makeIt = create3 new register make make make
This declares all the dependencies for a given component and make3
is just a function generalizing the makeCalculator
above to any constructor using the implicit typeclass instances so there's nothing to implement. Adding/Removing a new dependency to a
component becomes pretty trivial.
Inside the top level application we also need to create some Registry
instances to describe how to interact with the
"registry" for each component and how to "make" the various Config
values from it:
-- | find the Adder module in the registry
instance Register Modules Adder.Module where
access m = _adder s
register s m = s { _adder = Just m }
-- | declare that `Adder.Config` can be made directly
-- by extracting the `Adder.Config` from the `Modules`
-- data structure.
-- This uses the `config` and `adderConfig` lenses generated for `Modules`
instance Make Modules Adder.Config where
make = get <&> (^. config . adderConfig)
This part will be generated using Template Haskell in the future.
Starting services
The real story is always more complicated :-). Some components are "stateful" because they hold a database connection
or a cache, so they have to be started. The solution to this new requirement is not difficult: use a new
function
returning a RIO Module
and starting things. Then when doing the application wiring we operate in StateT RIO
instead
of just State
. This is also what makes the big difference between Config
and Module
in a component file: Config
is for pure data and Module
can trigger some side effects for its creation.
Conclusion
This all very new and currently only tested on a medium-size service. But I really like this approach: not a lot of type magic, enough flexibility, clear guidance on how to write code. It is probable that we will get more questions along the road (otherwise why would the world need so big dependency injection libraries?), I will report on them. In the meantime please share your ideas, add your comments. Especially if you are thinking that we are making a huge mistake somewhere that should be addressed right now!
No comments:
Post a Comment