Pages

18 June 2012

Strong functional programming

Programming in a statically typed functional language (think Haskell) gives quite a few safety guarantees as well as a lot of expressivity thanks to the use of higher-order functions. However the type system lets you write some problematic functions like:

loop :: Int -> Int
loop n = 1 + loop n

Why is that problematic? For one, this function never terminates, that can't be good. But also, and this is very annoying, we can not anymore use equational reasoning:

loop 0 = 1 + loop 0

// substract (loop 0) from both sides, and use `x - x = 0`
0 = 1

0 = 1 can then allow you to prove anything => #fail.

In this post I'm going to present the ideas of "Strong functional programming" (or "Total functional programming") and show how they can help avoid this situation above where unrestricted recursion goes wrong.

Note: most of this post is a transliteration of an article of David Turner, "Total Functional Programming", 2004. I'm also using pieces of "Ensuring stream flows" and this blog article by @sigfpe.

Life without ⊥ (bottom)

The type Int which is used for the definition of the loop function is not as simple as it looks. It is actually Int (⊥) where ⊥ (called "bottom") a value denoting run-time errors and non-terminating programs. "Total functional programming" doesn't allow this type of values and this why it is called "Total": functions are always defined on their domain (their "input values"), they cannot return an undefined value.

Not having to deal with ⊥ has a lot of advantages:

Simpler proofs

As we've seen in the introduction, we can't really assume that (e - e = 0) when (e: Nat) without having to worry about the ⊥ case. When ⊥ is present, proof by induction can not be easily used as read in math textbooks. To refresh your memory, this proof principle states that for any property P:

if P(0)
and for all n, P(n) => P(n + 1)
then for all n, P(n)

However if you allow ⊥ to sneak in, this principle becomes a tad more complex.

Simpler language design

Arguments evaluation

With no bottom value, one divide between programming languages just disappears. We don't really care anymore about strict evaluation (as in Scheme, Scala) vs lazy evaluation (as in Haskell), the evaluation of a function doesn't depend on the evaluation of its arguments:

-- a function returning the first argument
first a b = a

-- with strict evaluation
first 1 ⊥ = ⊥

-- with lazy evaluation
first 1 ⊥ = 1
Pattern matching

But that's not the only difference, another one is pattern matching. In Haskell, when you expect a pair of values, you won't be able to match that against ⊥:

-- will not match if (a, b) is ⊥
first (a, b) = a

You can argue that this is a reasonable language decision to make but that's not the only possible one. In Miranda for instance, the match will always succeed:

-- a bottom value can be "lifted" to a pair of bottom values
(⊥a, ⊥b) = ⊥
& operator

Same thing goes with the evaluation of operators like & (logical AND). It is defined on Booleans by:

True  & True  = True
True  & False = False
False & True  = False
False & False = False

If you introduce ⊥ you have to define:

⊥ & y = ?
x & ⊥ = ?

Most of the programming languages decide to be "left-strict", i.e they evaluate the first argument before trying to evaluate the whole expression:

⊥ & y = ⊥
x & ⊥ = False if x = False
x & ⊥ = ⊥ otherwise

Again, that might seem completely obvious to us after years of using this convention, but we can imagine other alternatives (doubly-strict, right-strict, doubly-non-strict), and this breaks the classical symmetry of the & operator in mathematics.

More flexibility in language implementation

Reduction

Let's introduce a bit of programming languages theory. When you define a programming language, you need to specify:

  • how to build language expressions from "atomic" blocks. For example, how to build an if expression from a boolean expression and 2 other expressions

    if booleanExpression then expression1 else expression2

  • how to "execute" the language expressions, usually by specifying how to replace values in expressions with other values. For example using a specific boolean value in an if expression:

    a = true
    if a then b else c => c

As you can see, executing a program means "reducing" large expressions to smaller ones, using specific rules. This raises some immediate questions:

  1. can we reach a final, irreducible, expression (a normal form)?

  2. if several "reduction paths" are possible (by using the reduction rules in different ways), do they converge on the same normal form?

If you can show that your programming language satisfies 2. when 1. is true, then you can say that it has the "Church-Rosser" property or it is confluent. Another version of this property, the "strong" form says that every reduction leads to a normal form and it is unique whatever reduction path you took.

Total functional programming can be called "Strong" because it has this property. The consequence is that the language implementation is free to select various evaluation strategies, for example to get better memory consumption or parallelism, without any fear for changing the program meaning.

Not all green

If Strong Functional Programming is so easy don't we all use it right now? What are the disadvantages then?

Not Turing complete

The first issue with Strong Functional Programming is that it is not Turing complete. It computes programs which always terminate but cannot compute all the programs that always terminate. In particular it can not compute its own interpreter. Why is that?

Let's say the interpreter is a function eval taking as arguments, a program P and its input I (this example comes from here). Any program is a big sequence of symbol and can be coded into a number, so our eval interpreter can be seen as a function :: (Nat, Nat) -> Nat. Based on that I can build an evil function in my total language:

evil :: Nat -> Nat
evil code = 1 + eval code code

But there is also a number corresponding to the evil program. Let's call that number 666:

-- this means "apply the evil program aka '666' to the value 666"
-- by definition of the eval interpreter
eval 666 666 = evil 666

-- but if we go back to the definition of evil
evil 666 = 1 + (eval 666 666)

-- we can't get a value for 'evil 666', because it's equivalent to the equation 0 = 1
evil 666 = 1 + evil 666

So evil is a non-terminating program of my language, which is however supposed to terminate because built from 1 and eval (a terminating program by our hypothesis) => fail.

This is a troubling conclusion. One way out is to create a hierarchy of languages where each language above has enough power to write an interpreter for the language below. And, at the top of the hierarchy, we use a Turing-complete language. This is not unsimilar to isolating a typed, functional world from a non-typed, side-effecting one (aka "the real world").

Non terminating!

Yes, by definition. But non-terminating programs are super useful, you wouldn't want your operating system stop after a few keystrokes saying "I'm done, buddy".

The good news is that we can actually deal with this situation by using codata. But before we look at codata in detail, we're going to enumerate some of the rules of an "elementary" Strong Programming Functional language (or ESFPL).

The rules for non-termination

Let's adopt a Haskell-like syntax for our ESPFL. One thing we must be able to do is to define the "usual" datatypes:

data Bool    = True | False
data Nat     = Zero | Suc Nat
data List a  = Nil  | Cons a (List a)
data Tree a  = Nilt | Node a (Tree a) (Tree a)

and some functions on those types:

-- size of a Tree
size a :: Tree a -> Nat
size Nilt = 0
size (Node a l r) = 1 + size l + size r

-- filter a List with a function
filter f Nil = Nil
filter f (Cons a rest) if (f a)  = Cons a (filter f rest)
                       otherwise = filter f rest

Rule n.1: all case analysis must be complete

This one should feel obvious to anyone having used data types and pattern matching before. If you forget a case in your pattern matching analysis, you face the risk of a runtime exception: NoMatchError.

But this rule has a larger impact. You will also have to change your "standard" library. For example you can't define the function taking the first element of a list, head, as before. You need to provide a value to use when the list is empty:

-- taking the first element of a list
head a :: List a a -> a
head Nil default           = default
head (Cons a rest) default = a

The other alternative is to program with a data type where things can't go wrong:

data NonEmptyList a = NCons a (NonEmptyList a)

-- taking the first element of a non-empty list
head a :: NonEmptyList a -> a
head (NCons a rest) = a

This kind of discipline doesn't seem so harsh, but other difficulties arise with the use of built-in arithmetic operators, starting with divide. What do you do about divide 1 0?

There are propositions to build data types where "1 / 0" is defined and equal to "1"! It looks so weird to me that I'd certainly try to go with a NonZeroNat data type before anything else.

Rule n.2: type recursion must be covariant

I like this rule because it was not obvious to me at all when I first read it. What does it mean?

  1. You can define data types recursively
  2. You can use functions as values
  3. So you can build the following data type

    data Silly a = Very (Silly -> a)

With the Silly data type I can recurse indefinitely:

bad a :: Silly a -> a
bad (Very f) = f (Very f)

ouch :: a
ouch = bad (Very bad)

The ouch value is a non-terminating one, because the evaluation of bad will use bad itself with no case for termination. Rule n.2 is here to prohibit this. You can't define a data type with a function that use this data type as an input value. The fact that this rule was not obvious raises questions:

  • how do we know that we have found all the rules for ESFP?
  • is there a "minimal" set of rules?

I have no answer to those questions unfortunately :-(.

Rule n.3: recursion must be structural

The previous rule was about data type recursion, this one is about function recursion. More precisely the rule says that: "each recursive function call must be on a syntactic sub-component of its formal parameter". This means that if you make a recursive call with the function being defined, it has to be on only a part of the input data, like with the ubiquitous factorial function:

factorial :: Nat -> Nat
factorial Zero = 0
factorial (Suc Zero) = 1

-- we recurse with a sub-component of (Suc n)
factorial (Suc n)    = (Suc n) * (factorial n)

This rule is not too hard to understand. If you always recurse on data that is provably "smaller" than your input data, you can prove that your function will terminate. This rule can also accomodate the Ackermann function, which has 2 Nat parameters:

ack :: Nat Nat -> Nat
ack 0 n = n + 1

-- m + 1 is a shortcut for (Suc m)
ack (m + 1) 0       = ack m 1
ack (m + 1) (n + 1) = ack m (ack (m + 1) n)

By the way, if you don't know the Ackermann function, look it up! It has an amazing growth, with very small numbers. Try to evaluate ack 4 3 for fun :-)

How restrictive is this rule? Not so much. This rule authorizes to program:

  1. "primitive recursive" functions. Those are the functions, like factorial, which are defined using only simple recursion (think how addition (Suc n) m can be defined in terms of addition n m) and by composing other primitive functions

  2. other "total recursive" functions, like ack ("total" meaning "terminating" here). Indeed ack is not "primitive recursive". It was actually especially created to show that not all "total recursive" functions were "primitive recursive"

It turns out that all the functions which we can be proven to terminate by using first-order logic (without forall and exists) can be programmed with structural recursion. That's a lot of functions, but we have to change or programming practices in order to use only structural recursion. Let's look at an example.

Coding fast exponentiation

We want to code the "power" function. Here's a naive version:

pow :: Nat -> Nat ->
pow x n = 1,                   if n == 0
        = x * (pow x (n - 1)), otherwise

It is primitive recursive but not very efficient because we need n calls to get the result. We can do better than this:

pow :: Nat -> Nat -> Nat
pow x n = 1,                       if n == 0
        = x * pow (x * x) (n / 2), if odd n
        = pow (x * x) (n / 2),     otherwise

Unfortunately this version is not primitive recursive, because when we call pow we're not going from n + 1 to n. It is however obvious that we're "descending" and that this algorithm will terminate. How can we re-code this algorithm with primitive recursion?

The trick is to encode the "descent" in a data type:

-- representation of a binary digit
data Bit  = On | Off

-- we assume a built-in bits function
bits :: Nat -> List Bit

-- then the definition of pow is primitive recursive, because we descend on the Bit data type
pow :: Nat -> Nat -> Nat
pow x n = pow1 x (bits n)

pow1 :: Nat -> List Bit -> Nat
pow1 x n = 1
pow1 x (Cons On rest)  = x * (pow1 (x * x) rest)
pow1 x (Cons Off rest) = pow1 (x * x) rest

David Turner conjectures that many of our algorithms can be coded this way and for cases where it get a bit hairier (like for Euclid's gcd) we could authorize another type of recursion called "Walther's recursion". With this kind of recursion we can recognize a larger class of programs where recursion is guaranteed to terminate because we only use operations effectively "reducing" the data types (acknowledging that n / 2 is necessarily lower than n). Using "Walther's recursion" with a language having functions as first-class values is still an unsolved problem though.

Codata for "infinite" computations

After having looked at the rules for bounding recursion and avoiding non-termination, we would still like to be able to program an operating system. The key insight here is that our functions need to terminate but our data doesn't need to. For example, the Stream data type is infinite:

data Stream a = Cons a (Stream a)

In this perspective, an operating system is a bunch of functions acting on an infinite stream of input values. Let's call this type of infinite data, "codata", and see how we can keep things under control.

Here's our first definition, Colist (similar to the Stream type above):

-- "a <> Colist a" stands for "Cocons a (Colist a)"
codata Colist a = Conil | a <> Colist a

Does it breach the "Strong normalization" property that we described earlier? It doesn't if we say that every expression starting with a coconstructor is not reducable (or is in "normal form"), there is nothing which can be substituted. Whereas with a regular List you can evaluate:

Cons (1 + 2) rest = Cons 3 rest

This is why we need to have a new keyword codata to explicitly define this type of data. In a way, this is making a strong distinction between "lazy" and "strict" data, instead of using lazy evaluation to implement infinite data types.

Primitive corecursion

The equivalent of Rule n.3 for codata is that all recursion on codata must be "primitive corecursive". You have any idea of what it is? It is kind of the opposite / dual of the structural recursion for data types. Instead of proving that we always reduce the input value when making a recursive call we need to prove that we "augment" the result, by always using a "coconstructor" to create the result:

-- functions on codata must always use a coconstructor for their result
function a :: Colist a -> Colist a
function a <> rest = 'something' <> (function 'something else')

Then we can define functions like:

ones :: Colist Nat
ones = 1 <> ones

fibonacci :: Colist Nat
fibonacci = f 0 1
            where f a b = a <> (fibonacci b (a + b))

Another way of looking at the difference between data/recursion and codata/corecursion is that:

  • recursion on data is safe if we break-up the data in smaller pieces then recurse on those pieces
  • corecursion on codata is safe if we apply the recursion first then put that in a coconstructor which declares that we have infinite data

Codata is data which is (potentially) infinite, but observable. There are methods to safely extract results, one at the time, in a terminating way. That's exactly what the fibonacci function does. It builds a sequence of results, so that you can safely extract the first one, then you have a way to build the rest of the values.

This is why people also talk about the duality between data and codata as a duality between constructors and destructors. Or a duality between accessing the internals of a data type and observing the behavior of a codata type.

Coinductive proofs

When we try to reason about codata and corecursion, we can use coinduction (you saw that one coming didn't you :-)?). The principle of coinduction is the following:

2 pieces of codata are the same, "bisimilar" in the literature, if:

  • their finite part are equal (the "a" in "a <> rest")
  • their infinite part are the same

In other words, if the 2 sequences always produce the same values. Using that principle we can show the following theorem on infinite structures:

-- iterate a function: x, f x, f (f x), f (f (f x)),...
iterate f x = x <> iterate f (f x)

-- map a function on a colist
comap f Conil =  Conil
comap f a <> rest = (f a) <> (comap f rest)

Now, we'd like to show that iterate f (f x) = comap f (iterate f x):

iterate f (f x) = (f x) <> iterate f (f (f x))       -- 1. by definition of iterate
                = (f x) <> comap f (iterate f (f x)) -- 2. by hypothesis
                = comap f (x <> iterate f (f x))     -- 3. by definition of comap
                = comap f (iterate f x)              -- 4. by definition of iterate

The coinduction principle is used exactly in step 2. If "iterate f (f (f x))" and "comap f (iterate f (f x))" are the same, then adding a new value "(f x)" will preserve equality.

Limitations

The definition of "primitive corecursive" is a bit restrictive. It prevents useful, and well-founded, definitions like:

-- "evens" would need to come first after <> for this definition to be corecursive
evens = 2 <> (comap (+2) evens)

Note that we can't allow any kind of construction with "<>". For example the bad function below is not terminating:

-- infinite lists
data Colist a = a <> Colist a

-- the tail of an infinite list
cotail a :: Colist a -> Colist a
cotail a <> rest = rest

-- don't do this at home
bad = 1 <> (cotail bad)

In "Ensuring streams flow", David Turner shows that it is possible to develop an algorithm which will check where the corecursion is safe and where it isn't. The formal argument takes a few pages of explanation but the idea is simple. We need to count the "levels of guardedness":

  • having evens recursing inside comap is not a problem, because comap will add a new coconstructor

  • having bad recursing inside cotail is a problem, because cotail "removes" a coconstructor

So the idea of the algorithm is to count the number of times where we "add" or "remove" coconstructors to determine if the corecursion will be safe or not.

Comadness

To conclude with the presentation of codata, I want to briefly introduce comonads and give a short example to give an intuition of what it is.

If you look at the definition in the books and listen to the category theory people, you will read or hear something like: "get the monad definition, change the arrows direction and you have a comonad":

-- the comonad operations

-- the dual of return or unit for monads
extract :: W a -> a

-- the dual of bind or flatMap for monads
cobind :: W a -> b -> W a -> W b

Intuitively you can think of a comonad as:

  • "I can extract things from a Context" (to contrast with the "I can put things in a context" of monads)

  • If I have a function which "computes values from inputs values which might change depending on the context": W a -> b, then I can use "values in context", W a, to return other "values in context", W b

This is still not quite clear, so let's give a simple example based on Colist. Colist is a comonad which we can use like that:

-- a Colist of Nats where each new value is the previous value + 1
nats = 0 <> comap (+1) nats

-- a function taking the first 2 elements of a Colist
firstTwo a :: Colist a -> (a, a)
firstTwo a <> b <> rest = (a, b)

-- now, let's cobind firstTwo to nats
cobind firstTwo nats = (0, 1) <> (1, 2) <> (2, 3) <> ...

Sometimes you can read that List is a monad representing the fact that a function can produce non-deterministic results ("zero or many"). By analogy with the List monad, we can say that Colist is a comonad that represents the function can have non-deterministic inputs.

Conclusion

I touched on a lot of subjects with this post, for which there would be a lot to say (and certainly with less approximation that I did). In the first part we saw that, when dealing with finite data, the notions of termination, computation and recursivity and intimately related. In the second part we saw that it can be beneficial to give a special status to infinite data and to provide for it the same kind of tools (proofs with corecursion, comonads) that we usually have for data.

I hope that the subject of codata will become more and more popular (this recent proposal for a codo notation in Haskell might be a good sign). We can see a lot of architectures these days turning to "Event Sourcing". I suspect that Codata, Comonads, Costate and all the Cotypeclassopedia will prove very useful in dealing with these torrents of data.

22 March 2012

Coming full circle

In this post I want to show a few of the upcoming features in specs2-1.9 and also take a step back on how specs2 features and implementation unfolded since I started the project one year ago.

Why would you want to rewrite from scratch?

Good question, that seems to be very masochistic :-). Not only because of the sheer amount of features to re-implement but more importantly because of the difficulty to move users to a new version.

I explained in details the reasons why I thought the rewrite was necessary, how it would benefit users and how to migrate in a previous blog post. What I want to emphasize here is the compromise I decided to make at that time:

  • more concurrency and reliability
  • less practicality

The main reason for this compromise was immutability. Forcing myself to immutability had opened the gates of robust and concurrent software but at the same time I had to cut back on the syntax I had proposed for the original specs1 project. For example it wasn't possible anymore to write:

    // this example can only be "registered" if there are side-effects
    "hello must have 5 letters" in {
      "hello" must have size(5)
    }
    "world must have 5 letters" in {
      "world" must have size(5)
    }

Instead, I proposed to write:

    // examples are "linked" together with the ^ operator
    "hello must have 5 letters" ! e1 ^
    "world must have 5 letters" ! e2

    def e1 = "hello" must have size(5)
    def e2 = "world" must have size(5)

The feed-back was immediate. Some developers who were sent a preview liked it but others told me: "NEVER I would use this horrible syntax". Ok, fair enough, I can't blame you, it's a bit ugly on the right.

I decided then that I would relax my constraints a little bit and add one, just one, var. It would only be used to accumulate examples when building the specification, to enable the good old specs1 style. This was also, at least partially, solving the migration problem since a full rewrite of the specifications was not necessary.

Not everything was as cool as in the original specs1 tough. In particular the super-easy "isolated" execution model was missing. Until...

Lightbulb

... Until it struck me, just one month ago, that reintroducing this mode of execution was less than 10 lines of code!

Because I had chosen the functional paradigm of "executing a Specification" <=> "interpreting a data structure" it was really easy to change the interpretation to something declaring: "for each example, execute the code in a clone of the Specification so that all local variables are seen as fresh variables".

In terms of product management, it was a "magic" feature almost for free! To me, this is the illustration of a principle of Software: "if code is a liability, maximise your return on investment".

Maximize the ROI

I'm sure you've already read something like: "features are an asset, code is a liability". But I haven't yet read the logical consequence of it: "Maximize the ROI: extract as many features as you can from your existing code". Indeed every time we write a piece of code, it's worth wondering:

  • how can this be put to a better use?
  • can I add something slightly different to provide a new useful feature?
  • can I generalize it a bit to make sense of it in another context?
  • can I make it composable with an existing feature to get a new one?

That's exactly what happened with the "isolated" feature above. Here's another example of this principle in action.

IO and serialization are not cheap

A few months ago, I developed a feature to create an index page. On this index page I had to show the status of specifications which had been executed as part of the previous run. This meant that I had to store somewhere, on the file system, the results of each specification execution.

In terms of feature price, this is not a particularly cheap one. Everytime there is some IO interaction and some kind of serialization, the number of possible issues to consider is not so small. In a way that was really an "iceberg" feature: not a lot of functionality on the surface, but a lot of machinery under the water. So I wondered how I could improve my ROI on this. Hmm... since I'm writing status information to the disk there might be a way to reuse it!

Indeed, I can use this information to selectively re-run only the failed examples, or the skipped ones, or... And so on. From there, a new "feature space" opens, and the initial investment starts making sense.

It is also possible to maximize the ROI on existing features. A feature is an "asset". Perhaps, but it's not completely free either. You have to explain it, to promote it, to show when and how it interacts with the rest of the software. This is why maximizing the ROI of features makes sense as well.

That's exactly what happened with the brand-new "isolated" feature.

All expectations

Year after year I'm looking at the specifications that people are writing with specs/specs2. Kind of my hallway usability test for open-source libraries... I usually see several "styles" of specifications and one style is pretty frequent:

    "This is an example of building/getting/processing a datastructure" >> {
       // do stuff
       val data = processBigData

       // check expectations
       data must doThis
       data must haveThat
       data.parts must beLikeThis
       // and so on...
    }

In that style of specification, there is usually one action and lots of things to check. It is very unfortunate that most of the testing libraries out there will stop at the first failure. Because it really makes sense to collect all of them at once instead of having to execute the specification multiple times to get to all the issues.

Is there a way to "collect" all the failures in specs2-1.8.2? Not really. You can try use "or" with the expectations:

       (data must doThis) or
       (data must haveThat) or
       (data.parts must beLikeThis)

And that will collect all the failures up to the first success, and then it will stop executing the rest. Besides, the "or" syntax with all the parenthesis is not so nice.

After a while I realized that the only way to make this work was to use yet another mutable variable to silently register each result. But then I'd be back to the old specs1 problem. What about concurrency? How can I prevent each concurrent example to step on another example results?

Well, that's easy, I have the "isolated" feature now! Each example can run in its own copy of the specification and the mutable variable will only collect the results from that example safely. The result is a feature that's easy to implement but also easy to use because it's just a matter of mixing a trait to the specification!

Coming full circle

Can I go further with the same thinking? Why not?!

From the beginning of specs2, I liked the fact that the so-called "Acceptance specification" style allowed me to write a lot of text about what my system behavior and then annotate it with the actual code. The price to pay was all the symbols on the right of the screen which appeared utterly cryptic to some people (to say it nicely).

Then, last week, I realized that I could rely on something which exists in most code editors: code folding! In a classical JUnit TestCase, specs1 specification or specs2 mutable specification if you fold the code, you're left with only the text, forming a narrative for your system. If you see it like that, any specs2 mutable specification can be turned into an acceptance specification, just a few features are missing:

Not a lot to implement really. Most of the machinery is already provided by the org.specs2.Specification trait. This means that, for a very reasonable "price", you can now use "mutable" specifications in specs2 with a great deal of expressivity (nothing's perfect though, there are small issues due to semi-column inference for example, see here for variations around the "Stack" theme).

Full circle and blurred lines

To me, it really feels that I've come full circle:

  1. I rebuilt from scratch the "Specification" concept from specs1, throwing away the syntax
  2. I reintroduced some mutability very carefully to get some of that syntax back
  3. I built upon all the immutable code to finally end up with exactly what I would have liked to have in specs1!

My only concern now is that newcomers might feel lost because the library is not really prescribing a specific style: "should I use a 'unit' style or an 'acceptance' style? And what are the differences between the 2 anyway?". I hope that they'll realize that this is actually an opportunity. An opportunity to try out different ways of communicating and then choosing the most efficient or pleasing.