## 18 June 2012

### Strong functional programming

Programming in a statically typed functional language (think Haskell) gives quite a few safety guarantees as well as a lot of expressivity thanks to the use of higher-order functions. However the type system lets you write some problematic functions like:

``````loop :: Int -> Int
loop n = 1 + loop n
``````

Why is that problematic? For one, this function never terminates, that can't be good. But also, and this is very annoying, we can not anymore use equational reasoning:

``````loop 0 = 1 + loop 0

// substract (loop 0) from both sides, and use `x - x = 0`
0 = 1
``````

`0 = 1` can then allow you to prove anything `=> #fail`.

In this post I'm going to present the ideas of "Strong functional programming" (or "Total functional programming") and show how they can help avoid this situation above where unrestricted recursion goes wrong.

Note: most of this post is a transliteration of an article of David Turner, "Total Functional Programming", 2004. I'm also using pieces of "Ensuring stream flows" and this blog article by @sigfpe.

### Life without ⊥ (bottom)

The type `Int` which is used for the definition of the `loop` function is not as simple as it looks. It is actually Int (⊥) where ⊥ (called "bottom") a value denoting run-time errors and non-terminating programs. "Total functional programming" doesn't allow this type of values and this why it is called "Total": functions are always defined on their domain (their "input values"), they cannot return an `undefined` value.

Not having to deal with ⊥ has a lot of advantages:

#### Simpler proofs

As we've seen in the introduction, we can't really assume that `(e - e = 0)` when `(e: Nat)` without having to worry about the ⊥ case. When ⊥ is present, proof by induction can not be easily used as read in math textbooks. To refresh your memory, this proof principle states that for any property `P`:

``````if P(0)
and for all n, P(n) => P(n + 1)
then for all n, P(n)
``````

However if you allow ⊥ to sneak in, this principle becomes a tad more complex.

#### Simpler language design

##### Arguments evaluation

With no bottom value, one divide between programming languages just disappears. We don't really care anymore about strict evaluation (as in Scheme, Scala) vs lazy evaluation (as in Haskell), the evaluation of a function doesn't depend on the evaluation of its arguments:

``````-- a function returning the first argument
first a b = a

-- with strict evaluation
first 1 ⊥ = ⊥

-- with lazy evaluation
first 1 ⊥ = 1
``````
##### Pattern matching

But that's not the only difference, another one is pattern matching. In Haskell, when you expect a pair of values, you won't be able to match that against ⊥:

``````-- will not match if (a, b) is ⊥
first (a, b) = a
``````

You can argue that this is a reasonable language decision to make but that's not the only possible one. In Miranda for instance, the match will always succeed:

``````-- a bottom value can be "lifted" to a pair of bottom values
(⊥a, ⊥b) = ⊥
``````
##### & operator

Same thing goes with the evaluation of operators like `&` (logical AND). It is defined on Booleans by:

``````True  & True  = True
True  & False = False
False & True  = False
False & False = False
``````

If you introduce ⊥ you have to define:

``````⊥ & y = ?
x & ⊥ = ?
``````

Most of the programming languages decide to be "left-strict", i.e they evaluate the first argument before trying to evaluate the whole expression:

``````⊥ & y = ⊥
x & ⊥ = False if x = False
x & ⊥ = ⊥ otherwise
``````

Again, that might seem completely obvious to us after years of using this convention, but we can imagine other alternatives (doubly-strict, right-strict, doubly-non-strict), and this breaks the classical symmetry of the `&` operator in mathematics.

#### More flexibility in language implementation

##### Reduction

Let's introduce a bit of programming languages theory. When you define a programming language, you need to specify:

• how to build language expressions from "atomic" blocks. For example, how to build an `if` expression from a boolean expression and 2 other expressions

`if booleanExpression then expression1 else expression2`

• how to "execute" the language expressions, usually by specifying how to replace values in expressions with other values. For example using a specific boolean value in an `if` expression:

`a = true`
`if a then b else c => c`

As you can see, executing a program means "reducing" large expressions to smaller ones, using specific rules. This raises some immediate questions:

1. can we reach a final, irreducible, expression (a normal form)?

2. if several "reduction paths" are possible (by using the reduction rules in different ways), do they converge on the same normal form?

If you can show that your programming language satisfies 2. when 1. is true, then you can say that it has the "Church-Rosser" property or it is confluent. Another version of this property, the "strong" form says that every reduction leads to a normal form and it is unique whatever reduction path you took.

Total functional programming can be called "Strong" because it has this property. The consequence is that the language implementation is free to select various evaluation strategies, for example to get better memory consumption or parallelism, without any fear for changing the program meaning.

### Not all green

If Strong Functional Programming is so easy don't we all use it right now? What are the disadvantages then?

#### Not Turing complete

The first issue with Strong Functional Programming is that it is not Turing complete. It computes programs which always terminate but cannot compute all the programs that always terminate. In particular it can not compute its own interpreter. Why is that?

Let's say the interpreter is a function `eval` taking as arguments, a program `P` and its input `I` (this example comes from here). Any program is a big sequence of symbol and can be coded into a number, so our eval interpreter can be seen as a function :: (Nat, Nat) -> Nat. Based on that I can build an `evil` function in my total language:

``````evil :: Nat -> Nat
evil code = 1 + eval code code
``````

But there is also a number corresponding to the `evil` program. Let's call that number `666`:

``````-- this means "apply the evil program aka '666' to the value 666"
-- by definition of the eval interpreter
eval 666 666 = evil 666

-- but if we go back to the definition of evil
evil 666 = 1 + (eval 666 666)

-- we can't get a value for 'evil 666', because it's equivalent to the equation 0 = 1
evil 666 = 1 + evil 666
``````

So `evil` is a non-terminating program of my language, which is however supposed to terminate because built from `1` and `eval` (a terminating program by our hypothesis) `=> fail`.

This is a troubling conclusion. One way out is to create a hierarchy of languages where each language above has enough power to write an interpreter for the language below. And, at the top of the hierarchy, we use a Turing-complete language. This is not unsimilar to isolating a typed, functional world from a non-typed, side-effecting one (aka "the real world").

#### Non terminating!

Yes, by definition. But non-terminating programs are super useful, you wouldn't want your operating system stop after a few keystrokes saying "I'm done, buddy".

The good news is that we can actually deal with this situation by using codata. But before we look at codata in detail, we're going to enumerate some of the rules of an "elementary" Strong Programming Functional language (or ESFPL).

### The rules for non-termination

Let's adopt a Haskell-like syntax for our ESPFL. One thing we must be able to do is to define the "usual" datatypes:

``````data Bool    = True | False
data Nat     = Zero | Suc Nat
data List a  = Nil  | Cons a (List a)
data Tree a  = Nilt | Node a (Tree a) (Tree a)
``````

and some functions on those types:

``````-- size of a Tree
size a :: Tree a -> Nat
size Nilt = 0
size (Node a l r) = 1 + size l + size r

-- filter a List with a function
filter f Nil = Nil
filter f (Cons a rest) if (f a)  = Cons a (filter f rest)
otherwise = filter f rest
``````

#### Rule n.1: all case analysis must be complete

This one should feel obvious to anyone having used data types and pattern matching before. If you forget a case in your pattern matching analysis, you face the risk of a runtime exception: `NoMatchError`.

But this rule has a larger impact. You will also have to change your "standard" library. For example you can't define the function taking the first element of a list, `head`, as before. You need to provide a value to use when the list is empty:

``````-- taking the first element of a list
head a :: List a a -> a
head (Cons a rest) default = a
``````

The other alternative is to program with a data type where things can't go wrong:

``````data NonEmptyList a = NCons a (NonEmptyList a)

-- taking the first element of a non-empty list
head a :: NonEmptyList a -> a
head (NCons a rest) = a
``````

This kind of discipline doesn't seem so harsh, but other difficulties arise with the use of built-in arithmetic operators, starting with `divide`. What do you do about `divide 1 0`?

There are propositions to build data types where "`1 / 0`" is defined and equal to "`1`"! It looks so weird to me that I'd certainly try to go with a `NonZeroNat` data type before anything else.

#### Rule n.2: type recursion must be covariant

I like this rule because it was not obvious to me at all when I first read it. What does it mean?

1. You can define data types recursively
2. You can use functions as values
3. So you can build the following data type

`data Silly a = Very (Silly -> a)`

With the `Silly` data type I can recurse indefinitely:

``````bad a :: Silly a -> a
bad (Very f) = f (Very f)

ouch :: a
``````

The `ouch` value is a non-terminating one, because the evaluation of `bad` will use `bad` itself with no case for termination. Rule n.2 is here to prohibit this. You can't define a data type with a function that use this data type as an input value. The fact that this rule was not obvious raises questions:

• how do we know that we have found all the rules for ESFP?
• is there a "minimal" set of rules?

I have no answer to those questions unfortunately :-(.

#### Rule n.3: recursion must be structural

The previous rule was about data type recursion, this one is about function recursion. More precisely the rule says that: "each recursive function call must be on a syntactic sub-component of its formal parameter". This means that if you make a recursive call with the function being defined, it has to be on only a part of the input data, like with the ubiquitous `factorial` function:

``````factorial :: Nat -> Nat
factorial Zero = 0
factorial (Suc Zero) = 1

-- we recurse with a sub-component of (Suc n)
factorial (Suc n)    = (Suc n) * (factorial n)
``````

This rule is not too hard to understand. If you always recurse on data that is provably "smaller" than your input data, you can prove that your function will terminate. This rule can also accomodate the Ackermann function, which has 2 `Nat` parameters:

``````ack :: Nat Nat -> Nat
ack 0 n = n + 1

-- m + 1 is a shortcut for (Suc m)
ack (m + 1) 0       = ack m 1
ack (m + 1) (n + 1) = ack m (ack (m + 1) n)
``````

By the way, if you don't know the Ackermann function, look it up! It has an amazing growth, with very small numbers. Try to evaluate `ack 4 3` for fun :-)

How restrictive is this rule? Not so much. This rule authorizes to program:

1. "primitive recursive" functions. Those are the functions, like `factorial`, which are defined using only simple recursion (think how `addition (Suc n) m` can be defined in terms of `addition n m`) and by composing other primitive functions

2. other "total recursive" functions, like `ack` ("total" meaning "terminating" here). Indeed `ack` is not "primitive recursive". It was actually especially created to show that not all "total recursive" functions were "primitive recursive"

It turns out that all the functions which we can be proven to terminate by using first-order logic (without `forall` and `exists`) can be programmed with structural recursion. That's a lot of functions, but we have to change or programming practices in order to use only structural recursion. Let's look at an example.

##### Coding fast exponentiation

We want to code the "power" function. Here's a naive version:

``````pow :: Nat -> Nat ->
pow x n = 1,                   if n == 0
= x * (pow x (n - 1)), otherwise
``````

It is primitive recursive but not very efficient because we need `n` calls to get the result. We can do better than this:

``````pow :: Nat -> Nat -> Nat
pow x n = 1,                       if n == 0
= x * pow (x * x) (n / 2), if odd n
= pow (x * x) (n / 2),     otherwise
``````

Unfortunately this version is not primitive recursive, because when we call `pow` we're not going from `n + 1` to `n`. It is however obvious that we're "descending" and that this algorithm will terminate. How can we re-code this algorithm with primitive recursion?

The trick is to encode the "descent" in a data type:

``````-- representation of a binary digit
data Bit  = On | Off

-- we assume a built-in bits function
bits :: Nat -> List Bit

-- then the definition of pow is primitive recursive, because we descend on the Bit data type
pow :: Nat -> Nat -> Nat
pow x n = pow1 x (bits n)

pow1 :: Nat -> List Bit -> Nat
pow1 x n = 1
pow1 x (Cons On rest)  = x * (pow1 (x * x) rest)
pow1 x (Cons Off rest) = pow1 (x * x) rest
``````

David Turner conjectures that many of our algorithms can be coded this way and for cases where it get a bit hairier (like for Euclid's `gcd`) we could authorize another type of recursion called "Walther's recursion". With this kind of recursion we can recognize a larger class of programs where recursion is guaranteed to terminate because we only use operations effectively "reducing" the data types (acknowledging that `n / 2` is necessarily lower than `n`). Using "Walther's recursion" with a language having functions as first-class values is still an unsolved problem though.

### Codata for "infinite" computations

After having looked at the rules for bounding recursion and avoiding non-termination, we would still like to be able to program an operating system. The key insight here is that our functions need to terminate but our data doesn't need to. For example, the `Stream` data type is infinite:

``````data Stream a = Cons a (Stream a)
``````

In this perspective, an operating system is a bunch of functions acting on an infinite stream of input values. Let's call this type of infinite data, "codata", and see how we can keep things under control.

Here's our first definition, `Colist` (similar to the `Stream` type above):

``````-- "a <> Colist a" stands for "Cocons a (Colist a)"
codata Colist a = Conil | a <> Colist a
``````

Does it breach the "Strong normalization" property that we described earlier? It doesn't if we say that every expression starting with a coconstructor is not reducable (or is in "normal form"), there is nothing which can be substituted. Whereas with a regular `List` you can evaluate:

``````Cons (1 + 2) rest = Cons 3 rest
``````

This is why we need to have a new keyword `codata` to explicitly define this type of data. In a way, this is making a strong distinction between "lazy" and "strict" data, instead of using lazy evaluation to implement infinite data types.

#### Primitive corecursion

The equivalent of Rule n.3 for codata is that all recursion on codata must be "primitive corecursive". You have any idea of what it is? It is kind of the opposite / dual of the structural recursion for data types. Instead of proving that we always reduce the input value when making a recursive call we need to prove that we "augment" the result, by always using a "coconstructor" to create the result:

``````-- functions on codata must always use a coconstructor for their result
function a :: Colist a -> Colist a
function a <> rest = 'something' <> (function 'something else')
``````

Then we can define functions like:

``````ones :: Colist Nat
ones = 1 <> ones

fibonacci :: Colist Nat
fibonacci = f 0 1
where f a b = a <> (fibonacci b (a + b))
``````

Another way of looking at the difference between data/recursion and codata/corecursion is that:

• recursion on data is safe if we break-up the data in smaller pieces then recurse on those pieces
• corecursion on codata is safe if we apply the recursion first then put that in a coconstructor which declares that we have infinite data

Codata is data which is (potentially) infinite, but observable. There are methods to safely extract results, one at the time, in a terminating way. That's exactly what the fibonacci function does. It builds a sequence of results, so that you can safely extract the first one, then you have a way to build the rest of the values.

This is why people also talk about the duality between data and codata as a duality between constructors and destructors. Or a duality between accessing the internals of a data type and observing the behavior of a codata type.

#### Coinductive proofs

When we try to reason about codata and corecursion, we can use coinduction (you saw that one coming didn't you :-)?). The principle of coinduction is the following:

2 pieces of codata are the same, "bisimilar" in the literature, if:

• their finite part are equal (the "`a`" in "`a <> rest`")
• their infinite part are the same

In other words, if the 2 sequences always produce the same values. Using that principle we can show the following theorem on infinite structures:

``````-- iterate a function: x, f x, f (f x), f (f (f x)),...
iterate f x = x <> iterate f (f x)

-- map a function on a colist
comap f Conil =  Conil
comap f a <> rest = (f a) <> (comap f rest)
``````

Now, we'd like to show that `iterate f (f x) = comap f (iterate f x)`:

``````iterate f (f x) = (f x) <> iterate f (f (f x))       -- 1. by definition of iterate
= (f x) <> comap f (iterate f (f x)) -- 2. by hypothesis
= comap f (x <> iterate f (f x))     -- 3. by definition of comap
= comap f (iterate f x)              -- 4. by definition of iterate
``````

The coinduction principle is used exactly in step 2. If "`iterate f (f (f x))`" and "`comap f (iterate f (f x))`" are the same, then adding a new value "`(f x)`" will preserve equality.

#### Limitations

The definition of "primitive corecursive" is a bit restrictive. It prevents useful, and well-founded, definitions like:

``````-- "evens" would need to come first after <> for this definition to be corecursive
evens = 2 <> (comap (+2) evens)
``````

Note that we can't allow any kind of construction with "<>". For example the `bad` function below is not terminating:

``````-- infinite lists
data Colist a = a <> Colist a

-- the tail of an infinite list
cotail a :: Colist a -> Colist a
cotail a <> rest = rest

-- don't do this at home
``````

In "Ensuring streams flow", David Turner shows that it is possible to develop an algorithm which will check where the corecursion is safe and where it isn't. The formal argument takes a few pages of explanation but the idea is simple. We need to count the "levels of guardedness":

• having `evens` recursing inside `comap` is not a problem, because `comap` will add a new coconstructor

• having `bad` recursing inside `cotail` is a problem, because `cotail` "removes" a coconstructor

So the idea of the algorithm is to count the number of times where we "add" or "remove" coconstructors to determine if the corecursion will be safe or not.

To conclude with the presentation of codata, I want to briefly introduce comonads and give a short example to give an intuition of what it is.

If you look at the definition in the books and listen to the category theory people, you will read or hear something like: "get the monad definition, change the arrows direction and you have a comonad":

``````-- the comonad operations

-- the dual of return or unit for monads
extract :: W a -> a

-- the dual of bind or flatMap for monads
cobind :: W a -> b -> W a -> W b
``````

Intuitively you can think of a comonad as:

• "I can extract things from a Context" (to contrast with the "I can put things in a context" of monads)

• If I have a function which "computes values from inputs values which might change depending on the context": `W a -> b`, then I can use "values in context", `W a`, to return other "values in context", `W b`

This is still not quite clear, so let's give a simple example based on `Colist`. `Colist` is a comonad which we can use like that:

``````-- a Colist of Nats where each new value is the previous value + 1
nats = 0 <> comap (+1) nats

-- a function taking the first 2 elements of a Colist
firstTwo a :: Colist a -> (a, a)
firstTwo a <> b <> rest = (a, b)

-- now, let's cobind firstTwo to nats
cobind firstTwo nats = (0, 1) <> (1, 2) <> (2, 3) <> ...
``````

Sometimes you can read that `List` is a monad representing the fact that a function can produce non-deterministic results ("zero or many"). By analogy with the `List` monad, we can say that `Colist` is a comonad that represents the function can have non-deterministic inputs.

### Conclusion

I touched on a lot of subjects with this post, for which there would be a lot to say (and certainly with less approximation that I did). In the first part we saw that, when dealing with finite data, the notions of termination, computation and recursivity and intimately related. In the second part we saw that it can be beneficial to give a special status to infinite data and to provide for it the same kind of tools (proofs with corecursion, comonads) that we usually have for data.

I hope that the subject of codata will become more and more popular (this recent proposal for a `codo` notation in Haskell might be a good sign). We can see a lot of architectures these days turning to "Event Sourcing". I suspect that Codata, Comonads, Costate and all the Cotypeclassopedia will prove very useful in dealing with these torrents of data.

Unknown said...

You might want a Nil cases for this data type because otherwise it is an infinite stream.

data NonEmptyList a = NCons a (NonEmptyList a)

Eric said...

You're right, there's a problem, but I don't think I can add Nil like that because that would be empty. What I should write is:

data NonEmptyList a = NonEmptyList a (List a)

Unknown said...

Hi, nice post! BTW you should cite the seminal work by Dave Turner on "Elementary Strong Functional Programming" which deals precisely with these issues.

conor said...

Thanks for putting this together. I'd add that total languages can perfectly well be considered Turing complete if they can express, say, a TM-simulation as a coinductive computation, giving it an honest type which shows that you can run the thing for any finite time but without guarantee of halting. Similarly, you can write a self-interpreter with a type that conservatively overestimates the risk of nontermination.

Total programming makes no effective restriction on the programs you can express, documenting risk honestly with appropriate types. Partial programming makes a severe restriction on the promises you can make your type system deliver.

conor said...

Apologies for duplicate comment, now deleted. Back button considered harmful.

Eric said...

@Unknown, I did cite his work in the "Note" in italics at the beginning of the work. David Turner's article is precisely what brought me to write this post.

Eric said...

@conor, interesting thoughts, I like this idea of "honest types" :-) Would you happen to know if anyone pursued ESFP, with a concrete language, at least for teaching? You write that there's no effective restriction on programs which can be express but I'd like to understand what would be the costs in terms of programming convenience.

Mark D. Albin said...

Good points, thanks for sharing.

Sean Seefried said...

I really enjoyed this post Eric. There is much food for thought in Turner's work. Agda is a strongly-normalising programming language that I've had a lot of fun playing with, although I have yet to try using codata with it.

conor said...

I know that Turner and Telford did some work on ESFP after Turner's initial "manifesto" (the progenitor of the JFP paper). I don't know how far they took it or whether there's a "system".

Tarmo Uustalu gave a talk some years back http://cs.ioc.ee/~tarmo/tday-veskisilla/uustalu-slides.pdf exploring various monadic encapsulations of partiality. The upshot is that risk of nontermination can be packaged like risk of other effects.

Sometimes, Haskell programmers have reacted to the news that they can still write their dangerous programs, but in a monad, with a distaste which betrays a sense that monadic style is disgusting and suited only to doing disgusting imperative things. I have a certain nostalgie de la boue, looking back fondly on my days writing side-effecting programs applicative style in ML. I'd like to work in the same applicative style no matter which effects (partiality, IO, exceptions, nothing at all...) are enabled and track the permissions in the types. The fact that Haskell does not let me do such a thing, does not mean it's impossible, just that it's a reason to wish for better than Haskell. I've been playing around with alternatives, leading to my new toy, Frank (https://personal.cis.strath.ac.uk/conor.mcbride/pub/Frank/). Frank makes thunking and forcing explicit, but overloads normal applicative syntax for any free monad you like: Frank's type system checks that you only use the side-effects you're entitled to. Like Eff, there's a construct which enables you to make new effects locally available by showing how to handle them.

I recently gave a talk https://personal.cis.strath.ac.uk/conor.mcbride/pub/Totality.pdf on the topic where I explore what partial programming might look like in a total dialect of Frank. The idea is that you write a general recursive program just as before, but the value that's constructed lives in a free monad, with recursive calls becoming monadic operations. Then you can pick your own semantics.

Eric said...

Thanks for all the links, I have lots to study from there!

rdm said...

I am somewhat dubious about your treatment of the issue illustrated by division, here. I am not convinced you are wrong, but ...

A general problem seems to be that the implementation models some external process, and the model may diverge from an accurate representation. When this happens, further computation ceases to be useful. And, detecting this issue can be useful for resource management purposes. (I feel that this is a restatement of the halting problem in concrete terms.)

If we limit our scope to problems which can be completely resolved at compile time, this of course is a non-issue. And, there's something to be said for restructuring requirements to eliminate such issues.

But some ill-bounded problems (for example: least squares fitting) seem interesting and are sometimes useful at runtime, even though we cannot guarantee statically that the results will always be valid.

Seth Tisue said...

Eric, you have a rare ability to write clearly on difficult topics. Bravo.

Unknown said...

Thanks Eric for an interesting post. I'm only just starting to explore concepts like total functions, codata etc. One thought I've been pondering is why in purely functional languages like Haskell it's not simply just assumed that all functions are total unless they are marked otherwise by the programmer? That would simplify reasoning and optimization of the majority of the program greatly, and in practice I don't think it would be any more unsafe than assuming all functions are partial. A simple totality checker can then be added to infer obvious cases where totality status is incorrect. Idris is one language which has implemented ideas in this direction.