21 September 2016

A neat trick from ICFP 2016

ICFP is an awesome conference where I can’t pretend I understand every presentation I’m attending but there is definitely lots of food for thought and some gems which makes it immensely worthwhile.

The gem I want to share here is a paper called “All sorts of permutations”.

This paper shows that you can use the filterM function with a predicate like a -> [True, False] and generate all the sublists of a list.

Similarly, you can evolve the sort function into sortM, taking a monadic comparison function: a -> a -> m Bool. If the comparison function is a -> a -> [True, False] the sortM function will generate the permutations of a list.

The paper goes on exploring if sortM generates all the permutations of a list or exactly the permutations of a list depending on the implementation of the sortM function (insertion sort, merge sort, quicksort,…) and tries to set various restrictions on the comparison function to get there.

At the end of the talk Philip Wadler asked the following question (paraphrased):

This is a neat trick to generate permutations but why should I teach it to my students, considering that there are more ‘classical’ implementations for getting the permutations of a list?"

I think that it is worth showing students because this sortM function gives us a very neat way of writing a QuickCheck generator for distinct elements!

It is indeed not uncommon to have to generate lists of distinct elements for testing applications. For example if you want to test a function scheduling different tasks and these tasks are supposed to be unique, but arriving in any order.

There is unfortunately no permutations :: [a] -> Gen [a] combinator in QuickCheck. So how can you produce random Lists with distinct elements?

One way is to take an existing list of distinct elements and shuffle it to get a random permutation of those elements. This is exactly what sortM gives us, we just need to use the right monad, which happens to be Gen instead of []!

Let’s take an example. Say you want to generate all the permutations of ["Homer", "Marge", "Bart", "Lisa"]

import Test.QuickCheck

-- An implementation of sortM
sortM :: Monad m => (a -> a -> m Bool) -> [a] -> m [a]
sortM _ [] = return []
sortM cmp (x : xs) = do
  ys <- sortM cmp xs
  insertM cmp x ys

insertM :: Monad m => (a -> a -> m Bool) -> a -> [a] -> m [a]
  insertM _ x [] = return [x]
  insertM cmp x yys@(y : ys) = do
    b <- cmp x y
    if b then return (x : yys)
    else fmap (y:) (insertM cmp x ys)

-- A generator for lists of distincts "Simpsons"
type Simpson = String

simpsons = ["Homer", "Marge", "Bart", "Lisa"]

distinctSimpsons :: Gen [Simpson]
distinctSimpsons = sortM (\_ _ -> choose(True, False)) simpsons

Let’s test it:

λ> sample distinctSimpsons

I think this answers Philip’s question, this is a nice application of the sortM “trick” which can be taught to students learning functional programming.

And what looks really promising is that this idea extends to the generation of other data structures. For example we can generate the partitions of a given set with groupByM:

partitionM :: Monad m => (a -> m Bool) -> [a] -> m ([a], [a])
partitionM _ [] = pure ([], [])
partitionM p (x : rest) = do
  b        <- p x
  (xs, ys) <- partitionM p rest
  if b then pure (x:xs, ys)
  else      pure (xs, x:ys)

groupByM :: Monad m => (a -> a -> m Bool) -> [a] -> m [[a]]
groupByM _ [] = pure []
groupByM cmp (x : rest) = do
  (xs, ys) <- partitionM (cmp x) rest
  ps       <- groupByM cmp ys
  pure $ (x:xs):ps

Then we can get all the set partitions of ["Homer","Lisa","Marge","Bart"]

λ> pl $ groupByM (\_ _ -> [True, False]) simpsons

or get some random partitions of that set:

λ> sample $ groupByM (\_ _ -> choose(True, False)) simpsons

Some partitions can be drawn twice but the generation should be uniform over the set of all possible partitions (it would be better to not take my word for it and prove it though :-)).

I suspect that this is more than a useful trick and that this deserves a more systematic treatment to understand all the data structures which we can generate by using the appropriate functions taking predicates or relations as parameters. But for now it is already useful as it is!