06 March 2014

Streaming with previous and next

The Scalaz streams library is very attractive but it might feel unfamiliar because this is not your standard collection library.

This short post shows how to produce a stream of elements from another stream so that we get a triplet with: the previous element, the current element, the next element.

With Scala collections

With regular Scala collections, this is not too hard. We first create a list of all the previous elements. We create them as options because there will not be a previous element for the first element of the list. Then we create a list of next elements (also a list of options) and we zip everything with the input list:

def withPreviousAndNext[T] = (list: List[T]) => {
  val previousElements = None +:
  val nextElements     = list.drop(1).map(Some(_)) :+ None

  // plus some flattening of the triplet
  (previousElements zip list zip nextElements) map { case ((a, b), c) => (a, b, c) }

withPreviousAndNext(List(1, 2, 3))

> List((None,1,Some(2)), (Some(1),2,Some(3)), (Some(2),3,None))

And streams

The code above can be translated pretty straightforwardly to scalaz processes:

def withPreviousAndNext[F[_], T] = (p: Process[F, T]) => {
  val previousElements = emit(None) fby
  val nextElements     = p.drop(1).map(Some(_)) fby emit(None)

  (previousElements zip p zip nextElements).map { case ((a, b), c) => (a, b, c) }

val p1 = emitAll((1 to 3).toSeq).toSource

> Vector((None,1,Some(2)), (Some(1),2,Some(3)), (Some(2),3,None))

However what we generally want with streams is combinators which you can pipe onto a given Process. We want to write

def withPreviousAndNext[T]: Process1[T, T] = ???

val p1 = emitAll((1 to 3).toSeq).toSource
// produces the stream of (previous, current, next)
p1 |> withPreviousAndNext

How can we write this?

As a combinator

The trick is to use recursion to keep state and this is actually how many of the process1 combinators in the library are written. Let's see how this works on a simpler example. What happens if we just want a stream where elements are zipped with their previous value? Here is what we can write:

def withPrevious[T]: Process1[T, (Option[T], T)] = {

  def go(previous: Option[T]): Process1[T, (Option[T], T)] =
    await1[T].flatMap { current =>
      emit((previous, current)) fby go(Some(current))


val p1 = emitAll((1 to 3).toSeq).toSource
(p1 |> withPrevious)

> Vector((None,1), (Some(1),2), (Some(2),3))

Inside the withPrevious method we recursively call go with the state we need to track. In this case we want to keep track of each previous element (and the first call is with None because there is no previous element for the first element of the stream). Then go awaits a new element. Each time there is a new element, we emit it, then call recursively go which is again going to wait for the next element, knowing that the new previous element is now current.

We can do something similar, but a bit more complex for withNext:

def withNext[T]: Process1[T, (T, Option[T])] = {
  def go(current: Option[T]): Process1[T, (T, Option[T])] =
    await1[T].flatMap { next =>
      current match {
        // accumulate the first element
        case None    => go(Some(next))
        // if we have a current element, emit it with the next
        // but when there's no more next, emit it with None
        case Some(c) => (emit((c, Some(next))) fby go(Some(next))).orElse(emit((c, None)))


val p1 = emitAll((1 to 3).toSeq).toSource
(p1 |> withNext)

> Vector((1,Some(2)), (2,Some(3)), (2,None))

Here, we start by accumulating the first element of the stream, and then, when we get to the next, we emit both of them. And we make a recursive call remembering what is now the current element. But the process we return in flatMap has an orElse clause. It says "by the way, if you don't have anymore elements (no more next), just emit current and None".

Now with both withPrevious and withNext we can create a withPreviousAndNext process:

def withPreviousAndNext[T]: Process1[T, (Option[T], T, Option[T])] = {
  def go(previous: Option[T], current: Option[T]): Process1[T, (Option[T], T, Option[T])] =
    await1[T].flatMap { next => { c =>
        emit((previous, c, Some(next))) fby go(Some(c), Some(next))
          go(previous, Some(next))
        ).orElse(emit((current, next, None)))
  go(None, None)

val p1 = emitAll((1 to 3).toSeq).toSource
(p1 |> withPreviousAndNext)

> Vector((None,1,Some(2)), (Some(1),2,Some(3)), (Some(2),3,None))

The code is pretty similar but this time we keep track of both the "previous" element and the "current" one.

emit(last paragraph)

I hope this will help beginners like me to get started with scalaz-stream and I'd be happy if scalaz-stream experts out there leave comments if there's anything which can be improved (is there an effective way to combine withPrevious and withNext to get withPreviousAndNext?

I finally need to add that, in order to get proper performance/side-effect control for the withNext and withPreviousAndNext processes you need to use the lazy branch of scalaz-stream. It contains a fix for orElse which prevents it to be evaluated more than necessary.

1 comment:

Mortimer said...

you say "what we generally want with streams is combinators which you can pipe onto a given Process", why is that? Is it because zip would load the whole stream in memory?