Friday, 13 December 2013

Conceptualising map and flatMap

This time, we’re got a specially selected guest post from Chris at CodingFrog.  He’s way further down the Scala-learning path than I am, but that just means this post contains maximal goodness and he’s very kindly offered to share it with us. It certainly furthered my understanding in many areas, and made me want to look more into monads.  Take it away sir…

In this guest post, I wanted to address a few thoughts about map and flatMap. A number of types in the standard Scala libraries (so-called "monads", though there's a little more to monads than this - but this is not a monad post) have these helpful methods. Both map and flatMap are higher-order functions, meaning they take functions as arguments and apply these to the type's contents.

The first exposure most developers have to the map operation is in the context of a collection type. The map operation on a collection applies a mapping function to all the contents of the collection. Given a collection, for example a list, and a function, the function is applied to each element in the collection and a new collection made up of the results is returned. The mapping function takes a single argument, of the same type (or a supertype) of the collection contents, and returns a result, potentially of another type:

scala> def myFun(x: Int) = x * x

myFun: (x: Int)Int

scala> List(1,2,3) map { myFun }

res0: List[Int] = List(1, 4, 9)

This example shows a named function being passed to map, but of course a lambda could be used as well:

scala> List(1,2,3) map { x => x * x }

res1: List[Int] = List(1, 4, 9)

We can also use the '_' placeholder, though I personally like to be slightly more explicit with my lambdas to maintain readability. Note that I am also using infix notation for map and flatMap in all these examples, but the dotted calling style is perfectly valid as well.

Before going further, I'd like to comment on the naming of map and the relationship between the operation and the data type which shares this name. These two uses of “map” are slightly different, but there is a connection. First, the Map data type, as you likely know, is a type which contains key-value pairs. If you think about this slightly differently, this is a conceptual function which "maps" a relationship - given a key input, it produces a value output. The map operation applies a mapping function to values in a collection. You could even pass a Map data type to a map function to convert a collection of keys into a collection of values!:

scala> val myMap = Map(1 -> 1, 2->4, 3->9)

myMap: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 2 -> 4, 3 -> 9)

scala> List(1,2,3) map { myMap }

res2: List[Int] = List(1, 4, 9)

Like map, the first exposure to flatMap is usually in a collection context. Suppose instead of a squaring function that returns a single value, you have a mapping function that returns a list from a given input:

scala> List(1,2,3) map { x => for {

     | y <- 1 to x toList

     | } yield y

     | }

res3: List[List[Int]] = List(List(1), List(1, 2), List(1, 2, 3))

In this case, instead of a simple list List[Int], we got a nested list List[List[Int]]! This may be what we want, but suppose we wanted to have a simple list. We can "flatten" this nested list, to remove a level of wrapping and give us a single list. flatMap does that for us in one step. So far so good:

scala> List(1,2,3) flatMap { x => for {

     | y <- 1 to x toList

     | } yield y

     | }

res4: List[Int] = List(1, 1, 2, 1, 2, 3)

So thinking about collections, it seems like we will be using the map operation much more frequently than we use the flatMap operation. After all, we are much more likely to use a function which gives us a single output for an input rather than one that creates nested collections. Again, we will have cases where we DO want that, but they are less common than the straightforward case.

But this isn't the case for other monad types. Let's look at the type Try. The Try type is a type which represents an operation that can fail. Try has two values - Success(t) which contains a successful result t; and Failure(ex) which contains an exception. In other words, instead of throwing the exception, we've grabbed it and stuffed it in a box. You don't know until you look in the box whether it contains the Success or the Failure (and you haven't jumped up the call stack to a catch clause). You can find out what is in the box with pattern matching:

scala> def examine(myTry: Try[Int]): Int = {

     | myTry match {

     | case Success(t) => t

     | case Failure(e) => -1

     | }

     | }

examine: (myTry: scala.util.Try[Int])Int

This is where I started to struggle to adapt my intuition of List map and flatMap, to the Try ideas of map and flatMap. So what would these mean? Map takes a function which takes a value (of type T) and converts it to another value (of type U, which may or may not be the same as type T). Now map on Try has a specific behaviour guarantee: for Success values, it will call the function on the value in the Success, and wrap the result back up in a Try.:

case Success(t) => Try(f(t))

For a Failure, it just returns this - in other words, it skips calling the function, and just returns the original failure:

case Failure(e) => this

But what if our mapping function wants to be able to represent failures? It's not hard to imagine wanting to have a sequence of operations, each of which could throw some exception. The chain will return a value if all operations in the sequence are successful, or else return the first failure. We can do this by using map. In this case, calling map on the first Try, will call the mapping function, which itself returns a Try value, and wrap that in a Try value:

scala> import scala.util.{Try, Success, Failure}

import scala.util.{Try, Success, Failure}


scala> def dangerOp1(i: Int): Try[Int] = {

     | if (i != 0) Success(i)

     | else Failure(new IllegalArgumentException)

     | }

dangerOp1: (i: Int)scala.util.Try[Int]


scala> val myTry = Success(5)

myTry: scala.util.Success[Int] = Success(5)


scala> myTry map { dangerOp1 }

res5: scala.util.Try[scala.util.Try[Int]] = Success(Success(5))


scala> val myTry2 = Failure(new NoSuchElementException)

myTry2: scala.util.Failure[Nothing] = Failure(java.util.NoSuchElementException)


scala> myTry2 map { dangerOp1 }

res6: scala.util.Try[scala.util.Try[Int]] = Failure(java.util.NoSuchElementException)

So we've gone from Try[T] to Try[Try[T]], just because we wanted to be able to have our mapping function return some exception cases. This could easily get out of hand if we want to keep mapping more and more functions which could return exceptional cases with Try! What can we do?

If we look back at our List example, we see that this is really the same case as when our mapping function itself returned a List. There we went from List[T] to List[List[T]], while here we're going from Try[T] to Try[Try[T]]. Our solution here is the same as it was there: we need to flatten a layer of nesting. We could take our first result, map our function, and then flatten the result, or we could do the map and the flatten in one step:

scala> myTry flatMap { dangerOp1 }

res7: scala.util.Try[Int] = Success(5)


scala> myTry2 flatMap { dangerOp1 }

res8: scala.util.Try[Int] = Failure(java.util.NoSuchElementException)

Learning map on List gave us this good (or at least, better) intuition for what mapping a higher order function onto that List means. But this didn't give us quite a rich enough intuition of what flatMap means. Applying functions that create lists to members of a list is just not a common enough pattern to really feel it in our bones. Other types, though, will give us the nested construction as our default pattern. It's quite easy, maybe even expected, to run a sequence of steps that each produce exceptions wrapped in Try. A sequence of operations where each one may return an error value wrapped in the Option monad seems pretty likely. When we step up to doing asynchronous coding, we can easily envision combining a sequence of operations that will complete in the future, hooking each up to be run in sequence whenever the previous step completes, by using the Future monad. In all these cases, flatMap is a much more natural and basic operation than map, which would keep wrapping each step in another level of nesting. By studying map and flatMap in terms of these types, we can get a better intuitve feel for how these operations combine values of these types, rather than falling back to our List intuition of these operations.

All of this has nice implications for for-comprehensions, which rely heavily on flatMap internally. But let's leave that for further study, and maybe another post.