Wednesday 18 November 2015

The Slow Boat to Folding

I'm back at the Scala-face, working my way through Functional Programming in Scala by Paul Chiusano and Rúnar Bjarnason.

I'm still taking things slowly.  The current task was grokking folding.  Why? As Miran Lipovača says in Learn You A Haskell:
"Folds can be used to implement any function where you traverse a list once, element by element, and then return something based on that. Whenever you want to traverse a list to return something, chances are you want a fold. That's why folds are, along with maps and filters, one of the most useful types of functions in functional programming." 
(I've substituted italics for the authors own bold text)
My journey to this post has been a circuitous one; false assumptions, based on semantics from my primary idiom (imperative, non-maths-based programming) having led me astray.  I know that there's a lot out there already about folding in Scala and so I've attempted to avoid repeating that. Instead I've mapped out the steps which eventually helped me wrap my head around things in a way that stuck.

Why I got confused (part 1) - worrying about "starting at the left or the right and folding it in"

Before we dive into the "how I grokked it", I think it pays to define why I got confused to begin with. My first mistake was to think about these pair of functions in terms of how they ate collections. I had some bonkers origami-style image in mind: a long-piece of paper, with the function starting at one end and folding it up concertina style, element by element.

I know now I shouldn't have bothered. I'd advise you yo do the same because it'll quite possibly be confusing you too. (For starters you're making a new, separate and individual thing from the fold. You're not simply copying the collection, applying a function to each element in turn - that's mapping.)

Why I got confused (part 2) - worrying about tail recursion

And while we're in the habit of dropping things, I also eventually found it useful to stop worrying about the design of the algorithm itself; specifically, "was it tail recursive or not?" Why? It's a level of complexity you don't need in our head right now. It certainly clouded my thinking for far too long. So let it go.

Step 1 - Start thinking about associativity

Now we've freed up all this mental space, what can we put up there in our lovely spacious minds so we can make progress?  The first thing is a nice simple picture of associativity.

Now, if you're like me, you'll already "know" about this topic - it's one of the standard bits you skim when learning a new programming language right? And you skim it because it's easy and obvious, right? So why am I going to bore you with it now? Because it's going to give us our "right" and "left", that's why.

First up is left-associativity.  Say we have a sequence of numbers and operators like so:
1 - 2 + 3 = ?
If I ask you to "add that up" you'll reply "simple, it's 2".  But what happens if we add some parentheses?:
1 - (2 + 3) = ?
You'll then answer, "still simple, it's now -4".  So what changed?  Why the different answer?  Thats because we've changed the order in which you apply the operators.  We've monkey-patched the associativity.

Step 2 - Tackle left-associativity first

What we're doing when we don't have any parens (and possibly aren't really thinking about anything like associativity at all) is using left associativity - it's the default in conventional mathematical notation, and the one we learn without realising it in primary school.  It applies when two operators have the same precedence (which '+' and '-' do), and tells us that when calculating a sequence of operations of equal precedence we should start at the left hand end of the same-precedence run.

We can make it explicit with some more parens:
((1 - 2) + 3) = 2
Brilliant.  Nice and simple.

Step 3 - Tackle right-associativity second

From this, it's easy to guess what this looks like with some explicit, parens-driven right-associativity:
(1 - (2 + 3)) = -4
Nice and simple yet again.

But look, see how I've bolded the parens?  That's the important bit for us as we take the next step.  Go on, look at it a little longer - burn it into your retina a wee bit.  It helps; trust me.

(Note: This has been a ludicrously simplistic overview - if you want to know more, the wikipedia article is nice and parseable, even for non-maths-heads.  Oh, and if I was to write the calculation with proper "right associativity" I'd not need the parens - I'd just write it and say "this is right associative". I put them there however to make us do left-associativity in a "right" way.)

Step 4(a) - Reduce foldLeft

"But wait" I hear you say, "I'm always only adding (or only subtracting) when I foldLeft / foldRight. What does associativity have to do with things?"

Let's put this aside for a minute and come back to folding.  Taking a hint we just got from associativity, let's tackle foldLeft first.  In the answer to Functional Programming in Scala's question 3.10 it looks something like this:

We'll now use this function to fold a simple List, tracing the recursive calls one call at a time:

What do you see? Look at the parentheses. Do they remind you of anything?  Yup, thats it - they look mightily similar to the ones in our previous example in step 2.  What we have some here is some explicit left-associativity.

(Note: I've just assumed here that you know how to trace. Its a great technique to wrap your head round things in the Functional world.  I'm planning on writing a post about it when I get time.  For now, just read it line by line - each is equivalent, but just expands the function calls into the bodies of the function that get executed.)

Step 4(b) - Reduce foldRight

Let's crash on; ther's no point hanging around now we have this nugget.

Here's the simple foldRight from Functional Programming in Scala (listing 3.2) and the associated trace:

The parens should really be jumping out at you now.  Notice anything this time?  Yup, its the same as the previous example in step 3 - this is exlicit (i.e. parens-driven) right-associativity.

Step 5 - See the fact you are starting at the left or right end in the reduced output

Now we're building up a nice bit of "that-makes-sense" momentum lets take the next step.

Remember is step 1 I advised you forget about the "starting from the left (or right) and folding in" bit? We're now in a position to bring this back into the frame.  What might have confused you at the beginning (it confused me anyway) was the fact that the key recursion calls both  (foldLeft and foldRight) start at the head of the List in question and eat their way down to the tail (i.e. both recursively eat away at the collection, one "head" at a time).

However, looking back, we now see we can't start collapsing the fold (i.e. start applying the higher-order function, turning it into its result) until we have the innermost pair of values. In the foldLeft example this is at the left hand side of the traced output, and in foldRight this is at the right hand side.  (If you want some corroboration, see where the seed values are placed in step 4 above (the '0' - on the left in the foldLeft, and on the right in the foldRight.)

With all this in place, you now have the means to see how this fold calculation is built up, and how, when it reaches it's limit (the end of the collection) it can be collapsed again down to a result.

Nice.

Step 6 - See the fact that what is important is why one is trail recursive and one isn't

Before we close, lets continue to ride the momentum of clarity and see what else we can learn.  You may have heard tell of the fact that;
"a call is said to be in tail position if the caller does nothing other than return the value of the recursive call" 
Functional Programming in Scala, Chapter 2, pp.20
What's different is the way the calls are made: foldLeft does it straight, while foldRight does it as the second parameter to a call to the higher order function.  Look; see:


If I tell you that foldLeft is tail-recursive, and foldRight isn't, that should come as no surprise.

In foldRight, we have to apply the function to the result of the recursive call, whereas in foldLeft we have the result there and then and simply pass it back. This is clearly also the source of the "right" and "left"-ness.  The differences in these two expressions is what leads to our parens, and our starting at the right hand end or the left.  It suddenly all comes together.

Taken together it now seems self-evident (to me at least, and hopefully to you if I explained this all clearly enough) how a foldLeft could compile down into a stack-safe loop, while a foldRight would have to use a stack-imperiling non-tail recursive route.

Bonus Extra Step 9 - The order of the params to fold left and fold right is just convention

To close, as we're riding the crest of the grok-wave, it's not a bad idea to see what else we can find to slot into our noggins.

It's not really related to this post apart from the fact that it's to do with fold functions (and we're now in a position to full appreciate it) but how about the order of the params in Haskell's foldLeft and foldRight?

You'll notice if you look it up that the position of the accumulator param differs. It'd be insulting to point out that the order of params in a function doesn't generally matter, so why is this the case? Let's let Miran Lipovača explain again:
"the left fold's binary function has the accumulator as the first parameter and the current value as the second one, the right fold's binary function has the current value as the first parameter and the accumulator as the second one. It kind of makes sense that the right fold has the accumulator on the right, because it folds from the right side"
from Learn You A Haskell
Lovely.