Scala Eye for the Java Guy: June 2013

Thursday, 27 June 2013

Pattern Matching Syntax – Case by Case

I’ve just read about Patter Matching. The concepts made perfect sense, but as is frequently the case with me, the syntax still seemed hard to grasp. Why were there all these bits? Which bit went where? Why?

def my_method(myArg:TheArgType) = myArg match {   // match myArg against the following cases
    case myArg_is_same_as_this => do_this()       // and return the result I’m guessing
    case _ => do_that()                           // (ditto)
}

Scala in Action, Nilanjan Raychaudhuri, Chapter 2 (with a few slight changes for additional clarity)

That seems to be the simplest form. That makes perfect sense. Next we kick it up a notch to replace simple value-matching for type-matching:

def my_method(arg:AnyRef) = arg match { // match the arg against the following cases
case s:String => do_this() // now we have the object:Type syntax
case _ => do_that()
}

Scala in Action, Nilanjan Raychaudhuri, Chapter 2 (with a few slight changes for additional clarity)

Having stepped through it, this still makes a lot of sense – especially the reappearance in the pattern of the object:Type syntax from class and method declarations. Jumping ahead a little (but not too much) I can also see that I can use the matched item reference (e.g. “s”) on the right hand side of the rocket (“=>”):

def my_method(arg:AnyRef) = arg match { // match the arg against the following cases
case s:String => “this works as I know I’ve got a String “ + s
case _ => do_that()
}

Now I’m ready to kick it up the next notch and try on some “Infix Operator Patterns”. First off, I’m forearmed – I know what an Infix Operator is. Next up, my initial reaction is that this syntax looks a little impenetrable:

scala> List(1, 2, 3, 4) match {
            case firstItem :: secondItem :: restOfTheItems => List(firstItem, secondItem)
            case _ => Nil
        }
scala> List[Int] = List(1, 2)

Scala in Action, Nilanjan Raychaudhuri, Chapter 2 (with a few slight changes for additional clarity)

I must admit, the first time I read this It didn’t parse at all. Having now crept up on it slowly, I think I see why. It’s not the sudden abundance of double colons; rather it’s the purpose of the match. We’re using things here to do some extraction, and the case statement is in some ways just a wrapper to ensure that nothing untoward goes on when there isn’t a firstItem and secondItem present.

There’s also the issue that as this example has suddenly moved to the REPL, and we’re just defining our match as a one-off, operating on an explicit List literal which will always match the first pattern. Consequently the second pattern seems a little superfluous. That confused me, as I thought I was missing something to begin with.

Finally (for now) we’re throwing some additional guard clauses into the mix. Again, I’m forearmed, having met guard clauses before.

def rangeMatcher(num:Int) = num match {
    case within10 if within10 <= 10               => println(“within 0 to 10”)
    case within100 if within100 <= 100              => println(“within 11 to 100”)
    case beyond100 if beyond100 < Integer.MAX_VALUE => println(“beyond 100”)
}

Looking back, guard clauses were just “if some_comparison_which_evaluates_to_true_or_false”. In this case we’ve been able to use one where previously we had the type in the type matcher. If I wrap the guard clause in (redundant) parentheses then it’s a little clearer where the boundaries are

def rangeMatcher(num:Int) = num match {
    case within10 (if within10) <= 10 => println(“within 0 to 10”)
    case within100 (if within100) <= 100 => println(“within 11 to 100”)
    case beyond100 (if beyond100) < Integer.MAX_VALUE => println(“beyond 100”)
}

Yet again, moving my way through it slowly has helped all the pieces fall into place. Again Scala wins the prize for not-immediately-obvious-syntax, but then that is eclipsed by winning the prize for incredibly-powerful-in-a-usable-way.

Wednesday, 12 June 2013

Terminology Confusion “Functions / Operations / …?” (Part Two of an Ongoing Series)

Again my lack of a formal (i.e. any) Computing Science education gets the better of me. What is the difference between a Function, a Operation and a Method? (I think I half know some of them, but half-knowing has already got me into trouble.)

Lets hit wikipedia again:

Subroutine (aka Function)

From Wikipedia.

In computer programming, a subroutine is a sequence of program instructions that perform a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. Subprograms may be defined within programs, or separately in libraries that can be used by multiple programs.

In different programming languages a subroutine may be called a procedure, a function, a routine, a method, or a subprogram. The generic term callable unit is sometimes used.^[1]

As the name subprogram suggests, a subroutine behaves in much the same way as a computer program that is used as one step in a larger program or another subprogram. A subroutine is often coded so that it can be started (called) several times and/or from several places during one execution of the program, including from other subroutines, and then branch back (return) to the next instruction after the call once the subroutine's task is done.

Operation (mathematics)

From Wikipedia.

In its simplest meaning in mathematics and logic, an operation is an action or procedure which produces a new value from one or more input values, called "operands". There are two common types of operations: unary and binary. Unary operations involve only one value, such as negation and trigonometric functions. Binary operations, on the other hand, take two values, and include addition, subtraction,multiplication, division, and exponentiation.

Operations can involve mathematical objects other than numbers. The logical values true and false can be combined using logic operations, such as and, or, and not. Vectors can be added and subtracted. Rotations can be combined using the function compositionoperation, performing the first rotation and then the second. Operations on sets include the binary operations union and intersection and the unary operation of complementation. Operations on functions include composition and convolution.

Method

From Wikipedia.

In object-oriented programming, a method is a subroutine (or procedure) associated with a class. Methods define the behavior to be exhibited by instances of the associated class at program run time. Methods have the special property that at runtime, they have access to data stored in an instance of the class (or class instance or class object or object) they are associated with and are thereby able to control the state of the instance.^[1]

Conclusion

So where does that leave us? Operations are disambiguated quite simply – they are a sub-set operators. Functions are a more general name for Methods, which is tied to Object Oriented programming. Excellent.

Tuesday, 11 June 2013

Infix and Postfix (and Prefix)

Jargon Alert! Two (OK, three) more terms from the mysterious worlds of Maths and Computing Science. Lets go to wikipedia:

Infix Notation

From Wikipedia, the free encyclopedia

Infix notation is the common arithmetic and logical formula notation, in which operators are written infix-style between the operands they act on (e.g. 2 + 2). It is not as simple to parse by computers as prefix notation ( e.g. + 2 2 ) or postfix notation ( e.g. 2 2 + ), but many programming languages use it due to its familiarity.

In infix notation, unlike in prefix or postfix notations, parentheses surrounding groups of operands and operators are necessary to indicate the intended order in which operations are to be performed. In the absence of parentheses, certain precedence rules determine the order of operations.
Update (5th September, 2013): The statement above about parentheses doesn’t refer to the use of parentheses to group the arguments passed to a method. It means that they should be used to indicate evaluation order. (c.f. Atomic Scala pp.54-57). It is common for Scala to be described as having “infix notation” but by this people generally mean that (as in the case with AtomicTest’s “is” method) a method can be called without a preceding dot (‘.’) and following parentheses for arguments. (c.f. Atomic Scala pp.119 for an example of this).

Reverse Polish Notation (aka Postfix Notation)

From Wikipedia, the free encyclopedia

Reverse Polish notation (RPN) is a mathematical notation in which every operator follows all of its operands, in contrast to Polish notation, which puts the operator in the prefix position. It is also known as postfix notation and is parenthesis-free as long as operator arities are fixed. The description "Polish" refers to the nationality of logician Jan Łukasiewicz, who invented (prefix) Polish notation in the 1920s.
The reverse Polish scheme was proposed in 1954 by Burks, Warren, and Wright^[1] and was independently reinvented by F. L. Bauer and E. W. Dijkstra in the early 1960s to reduce computer memory access and utilize the stack to evaluate expressions. The algorithms and notation for this scheme were extended by Australian philosopher and computer scientist Charles Hamblinin the mid-1950s.^[2]^[3]
…
In computer science, postfix notation is often used in stack-based and concatenative programming languages. It is also common in dataflow and pipeline-based systems, including Unix pipelines.

Polish notation (aka Prefix notation)

From Wikipedia, the free encyclopedia

Polish notation, also known as Polish prefix notation or simply prefix notation, is a form of notation for logic, arithmetic, and algebra. Its distinguishing feature is that it places operators to the left of their operands. If the arity of the operators is fixed, the result is a syntax lacking parentheses or other brackets that can still be parsed without ambiguity. The Polish logician Jan Łukasiewicz invented this notation in 1924 in order to simplify sentential logic.
The term Polish notation is sometimes taken (as the opposite of infix notation) to also include Polish postfix notation, or Reverse Polish notation, in which the operator is placed after the operands.^[1]
When Polish notation is used as a syntax for mathematical expressions by interpreters of programming languages, it is readily parsed into abstract syntax trees and can, in fact, define a one-to-one representation for the same. Because of this, Lisp and related programming languages define their entire syntax in terms of prefix notation (and others use postfix notation).

…

In Łukasiewicz 1951 book, Aristotle’s Syllogistic from the Standpoint of Modern Formal Logic, he mentions that the principle of his notation was to write the functors before the arguments to avoid brackets and that he had employed his notation in his logical papers since 1929.^[4] He then goes on to cite, as an example, a 1930 paper he wrote with Alfred Tarski on the sentential calculus.^[5]

While no longer used much in logic,^{[citation needed]} Polish notation has since found a place in computer science.

Simple. Now when we come across this again in the future, I’ll not look so blank.

[Deep Breath] For-Comprehensions

As I already noted, for-comprehensions are the point at which I lost track of the Fast Track to Scala course. I think the word “for” had lulled me into a false sense of security. It didn’t last long.

So now I’ve hit them again, in Chapter 2 of Scala in Action, and this time I’m determined to understand them. The rest of this post will comprise a running commentary on my progress towards this goal.

The Syntax

Already I can see why I got so confused:

‘for’ (‘(‘Enumerators’)’ | ‘{‘Enumerators’}’) {nl} [‘yield’] Expr

Scala in Action, Nilanjan Raychaudhuri, Chapter 2

But having just typed it in (no cut-and-paste for me), I can already see that, as before with functions and function-literals, that either parentheses and curly braces can be used to wrap the ‘Enumerators’. What else can I learn?

A Small First Step

On top of this, it must be my lucky day, as Nilanjan decides to tackle this from a perspective of tradition, and seeing as Java is the new Cobol, tradition is something that makes me very comfortable. Lets iterate through some collections:

val files = new java.io.File(“.”).listFiles
for(file <- files) {
val filename = file.getName
if(filename.endsWith(“.scala”)) println (file)
}

from Scala in Action, Nilanjan Raychaudhuri, Chapter 2

This is so damn legible that I’m already one step ahead of Nilanjan when he says that this looks very much like the equivalent foreach construct in Java. (Mmmmmm, the warm glow of familiarity.) But lets not jump ahead. we’re getting some terminology too – the “file <- files” element of all this is called a generator, and it’s job is to iterate through a collection. Why not call it an “Enumerator” like the previous syntax snippet? Don’t get too far ahead of yourself. I’m sure we’ll get to that.

OK. Continue.

Next, Add Some Definitions and Guard Clauses

Right, now we’re going to do a lot more within the generator.

for(
    file <- files;
    filename = file.getName;
    if(filename.endsWith(“.scala”))
) println (file)

from Scala in Action, Nilanjan Raychaudhuri, Chapter 2

This is precisely one of the concepts I now realise I’d missed during the course. Armed with my knowledge of the Scala-ite’s love of reducing things, I can begin to unpick what has happened here. Nilanjan tells me we can add definitions and guard clauses within a for loop. I also notice that we’ve had to add semi-colons (first sighting of them in Scala-land). The former (filename = file.getName;) simply defines a new val and the latter (if(filename.endsWith(“.scala”)) runs a check against it. If the guard clause is true, then the body of the loop executes, in this case, println (file).

Before we move on, I wonder if we can inline it even more?:

for(
file <- files;
if(file.getName.endsWith(“.scala”))
) println (file)

Sure can. I think one trap I need not to fall into here is to think of this as being like a standard Java for loop. The semi-colons (I am guessing) are there to help the compiler and that is all. I now know I can inline the definitions. But can I have more of them?:

for(
    file <- files;
    filename = file.getName;
    timeNow = new java.util.Date()
    if(filename.endsWith(“.scala”))
) println (“time now: ” + timeNow + “, file: “ + file)

Yup. This works fine, even with the just-noticed missing semi-colon (try it yourself).

But before I inadvertently fall off the deep end again, lets reel in the off-piste enthusiasm and get back to the book.

Multiple Generators

Now we’re getting complicated. Apparently we can specify more than one generator, and the loop is executed across both of them, just as if the latter was inside the former, and in light of that, the Scala syntax seems surprisingly simple (simpler than any way I can think of doing it in Java anyway):

scala> val aList = List(1, 2, 3)
aList: List[Int] = List(1, 2, 3)
scala> val bList = List(4, 5, 6)
bList: List[Int] = List(4, 5, 6)
scala> for { a <- aList ; b <- bList } println(a + b)
5
6
7
6
7
8
7
8
9

from Scala in Action, Nilanjan Raychaudhuri, Chapter 2

Nice.

One more thing before we turn to the next page, Nilanjan threw in that curly-braces-instead-of-parens trick I spotted earlier. Tricksy. He also put everything on one line. I was waiting for that, so that wasn’t so much of a surprise.

Interim Summary – The Imperative Form

Summary-time. Apparently all of these examples of the for-comprehension are in the imperative form. In this form we specify statements that will get executed by the loop (e.g. println (file) and println(a + b)) and nothing gets returns. OK. I get that. However, mapping back to the syntax definition at the top of this post, what I still don’t get is why we just talked about “statements” instead of “expr(essions)”, and “generators” instead of “enumerators”. Perhaps that will become clear later on. Lets keep on trucking.

Onward! The Functional Form (aka “Sequence Comprehension”)

Now we’re working with values rather than executing statements. That means they’re objects, and remembering that everything in Scala is an Object I’ve got my wits about me.

scala> for {a <- aList ; b <- bList} yield a + b
res3: List[Int] = List(5, 6, 7, 6, 7, 8, 7, 8, 9)
from Scala in Action, Nilanjan Raychaudhuri, Chapter 2

Now luckily for me, (although this was the point during the course when I was lulled into a false sense of security) I have come across the yield keyword before, in Ruby-land. But is it really lucky? After some back-and-forth reading / comparison, I’ve decided to not think about Ruby’s yield too much. All I want to take is the warm glow I get from a familiar keyword, but tackle the Scala meaning on it’s own terms. Lets see if I manage.

Nilanjan deftly expresses the difference between this and the previous example. Whereas originally we were just using the a and b values within the loop, now we’re returning the values from the loop. Unsurprisingly with this in mind the result is a List[Int]. We can then store this result and use it:

scala> val result = for {a <- aList ; b <- bList} yield a + b
res3: List[Int] = List(5, 6, 7, 6, 7, 8, 7, 8, 9)
scala> for (r <- result) print(r)
5
6
7
6
7
8
7
8
9

from Scala in Action, Nilanjan Raychaudhuri, Chapter 2

He then points out that while this new form is more verbose (yup) it separates the computation (adding a to b) from the use of the result (in our case, simply printing it). Does his claim that this improves reusability and compatibility? I can see how it does. And now I understand it, and have a nice syntax style I can parse, both styles seem very legible too.

Here We Are & This Is It

That was all surprisingly sane. As usual, taking it slowly, reading around a little, and trying out a few edge cases to cement my understanding helped a great deal. I think I’ll try the Koan related to this now to see how I get on, and if there is anything else I can learn.

A Haunted House

Virginia Woolf published a collection of short stories entitled “A Haunted House”. The eponymous opening story of the collection is among the first things I ever read by her as an author and it took me about 15 minutes to get off the first page. Why is this relevant to my study of Scala? Without spoiling anything about the story itself, it gave me the exact same impression of repeatedly sliding down the surface of the words, never really understanding what was going on. Scala in Action or a ghost story, both could leave me similarly stumped.

But in both cases I really wanted to grasp what was going on. I needed, I realised after a few attempts (each having ended with my looking blankly at the line of words in front of me, realising I’d missed something in the words just a few lines above), that all I needed to do was read very slowly, and very carefully. The problem was, in this information-everywhere age, with Facebook status update streams, and Twitter firehoses, and RSS, and bookmarks, and email, and …, and … it is easy to become accustomed to glancing over things; simple to say “Yup, I read it” when in fact we merely cast our eyes in it’s general direction, smiled at and only subconsciously digested the pretty pictures, checked there were no actions for ourselves and then moved on.

I’ve become so used to this way of reading that I can read an entire work of literature in this trance, not really paying attention to anything at all very much. And so here is why A Haunted House (and everything else by Woolf that I’ve managed to read) is like Scala in Action. Because, to really learn Scala, and to become a better developer, I can’t just consume all the source texts in a cursory way. I need to pick my way through them, line by line sometimes, just like Woolf has taught me to do. I need to always be aware, to keep my mind from wandering, and to stop when “my buffer overfloweth”, but plough on when I still have some mental capacity spare. And I need to enjoy myself as I go. Because challenging yourself is fun.

Monday, 3 June 2013

Terminology Confusion - “Variables” and val (Part One of an Ongoing Series)

In the grand Scala-brain-load I’ve reached variables. It seems that in this case, close reading can lead to confusion. I had assumed that when we talked about a “variable” we only meant a var, but it seems that this can also refer to a val:

“In Scala there are two ways you can define variables: var and val. A val is a single assignment variable sometimes called [a] value.” (Italics are mine)

Scala in Action, Nilanjan Raychaudhuri, Chapter 1

I’m not sniping. Nilanjan is completely correct in his use of “variable” to refer to vals but it highlights the need for me to keep my eyes open and my brain engaged.

Toire Wa?

(Thanks to @ChrisPhelps for his japanese help and input into the form of this post.)

As a language, Japanese is highly sensitive to context. (I suspect that English, as a language, is also highly sensitive to context too, but I’m trying to illustrate a point so run with me.) As a language, Japanese is highly sensitive to context. As an example, and because I’m British, lets consider the following:

Toire wa doko? - トイレはどこ。
Toire wa doko desu ka? - トイレはどこですか。

These are two forms of asking where the nearest toilet is in Japanese. The difference here is not primarily one of politeness (which is also a strong factor in the language) but instead it is one of context. If I wanted to be really terse (in an emergency for example) I could even be understood with this:

Toire wa? - トイレは。

However, if I wanted to be very specific (and, lets be honest, quite polite) I could use this form:

Toire wa doko ni arimasu ka? - トイレはどこにありますか。

And at the extreme end (which you probably won't hear apart from in samurai movies or Rurouni Kenshin) there's always:

Toire wa doko de gozaimasu ka? - トイレはどこでございますか。

In all cases, apart from the tersest, and the verbose, it is totally find with the listener to use all these forms – as long as the context is applicable.

So how does this relate to Scala? Well, it’s struck me, from the time I attended the Typesafe “Fast Track to Scala” course that context can allow you a lot of leeway here too. In fact, in discussions about how the interpreter would view the code we’d written, there was mention of how it is perfectly acceptable to leave out stuff the Scala interpreter (i.e. listener) can discern from context.

So bearing this in mind, you might understand how, as a learner, I’ve only just now seen an instance where taking syntax elements away in Scala made things more understandable. Up until this glimpse of pure joy I’d been working through every example meticulously putting things back in so that I could map back to something approaching the verbosity of Java which I grasp and feel comfortable with. Note I don’t say I like it, but it is a place of safety for me.

However, when I was in the Ruby world (for only about six months mind) I grew to like the clarity and expressiveness of much of the syntax. Things like this and this set my heart racing at the prospect of actually legible code.

So how has Scala managed it? Function literals is the answer. Take this example:

scala> val evenNumbers = List(2, 4, 6, 8, 10)
evenNumbers: List[Int] = List(2, 4, 6, 8, 10)
scala> evenNumbers.foldLeft(0) { _ + _ }
res4: Int = 30

Is it because it actually looks so much like the closure syntax in Ruby? Perhaps.

But lets have a little fiddle. Is it possible to go too far. With standard functions, we can drop the curly braces an things are still fine. Can we do this with their literal cousins? If this works:

scala> def max(a: Int, b:Int) = { if(a> b) a else b }
max: (a: Int, b: Int)Int
scala> max (10, 200)
res9: Int = 200
scala> def max(a: Int, b:Int) = if(a> b) a else b
max: (a: Int, b: Int)Int
scala> max (10, 200)
res6: Int = 200

Does this?:

scala> evenNumbers.foldLeft(0) { _ + _ }
res7: Int = 30
scala> evenNumbers.foldLeft(0) _ + _
res8: String => java.lang.String = <function1>

It seems not, although this isn’t an error, it’s certainly not what I’d hoped for. I’ll need to come back to this later when I’ve learned more.

But before I leave on a happy, satisfied note, there’s one more thing. The book has assumed I’m clever and stated:

Now, with your new knowledge of function literals, it should be pretty obvious that _.isUpper is a function literal:
val hasUpperCase = name.exists(_.isUpper)

Scala in Action, Nilanjan Raychaudhuri, Chapter 1

I’m afraid Nilanjan is being optimistic regarding myself again. Lets try some deduction and see where that gets us…

First Observation: I’d assumed that function literals would come after the method name and the parens. No evidence of this here;
Second Observation: I’d also assumed that function literals would be contained within curly braces. That’s clearly wrong too
Third Observation: isUpper has no parens

So how can I discern that it’s a function literal? The hint in the text seems to be the presence of the underscore (_), but we also learned a little earlier that they can also be used to assign default values to variables. What I can see however is the dot (.) appended after the underscore, and that (so far at least) indicates that a function is being called (isUpper), and the fact it is appended onto an underscore does strongly imply it’s a literal. Would it work with curly braces in place of the standard parens? It turns out it would:

scala> val hasUpperCase = "name".exists{_.isUpper}
hasUpperCase: Boolean = false

The final observation, not related to function literals specifically, means that think I can assume isUpper has no side effects (it has no parens of its own).

I get the feeling I’ll need to come back this. It’s solidifying, but nowhere near enough for my liking yet.