General musings on programming languages, and Java.

Monday, July 28, 2008

Optional Values in Java

If you take some Java code and write psuedocode representing it, you'll probably find that you don't bother with null checks and you don't bother with getters and setters. Sure, in psuedocode you're lazy, but it's more than that - null is usually wrong, so much so that intentional uses of null look like sloppy code.

In fact, if you're writing an API, you probably want to keep null out of your interactions with your users - you want to make sure they realise their mistake if they give you null and you don't want to give them null, lest they forget to check it. But there are actual times when you need some way of representing an optional value.

One particularly popular approach is to use sentinel values - let's say "" for Strings, Double.NaN for doubles, -1 for ints. Now everywhere you read the value you need to check for the sentinel, or be sure that not checking for it won't cause you problems.

Another approach is to use an empty list to represent no value, and a list of 1 element otherwise. Again you need to check whether the list is empty before getting the result out.

You could make a class that might hold a value, that has methods called hasValue() and getValue(). Again, requires a check.

In all these you need to remember to check before you get the value - not much of an improvement over using null directly.

If I categorise some code including null checks (no, not nunchucks), then we'll have something to toy with:

1. foreach

if (x != null) {
 doStuffWith(x);
}
2. map
String s;
if (x == null) {
 s = null;
}
else {
 s = x.toString();
}
3. fold
int length;
if (s == null) {
 length = 0;
}
else {
 length = s.length();
}
Those were some strange names I gave to these categories! Let's tackle foreach first: Think of a value that might be null as a collection containing 0 or 1 elements - foreach would be a loop that runs 0 or 1 times to do something with the value.

map is a mapping from a domain containing null, to a co-domain containing null, - for example, mapping from rectangular coordinates to polar coordinates should probably yield null for a null input, if it doesn't throw an exception.

fold is a more manageable name for a 'catamorphism', which is a transformation that tends to yield a simpler value than the collection it's applied to (which seems the opposite of a fold in origami). In the case of a possibly-null value, the result is simpler because the result is (usually) a not null value.

Being responsible non-repetitive Java programmers, we'd like to encapsulate our possibly-null value plus the checks into an object with three methods, foreach, map and fold, rather than repeating them everywhere:

interface Optional<T> {
 void foreach(Task<T> task);
  R map(Conversion<T,R> conversion);
  R fold(R theDefault, Conversion<T,R> conversion);
}
(you might really want to make Optional Iterable so that you get Java's foreach loop, rather than providing foreach, as an implementation detail).

In the same way that java.util.Collections.sort can take a Comparator, each of these methods takes in an object that has a method that gets called if and when it needs to be.

interface Task<T> { void execute(T value); }
interface Conversion<T,R> { R convert(T value); }
Let's look at how we can convert the earlier null-using code to code using Optional.

1. foreach

x.foreach(doStuff);
2. map
String s=x.map(toString);
3. fold
int length=x.fold(0,length);
Of course, the likelihood is that you're not lucky enough to already have doStuff stored as a Task, toString stored as a Conversion and length stored as a Conversion, so perhaps you'd use an anonymous class to provide those. Unfortunately the syntax for anonymous classes bloats the code too much to be readable in a blog (or an IDE).

It would be useful to have good syntax for using foreach, map and fold in Java, so that there was at last an attractive alternative to null. For now we'll have to settle for attractive semantics rather than attractive syntax though.

I think this is beautiful because it provides a level of abstraction that gets you further from a potential source of bugs, makes your code more expressive about what it accepts, and lets you do in objects what otherwise would be repetitive.

A complete implementation of Optional is available in Functional Java under the name Option. There, Task is called E, and Conversion is called F. Option is most widely known as Maybe, from Haskell.

May your nulls rest in peace.

Monday, July 21, 2008

Designing an Object

If there may exist an object with a method that is not appropriate at all points in the existence of the object, then the object or the method are flawed.

A class encapsulating a compile phase in an IDE might have a blocking or non-blocking execute() method, plus a getErrorMessages(). There is an obvious protocol in using this class - instantiate, call execute(), call getErrorMessages(). It's not particularly hard to use, though it's also not hard to get wrong. Even if you decide not to help those users who don't bother to learn the protocol, it's worth thinking about whether that protocol should even exist, and what the alternative is.

Many readers would probably, when prompted at least, make execute() return the error messages (or a Future for them), which solves the problem quite well. If you wouldn't, keep adding phases plus methods only appropriate for each phase, to one class that grows and grows, until you end up agreeing or changing career :) Anyway, in this case it's clear that the object was flawed by doing two things one after the other - executing the compile phase and delivering results.

I bet most people could train themselves to spot this flaw and remove it, and if anyone only goes that far as a result of reading this post I'll be happy. But most people are probably quite happy with another flaw, java.util.Iterator.next(), which is only allowed when hasNext() returns true, in most Iterator implementations. But moving next() or hasNext() onto another object doesn't really work for Iterator. For a long time I was unhappy with Iterator, but didn't really have a solution, despite trying a couple of things out.

The biggest use of Iterator directly in Java for many years was in what has since been replaced by a foreach loop. There are some detractors of, well, anything new, but generally the foreach loop was really well received by the Java community. It provides a higher-level interface than the Iterator gives us. We can write a lot of code using the foreach loop that would have been more verbose and awkward to get right using Iterator directly. But foreach is only one abstraction; there are some more that are higher still than it, and don't (but can) depend on Iterator. If you don't know what those abstractions are I really think you should take the time to learn about map, filter and reduce, and the more general but less usable parent of those, fold. But this isn't a post about those, so I'll return to the topic at hand.

Iterator has been shown to be flawed, though flawed in a way that is acceptable to most of us and in a way we're used to, and in a way that seems non-trivial to solve (without knowing about map, filter, reduce and fold!). You might not be in a position to, or even want to, replace Iterator, but at least you should know not to copy its design, or bind yourself unnecessarily to it. You're now either armed with a simple way of deciding between two API designs, or you're about to tell me why I'm wrong.

Happy coding.

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.