General musings on programming languages, and Java.

Friday, September 19, 2008

My Scala Coding Style

This was originally a posting to the scala-tools mailing list, in response to someone asking how to make the Eclipse plugin format using braces on the next line and spaces instead of tabs.

I used the braces-on-next-line rule, lining up the { to the previous line rule for years, and tabs and no spaces, for years. I was happy with it. I used an 80-character line length limit, which made coding in a text terminal easy, and tabs equivalent to 8 characters (though the actual tab width didn't really matter as I only used indentation to signify nesting depth, not to line up individual characters). My code was very structured.

If you look at my blog's very early days, you will find a post somewhere saying to avoid anonymous classes in Java. This is exactly the opposite of what I'd say now (now I'd say avoid Java :) ). It dawned on me after a while that the reason I didn't like anonymous classes was that my coding style made them a real pain in the backside to use. Specifically, if you have one anonymous class inside another with the above coding style, you end up splitting most lines of actual code.

Then I learned Lisp.

I realised that my overly structured Java code penalised me for nesting. So nesting must be bad, or the coding style must be bad.

Most human languages are highly recursive - within one sentence you can set up phrases, talk about what would happen in the future, and discuss two possible futures or even possible pasts. Only one language is known of that isn't highly recursive - the Piraha language. In Piraha the culture discourages talking about the future, or any event that you haven't seen personally, or been told about by another. A researcher who lived with them and learned their language tried telling them about Jesus (yes, there are God-botherers in linguistics too). One Piraha asked what Jesus looked like - and when the researcher said he didn't know, as Jesus lived 2,000 years ago, the Piraha wasn't interested anymore. This actually caused the researcher to start questioning his own religion, but that's a little beside the point.

I imagine that if you asked a Piraha to design a computer programming language (or a suitable analogy to one - perhaps they wouldn't be interested in computers), I imagine it would be limited to a nesting level of 2, by virtue of a 25-space tab and an 80-character line limit. And if they invented braces, I'm sure they would be lined up.

This has parallels in our society too. The more fluent amongst us can happily deal with long sentences, discussing multiple futures. The less fluent (perhaps speakers of a foreign language learning English) will generally prefer shorter sentences, discussing only the present.

In summary, pick a coding style that doesn't punish nesting, unless you want to make the language 'unnaturally' statement-orientated. I have taken this to an extreme - I use one space for indentation, and never place opening OR closing parens on their own lines. I think this is a direct result of learning Lisp, but it took a long time after learning Lisp for me to change.

Plus, silly Scala makes putting the brace on the next line fail to parse sometimes anyway.

Wednesday, September 10, 2008

Implementing the Builder pattern in Java without repeating code

When writing some Java wrappers around some CGI requests at work, I began with a normal implementation of the builder design pattern, but when I realised I was going to have to do this for about 50 CGI requests, and some of them were nested (CGI requests with query parameters to be sent on to a further CGI request on another machine), and that many of the parameters had interesting constraints, I realised that while the API might be fine, the implementation we were looking at, Josh Bloch's, encouraged repetition of logic.

Anyway, here's an example to get us started. The problem:

Person person = new Person("John", "Jackson", 1979, 11, 10, "AC2193");
We wanted something more like:
Person person = new Person.Builder()
    .forename("John")
    .surname("Jackson")
    .yearOfBirth(1979)
    .monthOfBirth(11)
    .dateOfBirth(10)
    .nationalInsuranceNumber("AC2193").build();
To keep the code short, we'll use a simpler class as an example, a Person consisting of name and age. As you'll see, even this can be large enough to be interesting.
public class Person
{
    private final String name;
    private final int age;

    private Person(Builder builder)
    {
        this.builder = builder;
    }

    public static class Builder
    {
        private String name;
        private int age;

        public Builder name(String name)
        {
            this.name = name;
            return this;
        }

        public Builder age(int age)
        {
            this.age = age;
            return this;
        }

        public Person build()
        {
            return new Person(this);
        }
    }

    public String getName()
    {
        return name;
    }

    public int getAge()
    {
        return age;
    }
}
Even for 2 parameters, this is quite a lot of code, though thus far there isn't really any logic to be duplciated. Now let's look at a really simple constraint, that values cannot be set twice.

The most obvious way of trying this would be to have a boolean alongside each field in the builder, e.g.:

private String name;
private boolean nameAlreadySet;

private int age;
private boolean ageAlreadySet;
And then in the name(String) and age(int) methods in the Builder you would check the value of that boolean, and throw an IllegalStateException if the boolean had already been set. This is clearly a repetition, which can lead to copy-and-paste errors or just make things hard to change.

In object-orientated programming the usual way of handling this would be to package the field with its boolean in an object and call it, say, MaxOnce. There is a good reason not to go down this path, though, it's difficult to chain MaxOnce with other such types, for example when we want BoundedInt, which prevents values outside a certain range, to work with MaxOnce. So we have a problem that the classes don't work together well. Time for another approach.

It would help if MaxOnce and BoundedInt were more like filters that data gets passed through (or not, if the data is invalid). Enter Parameter.

private final Parameter<String> name = maxOnce("name", null);

private final Parameter<Integer> age = bound(maxOnce("age", 0), 0, Integer.MAX_VALUE);
Notice how bound and maxOnce are chained together in the age parameter It's easy to see how you might write other filters. Here's a largely useless example:
Parameter<Integer> number = not(5, bound(maxOnce(0, "a number from one to ten, but not five"), 1, 10));
For a Parameter that has no default, it might be handy to store the value as an Option, rather than use null, or an empty String or some other sentinel. In another case we want to store the value as a TreeMap (for sparse arrays, mentioned later). So generally we'd like to be able to specify an input type and an output type for a Parameter.
private final Parameter<String, Option<String>> name = maxOnce("name");

private final Parameter<Integer, Option<Integer>> age = bound(maxOnce("age"), 0, Integer.MAX_VALUE);
Note that bound and maxOnce work together for the age parameter, as two filters.

In a few cases, the Parameters that we use are allowed to take multiple indexed values. They are effectively sparse arrays. The Parameter's input type is a Pair<Integer, T> and the output type is a TreeMap<Integer, T> - each incoming Pair gets added to the TreeMap.

private final Parameter<Pair<Integer, String>, TreeMap<Integer, String>> = ...;
We can see that the Parameter is more a declaration than it is an actual value. Then it's quite handy that, actually, Parameter holds no mutable state - we store that in a Map<Parameter, Object> but with slightly better types, wrapped up as a GenericBuilder, and we don't modify that Map, we copy it when we add values, like CopyOnWriteArrayList does in the Java libraries.

Here's the original Person class with a new implementation:

public class Person
{
    private final GenericBuilder finalValues;

    private static final Parameter<String, Option<String>> nameParam = param("name", "The name of the person", Conversion.<String>identity());

    private static final Parameter<Integer, Option<Integer>> ageParam = notNegative(param("age", "The age of the person", Conversion.stringToInt));

    private Person(GenericBuilder finalValues)
    {
        if (realBuider.isDefault(nameParam) || realBuilder.isDefault(ageParam))
            throw new IllegalStateException();

        this.finalValues = finalValues;
    }

    public static final class Builder
    {
        private GenericBuilder realBuilder = new GenericBuilder();

        public Builder name(String name)
        {
            realBuilder = realBuilder.with(nameParam, name);
            return this;
        }

        public Builder age(int age)
        {
            realBuilder = realBuilder.with(ageParam, age);
            return this;
        }

        public Person build()
        {
            return new Person(realBuilder);
        }
    }
      
    public String getName()
    {
        return finalValues.get(nameParam);
    }

    public int getAge()
    {
        return finalValues.get(ageParam);
    }
}
There are a couple of extra bells and whistles in the real code, such as building and parsing URLs containing the parameters. I have another post taking this one step further (using Parameters from code generation) in the pipeline.

Sunday, September 07, 2008

An IRC Bot in Haskell, 20% code, 80% GRR

I decided to look at Haskell more seriously, after mainly using it to learn functional programming and, well, as a posh calculator. So when I came across Don Stewart's little tutorial on writing an IRC bot, I followed it, but with one use case in mind:

In the #scala channel, often it's handy to display a URL to a bug by its index, particularly as the URLs are a little tricky to remember.

Don's code was excellent, very readable, and on my little Asus EEE, all I had to do besides install ghc6 was to install ghc6-network-dev - for some reason the Xandros packages are quite split up.

All worked fine - I tested it in a private channel, then I wanted to make it work on a remote machine, which Eugene Ciurana lets me use. Eugene is mainly a Java programmer, so he doesn't have Haskell installed at all, and as I write this, it's around 9am on a Sunday his time - I'm not going to bother him!

So I wondered about installing ghc as a user, and downloaded the 64-bit binary distribution for ghc-6.8.3. After extracting it I realised I couldn't see a bin/ directory or similar, so I checked the README file, which pointed me at the INSTALL file. So I typed ./configure --prefix=/home/ricky/ghc/, only to find out that gcc was broken on the machine.

Eugene also lets me use another machine, a 32-bit one, but on there I got this error from ./configure:

"checking for path to top of build tree... utils/pwd/pwd: /lib/i686/libc.so.6: version `GLIBC_2.3' not found (required by utils/pwd/pwd)"

dmwit and Heffalump (no, really) from #haskell suggested I try using ghc 6.8.2 instead of 6.8.3, which I am going to try shortly, but I thought of another solution. My little Asus EEE runs ghc (albeit version 6.6) well enough, so I compiled my file. After using gcc on and off for years, I was surprised that ghc 3.hs didn't do anything useful - #haskell suggested ghc --make 3.hs. This produced an executable file called 3, which I promptly ssh'd to both of Eugene's machines. On both it failed because libgmp wasn't present. So I was advised to try ghc --make 3.hs -optl-static - which seemed to almost kill my EEE. But a minute and a half later my EEE returned to its usual speed, last.fm started playing again, and I had a somewhat larger file called 3, which worked on the target machine!

So, if you wander into #scala now and type ~trac 100 it will give you a link to Scala bug report 100.

I spent more than 4 times as long on the config as I did on writing (well, mainly copying, though I copied without a clipboard) the code. Hopefully that will drop next time!

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.