Ricky's technical blog: April 2007

Walking to the bus stop, with a coffee burning one hand and my breakfast the other, I noticed how poor my area looks; an urban area where many students and young professionals rent houses. The tenants aren't poor, the owners neither, but the area itself is impoverished, because the owners don't care and the tenants don't care. I got on the bus, and dropped half my coffee on the bus floor as it went over a speed bump, because I was holding the coffee so as not to burn my hand too much. Ah, well, glad it wasn't my bus. It struck me right there how little tenants look after the property they rent. Some local teenagers smashed a bottle of beer outside my house, and I didn't clear it up. I didn't ask the local authorities to do so either. I just looked at it with disdain whenever I passed it, and resented paying my local taxes even more. In my job as a coder, on a 1.5-man project, I am a tenant. I'm actually a good tenant, in that I look after the codebase pretty well, but I'm still a tenant in the most important, and crippling, way. I don't own my code. This isn't the same as responsibility for my code; I have that. It's my code, but it's not mine. I can't walk away from my employer and reuse the code elsewhere. That's not completely true; for a couple of libraries I wrote for the project my boss has given me licence to release them for free. For those libraries, I actually find myself reserving my best coding practices, because I know that other people will need to read the code. Not just in some abstract "sometime after I leave my job" way, but in a "they already do read it" way. I don't bother to create separate libraries for code that I can't release separately. In other words, I don't decouple as much as I could, because the practical benefits are too indirect. Realistically, I know that my project will die when I leave. That's not to say that I won't be very gracious in documenting it. My boss doesn't have the skills to maintain it well, and he's uncertain as to how to make more money from it, so he probably won't want to release the whole thing for free until it suffers from bitrot (which will probably be whenever IPv6 becomes dominant). The fact that I don't own the code means that my goals and the project's goals are slightly at odds with each other. There's an IP stack in the project, a serialisation library, clever handling of scroll bars when stuff is dragged around the display, but only our 200 students (per year) are benefitting from those. The funding comes from a nationwide education body, but they don't have rights over the code either. In that respect, it's a waste of money, because other establishments can't easily customise it for their requirements, and largely they don't know about it. In fact, even within our department, another member of staff wanted to use the software, and she couldn't, because it didn't have a particular feature. I offered to implement it, but my boss declined because it wasn't part of what the funding was paying for. If it was my code, I'd happily implement it in a separate branch, in my spare time, and I'd feel enthusiastic about it. Now let's look at the most recent entry in Bugzilla for this project: "A long term thing for a quiet Friday afternoon (if such a thing exists!) In the same way you have files in the jar for customisation of "installation number", "installed to" etc, I think there may be a long-term need to use a similar mechanism to allow some customisation of the conformance rules (this will be important if other people use IPSim, as there may be some things that I want to give feedback on that other people may not care about" We lost 150 users already by not doing this for the other member of staff. Plus, I could only code in advance customisations that I can anticipate. If I'm on holiday or busy coding a new feature that we already have funding for, we'll miss out on the opportunity. For a small project like this one, developer motivation and the quality of the project would increase if the developer could reuse the work elsewhere. That could mean open-sourcing it, as has been done for two small parts (a functional programming library and a Java layout manager), or it could mean an agreement not to use the code for commercial purposes for a fixed term after employment. How about for a larger team? I'd suggest to assign each source file to a single developer, who has responsibility over that file, and has actual ownership, so that he can reuse the file at some point, even now, in another company. Then any changes made by any other developer to that file are automatically licenced so that the owner can use them.

CORRECTION: Neal has since informed me that he, since the presentation that this post references, has learned that there is no such JCP rule. I'll still keep this post, because it makes some useful points (and so do the comments).

Josh Bloch, with "crazy" Bob Lee and Doug Lea, proposed their Concise Instance Creation Expression solution for closures for Java, last year.

They didn't address returning from the enclosing scope from within the closure. Theirs is a trivial syntax change, probably fairly straightforward to implement but it wouldn't address simple cases such as finding the first line in a file that matches an expression. If you're already familiar with why CICE is insufficient, feel free to skip past the code here.

Here's the psuedocode:


forEachLine ln in "/etc/passwd" do
   if ln contains "root"
      return ln

Here's the usual Java 5:


BufferedReader reader=null;
try
{
    reader=...;
    String line;
    while ((line=reader.readLine())!=null)
    {
        if (line.matches("root"))
            return line;
    }
}
finally
{
    if (reader!=null)
        try
        {
            reader.close();
        }
        catch (IOException exception)
        {
            log(exception); //but don't rethrow it
        }
}

So, of course, as the good programmers we are, we write an abstraction for that:


forEachLine("/etc/passwd",new LineProcessor()
{
    public void processLine(String line)
    {
        if (line.matches("root"))
            return line; //won't compile
    }
});

There's a problem there, the 'return line;' is returning from the run method, not the enclosing method. Ok, we can solve that, let's make LineProcessor return a String, which is normally null unless we want to end the loop. Then forEachLine can also return a String, which will be the return value of the method.


return forEachLine("/etc/passwd",new LineProcessor()
{
    public String processLine(String line)
    {
        return line.matches("root") ? line : null;
    }
});

This works, albeit with some strange semantics, which could be made more obvious by adding a class to hold the result, but that's not important for this post.

So forEachLine now has to know that the LineProcessor it runs might want to return early. When callers of forEachLine don't want to return early, they still have to return null all the time. It's inconvenient; to not punish callers that don't want to return early, we now need two versions of forEachLine, one that supports early return, and one that doesn't. That's confusing. The bad smell wafts up, making you either overload, or use names like "forEachLineWithAPossibleEarlyReturn".

What does CICE do for this?


return forEachLine("/etc/passwd",LineProcessor()
{
    return line.matches("root") ? line : null;
});

That's right. It removes a bit of punctuation and the 'new' keyword. Without underestimating that contribution, it doesn't actually help to make the end result resemble the psuedocode. It would be hard to get back from this end result into the psuedocode, there's a loss in translation. It isn't expressive.

The BGGA proposal allows this:


forEachLine(Line line: "/etc/passwd")
    if (line.matches("root"))
        return line;

Now just to remind you of the psuedocode again:


forEachLine ln in "/etc/passwd" do
   if ln contains "root"
      return ln

It doesn't need a special version of forEachLine for each use case. It's generally better, more expressive, with one wart that I'll mention at the end.

So why does this matter? Neal Gafter, the proposal lead, used to work for Sun, he's well respected, so he can easily make this into a JSR. Or not.

Neal Gafter now works for Google. All the work he's put into BGGA has been on his own time, it's not allowed to be his "20% project" (Google employees spend 20% of their paid time on ideas of their own). He is not the JCP's contact for Google. Surprise, surprise, Josh Bloch is, and Josh has his own more limited, less expressive, proposal, as I've demonstrated just now.

The JCP has a rule that disallows individuals from being on the JCP if their employer is on it. I felt sorry for Neal at the end of his presentation on closures at Google. Here's a quote from the last few minutes (emphasis from original):

From the audience: "Is someone planning on opening up a JSR on this?"

Neal: "There's a long answer and, I'm afraid, a longer answer. I was planning on writing a JSR for this. Sun Microsystems actually asked me to write a JSR proposal for this. But I can't do that, because I am not Google's JCP representative. And I cannot do it as an individual contributor to the Java Community Process, because as a Google employee I cannot be an individual contributor to the Java Community Process.

Google's JCP representative is Joshua Bloch, and he has other ideas about what should be done in the Java language in this space. As far as I know he is not currently planning on submitting a JSR on this. My hope is that creating a prototype, which by the way, I'm doing on my own time, will be something that Sun can use as a justification for Sun creating a JSR to do this into the language, because I think that's the only way this will happen in Java at this point."

Here's a little of Josh Bloch on closures:

"I like closures. I think they work very well in languages that are designed around them, like Smalltalk and Scheme and so forth. I think closures in java are a good thing. We've basically had them since 1.1 in the form of anonymous class instances and those are a bit clunky so I definitely see a place for improving support for closures; on the other hand some of the proposals that I've seen floating around are overly complex; they involve massive changes to the type system, things like function types and I'm severely worried about pushing the complexity of the language beyond the point where Joe Java can't use the language anymore.

If we add more support for closures I think it has to be in the spirit of the current support, which means that closures should take the form of instances, of types that have a single abstract method, whether they are interface types such as Runnable , or class types such as TimerTask. I think it would be nice if a better syntax in the current anonymous class syntax were adopted and if requirement with regards to Final variables were somehow made more tangible, which doesn't mean necessarily relaxed; I think it's actually good that you can not close over non Final variables, but it's a pain to actually mark them final, so if they automatically marked themselves final which would be nice."

His points are all good, except that function types aren't really a big overhaul of the type system - they will be resolved into interfaces. They're no bigger a conceptual problem than array types are.

Also, I think that if I could write a forEachLine in the BGGA style, I'd be more than happy for novices to use it.

The wart I mentioned is that Tennents' Correspondence Principle is violated by the BGGA when a closure throws a checked exception, and the interface that the closure conversion converts it to doesn't include any checked exceptions.

Neal proposes an extension to the generic type system to allow writing interfaces whose single method throws 0, 1 or many checked exceptions, but for those cases where the interface is already fixed, such as SwingUtilities.invokeAndWait(Runnable), the exceptions cannot be passed back to the caller. The code fails to compile. This is an inconsistency with the rest of the proposal. You can return from within a Runnable closure, but you can't throw a checked exception from within it.

I think that it would be possible to achieve exception transparency without needing anything special on the interface, as checked exceptions are purely a compile-time concept.

Ricky's technical blog

Monday, April 09, 2007

Really Own Your Code

Wednesday, April 04, 2007

Is Josh Bloch the biggest problem for closures?

Blog Archive

About Me

My Blog List