Monday, February 13, 2006

Manage your life with voo2do

When you have a job, a house, several kids and a wife, your day-to-day tasks can quickly get out of hand. You know that you're in trouble when:
  • Your bills are never paid on time
  • You often apologize for things you forgot to do
  • There are important things that you've been postponing for months
  • You feel like there are a thousand things you need to do this week
Yes, I was definitely in trouble.

After a little thinking, I came to a conclusion that I need a tool to organize my "to-do" list so that every task will receive the proper attention and will not be forgotten. My requirements were simple:
  • It had to be web-based, so that I can access it from my home and my office
  • Tasks should have priorities
  • For some tasks, I want to specify a deadline
  • I want it to be easy to use, otherwise I'll probably stop using it after a while
  • It better be free (everything is free nowadays, isn't it?)
By pure luck I stumbled upon voo2do, which fulfills all these requirements and a few I haven't thought about. I find it immensely useful and a joy to use.

Another thing that helped me is the Getting Things Done® (GTD) system, by David Allen. I'm not really using the system as is, but it is a very powerful way to, well, get things done. Here's a good introduction to GTD - read it!

One thing in GTD that particularly matched my personal experience is the idea that our life is filled with stuff that, if not taken care of, causes stress and anxiety. We keep thinking about everything that needs to be done, instead of actually doing it. And the first step to reduce this constant stress is to record all this stuff instead of trying to keep it inside our limited memory. And for me, voo2do is a perfect way to do that.

Thanks, Shimon!

Thursday, December 29, 2005

Preparing for Backwards Compatibility

Maintaining backwards compatibility is a pain the ass. In a perfect world (at least from a developer's point of view), every version of the software is completely free from the shackles of the previous versions. It can be deployed without any regard to the established data, APIs and protocols.

Unfortunately, in the real world users expect the new version to be able to read old data. They expect their plugins and extensions to continue working without having to adapt them to a new API. They even expect old client software to work with the new version of the server, and they'll upgrade the client software only when they damn feel like it.

All these "unreasonable" user expectations force us developers to build backwards compatibility into the product, and it's best to do that early, before there are multiple product versions out there. In this article, I'll show a few tips and techniques for building backwards compatibility support into the software's data files and communication protocols.

The First Step

First and foremost, every piece of data that your application uses should include a version number. This little piece of metadata will allow the application to give differential treatment to different data format versions. For example if you have an XML configuration file, add a "version" attribute to its root element. If you have a proprietary client-server protocol, add the version number as part of the first message that is sent from the client in order to connect to the server.

Now that you have a version number in place, be sure to update it each time that the data format changes, no matter how small the change is. The numbering system is up to you - you can use an integer that is incremented each time the format changes. Or the numbering can be in sync with the product's version number, if you find that more convenient.

Processing Versioned Data

Not surprisingly, the code that is responsible for processing the data needs to change whenever the data format changes. The natural tendency is to add "if" statements to handle the differences between versions, like this:

if (version >= 7) {
  // current code
}
else if (version >= 3) {
  // backwards-compatibility code
}
else {
  // ancient legacy code that nobody uses anymore
}

Pretty soon, the code is peppered with ifs and elses and becomes incredibly brittle and unmaintainable. It becomes impossible to refactor or optimize it, for fear of changing its delicate logical structures and breaking its backwards-compatibility. In short, a nightmare.

A much better approach is to "branch" the code - make a copy and modify only the copy, without touching the original. For example, we can have a DataProcessor interface (call it whatever you like) with multiple realizations, one per data format version. So we might have a class called DataProcessor_ver_1 and another called DataProcessor_ver_2, and so on.

We've all been taught that code duplication is a Bad Thing that Should Be Avoided. In this case however, code duplication is your friend. It allows you to evolve the data format without having the slightest fear that changing the data-processing code will break backwards compatibility. You can refactor the code, remove obsolete sections, optimize it etc. with complete confidence. You can even make radical changes such as replacing the XML parser that you use, as long as the old code can still use the original XML parser. And when you want to check what changed between version 7 and 8, simply open the two classes in your favorite diff utility.

Now you may be tempted to use inheritence, like this:

public class DataProcessor_ver_2 
             extends DataProcessor_ver_1 {
}

Please resist the temptation and make a full copy. Using inheritence is arguably even worse than the "if-else" method, since the logic that differentiates between versions becomes implicit instead of explicit. The code's internal flow is more difficult to follow. Heck, it's not even all in the same file!

Using a Factory

To complete the picture, the application should perform the following steps:

  1. Read the version number of the data
  2. Create the appropriate DataProcessor for this version
  3. Use the DataProcessor to process the data

Step #2 is best accomplished using a factory. The naive approach would look something like this:

public class DataProcessorFactory {

  public static DataProcessor createInstance(
                                String version) {
    try {
      switch (Integer.parseInt(version)) {
        case 1:
          return new DataProcessor_ver_1();
        case 2:
          return new DataProcessor_ver_2();
        case 3:
          return new DataProcessor_ver_3();
        default:
          throw new RuntimeException(
            "Unsupported data processor version " + 
            version);
      }
    }
    catch (Exception e) {
      throw new RuntimeException(
        "Cannot create a data processor for version " + 
        version, e);
    }
  }
    
}

The problem with this approach is that the factory class needs to be updated whenever a new version is added. To get rid of this headache, we can use reflection:

public class DataProcessorFactory {

  public static DataProcessor createInstance(
                                String version) {
    try {
      String className = DataProcessor.class.getName() + 
                         "_ver_" + 
                         version.replace('.', '_');
      Class c = Class.forName(className);
      return (DataProcessor) c.newInstance();
    }
    catch (Exception e) {
      throw new RuntimeException(
        "Cannot create a data processor for version " + 
        version, e);
    }
  }

}

Note that the code assumes that the DataProcessor realizations reside in the same package as the DataProcessor interface.

An Even Better Factory

If your application uses several different data formats, you might find it annoying to keep multiple factories that basically all do the same thing. Instead, you can create a single generic factory that handles all of your data formats:

public class GenericVersionedFactory {

  public static Object createInstance(
                         Class targetClass,  
                         String version) {
    try {
      String className = targetClass.getName() + 
                         "_ver_" + 
                         version.replace('.', '_');
      Class c = Class.forName(className);
      if (!targetClass.isAssignableFrom(c)) {
        throw new RuntimeException(
          className + " is not a " + 
          targetClass.getName());
      }
      return c.newInstance();
    }
    catch (Exception e) {
      throw new RuntimeException(
        "Cannot create a data processor for version " + 
        version, e);
    }
  }
    
}

The factory expects an additional argument which is the expected interface of the created instance. It uses this argument to infer the target class name, as well as to check that the created instance is of the correct type.

To use the generic factory, you would do something like:

DataLoader dl = 
  (DataLoader) GenericVersionedFactory.createInstance(
                  DataLoader.class, 
                  version)

Tuesday, October 25, 2005

Firefox Faster than Internet Explorer

There's little doubt that Mozilla Firefox is a better browser than Microsoft Internet Explorer. It is easy to see how the tabbed browsing in Firefox make your life easier, as well its simplicity and better usability.
One thing that I wasn't sure of, though, was that Firefox is faster. Browsing speed can mean several different things, such as download time, how quickly the page is rendered, and so on. It is also quite difficult to quantify. But at least in one respect, Firefox is indeed much faster - scrolling. I've noticed that Internet Explorer seems to be quite sluggish when scrolling through a long page, so I tried to time just how sluggish is it. I opened a page containing about 190kb of text1, held the "page down" key pressed, and measured how long it took for the browser to scroll through the whole document2. The results were dramatic:
  • Internet Explorer - 40 seconds
  • Firefox - 3 seconds
So at least in one respect, Firefox is indeed much faster than Internet Explorer. Granted, on a faster computer the sluggishness of Internet Explorer may be less noticeable, but still it shows that something in its rendering engine scales very poorly.
1 This is the page I used.
2 Measured on a 800Mhz Pentium III.

Saturday, October 08, 2005

Java API Pitfalls: Boolean.getBoolean(String)

Creating a public API is a task that should not be taken lightly. Any bad decision at this stage tends to become baked in, remaining there for posterity. One such lousy decision was made by an unnamed Sun engineer, eons ago when dinosaurs roamed the earth and the Java programming language was born.

I can imagine this developer thinking to himself "I really need a method that gets a boolean value from the system properties. Since this method returns a boolean, it obviously belongs in the Boolean class! Now let me think, should I call it getBooleanSystemProperty? Nah, that's way too long and I don't have an IDE with code completion. I'll just call it getBoolean - it's so much catchier and saves typing fourteen characters!".

And thus, the Boolean.getBoolean(String) method was born.

Cut to a few years later, a programmer needs to convert a String to a boolean value. Remembering that there's some static method in the Boolean class for this, she types "Boolean." and waits for the code completion popup to appear. Her eyes scan the list of method names, and stop at getBoolean. "This is it" she thinks, failing to notice that further down the list there's also a method called valueOf.

A few weeks pass until somebody notices that no matter what string value the program receives, it always treats it as "false". It takes another few hours to trace the cause of this bug to the innocent-looking call to getBoolean.

Now you may think that this story is fictitious, but I've seen this mistake being made at least twice. And the fault lies entirely with this method, which is located in the wrong class and has a name that does not convey its purpose accurately. If you don't believe me, just ask Glen Stampoultzis.

What surprises me the most is that Sun didn't choose to deprecate this method, as a way of admitting that they screwed up and flagging that it should not be used. And so, it remains a part of the core Java API, like a landmine waiting quietly to be stepped on by a poor victim.

Saturday, October 01, 2005

Where Dynamically Typed Languages Fall Short

In the past, dynamically typed languages were considered to be more productive than statically typed ones, and I used to agree with that view. My past experience has shown me that Python, a dynamically typed language, is more productive than Java (statically typed). Somehow Python code feels more malleable and pliable, and making it do what you want is more hassle-free. For example, if you want a hashtable, you just write
map = ["R": "red", "B": "blue", "G": "green"]
instead of Java's tedious
HashMap map = new HashMap();
map.put("R", "red");
map.put("B", "blue");
map.put("G", "green");
I think the convenience of writing Python code can be attributed in equal amounts both to its elegant syntax, and to its dynamic types.

But now the landscape is changing. First, the introduction of the new Java language features in Java 5.0 makes Java code a lot more elegant. Second, and more important, is the appearance of really smart IDEs such as Eclipse. Zef Hemel has already noted that statically typed languages allow IDEs to offer features such as refactoring and code completion. But it goes even further than that.

When you open a Java source file in Eclipse, what you get is an extensively hyperlinked document. Ctrl+Click on any variable, method or class name brings you directly to its definition. Ctrl+G finds all the uses of the same. You can easily navigate to the superclass' implementation of any overridden method, or to all the subclasses that override it themselves.

These invaluable capabilities are made possible by the fact that the IDE knows everything, or more specifically - the type of every variable in the source code. And this is of course impossible in a dynamically typed language, since it's, well, dynamically typed. Consider the following Python function:
def func(x):
    x.doSomething()
The IDE has absolutely no way of knowing the type of x, and in fact it can be of any type that has a doSomething method. So inevitably, IDEs (and other tools) for statically typed languages can be much more advanced than IDEs for dynamically typed languages.

These advanced features - code completion, automatic refactoring and hyperlinked source code - greatly improve the developer's productivity, thereby closing the gap between dynamically and statically typed languages.