Thursday, December 29, 2005

Preparing for Backwards Compatibility

Maintaining backwards compatibility is a pain the ass. In a perfect world (at least from a developer's point of view), every version of the software is completely free from the shackles of the previous versions. It can be deployed without any regard to the established data, APIs and protocols.

Unfortunately, in the real world users expect the new version to be able to read old data. They expect their plugins and extensions to continue working without having to adapt them to a new API. They even expect old client software to work with the new version of the server, and they'll upgrade the client software only when they damn feel like it.

All these "unreasonable" user expectations force us developers to build backwards compatibility into the product, and it's best to do that early, before there are multiple product versions out there. In this article, I'll show a few tips and techniques for building backwards compatibility support into the software's data files and communication protocols.

The First Step

First and foremost, every piece of data that your application uses should include a version number. This little piece of metadata will allow the application to give differential treatment to different data format versions. For example if you have an XML configuration file, add a "version" attribute to its root element. If you have a proprietary client-server protocol, add the version number as part of the first message that is sent from the client in order to connect to the server.

Now that you have a version number in place, be sure to update it each time that the data format changes, no matter how small the change is. The numbering system is up to you - you can use an integer that is incremented each time the format changes. Or the numbering can be in sync with the product's version number, if you find that more convenient.

Processing Versioned Data

Not surprisingly, the code that is responsible for processing the data needs to change whenever the data format changes. The natural tendency is to add "if" statements to handle the differences between versions, like this:

if (version >= 7) {
  // current code
}
else if (version >= 3) {
  // backwards-compatibility code
}
else {
  // ancient legacy code that nobody uses anymore
}

Pretty soon, the code is peppered with ifs and elses and becomes incredibly brittle and unmaintainable. It becomes impossible to refactor or optimize it, for fear of changing its delicate logical structures and breaking its backwards-compatibility. In short, a nightmare.

A much better approach is to "branch" the code - make a copy and modify only the copy, without touching the original. For example, we can have a DataProcessor interface (call it whatever you like) with multiple realizations, one per data format version. So we might have a class called DataProcessor_ver_1 and another called DataProcessor_ver_2, and so on.

We've all been taught that code duplication is a Bad Thing that Should Be Avoided. In this case however, code duplication is your friend. It allows you to evolve the data format without having the slightest fear that changing the data-processing code will break backwards compatibility. You can refactor the code, remove obsolete sections, optimize it etc. with complete confidence. You can even make radical changes such as replacing the XML parser that you use, as long as the old code can still use the original XML parser. And when you want to check what changed between version 7 and 8, simply open the two classes in your favorite diff utility.

Now you may be tempted to use inheritence, like this:

public class DataProcessor_ver_2 
             extends DataProcessor_ver_1 {
}

Please resist the temptation and make a full copy. Using inheritence is arguably even worse than the "if-else" method, since the logic that differentiates between versions becomes implicit instead of explicit. The code's internal flow is more difficult to follow. Heck, it's not even all in the same file!

Using a Factory

To complete the picture, the application should perform the following steps:

  1. Read the version number of the data
  2. Create the appropriate DataProcessor for this version
  3. Use the DataProcessor to process the data

Step #2 is best accomplished using a factory. The naive approach would look something like this:

public class DataProcessorFactory {

  public static DataProcessor createInstance(
                                String version) {
    try {
      switch (Integer.parseInt(version)) {
        case 1:
          return new DataProcessor_ver_1();
        case 2:
          return new DataProcessor_ver_2();
        case 3:
          return new DataProcessor_ver_3();
        default:
          throw new RuntimeException(
            "Unsupported data processor version " + 
            version);
      }
    }
    catch (Exception e) {
      throw new RuntimeException(
        "Cannot create a data processor for version " + 
        version, e);
    }
  }
    
}

The problem with this approach is that the factory class needs to be updated whenever a new version is added. To get rid of this headache, we can use reflection:

public class DataProcessorFactory {

  public static DataProcessor createInstance(
                                String version) {
    try {
      String className = DataProcessor.class.getName() + 
                         "_ver_" + 
                         version.replace('.', '_');
      Class c = Class.forName(className);
      return (DataProcessor) c.newInstance();
    }
    catch (Exception e) {
      throw new RuntimeException(
        "Cannot create a data processor for version " + 
        version, e);
    }
  }

}

Note that the code assumes that the DataProcessor realizations reside in the same package as the DataProcessor interface.

An Even Better Factory

If your application uses several different data formats, you might find it annoying to keep multiple factories that basically all do the same thing. Instead, you can create a single generic factory that handles all of your data formats:

public class GenericVersionedFactory {

  public static Object createInstance(
                         Class targetClass,  
                         String version) {
    try {
      String className = targetClass.getName() + 
                         "_ver_" + 
                         version.replace('.', '_');
      Class c = Class.forName(className);
      if (!targetClass.isAssignableFrom(c)) {
        throw new RuntimeException(
          className + " is not a " + 
          targetClass.getName());
      }
      return c.newInstance();
    }
    catch (Exception e) {
      throw new RuntimeException(
        "Cannot create a data processor for version " + 
        version, e);
    }
  }
    
}

The factory expects an additional argument which is the expected interface of the created instance. It uses this argument to infer the target class name, as well as to check that the created instance is of the correct type.

To use the generic factory, you would do something like:

DataLoader dl = 
  (DataLoader) GenericVersionedFactory.createInstance(
                  DataLoader.class, 
                  version)

0 Comments:

Post a Comment

<< Home