02 August 2006

Agility and legacy code refactoring

Let's say you have some bad legacy code (I have yet to see a codebase without any,...). When, how should you refactor/reengineer it?

In an agile context, there are conflicting forces related to that matter:
  • deliver as much value as possible
  • have a unit test for each line of code
  • refactor mercilessly
  • raise your velocity
Real cases from real life

Ok, this is very abstract, let's take some real cases:

1. You need to implement a cache to lower memory consumption (this is the consequence of a customer request) and you realize that the objects you are building are not properly build:

instance = new Model(xmlContent);

Creating a model doesn't have to be dependent on a specification representation format. You'd better use a factory for that. This code should be refactored,... But is it the right time to do so?

2. You found the fix for a nasty bug. Very very nasty, ... But deeply buried in a horrible mess of interactions. You have to write a unit test to correct the guilty class. Easier said than done. A real unit test may require a lot of refactoring, just to untangle the class dependencies.

What should you do? Write an "integration" test? Take time to write a unit test, refactoring everything that's necessary?

3. You have to add a new import/export functionality. However, your persistence design is so bad that implementing the function looks like hell,...

Should you refactor your persistence design? This may halt the team for many iterations. Should implement the function, going through so many loops and hooks, but doing it in a realistic time?

The books/consulting answer

I would like to point out an easy answer from a well known book or website, but I haven't found any. No, wait, there's one! The famous "consultant answer": "it depends,..."

Whose call is it?

Well, before emitting an opinion on the subject, I would like to add some more questions: whose responsibility is it to answer? Is it the client? Is it the team? Should they bargain refactoring actions around the velocity issue?

I think,... (drum roll,...) that this is,... (drum roll again,...) a development team issue. Let me list several reasons for that:
  • the team has the responsibility to improve its own velocity
  • the team has the best knowledge on how to do it
  • the team has to live with the code day after day and suffer from its sluggishness
  • really empowering the team also means letting it decide on how to improve its velocity
Now, what can be the policy for the 3 above cases?

Refactorings in the 3 cases

1. The "oh, this should be refactored" case

The pair-programmers should not refactor any code that doesn't get in their way. 3 reasons for that:
  • keep the velocity up,
  • don't risk to break existing working code,
  • work on that code when really exercising it (so they don't take the risk to over-refactor)
By the way, this is such a real case that we incurred a several days development penalty lately for not following this rule (hence the idea for this blog entry,...)

2. The "nasty bug" case

This one is a "must do". You have to write a unit test for your fix, even if it is difficult. If you don't do that:
  • you take to risk to face the bug again
  • you may write an "integration test" that will slow your test suite and may be very fragile
  • you open the door for your QA team to write more fragile validation scripts
  • you don't show to path to more testability and better design for the guilty class
Having said that, the refactoring can range from a complete reengineering of the guilty class, to the sole extraction of the guilty behavior in a special-purpose class, passing by the only isolation of the bad class.

On that real case, we chose to reduce as much as possible the class dependencies, just to be able to test the fix, and only that.

2. The "new functionality/bad velocity" case

This one calls for a trade-off (the famous "it depends").

On one hand, you cannot spend several iterations on your brand new this-time-I-will-get-it-right-design. On the other hand, adding functionalities on existing debt adds more debt (and we all know about interest rate composition).

The only duty here is to do at least something. You have to set the path for the reengineering of the offensive design: implement a tiny part of the solution and communicate a new vision to the rest of the team.

Then, on any of the 3 refactoring cases, another pair can pick up the new design and push the codebase towards the new path. The net result of this strategy is:
  • a velocity that will better reflect the codebase state
  • an investment against more debt if not an investment towards a better velocity
This is also a real case,... we chose here to prepare the path for persistency isolation by introducing several abstract classes upon the base class of domain entities to isolate the persistence behavior. Then we have to move some entities to an abstract class with less dependencies.

Let the team clean the house

I think this is very important for the team to communicate a lot here. The first case calls for information: "refactor that, next time you have develop/fix that code".

The second case needs communication around the new concepts introduced to enhance testability or if the extraction that has been done is a bit clumsy.

The third case, presenting a compromise and beginning some kind of pharaonic work around the existing code warrants obviously for a lot of discussion.

It's like house-cleaning,...
  • if you don't do it regularly, it can only get worse
  • you can fix a broken bulb more easily if you have ready-to-use tools and spare parts
  • building a new ground-heat heating system is a long project but this will be a nice replacement to your ever-broken old one. In the meantime, you can still add a new remote control, even if it would be easier on the new system.
  • you have more fun when you appreciate where you live!
[Update 03-08-2006]

Ian Roughley has written a very good article on a non-intrusive technique to write tests for legacy code. Add some logging traces and a Junit appender!

This is a really nice technique to use when you want to fix a bug and don't have time to break all dependencies to write a unit test (or you can't take the risk: see case n. 2).

You can also use it before refactoring some code, but in any case you'll have to get rid of those tests sooner or later (see my comments on the article) to replace them with "real" unit tests.