I came across this question on programmers stack exchange. Basically the user is asking if it is worth writing good clean code, and refactoring code. It's a fair question but it's complicated. As a developer who focuses on clean, well tested code I thought I would take some time to focus on answering this question (and adding a few comments about metrics along the way).

Is is worth it to write clean, refactored code?

The main question (and others like it) seem to focus on the fact that writing clean, refactored code takes time. Is that time spent cleaning code and refactoring it going to make a difference in the viability and sustainability of the project over the long run? It's a fair question. For my clients every moment I spend writing code for them, they have to pay for. You can assign a real dollar value to a clean up task. If it costs $100 to clean up that code will you make $100 because of that clean up? Will that cleanup allow you to make later changes for only $50? Is there enough ROI to actually preform the cleanup task? 

Cleanup Tasks

First we have to define what clean up tasks are. I usually group clean up tasks into XX sets. These groups allow me to focus on one type of clean up task at a time. This reduces the time spent overall because you spend less time chasing your tail. I will explain in a moment, but first lets talk about the groups of tasks. 

DRYing the code base

This means finding similar chunks of logic and consolidating them into one place. A classic example is a User class and a Contact class. Both have email address. So you and DRY your code base by moving the logic for validating the email address to a single place and using that place in both classes. That's a bit abstract, so lets look at an example (in rails).

class User < ActiveRecord::Base
  validates :first_name, presence: true
  validates :last_name, presence: true
  validates :email, with: /\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\z/i
end
class Contact < ActiveRecord::Base
  validates :first_name, presence: true
  validates :last_name, presence: true
  validates :phone, presence: true 
  validates :email, with: /\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\z/i
end

Can become something like

class User < ActiveRecord::Base
  validates :first_name, presence: true
  validates :last_name, presence: true
  validates :email, with: EMAIL_REGEX
end
class Contact < ActiveRecord::Base
  validates :first_name, presence: true
  validates :last_name, presence: true
  validates :phone, presence: true
  validates :email, with: EMAIL_REGEX
end
# in an initializer
EMAIL_REGEX=/\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\z/i

This way when you have to change the EMAIL_REGEX for some reason you only have to change it in one place. This might be a very simple example, but it is a good one. That regex works well, but is pretty hard to read. If there was an issue with it, it would be quite cumbersome to change it in all the places that use email addresses, but if you only have to change it in one place then you have a much simpler task. You can then DRY your code again by noticing that the User and Contact classes are very simular. Maybe you should consolidate them down some more. Are new classes become:

class User < Person
end
class Contact < Person
  validates :phone, presence: true
end
class Person < ActiveRecord::Base
  validates :first_name, presence: true
  validates :last_name, presence: true
  validates :email, with: EMAIL_REGEX
end
# in an initializer
EMAIL_REGEX=/\A([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})\z/i

Without getting to technical about the pros and cons of this type of consolidation (STI using inheritance in this case) we can now say that all "people" have to have some things, and that contacts have an additional requirement. This means that when we roll out a new type of person we can inherit from the Person class and get all the logic in that class, and the work that was done to clean up that class. In the long run DRYing the code base can save a massive amount of time and money. 

That is not to say that there are not downsides to doing this. There are times when a less DRY code base is desirable. In our above example, the classes are very simple. The last thing you want to do in a code base is make a hunt and peck task. Having to go to User, to go to Person, to go to Nameable, to go to Name to figure out how to make a middle name required. It is possible to have FirstName be a class, LastName to be a class, both inheriting from the Name class, which inherits from the String class. But the real question is why?. Sure when you implement "MiddleName" you don't have to specify the presence validator, but when you go to debug a problem you have to open up 3 classes to figure out where to make your change. This becomes more apparent in larger classes and applications. It's ok to be a little less dry in favor of navigable, maintainable code.

Which brings us to metrics. These can be very different per language but DRY is one place where you should be aiming for 98% or better in your metrics. 100% might be too much, but you want this metric to be way up there. There is plenty of ROI in the form of fewer bugs, quicker implementation of new features, fewer mistakes when changing the code. In short, if you do nothing else then DRYing your code is with the extra time spent on even the smallest budgets. 

Single Responsibility Principal

Like DRY this principal (SRP) is aimed at making "less code" to maintain. It's very closely related to DRY so I usually do it second. The basic goal is to make a class have one and only one job. It also means that for every job there is only one class. This gets a bit tricky. Some classes are very simple while others are very complex. Complex classes don't mean that you violated SRP, classes can really be complex. Mostly this means going through your more complex classes and seeing if you can delegate some of the logic out into smaller segments. You also have to keep an eye out for places your doing things twice. DRY should have taken care of the "doing things twice" rule but you will find, especially in larger projects, that it doesn't. Usually because your doing a thing two different ways. These two ways can be consolidated to provide a better user experience and more maintainable code.

SRP has two "sides" the first is making sure that every bit of logic only has one place. Mostly this is very similar to DRY, but it extends and looks at "how" you do something instead of just doing the same exact thing. For example (continuing from above):

class Father < Parent
    def is_over_18?
        self.date_of_birth < DateTime.now - 18.years
    end
end
class Mother < Parent
    def is_adult?
        self.age > 18
    end
    def age
        (Date.today - self.date_of_birth).to_i / 365
    end
end
class Parent < Person
end

In this example, while it is DRY in that we don't repeat any logic, were looking to see if a parent is an adult in two different ways. Again it's not exactly repeat code, but it's a repeat "task".  So we have to decide which way is best. "age" sounds better but doesn't let us know if the person is over 18 (yes or no). "is_adult"  gives is a boolean (yes or no) answer as does "is_over_18" so that seems to be the place to start. At the same time "is_adult" seems better but we imply that all adults are older then 18, and at a glance, "is_adult" looks like it messes up leap years and because you have to be older then 18 the first time "is_adult?" says true is when your 19 or older. So to refactor:

class Father < Parent
end
class Mother < Parent
    def age
        (Date.today - self.date_of_birth).to_i / 365
    end
end
class Person < ActiveRecord::Base
  validates :first_name, presence: true
  validates :last_name, presence: true
  validates :email, with: EMAIL_REGEX
  def is_adult?(age_limit=18) 
      self.date_of_birth < DateTime.now - age_limit.years 
  end 
end

You can see that I left age. Mostly because I don't know if it's in use somewhere else. It has a bug and it needs to be fixed but thats not the point of refactoring. By moving "is_adult?" to the person class I have said that is_adult? is relevant to all people not just mothers and fathers.

The second "part" of SRP is to make sure that a class only has a single job. Meaning that it's better to have a class manage one thing and only one thing. This gets tricky. You want your classes to handle one thing (and your methods to only do one thing) but you don't want to make 100 tiny classes. So there is a balance. More on that later. In our running example, figuring out an age really isn't "mother"'s job. It's also a property, that will likely be needed elsewhere. So let's "move" it:

class Mother < Parent
end
class Person < ActiveRecord::Base
  include Ageable
  validates :first_name, presence: true
  validates :last_name, presence: true
  validates :email, with: EMAIL_REGEX
  def is_adult?(age_limit=18) 
      self.date_of_birth < DateTime.now - age_limit.years 
  end 
end
class Wine < ActiveRecord::Base
    include Ageable
end
module Ageable
    def age
        years_old(date_of_birth) if self.responds_to? :date_of_birth
        years_old(bottle_date) if self.responds_to? :bottle_date
    end
private
    def years_old(start_date)
         (Date.today - start_date).to_i / 365
    end
end

While not the best example, it makes the point. Age is now separated out into a new module Ageable, and can be shared (with a little bit of work) between people and wines. In our latest example however the age method does one job, it figures out what property to base the age off of. years_old does the actual math. It's important to note that I didn't fix the bug. Bugs are not part of the code clean up process.

When looking at metrics for SRP I usually aim at something > 85%. It's an important metric, but not as important as DRY. In the real world, I would be tempted to put the age method in both Person and Wine and ditch thee years old method all together. Not because it's the right thing to do, but imagine the bug report about age, you would have to open Mother, then Parent, then Person, then Ageable to be able to fix the problem. On the other side though, fixing the bug report related to a Person would still leave the bug with Wine.  It truly is a balance between easy to maintain and easy to find. Exceeding 85% on SRP (specially the second half) allows you to have a few places that you have logic where it might not belong (strictly speaking) for ease of access but makes sure that getting most of it. With the first part of SRP (logic in only one place) it's better to target for 100% and accept 98%, just like DRY.

Reduce Complexity

This can be  described as just reducing the about of "crap" a developer or application has to go through. It can be a real pain to have an application that is really complex for no reason. Reducing that complexity can help remove obstacles.  Because it's about making it easier to code and not about making a better application I usually do this third. For example, again continuing from above:

class Person < ActiveRecord::Base ... end
class User < Person ... end
class Parent < Person ... end
class Child < Person ... end
class Mother < Parent ... end
class Father < Parent ... end
class Son < Child ... end
class Daughter < Child ... end

Just looking at that makes my head hurt. So if the classes don't have a lot of different code in them then we can reduce them in to fewer classes. We have to decide if the classes are distinct enough, in our usage to warrant the extra classes. For instance, what is the real difference between a mother and a daughter? For our usage, lets say that a Mother has children and a daughter does not. What's the difference between a father and a sun? Again one has children the other does not. What's the difference between A father and a mother? Sex. Same for Sun and Daughter. So I would do something like:

class Person < ActiveRecord::Base ... end
class User < Person ... end
class Person ActiveRecord::Base
  include Ageable
 def is_mother?
    !is_male? and has_children?
 end
 def is_father
    is_male? and has_children?
 end
 def is_sun?
    is_male?
 end
 def is_daughter?
    !is_male?
 end
 def is_male?
    sex == "Male"
  end
  def place_in_tree
    return "Father" if is_male? and has_children?
    return "Mother" if !is_male? and has_children?
    return "Son" if is_male?
    return "Daughter" if !is_male?
  end
 def has_children?
    children.length > 0
 end
end

Once again, not the best example but makes the point. Now instead of having to figure out 8 classes when a person has a problem I only have to figure out 1.  This only works in cases like our example where 6 of the 8 classes weren't really doing anything. As soon as the Father and Mother have to have different logic, they will have to be separated out again. Because of this I usually aim for 75% or better when talking metrics and this type of complexity. The truth is that it's better to have 500 tiny classes then 50 small classes. Complexity Reduction really is just a matter of taste and finding a better balance between code base maintaining, and having to hunt for the place logic exists. 

Making Metrics Happy

The very very last thing I try to do is "make metrics happy". That is to fire up a set of metric tools and try to get a "good" score. It's important to try to do,  but it gives you the least ROI. Having pretty metrics isn't going to make a better application, it will make non-developers happy (in that they can see and understand metrics), and it is a good tool for achieving other goals (like those above), But there is very little to gain other then "side effects". Your not going to make your application any faster by getting a 100% on your overall metrics. Because I use mostly rails, I use Code Climate for this. And while it's definitely valuable, specially from a clients perspective (they can see the code improving in a way they can understand), it can be really trick to make the metrics "happy". In some cases you have to make poor design decisions to make the metrics happy, and that is just silly. In other cases you have to make metric "unhappy" decisions for performance reasons. In the end, code metrics are an important tool to use, but there only a guide to achieving other goals not the goal it's self. 

Because they are a guide and not a true goal, I usually am for 90% or better. In code climate terms a 3.7 GPA. By doing DRY and SRP you get pretty high. Code Climate and other metric tools can help you find places to improve, but that and showing the client an easy to understand visual of the code getting better is what they are really good for. 

Is it worth it?

So far we have talked about what I consider clean up tasks. The next part of the question is "is it worth it"? Again this is a complex question. For the main part the answer is hands down yes. How ever you have to have realistic goals. Nothing in life is black and white. The same is true here too. You can get 98% DRY with very little time and a little practice. Getting to 100% may be impossible in many apps. It may be such a thing that getting to 90% dry only takes an extra hour, but getting to 95% dry would mean missing a deadline.

When trying to decide how much to refactor you have to manage ROI and time.

The Never

When I was looking at Stack Exchange, I noticed a few people complaining that their boss, or client, or what ever complained that they were too slow because they were pushing the "metrics" or "code quality". It made me think instantly of a few things you should never do when it comes to code cleaning. 

  • Never clean it all at once. Clean one or two classes at a time. If you need to clean more then that then your either doing something wrong and need to write better code to start, or your trying to catch up. If your trying to catch up only focus on one spot at a time. It will take time but you will get there.
  • Never separate normal code cleaning from the task that generates it. Adding a feature should be, Add feature, clean code. Those two together should equal the task of writing a feature. Don't think of it as X hours to code the feature and Y hours to clean it up, think of it as X+Y hours to write the feature. 
  • When clearing Technical debt never "all stop". That's just not a real world answer unless things are really bad. Alway try to duct tape then go back and fix. Think of a flat tire. You get a flat tire on the way to the airport you don't just sit there and pout, cancel your trip, and starve on the side of the road. You put on the spare, call AAA, or otherwise get to the airport, and when you get back deal with the need of a replacement tire. Same is true with a big clean up project. Keep moving forward with the project even if it means "using a spare". As long as your cleaning your new stuff, and trying to clear the debt a little at a time, you will get there.

It is worth it. A project, even a small one, should benefit from clean, refactored code, as long as the goals are sensible. Even the largest, most complicated projects, will not benefit from trying to "get 100%". Like most things in life, you have to balance. The extremes are where your projects will suffer. 

 

Coteyr.net Programming LLC. is about one thing. Getting your project done the way you like it. Using Agile development and management techniques, we are able to get even the most complex projects done in a short time frame and on a modest budget.

Feel free to contact me via any of the methods below. My normal hours are 10am to 10pm Eastern Standard Time. In case of emergency I am available 24/7.

Email: coteyr@coteyr.net
Phone: (813) 421-4338
GTalk: coteyr@coteyr.net
Skype: coteyr
Guru: Profile