Monday, November 12, 2012

Grading Legacy Code and Estimating Cost

The above is an interesting set of criteria for the assessment of a codebase.  I think they are valuable indicators as far as code "smells" but they do not really provide a framework for estimation of work. There are lots of signficant issues that the above article does not mention, the biggies are all around the users of the system.  Whats the documentation like, how big is the user base, what is their investment in the system?  These are all things that are outside the world of "code" but still have a huge impact on the actual "cost" of changing a system.  We, like any ecconomically minded type who might need to manage an acutal business need to build some sort of reasonable cost estimate.  So brainstorm for some decent cost drivers and name those variables, then figure out an estimate for them and get your spreadsheet out. See below for a dummy example I have concocted out of thin air...

Lets name some variables in our model:

TC  - This is Total Cost of the Change Request
U - This is the Cost of the Change to the Users (People Factors)
D - This is the Cost of the Change to the Documentation (Writing, testing, production, distribution )
P - This is the Cost of the Change to the Software (Person Hours to Implement the Change)
Pe - This is the Cost of the Change to the Software Environment (Code Storage, Test Suite, Dev Tools, Reporting Tools etc)
He - This is the Cost of the Change to the Hardware Environment ( Programmers Hardware, Testing Hardware, Deployment Hardware, User Hardware, Wires and Networks etc)

Lets assume that for any of these variables there is a fixed "unit cost". This is however a small fantasy as each will have a different cost model with fixed and variable components. Its your job as an analyst to figure out what these are in each business scenario... AND DOCUMENT THEM.

We can supply some general cost estimates for the model for entertainment purposes.

U - Informing a single user about a change to their software takes 1 email.  This means this will have a fixed cost of  time to writing the email, spell checking it, testing it with a couple of sample idiots and then figuring out who is on the distribution list, sending the email, gettting the abusive feedback from the 10% of people who cannot read, or did genuinly find a problem and responding to them with 1-10 follow up emails (average 3 emails).  So the true cost of informing a userbase via a single email will be:

U = 1(Time to write initial email, spell check, test, distribute) + ((Number of Users / 10) * 3(Time to write follow-up email))

If we assume the time to write, test and distribute the initial email at about 1h and the time to write a followup email at about 5min, then for a userbase of about 100, we get:

U = 1(60m) + 10*3(5m)
U = 3h 30m at the rate for a technical writer or trainer of $50 per hour
U = $175.00

D - Update the documentation, remove old copies of the documentation from circulation. Since writing documentation can be done with varing results, this will be a matter for consultation with your technical writer..( what do you mean you do not have a technical writer?  You get your programmers to write the users docs?  Hahahahahahahhahahahahhahahahaahhahahahah... you get the idea.  This is just insulting to assume that programmers have some magical talent at communicating, educating and illuminating people. Go and find a real technical writer who can show they have produced documentation that users enjoy using and hire them.)

Lets assume that you have the best possible scenaros of a simple centralised distibution system (website, internal server..etc) that allows you to maintain a single copy of your documentation and keep it up to date. You also have a competent technical writer who knows how to communicate with the Users, can illustrate the effect of the change in terms of concrete usage scenarios to the various users and can deal with the testing and feedback around changing the documentation.  Lets also assume that the change involves only a single location change to the documentation as there is no cascase of dependencies within the documenation that creates a change-storm. 

So assume the Documentation will involve:  Write the changes and inegrate with the existing docs, test with a sample of the user base, modify if required and then deploy the docs to the doc server.  Notify all users. Deal with any resulting follow-up emails(1-2, average 1.1) from the 2% of users who whinge about doc changes and can't understand well written documentation.

D = 1(Time to write, test, review and distribute) + ((Number of Users / 50)*   1.1(Time to write follow-up email))

If we assume that the time to write a straight forward change to the docs might be 2h, with the time to write a follow-up email as above (5m) with the same userbase as above (100) then:

D = (120m) + 2*1.1(5m)
D = 2h 11m at the rate for a Technical Writer of abotu $50 per hour.
D = $109.17

P - Now this is really the meat of the article above.  It was looking at some of the big variables and trying to give some sense of what they were in the experience of the author.  Fair enough.  The problem was that the author didn't really provide any estimates or even a possible range for what they might be.

The problem really is that we need some estimate for a fairly straight forward change, when there is no such thing.  So to keep things moving, let's keep it simple and say:

P = all the changes a single programmer can make in one hour + the cost of unrolling the changes in the event that it turns out to be a bad idea. As this happens once in every 10 changes made... thats a possibility of 0.1.
P = 1(hourly rate for a competent programmer) + 0.1(hourly rate for a competent programmer)
P = 1($50) + 0.1($50)
P = $55 

Pe - This is always going to be complicated and very dependant upon what the existing system is, the number of programmers and managers who are using the system and what the cost of transitioning to the new system may be.  (Also consider the hidden cost of transitioning back to the old system if the new one does not pan out)

Lets assume that the cost a simple scenario is something like changing a test suit, while a worst case scenario is a whole new IDE, Test Suite, Build System, Code Repository, and set of scripts and glue code to make it all work together. 

Since its reasonable to assume that any change will require tweaking a test system this should form the bulk of our constant. However, as there is virtually an endless stream of "maintenance" costs with maintaining a coding environment, we should include a small constant to represent the cost of keeping a coding environment up to date and the knowledge of the programmers up to speed.  Assume we have 5 developers who need to play nicely with this system in the shop.

Pe = (10m for a programmer to write some new tests and verify them) + (Number of Programmers) *(10m for maintenance)

Pe = ($50/60*10) + 5 * ($50/60*10)

So the best case scenario gives us:
Pe = $50

He - The implications of a system change that causes hardware changes could be pretty much anything.  At the small end is something like the additon of a peripheral to one of the developers workstations, while at the big end is a requirement to update all the machines in user-land.  The more common scenario might be something like an update to a test machine or a build server, but in general, most change requests would usually not have an impact on the hardware environment.  However, we need to put together a constant for our model, so here goes.

The most frequent scenario would be ... um.... an upgrade of some old machines in userland that have not been kept up to spec.  Say, 1 in 100 machines across the whole of userland.  But since this is only triggered every couple of years...  we can estimate it with the following (N of users still = 100, cost of machine is ~$1000 for a reasonable workstation)

He =  (Number of users/100) * (cost of new machine)  / (2*365)
He = $1.37

So now we have some numbers to play with... lets play with them:

Trigger Scenario
You are handed a code base and a request for change (bug, new feature... whatever) and a question about how long(how many man hours or whatever cost metric you use).

This could be:

Response Scenario 1. In the best case scenario, no work is needed and no time is spent because you already know that the change is not required. Ideal but not the norm...
TC = 0U + 0D + 0P + 0Pe + 0He
TC = $0

Response Scenario 2. Some investigation is required, some small code change is required, users need re-education about the change or new feature. This is kind of our minimum possible change that touches the system.
TC  = 1U + 1D + 1P + 1Pe + 1He
TC =  ($175.00) + ($109.17) + ($55) + ($50) + ($1.37)
TC = $390.54

Response Scenario 3.  Significant changes are required.  There is a mess of dependancies. There is some cruft that is in the middle that will need refactoring and the documentation is a mess.  This is probably going to spill over into some renovation on the programming environment and there are a bunch of possible hardware issues.


There are about ... well everyone in usersland will be affected. Most will see a couple of changes.
About 20 places in the documentation that look like they need tweaking.
The code looks like it will take a week with two coders on it.... 35hrs per week =
The test suit is ragged and smells like a teenagers sock... maybe 2 coders for a couple of weeks
The hardware could play out or might be cause a couple of junkers to be upgraded... maybe

TC = 200U + 20D + 140P + 140Pe + 2He
TC =  200($175.00) + 20($109.17) + 140($55) + 140($50) + 2($1.37)
TC = $51,886.14

Response Senario 4.  Insert worse case scenario here... imagine that not only have you been presented with a legacy code base, but it looks like it was written by the minify command, some of your coders are not familiar with the language, its a proprietary IDE that you do not have licenses for, it is hardware specific so you will need to re jig some or all of the test system and it may force a forklift upgrade of every machine in userland.. oh and the documentation was written by someone with hostility issues towards users...

I leave this as an exercise for the reader to calculate...

No comments:

Post a Comment