Friday, June 7, 2013

MATLAB Refactoring

Today I hate Matlab. 

My problem is that I have inherited a pile of scripts that have been hacked on by a range of researchers in multiple countries over multiple years to do multiple experiments.... (I hope you get just how ugly that is).  Each of these researchers is a gifted amateur. That is to say, they are very good at hacking in a fairly oldschool procedural style of coding. Which is precicely where everyone starts... no problem there.  The problem is the scaling issue, lack of discipline, quite varied use of idiomatic styles and all the usual code-rot problems of a codebase that has been recycled multiple times without any particular clean-out or clean-up.  Its got gunge!

Where do I start?

Firstly, get my tools lined up. get it under source control, create a working copy...

Tools
Matlab and Notepad++

Now start reading to see what I have to deal with....

As per Matlab convention... they are basically one massive function per file.... some very rudamentary decomposition but still about 5kloc over 4 files.  Shit....

Scads of cryptic variables.  Lots of them seems to be allocated within complex calculations and then used here and there.... then turn into arrays and cell arrays or just get dumped to files where they die.

Lots of magic numbers.....

Lots of one line calculations doing multiple things with multiple cryptic variables and a scatter of magic numbers.

Did I mention the abysmal lack of whitespace.
Did I mention the oldschool style of keeping all names in lower case and minimising the number of characters (looks like they were trying to keep it under 5 characters for some insane reason??/)
Did I mention the lack of meaningful documentation?
Did I mention the lack of any verifiable test for anything?

The only documenation of intent ("Intentional" descriptions) is a bunch of research papers that have been written off the back of this snarl that document what the researchers think they were doing.... which scares me when its very hard to tell if the code actually did that...reliably.

Now to start the refactoring.

Step 0.  Format the Code for readability.

In Matlab (Ctrl+A, Ctrl+I) for the auto-indent function. This works nicely and I have not found any problems with it.

Make sure your functions have the matlab "optional" "End" keyword.

Now where are the styleing tools????  WTF? There are no whitespace tools for matlab... you must be F@#@$% kidding.

Here is the .m file I wrote to fix that.  http://duncanstools.blogspot.com.au/2013/06/matlab-fix-style-tool-to-fix-formating.html

Step 1.  Name stuff. 

Name the functions as meaningfully as you can. 

 This will change as you work through the codebase. But name them to describe their "function" as much as you understand it.

Start naming the variables to describe what they are doing.  

Matlab has some fairly handy tools to rename variables... they are a little flakey and sometimes fail which means manually undoing what you have done.  When this happens I use the find tool to work backward up the file until I find the first instance of that variable and try to rename it at that point. Usually this works.  If all else fails use find and replace....

Step 2. Group Stuff. 

Use the matlab code cells (use a double percent characters with a following space to start a comment "%% " and it will create a different "region" in the code. This is an easy way to "soft" group a region of code that you think is a candidate for extracting into a function.

Start looking for any variables that are "recycled" and split them up.

Start looking at where variables are allocated, initialised and used. Group them together and see if you can package them up into a nice little unit of functionality.

Break out all the repetitious code and sketch out a reusable block that might work to replace them.  Make an class if it will need to hold state data or a function if its just needs some scope.  Beware of Matlabs scope rules... they are a bit weird if you have come from other languages.

Matlab allows you to have multiple functions in the same file.  Just start the file with the "big" function and the put "helper" functions below.  There are no refactoring tools to help you with this but the syntax highlighting and variable highlighting are pretty good and help alot when extracting sections of code manually.

Package up related material and then clean up the packages.  The exercise of packaging is really an exercise of breaking the total "scope" of the code into small managable bits that you can hold in your head all at once.

By driving a structural "fence" through a large peice of code (using functional decomposition), and managing what crosses the fence (passes into and out of functions), you are immediatly simplifying your mental schema. Once you have the system decomposed into chunks of a size that you can hold in your head comfortably, you are done.  Now just look at each chunk and clean it up.   

Step 3. Write Stuff. 

Write comments to document what you think a variable is for.
Write comments to document what a function or group is doing.

The more you write the more clearly the code will come out and the less you need to hold in your mental schema at any point in your coding. This is the objective of all refactoring.  Unload your mental schema of the code so you can get a handle on the whole game. 

Setp 4.  Delete Stuff. 

This is the most pleasant step.  Delete all the crap comments,  bad formatting, irrelevant and duplicate code, old style idioms, bad names, unwanted functionality and anything else you can find that in not being used NOW.  Kill it all.  Make your code shiny and clean.  If you need something later, add it back in... don't try to carry it forward based on a "maybe useful later" kind of logic.  This does not pay off.  Go back to the code in the repository and have a look if you want to refer to something that was in an old version.  Be ruthless.  Get rid of everything that is not "Shiny" and "clean smelling" (to borrow from Fowler et al.)


Does Matlab help or hinder refactoring?

Don't get me wrong, Matlab is a powerful toy with a shiny shell and a big price tag, but its not a productive toolset for larger projects.  Then again, I doubt its intended to be.  So maybe this is just me thinking about it in a way thats inappropriate.

There are some good features built in that provide some basic tools for the refactoring process, but there is little beyond that.

Find and replace Tool.  It's good, but not great.
Block Comment/Uncomment.  It's there and works without mangling the code
Auto-indent.  It's good.
Auto-Style.  NONE.  Check out AStyle or Profactor StyleManager, PolyStyle if you don't know what I mean. Also this article on CodeGuru - MakeCodeNicer by Alvaro Mendez (Took me a lot of effor to find this references. I havn't used this code in 5 years but it sticks in my memory as part of my essential toolkit back then...)

Higher level refactoring.  NONE. Is this a problem or am I just applying Matlab to bigger things that its ready for?

Later all...


No comments:

Post a Comment