Tuesday, March 1, 2011

Risk Managment and Building Experiment systems

I'm busy building a test suit at the moment in E-Prime for a client and it got me thinking about some issues particular to building experimental systems.  The issue I'm looking at is that of embedding analysis code inside the experiment module itself. (By experiment module, I mean the software unit/system that runs the experiment and collects the raw data.)

My thoughts on this is that its a "bad thing"(tm) My general attitude is that during the experimental run there should be the least amount of code running as possible. Just enough to present the stimuli and log the result.  Analysis can be done afterward. I.e Not in Real Time.

This provides a couple of benefits.

1) There is less potential things to go wrong during the experiment and crash/trash a run.
2) You can re-think the analysis later on. (Some good, some bad)
3) Researchers can't play the black box game and "trust" the software to always be right.
4) You can't fix bugs in real time. 
5) It provides separation of functionality.  The experimental software is focused on doing one thing right.
6) It spreads the cost of development over time.
7) It allows you to use different tools for different parts of the tool chain.

This all works fine until you have a feedback system that depends on some sort of calculated property based on the results.  But that's ok... it just takes a little more testing to verify.

None of this makes a system idiot proof. You can still introduce bugs into an analysis tool chain just as easily as into an experiment module. But when the modules are smaller and simpler its easier to verify each one individually.

I've found a couple of researchers who are of different opinions to me and want the system to do everything in a single all consuming step(Generate, collect and analyze). I think this illustrates a general lack of understanding of process and methodology rather than any particular lack of insight into software or programming. My generalization is that they have used fairly high end systems that did a great deal of hand-holding rather than being used to creating their own toolchain out of lower level units.  Neither good nor bad but just a difference in expectations between myself and the client which needs to be managed early and often. It also speaks of a "delegation" mindset. This is fine when there are lots of RA's working on the project and you are delegating to a person that you trust and can fiddle with the process until they get it right. But delegating to a piece of software carries a degree of fragility and untested assumptions. 

The other side of this is that having a researcher who is very clear on their analysis before the data collection begins is both interesting and somewhat risky.  I like that they are prepared and have clarity on what they want to do with the data, but on the other hand, I find that until you really get a look at the raw data, its always a slight unknown. (Which is the point of experimentation.) So I find that crafting an analysis toolchain that will inhale, clean, process and summarise the data all without the researcher having to look at the raw data... worries me.  There are just too many un-tested assumptions hard wired into that process. Too much blind trust.

The other side is obviously just as dangerous. Where the researcher has a vague or non-existent idea of what analysis they want to run on the data and wants to "see" what it looks like before they decide... is usually a bad sign.  Sometimes its just that they're a visual learner and can't articulate what is still clear in their head... the other is where they are just wasting everyone's time and really have no idea what they are doing and should get out of the lab until they figure it out or get a job with a private sector public opinion firm.  How they get their research proposals past Ethics is a mystery to me....

I guess it all comes down to the issues of the relationship between the researchers and their process. Mostly they are exploring a half grasped idea and improvising as they go. This demands a degree of flexibility from the tools and modularity.  Very clean and clear interfaces that don't leak assumptions.  Decomposition of the problem rather than excessive composition of functionality. All this with an eye toward mitigating the risks of:

1) Changing ideas and the associated cost to change
2) Change in research focus
3) Unexpected failures in other modules
3) Insufficient testing due to constant evolution
4) Poor time planning
5) Other parallel projects and tasks cause distraction


Unfinished idea

No comments:

Post a Comment