Wednesday, March 2, 2011

Design of Experiment Systems

I want to outline my ideas on a general pattern for most of the experiment systems I work on.

1) Design Stage
- Capture the requrements
- Walk through some scenarios
- Collect, scrounge, borrow what is needed

2) Experiment Module
- Generate stimuli / Survey / Test / Assesment etc
- Sense response 
- Log response

3) Library Module
- Collate raw data
- Preserve raw data
- Curate and add meta data

4) Data Transformation Pipeline 
- Clean bad records
- Cull unwanted elements
- Map from one form to another ( Coding, transforms, etc)
- Summarise ( Cook down )

5) Analysis Module
- Summary Stats
- Other Stats
- > To Publications and Presentations

6) Presentation Module
- Visualisations, Graphs, Screen Shots etc
- > To Publications and Presentations


This to me is a fairly graceful set of units that abstract the process and provide the maximum flexibility and reuse.

What do I use for the various Modules?

Design stage
Word, Email, Excel, Project, Mind Maps  etc. This stage is primarily about communication. Getting the information and pieces in place to reduce the uncertainty in the project as early as possible.  This is the stage to test the researchers resolve and see if they have really settled in their mind what they are doing and are committed to seeing it through.

Experiment Module
PointLightLab, E-Prime, ExperimentBuilder, Superlab, Biopac EEG, Eyelink, BrainAmp EEG, Matlab, C++, PERL, Python, Visual Basic, 3ds Max, Permier, Mudbox, Audacity, Photoshop, SurveyMonkey, Qualtrics, Combustion, hardware hacking etc... endless number of tools get used to generate this stage.

Library Module
CSV files, Access DB's Spreadsheets, log files.  Anything that is easily accessed by other software and can be read and recovered in a couple of years. Binary files are bad.

Data Transformation Pipeline
Ideally this is completely automated using simple macros and scripts. The reason for the automation is that this module involves a great deal of repetitious labour and the chance of human error is huge.  This should generate a completely reproducible result given the same inputs. So human fiddling is naughty.
I use PERL, Python, VBA Macros,  MatLab  + Libs, SciLab, SQL, Visual Basic and C++ ( with various libs) when I need to do some heavy lifting.
Generally this is a specific toolchain for every project. The only reuse is where a particular researcher continues to use and evolve the same tool set over multiple experiments or projects.

Analysis Module
This module gets down to being the favorite stats tool of the researcher. Where the analysis is fairly straight forward ( small data set, known stats models, well understood assumptions) its usually just a job of pushing the data into a format suitable for the tool SPSS/PASW, Excel, R.

Presentation Module
Generating Movies, Audio Tracs, Still Images, Animations, Graphs, Interactive Visualisations.  This can be anything from PowerPoint Gymnastics through Excel, Sigmaplot, Processing, VBA, 3dsMax, Permier, Quicktime, Photoshop etc etc etc. Usually lots of multimedia work driven from stats outputs. Every so often I get to do something more fun like animating a fish or visualising an MRI plot.


Ok thats my brain dump for the day.....

No comments:

Post a Comment