Friday, October 8, 2010

Thoughts on this years crop of research student projects.

Thoughts on this years crop of research student projects.

The projects are not unusual. The same sorts of mistakes by the students trying to get up to speed quickly with complex systems and subtle processes that take time and experience to master. Issues like

  • Not testing their experiments completely before starting to collect data. 
  • Not keeping a log of their experiment activity
  • Not having any idea about how to process data after its been collected
  • Having no quality control concepts
  • Having only the most basic ideas about backup and versioning
  • Designing experiments with no resilience or risk management concepts
  • No time management skills
  • Not being computer literate and running computer based experiments(..words fail me!)
Again I have been asked to develop a wide array of experiment systems, data processing systems and in some cases analysis and visualizations of the final data.

These are all throw-away systems so are not being maintained in any way. This year has been remarkable for the size of the data sets I am being asked to work with. The use of automated data collection systems is allowing researchers and student to collect greater volumes of data. Some of which I would suggest is never going to be used and just adds bulk to the task of transforming the data sets and stresses the tools.

This is again a mistake from ignorance. If the students had done the experiments on paper and then hand processes all their data I think they would be a little more constrained when they next planned an experiment. Is this a good thing though?

While the downside of automation is that an inexperienced researcher can generate huge amounts of worthless data, it also allows an experienced researcher to "Think Bigger" and not feel constrained by the hard lessons they learned as an undergraduate that have limited their scope in the past.

I still have issues with people trying to keep a foot in both camps. Who are using the automated tools to generate massive data sets and then trying to hand processes them.  This ends up being the worst of both worlds.  There is a project going on at the moment that has gone down this road and is reaching epic levels of manual labor.  Essentially all the data is coming from databases, being converted to static reports and then hand entered back into a new database. All without any intentional transformation of the data. And its a huge data set containing more than 1600 items per record. And did I mention they are trying to code it on the fly using an evolving code book so it can kind of end up in an SPSS file like they used to do in the old days...    Talk about frustrating. Its going to take months of work, thousands of man hours of labour and introduce all manner of human errors into the data... all because the lead investigator has control issues and cannot grasp the scope of the problems they create for everyone else.

Ahhhhhhhh.

But back to the students.  Like most years its been interesting. There have been a constant flow of new problems to solve and an acceptable number of repetitions of previous years problems. Some of them are starting to get a little tedious but I am still learning new tricks so its not a total waste of time.

I am encountering a couple of recurring issues that just add to the misery. One is dependent questions in Surveys. The other is designing projects to manage attrition of participants. I have a paper in mind on the first issue but I have not really decided how I want to try to deal with the second. Its still evolving in my head. Maybe a blog post will let me work out some angles....

Later.

No comments:

Post a Comment