Wednesday, October 12, 2011

Processing Eprime, Superlab and PointLightLab data sets... some notes

After working with dozens of projects over the past few years I have evolved the fastest way to go from an Eprime data set into a usable dataset that is both robust and understandable.

Step 1. Export each participants data file from the native format of the data collection software to CSV formatted text file called P1.txt, P2.txt, P3. txt etc.
(Use the E-Data Aid program File->Export, set it to Excel and export the whole lot.)
(PointLightLab already creates .csv files so this step is already done. Just rename the files to P1.csv, P2.csv etc)
(SuperLab generates text files containing csv data so you are already done. Just rename the files to P1.txt P2.txt etc)

Step 2. Open Excel and create a new file called "My Data Set" or whatever. Drag and drop each of the text files generated in the step above onto Excel. This will open them using the CSV parser automatically rather than having to talk excel through each step.  It creates a worksheet named with the file name, which we cunningly set to P1, P2 etc in the step above.
Right click on the worksheet Tab in Excel and select "Move or Copy".
Select the "My Data Set" file in the top drop down list.
Select the position you want the sheet (move to end is the easy choice... but try to get the participants in order. P1, P2 etc)
Now move the sheet.

Rince, repeat for all the participant text files until they are all in the "My Data Set" file.

This is your master data set and you can save it as "My Data Set - Raw.xlsx" if you are disciplined.

Copy this file and call it "My Data Set - Cleaned.xlsx"

Step 3.  Record a "Cleaning" Macro for a worksheet to delete all the unwanted columns and bits and neaten it all up.

Step 3a. Record a "Summary" macro to reduce the remaining data on the sheet into some kind of summary.  Usually this is dependant upon what the researcher is doing. I often join these two macros together and bind them to a keystroke combination for speed.  This allows me to process a worksheet with two keystrokes. 

Step 4. Save these macros and run them on all the participant worksheets.  This step transforms the raw data to the cleaned data. If you stuff up a worksheet, copy it from the raw file back to the cleaned file using the move/copy command and run the macro again. 

Step 5. Create a "Summary" worksheet at the end of the "Cleaned" file.  Use the Excel "Indirect" function to pull all the data from the P1,P2, P3 worksheets onto the summary worksheet. This keeps the data live and you can update and replace the worksheets if required.

If you format this with the Participants down the colums and the DV's across the  row then you will be able to copy and paste straight into SPSS in a later step.

Step 6.  Do rough analysis, make descriptive graphs etc to perform sanity checks, look for weird outliers etc. Plot frequency distributions, check data ranges etc.

Step 7. Go back and fix all the bugs.

Step 8. Move the summary data to SPSS or whatever you destination analysis package is.  (I have some quick and dirty ways to create SPSS files as well... but thats another story.)

No comments:

Post a Comment