Monday, October 24, 2011

Compiler as API

Rosylyn project... . Sort into "Things I have wanted to waste time, which I don't have, on..." category.  Now if only they would do it for the VC++ compiler. 

Thursday, October 20, 2011

Building Ember Media Manager from Source pt1

I have been having a look at Ember Media Manager and got bored/curious enough to try to build it from source.  After crashing into various instructions on both the origional Ember Media Manager project and the Ember Media Manager - R project.  I have pulled down the source from both and tried to build it using VB.NET 2008 and 2010 Express.

I'm currently focusing on the Emm-r project.

The first error is in the various solution files. There is one that seems to be openable via 2008 but it does not inlcude all the sub projects correctly as the paths have changed. This is easily corrected by adding "\trunk" into the paths, tweaking a couple of project names and adding some projects taht were not present at all.

The second solution file is openable in 2010 but as I only have the Express version.... (this becomes a problem in the next step)

Building with VB2010Express

The next set of errors are in the form of:

rem SVN Revision -> Version Number

echo %ProgramFiles%  | find /i "(x86)"
if errorlevel  1 (
set ToolExec="C:\EmberR\trunk\SubWCRev.exe"
) else (
set ToolExec="C:\EmberR\trunk\SubWCRev64.exe"
%ToolExec% "C:\EmberR\trunk\EmberAPI\\" "C:\EmberR\trunk\EmberAPI\My Project\AssemblyInfo.temp" "C:\EmberR\trunk\EmberAPI\My Project\AssemblyInfo.vb" exited with with code -1073741515.

 These errors are to do with the pre-build events in each project.  The build events use the SubWCRev.exe programs to include the Subversion  Working Copy number in the AssemblyInfo.vb file. The problem is that the version of SubWCRev.exe (or SubWCRev64.exe) included in the project is either incomplete (missing libapr_tsvn.dll) the  are no longer current with the current version of Subversion. 
One solution is to remove the explicit path and allow the command to search the path for the program and find it in the c:\program files\TortoiseSVN\bin\ directory (in my case).  The other option is the do away with the inclusion of the subversion working copy number in the AssemblyInfo.vb and delete the pre-build event script all toghether.  Either or... but the question is how to do this and stay current with the project source.  Anyway... getting it to build is step 1.

As 2010 Express edition does not allow access to the pre-build or post-build steps.  Which since they are broken, does not let you fix them.

Build with VB2008Pro

So, my method is to open the 2008 solution file, fix the paths to the sub projects, then fix the pre-build scripts. If I build at this stage I get 867 errors due to all sorts of shit.

So open the 2010 solution file and build using 2010. This seems to build OK, but trying to run the program and its still in a tangle due to not running in a useful working directory with the resources.

Building with VB2010Pro
However, building on my work machine with VB2010 pro works first time (after upgrading the solution file)

Wednesday, October 12, 2011

Future Gridlock

This is an interesting rant by Neal Stephenson where he poses the possibility that his society (the US) no longer has the capacity to make big changes.

I would tend to agree and suggest that the problem is that there is too much inertia in the structure of the US.  The political and social gridlock has reached a point that there is enough pressure from interested parties to prevent change that its almost impossible to enact large programs of change.  Thus stasis.  And as we know, a static organism rots. 

Other societies have gone through this and if history is anything to learn from its that there is either evolutionary change (slow careful steps) or revolutionary change (rapid and substantial). I have no ability to clearly predict which path or when, but I don't think it takes a genius to realize that there is structural gridlock in most of the so-called "western" nations.  If we look to the "so-called" Arab world at the moment, we can see revolutionary change happening after ( in some cases half a century) of static rotting.  Their infrastructure, public systems, social orders and classes are all in decay and finally the active parts of their society have "renewed" the system by destroying the existing status quo and remaking their society. 

My guess is that the static "western" nations have decades of gridlock to wait out before they reach a stage where revolution can happen. Some more than others.  The amount of urban decay and dislocation has already started to play out in Europe in various place but its only small scale. The US is seeing the first widespread protests about its structural problems but their government and ruling elite have the resources to put that down without much effort at the moment. 

This is not to say that evolutionary change has completely stopped. Its still happening all over the place in large and small amounts. The point is that the amount of big change has slowed notably and is moving toward and average of 0. As more and more gridlock builds up, and the organism looses the ability to adapt.... the amount of mal-adaption the the environment and consequently the amount of damage sustained just by being in the environment increases.  (By environment I mean, social, political, resources, languages etc)

Enough ranting for one day....

Processing Eprime, Superlab and PointLightLab data sets... some notes

After working with dozens of projects over the past few years I have evolved the fastest way to go from an Eprime data set into a usable dataset that is both robust and understandable.

Step 1. Export each participants data file from the native format of the data collection software to CSV formatted text file called P1.txt, P2.txt, P3. txt etc.
(Use the E-Data Aid program File->Export, set it to Excel and export the whole lot.)
(PointLightLab already creates .csv files so this step is already done. Just rename the files to P1.csv, P2.csv etc)
(SuperLab generates text files containing csv data so you are already done. Just rename the files to P1.txt P2.txt etc)

Step 2. Open Excel and create a new file called "My Data Set" or whatever. Drag and drop each of the text files generated in the step above onto Excel. This will open them using the CSV parser automatically rather than having to talk excel through each step.  It creates a worksheet named with the file name, which we cunningly set to P1, P2 etc in the step above.
Right click on the worksheet Tab in Excel and select "Move or Copy".
Select the "My Data Set" file in the top drop down list.
Select the position you want the sheet (move to end is the easy choice... but try to get the participants in order. P1, P2 etc)
Now move the sheet.

Rince, repeat for all the participant text files until they are all in the "My Data Set" file.

This is your master data set and you can save it as "My Data Set - Raw.xlsx" if you are disciplined.

Copy this file and call it "My Data Set - Cleaned.xlsx"

Step 3.  Record a "Cleaning" Macro for a worksheet to delete all the unwanted columns and bits and neaten it all up.

Step 3a. Record a "Summary" macro to reduce the remaining data on the sheet into some kind of summary.  Usually this is dependant upon what the researcher is doing. I often join these two macros together and bind them to a keystroke combination for speed.  This allows me to process a worksheet with two keystrokes. 

Step 4. Save these macros and run them on all the participant worksheets.  This step transforms the raw data to the cleaned data. If you stuff up a worksheet, copy it from the raw file back to the cleaned file using the move/copy command and run the macro again. 

Step 5. Create a "Summary" worksheet at the end of the "Cleaned" file.  Use the Excel "Indirect" function to pull all the data from the P1,P2, P3 worksheets onto the summary worksheet. This keeps the data live and you can update and replace the worksheets if required.

If you format this with the Participants down the colums and the DV's across the  row then you will be able to copy and paste straight into SPSS in a later step.

Step 6.  Do rough analysis, make descriptive graphs etc to perform sanity checks, look for weird outliers etc. Plot frequency distributions, check data ranges etc.

Step 7. Go back and fix all the bugs.

Step 8. Move the summary data to SPSS or whatever you destination analysis package is.  (I have some quick and dirty ways to create SPSS files as well... but thats another story.)

My polyglot life... I mean code...

This is a good article on a product that used "polyglot programming".  It reminds me of some of the projects that I have had to Frankensteined together.  PointLightLab is the worst, it uses native C++, Managed C++, WinForms, XML  Schema & DTD, XSD for the XML parser, and implements 5 simple (well 4 simple and 1 convoluted) DSL's for the user side logic.  On top of which is layered the academic logic of the domain that we are playing within.  It then integrates (or is integrated with) by other packages I have written in PERL, Visual Basic.NET, VBA, MaxScript and others I have probably forgotten.

Pity the poor bastard who inherits this lot. 

Tuesday, October 11, 2011

Hypertext literature inutero

This is an interesting post on the problems with hypertext as a narrative form.  I don't think it holds many answers, but it does summaries some of the problems (loosely).  I don't think much has changed since I tried to teach people about it half a decade ago. Its still not clear what the form and rules are, if there are going to be genre conventions and what readers expectations will be/are.  So the fundamentals are still wildly in flux, even though there are existing works. 

I agree with the authors point that its a challenging and complex beast to create... not because its intrinsically hard but mainly because no one knows what it should look like.  Once we have conventions, it will be possible to either follow them or violate them, but everyone will know what they are....

Must get back to this one day....

Monday, October 10, 2011

Post book apocalyse

This is a good abstraction of the problems in the publishing industry.  (and in many transforming industries) The failure to understand their monopoly.  What value do you add to the product that you are selling.  How do you do something that no one else does?  What is the true value of that element and what is the actual recoverable value.... Book publishers are weighed down with dead tree artifacts, presses, staff, warehouses and distribution chains that they are either unwilling or unable to discard.  The things they are discarding are the important bits like editors and designers.... Doh!.

Personal printers made commercial printing irrelevant a long time ago.  Its really only been a question of quality that has allowed them to linger on.  Once you can print and bind a book on the desktop with one click of a mouse... they are toast.
Even if you can do it at a kiosk in the mall... it becomes an on-demand service rather than a specialist industry.  This will be the last gasp of physical publishing.  But the art of preparing the text for publication, designing the final polished product will still be a value step in the chain between author(s) and readers.  Adding context, keeping the text up to date and maintaining the contextual information's currency, relevance and history will all require people and systems.  Publishing.. minus paper, presses and trucks.  Distribution will still be a big job, advertising, selling etc... it will all still happen.  Legals, copyright, ownership etc... all still there.  Authors, editors and designers will still do exactly the same jobs... probably more.  There are multiple formats to deliver the book in, there are tradeoffs and decisions to be made. There are channels to publish into that have never existed before. 

A book is still unmatched by even a carefully curated website.  They are different.  A book promises completeness and polish. A snapshot of knowledge, story, whatever, that marks an identifiable waypoint in the flow of information and life.  We can compare and discuss the waypoint, we can refer to it. It has stability and finality.  Good or bad. It marks a tiny place in history around which all copies, abridgements, variations and edits will forever orbit. 

Thursday, October 6, 2011

Evolutionary vs Revolutionary Algorithms

This is a nice little post on building a simple evolutionary algorithm.  I have not thought of applying it to string mutation but its a nice little application.

A couple of thoughts occur to me.

The first being that "evolution" is strongly encoded in the mutation algorithm which single-steps one of the characters in the string a single position ( plus or minus). Which produces a slightly random systematic search strategy.  A "revolutionary" approach would be to randomly replace a character in the string with another character. 
If this approach was taken, then it would introduce much greater possible degree of individual mutation.  This is detrimental for a population of one but would be much more valuable if there is a larger population using cross-over and cull based on fitness.  Beneficial mutations would be quickly reinforced.  In a population of one, there is no chance to move toward beneficial or away from detrimental mutations.  You just keep randomly mutating until you exactly hit the target.   

Either way, this is still essentially a search strategy toward a known target. As long as the fitness function and the mutation function can play together to move the solution toward the target, it will eventually get there.

The more interesting application is moving toward an unknown target, which means the fitness function needs to have some method of either encoding the properties of a desirable solution ( beware of assumptions) or have some way to brute force test each possible solution in some fashion and collect some metrics from the test to measure fitness.

The mutation function can then either take an evolutionary or revolutionary approach to seeking new solutions.

The next level of subtlty is to introduce a more complex fitness function that evaluates both the overall fitness and "sections" of fitness.  In the example above, it would be beneficial to evaluate the overall closeness of the string and the closeness of the first and second halves.  This way, its possible to introduce multiple mutations in each generation, rather than a single mutation per step.

What is the benefit of this?  Its the same process just running in parallel.  Parallel evolution anyone?

You could add random cross overs along with culling based on part and whole fitness values. (So a solution with a high fitness in one half would not be culled even though its overall fitness may be low. )

This allows the solution to evolve at different rates in different parts of its DNA. Again evolutionary rather than revolutionary.

The next stage of fun being to randomly add or remove modules from the solution string and do some sort of fitness evaluation on different parts of the solution string, for different parts of the target.  This allows a valuable mutation to may occur in one section to migrate to a different location in the target string, even though it was useless in the position it occurred in initially.

 We will call this a matrix of fitness functions.  More dimensions can be added to the fitness matrix, depending on how bored you are. 

The choice really comes down to the type of mutation function, is it evolutionary (incremental steps) or revolutionary ( random jumps).  The first can get trapped on plateaus while the second wanders much more wildly.  We need a mix of both.  The question is the influence of either in the mutation algorithm.   Next trick would be to vary that based on the fitness value.  Need to figure out a continuum effect for the mutation algorithm while still keeping it blind to the eventual target state.

Endless fun...