Stratasphere: 2010

Thursday, December 23, 2010

Mature Coding Environment

It’s a day for bitching about programming tools again. I have just finished reading a couple of fanboy posts and articles about various programming languages/IDE's etc that are new and hot and just the secret sauce and I am tired of it.

These systems are IMMATURE. They are not only not worthy of production code but they are not worthy of hobby coding. They are to use a technical term... toys. They are a good idea that is under development and will not be ready for productive use until they have matured.

Why do I bother making what should be a bleedingly obvious comment? Firstly, cause I am sick of the endless spew of new coding stuff and secondly because its clearly not obvious to others. Oh and I'm forcing myself to articulate it because it helps crystallize my thoughts.

So what do I mean by mature? OK. I'll state it up front and then support it rather than trying to build it via subtle and cleaver arguments.... (Never really works anyway)

*** A MATURE PROGRAMMING ENVIRONMENT HELPS THE PROGRAMMER BE EFFECTIVE ***

Yeah? And.... like how?

1. A stable language feature set

2. A state of the art IDE or language support in your favorite text editor

3. Comprehensive RICH support for the major aspects of development (develop, test, deploy, maintain)

4. Has tools to extend the programmer via automation. (Debuggers, Static Analysis, formatters, document extraction, profiling, code generators, macros, Refactoring, GUI Designers, test frameworks and runners)

5. Has high level tools for managing complexity (various file/text/logic/structure views, flowcharts, models etc)

6. Integration with a STABLE ecosystem of components (databases, media libraries, GUI systems)

7. Has a rich knowledge base with both breadth and depth on both core competencies and application to more exotic projects.

8. Has systems of support. (No man is an island... shoulders of Giants etc) Forums, discussions, communities where solutions can be found in a timely fashion.

9. Comprehensive documentation for the tools.

10. Is integrated ( or can be integrated ) into a robust workflow that can be packed up for storage and reopened when needed.

Without getting into stupid methodology arguments, I think these aspects make for an environment that gives the working programmer the best chance of getting from A to B with a project in the most effective way. I'm not talking about the self enrichment that comes from tackling something unknown or hard or only working in assembler or any other side effect of programming. I'm talking about getting a job done effectively so you can get on to the next thing/problem/job/stage whatever. I get that many people enjoy programming and fiddling and wish it would never end. I have those moments and its easy to waste time for little real progress. (Refactoring should be stored in a locked cabinet with the other dangerous drugs...) I'm talking about getting a specific, bounded job "done" efficiently. (I may be strange in that I am rarely working on a single project at a time, usually there are half a dozen or more in the air at any time in a couple of languages with their own domain issues, so I see patterns develop.)

This then becomes a "how long is a piece of string" discussion about what "efficiently" means. Well it means the system works, satisfies the criteria I have agreed too, will not embarrass me when its reviewed and will not be a pain in the ass when I inevitably have to come back to it and make some changes.

So what’s with the MATURE shit? Well it goes like this... a software system is an investment... it has costs to create and hopefully will return value to someone in its use, not just its creation. It has a life cycle and some sort of diminishing returns curve. It also potentially has ways to extend its value (maintenance). If you can conceptualise of software this way, then the economics of the tool set used to create it are an important variable in the value model. Not that hard if you're used to building any financial models or doing basic cost benefit analysis.

So the initial outlay cost of the tools is a fixed amount, but the cost of using the tools is a variable amount, dependent upon both the task its being used for (difficulty) and the time the task takes. These two will probably have an exponential relationship; simply meaning that the harder the job, the longer it takes. However time is a linear constant, so its will probably not vary too much unless you have an unlimited number of people to throw at the job...(Mythical man month anyone?)

These two variable costs are the ones that make this model suck hardest; but also identify the issues to attack for the maximum gains. (Has anyone profiled the actual activity of programming? I 've certainly tried.)

The time variable can be fiddled with a little but has a ceiling. There is only so much work you can get out of someone in a given time period and adding more people has a diminishing return... so it has some pretty hard limits from the people side of the equation. However... if the tools have an effect on time... then.... ah... you see the point... if you tools are throttling the activity of the people then you are tool bound.... better tools could potentially allow them to reach their maximum productivity.

The other variable is difficulty, which can manifest in all sorts of ways. Difficulty from the platform, from weird API's, from crappy third party support, from ugly documentation, from missing language features, from having to write and test the same boilerplate code over and over again... The point however is that within the activity of programming itself there are only two states... the first is where you are making forward progress toward the project goals(State A) and the other where you are stalled or going backward (State B). In state B, you are burning time and resources for no gain. These situations can't always be avoided... or could they?
There can be a bit of grey between these two states but forcing myself to make a call between state A or state B can clarify the situation in my own head.

Anyway, so after a brief look at my personal economics philosophy I get back to mature programming environments and how it all relates.

The key point is that a mature programming environment has been optimised to reduce the cost in time and the multiplying effect of complexity/difficulty. This optimising never quite ends but there is a distinction between a well developed, mature environment and the solutions available for a "new" language/tool set.

The thought exercise I always use is to conceptualise a couple of projects, the first is a one-off throw away data solution for a project for a single user of maybe 1kloc, then a simple experiment package of say about 10kloc for a couple of researchers, the next is a more developed multi-part tool set for working with motion capture data of about 70kloc for both internal and external users, the last is a larger package with more history, a huge library of resources, multiple generations in service at once and large body of users of about 750kloc. I mentally try to apply a prospective tool set/language to each of these projects and see if I can imagine using that tool set productively on the projects.

Honestly, most of the languages and tool sets I see talked up fail before the first hurdle. They're not even worth considering for a tiny throw-away project. Why not? Because their initial setup and investment in the tools is massive in comparison to the time spent on the productive project work! It takes time to find and assemble all the bits and update to the latest builds and scrounge enough information to build a GUI and you need to hand code everything without any useful samples... etc. Why bother? For larger projects the initial cost is much less significant, but the other issues start to come into play. How well integrated is the tool set? Will it build with a single button click, will it deploy iteratively, can I build various types of tests (unit, integration, GUI?) Etc. How does the tool chain scale? How does it manage complexity and extend the limits of the human brain? Can it graphically express code structures, does it support static analysis tools, are there profilers, debuggers, code formatters, documentation systems, code generators and an API for building your own tools against the exiting toolset. Is there a macro system for the tools?

These are basic features that a working programmer should expect. But so often are lacking.

As such, there are very few systems that can conceivably be described as mature. There are a lot that are moving in that direction and there are an infinite slew that is much closer to toys....
I'm just tired of the illiterate fanboy’s who get lost in the excitement of a shiny new toy without realizing that its got such a long hard way to go before its grown up enough to be a serious contender for anything. That probably makes me seem quite dated.....

I must build a table of all the contenders one day... Wikipedia maybe....

Thursday, December 16, 2010

Sloppy Code Article

http://journal.stuffwithstuff.com/2010/11/26/the-biology-of-sloppy-code/

I just read an article about Sloppy Code. The seed idea is not the gem here its the explanation of how to fit it into the mindset of "programmers" and all the issues surrounding the evolution of both the craft and the environment in which we are all working. I found the article very deeply resonated with a bunch of half formed ideas that have been slowly orbiting my conscious and unconscious mind for some time now. This article not only articulated it, but did it with grace and clarity. The linkage with the abstraction levels among the sciences was wonderfully illustrative. It just resonated.

The best aspect was the optimistic spin. I have been reading articles about change in various industries and environments recently and the common thread has been the fear and uncertainty communicated by the authors which ended up with a common negative taint being attached to change in any form. ( I understand that change generally means loss and dispossession by many... so its fair... but still there have to some who see the upside)
Anyway, I found this article strangely uplifting. It has a lot of parallels with what I find myself doing more and more. While I still occasionally get a job that I can break out C++ and hack against some low level library from Apache or Boost, more often I am writing loose VBA or scripts to drive high level objects or automate whole executable through some high level API. This gets stuff done and is often quite satisfying to get it done quickly, but it lacks some of the fundamental satisfaction of having constructed it from elemental primitives.

I guess thats why I still get so much satisfaction from going back to raw materials in the shed. I would rather build a lathe out of fundamental components, weld them together, cut and shape and slowly assemble them rather than buy one and use it. But on the other hand I have enough experience with turn-key packages that I also enjoy getting something that "just works" and getting stuff done with it. Different levels of abstraction.

The next major building block I am wrestling with is AI. There are enough low level libraries around to build various simple constructs but you can still see the bare metal through the library. The question becomes whether to use a library that has rough edges or to build it yourself. There is not a strong enough value proposition to use them as higher level black boxes and glue something on top because they are not really high level. They are still just first generation collections of tools and routines.

I want a library that I can instantiate a functional AI from in a single line of code, be it a Neural Net or Agent game actor or some other variant that is already done. I can then just build the rest of the experiment rather than having to go back to almost bare metal and make all the decisions and construct it slowly.

Now I think about it... I guess I am moving further away from the metal in a number of threads. The attraction of building another 300KLOC program just to get something high enough to run a couple of stepper motors as an abstract unit within its own work envelope is just depressing. Maybe its just fatigue. Having re-invented the wheel a few times and worked with so many packages that have done the same thing, over and over again, I am just tired. There is a certain point at which the idea of inventing the same wheel in yet another immature language becomes down right depressing. Trying to map the concepts that I have spent countless hours of bloody minded effort learning onto a simpler faster way of doing it.... almost seems a step backward. The time spent lerning basic, Pascal, VB, Assembly, then C code and learning C++, OOP, Managed Code, VBA, Perl, Python, Lua, RegEx, various libraries and windowing toolkits, Generic Programming, Functional Programming, Logic Programming, Scripting Languages, Embedded Languages, all the IDE's, debuggers, profilers, Static Analysis tools, patterns, refactoring, Testing Frameworks, Graphics Libraries, Game Engines, Encryption Libraries, AI Engines, Physics Engines, and now GPU languages, Network stacks, Databases, Servers, OS's, Parallel Programming, Threads, Memory Models.... the hours and hours spent looking for solutions that you know must be there but will not turn up in a search no matter how you rack your brain to describe it.... all fading into irrelevance.... Now I can barely perceive the metal through the layers of code. Working in VBA over the objects in Office is a totally different model. So much of what you know is useless. You can do it the easy "Office" way or you can try to torture the system by mapping your own ideas over it and quickly find the limitations of just what is possible. Its not really OO, its not really any of the techniques you may have known, its impure, its unpleasant, its still possible to have control over some things but not an even level of control, you can still manage lifetimes but not easily.... in the back of your mind the darkness grows and you start doing things the "Office" way... and then the .NET way... and before you know it you are on the slippery slope to being a competent Access Developer. No longer battling against the restrictions but hacking fast and loose and getting stuff done.... not worrying about creating a Q&D object to encapsulate some code and putting dirty switch logic in that you would be embarrased to write in C++. It just works. Its not something that will come back to haunt you because it will get replaced in the next round of refactoring and massaging. In fact, adding the overheads of "Quality" just makes the code that little bit more rigid and expensive to change when it needs to. My feel is that the value proposition has been reconfigured with the lighter more dynamic systems that combine high order abstractions with loose glue languages. There is much less value in building comprehensive code that is robust and complete because the very nature of these systems is fluid. The quality has been pushed from the code you write into the objects you trust. We are delegating at a much higher level. Not only are you delegating responsibility for functionality but also high level error handling and self management.

Suddenly COM has come into itself. Web API's are next. The objects are not only a black box, they are a black wall. With a tiny window in it. You can talk through that window and the rest totally and I mean totally takes care of itself. This promotes the ideas of loose coupling in a way that I had not fully appreciated before. Having some sort of abstract communication mechanism between your glue code ( or Sloppy Code) and the API of a service provider forces you to keep it at such a long arms length that many of your assumptions are obviously violated in ways that force you to stop making them. Communicating via JSON or XML or some non native intermediate language stops you depending on things that are too easy to depend on when you are passing longs across to an API of a dll that you have loaded by hand into memory. There is so much less and more to trust in the relationship. The illusion of control has been stripped away that little bit more and you need to be a little more accepting of how little you know or can depend on in the exchange.

I think pulling data from a web service in a stateless transaction is a cleansing experience. Much the same as I once heard John Carmack quoted as saying that working on driver code is good for a programmers soul; so to is working through a flimsy API via a crappy intermediate language with a service hosted on a computer who knows where running an unknown OS, maintained by someone across a public network of wildly fluctuating service availability and with no illusion of control. Its humbling, frustrating and simple while being ugly and primal at the same time.

The idea that you can profile the service and get into the guts of the code and fix stuff... just goes away. You need to be more accepting of your limits and realistic about choices that you can make and the cost of making them. Because the choices are real simple. You can't wait for an updated version of the library or compile one from source. You can't look for alternative options or roll your own... the whole system has become about the data in the system rather than the framework of code through which it flows no matter how the abstractions within the code facilitate reuse or change... its just crap to either work or get out of the way.

Moving on.... I want high level API's (essentially an AI that I can talk to in natural language that will then go and do stuff in an organic way) and I want it now ... end of post.

Wednesday, November 17, 2010

Natural Language Processing

http://nlpwp.org/book/chap-words.xhtml

Good resource both on Haskell and NLP. Nothing that can't be done in another language. I feel much more comfortable mapping this problem space to C++ than to Haskell, simply because I'm more fluent. I also have a sneaking suspicion that the myths about Functional Programming are less about any intrinsic properties of the language and more about the person holding the hammer....

I have actually done some of this in VBA, which if anyone is interested, is painful due to the ugly data structures and having to build everything yourself. I know I can use .NET containers etc but its still ugly because its not my favorite hammer and its dog slow, hangs unpredictably and ... well its just dog slow(very slow dog... ignore obvious edge cases in this metaphor) I can make any tool work given enough time and energy but some just make the job harder in ways that the word "harder" does not express completely or gracefully. Think of VBA as a plastic toy hammer with an asthmatic squeaker .... C++ on the other hand is a bit like an antique 2T power hammer with a loose linkage in the return spring and a lumpy anvil but damn it can hit the problem hard....

Imposter Syndrome

http://blog.asmartbear.com/self-doubt-fraud.html

I like this post. Its well constructed, short and punchy and speaks on a topic that resonates with me (and some other people I can think of). In the contextual abstract its a beautiful post. (Although the first post in the comments is a total fanboy tag) anyhoo... as for the actual content, its slightly disturbing. I would suggest that while I identify with the Imposter Syndrome, in reality I would probably conclude that not only have I been there but I have taken the escape hatch route. I did not go to the happy place, I have gone to the safe place.

This line of thought becomes a complicated tangle of self doubt, supposed objective analysis, excuses, rationalisations and unfulfilled dreams until reality crashes in, gets dismissed as excuses, exits stage left in a huff and proceeds to play devils advocate from the wings dressed in the guise of a "Gandalf-ian" old wise man getup.

Am I confusing the pressure of unreasonable expectations and workload with the fear of failure/discovery issues that were discussed in the post? Perhaps

Am I pushing myself to learn new things or am I cruising at a safe altitude? Nope. New things every damn day.

Am I building something bigger than myself? Nope. Its just a job that will exist long after I have gone.

Am I endlessly passionate about what I am doing? Bits of it. A great deal is politics and ephemeral bullshit... but I find value in it all. It stretches me in other ways that are not always comfortable or pleasant... so I would phrase it as: "Its a challenge every day".

On the topic of being in a safe place vs on the scary bleeding edge... I think as a parent and provider for children, the safety and security aspects trump the bleeding edge thrill stuff. Its irresponsible to expose your children to risk and stress. Full stop. There is no "but..." arguments after this point. I can think up any number of rationalisations and guilt trips about how I'm wasting my talent etc but its not about me and I accept that I may look like a cop-out, or I'm hiding behind my children so I don't have to play with the big boys... but providing a safe and low stress environment for the children comes first, second and third otherwise I'm a failure as a parent. So while every guilt trip still lands and hurts, I suck it up and do what I know is the right thing. I find ways to enjoy a job that is not thrilling every minute of the day, I humbly accept pay that is far from the bracket that I was aiming at, I disappoint the dreams of various people who are looking for "great things" from me ( or great money...), I trade all this for a reliable pay packet, a quiet neighborhood with trees and dogs and schools and beaches where my children can grow up and learn without having to worry about all the other stuff.

Do I still dream? Is the hungry urge to beat the world at something still there? Do I still want to do amazing things and create fantastic tools that I can just about see how to do? Hell yessssss! But I take care of business first. Walk carefully through the minefield. Make safe choices, reduce risk, manage money, balance the budget, keep the cart on the tracks, put one foot after another, ignore temptation, be the money cop, ignore opportunities, stay focused, play the long game, avoid regular fixed costs, reduce debit, stay with what you know, save for a rainy day, don't explore shady service providers(phone companies, banks, internet providers etc), be the dull dependable safe sunscreen wearing boring person that puts food on the table, a roof over their heads and endless stimulation in their minds.

Oh and don't ride motorbikes. I miss my bike every day. Its was 300kg of big, black crazy risk taking behavior. It reminds me not only of what I have given up but inversly what I have given it up for. Better than a tattoo because the pain does not fade. (even for a huge battle cruiser, it could hit 180kmh at red line in 5th... so I hear)

Now I have the thrill of debugging spreadsheets and labeling equipment.... no comparison really.

Tuesday, November 16, 2010

Tenacious C IDE

http://tenaciousc.com/

This looks like an interesting product. Need to investigate more at a later date.

RESTful services model

http://martinfowler.com/articles/richardsonMaturityModel.html

This is an interesting article on the architecture of a RESTful service. I've been running into this term for a while now and have not really investigated it before. So this was a very useful read. Funny now that I've pasted the link and looked at where the article was actually hosted it make sense why it was so polished. Shows how little I'm actually reading page headers and such....

Curating a blog

Every so often I have the need to go back and clean up old blog posts. It's, strangely enough about this time of year, most years. The research students are done and its time to clean up, round up the assets, restock the supplies in the labs, archive the tools and software, catch my thoughts, and generally look at the year in review. But back to the point....

Curating a blog. I guess I'm not having any new ideas that have not been had by others who maintained diaries or any other sort of longitudinal writing. It becomes a body of work and patterns start to emerge.
I wonder if by cleaning it up and curating it I'm actually destroying some of the historical issues that would otherwise have been interesting to me in the future. Bad spelling which was a result of starting with a blog package without spell checking (and being too lazy to manually copy/paste into a spell checker). This in itself is valuable insight into why I was blogging at the time. It was about cathartic train of though writing. Just hacking something onto the page without too much thought about why or who or what. That is being lost a little at a time as I go back over old writing, see it with current eyes and standards and "clean it up" to suit my current frame of reference. Is this a good thing?

Does anyone care or will they ever care?

In this respect paper has more authenticity. I certainly have some piles of paper with random writing on it and directories on old computers full of random writing. However its less structured. There is little sense of a timeline because so much of it has been moved around from backups and disks that its lost its original time stamps, or its actually been re-edited at some point. The other side of the coin is that the paper writing is also dateless so its not that different I guess. Mostly just random snippets of bits and pieces. Every so often I try to find the time and energy to digitize it and extract all the supposed pearls but like all piece of string projects, the value is pretty trivial compared to the work to transcribe it all.
Its not like I actually believe there are any great insights lost in it. Mostly its just childish fragments and scenarios that were meaningful to me at the time.
Midlife crises suck.

The boy is starting to get quite verbal now. He's putting good meaningful sentences together and its possible to get a bit of a conversation going. Two or three sentences anyway but he's expressing himself a little more than just frustrated screams or inarticulate noises. Every day its a bit clearer.

The girl and I have successfully built our second robot together. This one is a OWI 5DOF arm with a USB interface. It's from a kit so its not as big a challenge for her. I wanted something that would work once we has put it together rather than a home brew system that would have taken endless tweaking and frustrating debugging. She still only has an attention span of a couple of minutes at a time so we need fast results and a predictable outcome. Its been a great little project so far.
The firs robot we built was about 6 months ago. It was another kit system using Capsela components for a simple walking robot with pose-able arms. Very trivial but a good first project. We had results in about 10 minutes without any tools. Like most Capsela systems its not featured on any websites and seems to have never existed. I found it in a little toyshop an it seemed to be the only one they had. Anyway....
Now I have to find the next challenge.

Wednesday, November 10, 2010

Character differences between WRPG's and JRPG's

http://www.escapistmagazine.com/articles/view/issues/issue_279/8295-United-We-Stand

Very insightful article about different styles of character and how that ties back to the social models the games derive from.
Lots to think about.

Tuesday, November 9, 2010

Supervisor pattern

http://vasters.com/clemensv/2010/09/28/Cloud+Architecture+The+SchedulerAgentSupervisor+Pattern.aspx

A heavy post on cloud patterns. Need to re-read this and have a think about it.

Improving the performance of a Shared Access Database

http://www.fmsinc.com/microsoftaccess/performance.html

http://www.fmsinc.com/microsoftaccess/Performance/LinkedDatabase.html

http://office.microsoft.com/en-us/access-help/about-sharing-an-access-database-on-a-network-mdb-HP005240860.aspx

I've implemented the trick of holding the lock file open and it seems to be helping. Counter intuitive to all the energy I have spent releasing resources in other languages and systems but anyway.... it works. I am also doing a lot better with caching results and doing my joins in code rather than via SQL. Anyway, back to the bug hunting...

Monitor Arms

http://www.scorptec.com.au/computer/18/421

Finally found a source of my favorite monitor arms in small quantities. Scorptec have them under the brand name "Manhatten". They are cheap, solid and black and work brilliantly. I have about half a dozen already from a different supplier under a different name. Not shiny but they just work.

Ergonomics Information

http://sheddingbikes.com/posts/1281257293.html

Good post on a variety of issues for programmers and some reasonable solutions.

Learn Python tuts

http://learnpythonthehardway.org/index

Something for my endless supply of free time.

Bit hacks to remember

http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know

These are some useful techniques to keep fresh for the upcoming embedded programming.

40 hour week

http://blogs.msdn.com/b/jmeier/archive/2010/10/21/40-hour-work-week-at-microsoft.aspx

This is an interesting post about working within a bounded time frame. I have had a week or two to think about the issues raised and I still keep coming up with new and valuable insights from it. Its worth another read at some point.

Firesheep plugin

http://codebutler.com/firesheep

This is a little old but still funny. I'm interested in this from a security point of view but also from seeing what the reaction to it is. Its one of those things that could catch fire and take off or disappear into the background noise. Its an interesting experiment on public scrutiny and security by obscurity. The techniques to exploit this hole have been around since cookies were invented and abused for session management over insecure networks... so far its passed about 1.4 million search results on google using "firesheep -sheep". The top ten pages are all 100% articles on Firesheep so I figure the rest of the results are probably pretty good. LOL. Way to shine a light on the issue. Lets see if anything happens.

Edit
The general reaction has been two fold. Firstly all the tech press is generally cheering the political objectives while recommending countermeasures. Secondly the hysterical non technical press is decrying the existance of such a terrible weapon... blah blah blah.

Another interesting aspect is the ecosystem of countermeasure tools that are popping up. BlackSheep and FireShepard are the two that have sprung fully formed to offer a solution for the ignorant. Does that not strike you as suspicious? I have read that BlackSheep is actually a DDOS attack client which I find much more credible than that it magically has some capacity to reach out and touch a passive sniffer application. The description of how it works is kinda credible but not if you know much about DNS and how FireSheep actually works. Even if its exploiting a weakness in FireSheep, its not actually dealing with the underlying issue that is being highlighted. It would be trivial to rework FireSheep to be impervious to BlackSheep's supposed technique.

As for FireShepard:

http://blogs.forbes.com/andygreenberg/2010/10/28/how-to-screw-with-firesheep-snoops-try-fireshepherd/

This page has a lightweight description of how it claims to work. Again its basically trying to attack a weakness in the Firesheep tool rather than patch the problem that FireSheep is highlighting. Also FireShepard would probably breach the terms of service of any reasonable network because it works by intermittently flooding the network with rubbish packets. This sort of activity would probably set off all sorts of DOS attack detectors, Intrusion systems and just generally piss off any network admins who caught you using it. Its the equivalent of turning on the sprinkler system in a whole building to put out a single candle (that may or may not be there). And just consider the chaos if one paranoid user on the network starts talking about it to their co-workers and encourages them to also install it. You then have multiple people intermittently DOS'ing the network segment. Genius.... (Sarcasm)
The first tool sounds like its a tiny step from being outright scamware if its not already malware. The second sounds like a poorly thought out tool with marginal hope of fixing the problem but much larger potential for getting the user banned or prosecuted.

Nothing has turned up about dealing with false positives or the social consequences of detecting an attacker and how to deal with it ethically or safely has shown up yet. I would assume that the common witch hunt rules would apply. If you think someone is running a sniffer on the network, you can unilaterally employ the "strike first" approach and burn them publicly so you feel all safe again. Since there is no actual evidence (unless your facebook profile has been hijacked by a completely incompetent person who signs all their fake posts with their real name... but then how would you even prove that that was their real name? Endless fun with digital forensics.

So we have a scary mix of paranoia, uncertainty, ignorance, exploitative tool developers, no useful solutions from most of the affected sites and a bubbling pool of anger, distrust and the usual illusion of invulnerability that internet users get when they feel safe and anonymous. Nothing bad could happen here...

Stats with Cats Blog 10 fatal flaws

http://statswithcats.wordpress.com/2010/11/07/ten-fatal-flaws-in-data-analysis/

I should hand this out to all the research students as they ask the same questions again and again....

Irony Language Implementer

http://irony.codeplex.com/

This looks useful... but not right now. Could be used for implementing domain specific languages that leverage the .NET runtime. May be a good solution to create an embedded language for inclusion in experiment generations tools.

i fix it site

http://www.ifixit.com/

This is an interesting site to keep an eye on. Not particularly rich with detail yet but growing. They need some better in depth diagnostic information for many of the devices rather than just a list of which screws to remove. But we live in hope.

Monday, November 8, 2010

Narrative structure for RPG

I am back to thinking about suitable narrative structures for CRPG games.

My current breakdown is something along the lines of:

* Linear narrative - Essentially there is one start and one ending, there may be more or less variability in the middle. (Think Max Payne )

* Broom narrative - Similar to the above but with the addition of a number of different endings. (Stalker is an example )
* Serial missions - One mission at a time usually with a strong theme. Can do branching but very expensive to develop whole mission modules that may not be played through.
* Mission Loom - From a single start point the rest of the game is composed of many small missions that overlap and weave together. Good for pickup games or for the OC gamer who wants to "complete" everything in the walkthroughs.
* Mission Loom with main thread(s) - One or more primary missions are presented to try to tie all the mini-missions together into a common theme? (Think Fallout 1,2,3 etc)

These are essentially all fragile systems that have modular segments that join together to create the narrative context for the play activities. They are fragile for a number of reasons, the first being that they have no capacity to be resilient to any errors in the chain. If one module fails ( for whatever reason) it conceptually leaves a gap. Even if the player somehow crosses the gap and gets back into the game, they have experienced the cognitive dissonance of having to "exit" the flow of the game to deal with the broken game engine/level script/ whatever bug. Obviously this is undesirable.

The second issue of fragility is to do with the Users perceptions of whats going on. Its fine to be able to tune a level until most users get it most of the time, but for complex storytelling that involves any subtlety ( which the debate is still open as to how many people would actually buy a game with too much subtle storytelling...stuff but anyway) there is no way to gauge if the user is following the storyline at a level they want or care about. (I use care to represent their level of engagement in the storyline rather than just their engagement in the game activities which can be a different kind of bear tickling)

So, to recap, the player can loose interest, loose the thread or fail to engage with it enough to differentiate the meta story from the game activities. (Not see the forest for the trees... so to speak)

The other sort of fragility I was thinking about is the temporal aspect. How to deal with the amount of playtime that the player wants to commit vs the time required to get through the narrative. Can the player commit more time if they enjoy the story or does the story have to fit the amount of time the player wants to commit? Both of these scenarios have ugly issues associated. Players are getting a little tired of "sticky" games that demand attention. (This whole issue has been soured by the MMORPGs and their subscription models.)
So players want a quality experience not a skinner box. Something that enriches their life rather than sucks it dry.

This is a similar issue to the one faced by television serials. Keep a narrative arc or make each episode self contained. Keep them wanting more or give them what they want? The eternal dilemma for media producers.

I have been reading an interesting post by someone about working inside a bounded time frame that has some interesting dimensions for writing. These ideas have mixed with an analysis I recently read of the "Robert Jordan - Wheel of Time" fiction series. One point of view is about how forcing us to work within a clearly bounded time frame helps sharpen everything you are doing while on the other hand, the book series was criticized because it wandered at times without boundary and so the quality suffered.

The point being that forcing a scenario or a narrative into a fixed time frame may be no bad thing. Especially a reasonably short one. Cinematic scripts are usually 120 pages which forces them to keep it tight (conceptually. Lets not argue about all the exceptions to this rule that exist.)

So how would that be applicable to game narratives, assuming game narratives are even well enough defined to be called such a thing.
If we are looking at a simple linear narrative, its easy to apply a fixed time to the narrative. It just keeps on ticking no matter what the player does. This forces the player to get with the program and stay with it. In effect punishing them pretty severely for making any mistakes or wandering around exploring.

Pick this up later.

Wednesday, November 3, 2010

Swapping keyboards is complicated

http://www.sense-lang.org/typing/games/balloon.php?key=EN

My current favorite typing tutor. I have swapped to a Kinesis Freestyle keyboard and its been a bit of an adaption curve. This site provides some useful exercises to help get me back up to speed.

I started by moving the whole keyboard to the ideal 30 degree split but found it too much too soon and put it back together. I have since been progressively moving the angle and adapting slowly. I am still not a fan of the key layout. I appreciate the loss of the numeric keypad but I have some kind of brain failure with the backspace and delete keys. I constantly seem to be getting them backward. Mainly because the delete key is approximately where I expect the backspace key to be which works the reverse of my intuitive expectation, then when I realize the mistake, I have to consciously think about the solution and it gets very messy. Suffice to say its working itself out but breaking long habits is challenging without the regular practice of a keyboard trainer.

The other irritating thing is the keys on the left hand side of the keyboard. They are essentially useless. They are sort of shortcut keys to reduce RSI but need to be completely learned. My biggest problem is I keep trying to feel for the lower left ctrl key which used to be the left most bottom key, but now its not. I keep hitting the shortcut key for context menus instead. Like I said, irritating but not life ending.

I have also been using the Evoluent upright mouse for a while. I would say that my adaption to it has been less useful. I still habitually try to hold it from the top down. And I find my coordination and precision is pretty crap when I do hold it the right way. Rather than driving from the wrist, I find I end up moving from the elbow which is much less precise. Probably from lack of practice. Still its a whole new set of muscles and nerves to train so it would be good to get some mouse training games happening.
I happened to do some timed testing tasks with it the other day and my coordination was a bit slower than usual and it made me feel like I was fighting the mouse rather than it being an intuitive extension of my hand.

Blah. I am still not game to try the Kinesis advantage keyboard. I'm worried that if I adapt to that I will get even less capable with all the normal keyboards I use every day. As a Tech I need to be able to work on any of the computers in the place and not being able to drive the keyboards I find would be a big disadvantage. I'm still wondering if I can maintain two similar sets of competency without making a mess of both (as usually happens for me)

VOIP SDK for Telemedicine app

http://blog.phono.com/2010/10/25/behind-the-phone/

This looks like a useful product for some of the telemedicine projects that are floating around. It uses SIP for brokering so I can pretend like I know something about it from my past research. Good to see someone making it more accessible. Version 2 will be very interesting.

Brain Science and Brain Training

http://www.cambridgebrainsciences.com/

Need to keep track of this as a brilliant set of samples. Kinda irritating because I have had to hack up a bunch of these tests for various experiments in the labs. These ones are much prettier than mine which suggests I need to get back to some flash at some point. Probably just before it all dies from HTML5... figures.

Fastflow Parallel Programming framework

http://calvados.di.unipi.it/dokuwiki/doku.php?id=ffnamespace:about

This looks very interesting. When I suddenly come across a bottle of "free Time" I might just have a drink and a think...

Clang and LLVM for C++ refactoring

http://clang.llvm.org/

Thank the Assembler, I don't have to do it myself. Not that I'm delusional enough to think that I could have, rather that I could see the need but there appeared to be no solution other than me doing it (somehow) myself.

Finally, I can start to dream of getting some re-factoring tools working for C++ and after that we can talk about high level transformation of code... better static analysis.... design extraction.... whooot! Well a boy can dream can't he?

Monday, November 1, 2010

Philosophy of hacking

Found an interesting post on Lifehacker

http://lifehacker.com/5672997/the-benefits-of-disobedience-why-we-hack

It wraps a philosophical context around the activity of hacking and the mindset. I doubt that many hackers have ever conceptualized their motivations in such a way but the piece does have a certain resonance. I certainly have never felt that disobedience was my motivation. Usually it was just my frugal farmboy upbringing that motivated me to fix whats broken and extend the life of anything that is useful. Growing up my reality was that if I couldn't fix it or make it, I didn't have it.

But back to the article. I can see the association that the author is making, and I think it has merit. Mainly because I can't really think of a good argument against it, but I am still wary of suggesting this argument captures the general motivation of most or all "hackers".

The argument gets a little shaky when you drill down into the semantics. "Disobedience" semantically is not really the over arching motivation for most hacking projects, I would suggest that in the way the author was using the term, its more a means to an end. In that a hacker has to "disobey" the rules not to do what they are doing, but they are doing it for "X" reason. "X" may be anything from fixing a bug, to adding features, to working out how something works to trying to impress someone and get lucky ... whatever. Disobedience is simply a means to an end.

The interesting implication that the author did not make was that the willingness and the ability to be disobedient are essential to successful hacking. The idea of not being afraid to "void your warranty" and not being afraid of "bricking it". This is essentially risk taking behavior in another form along with a degree of confidence combined with technical experience that leads one to suspect that something is possible.

To extrapolate this train of though, its possible to suggest that the environment of warranties and license agreements and all manner of other "rules" sets that impede the fairly natural instinct of "taking it apart to see what makes it tick" is an environment that habituates curious people to ignore these rules and thus be less worried about them in other aspects of their lives. Life is just more straight forward when you ignore most of these unenforceable rules ("Thou shalt not mod thy game console") and threats ( "void your warranty") and get on and hack your device/software/web service/file format etc into whatever shape you currently need.

I would also suggest that hacking is an emergent phenomenon in any complex structure. Once a system gets sufficiently complex, opportunities emerge for something else to find more direct ways to exploit resources within that system. This may not be to completely break the system, rather just to subvert parts. Look at all the parasites inside ant and bee colonies. These genomes have accidentally found strategies to hack an otherwise all encompassing system with no "legal" opportunities. The result being that they make out like bandits.

Alternately, look at any large organisation and consider all the "proper channels" and the way people naturally find ways to get their jobs done even when parts of the system are malfunctioning ( bad manager, poor communications, horrible enterprise software etc) people find ways around things and the more they get used to going around the obstacles, the less they "follow" correct procedure in other issues. Eventually they habitually look for the most straight forward way to achieve the ends and the system re-aligns or sections are discarded.

Hacking is all around us. Every day I see people who have no idea what's inside a computer or what to do with a hex editor hacking away at corporate systems. They use telephones to talk to other parts of the organisation and avoid sending emails through "proper channels" or they feed systems information and data that they know will be accepted, even though its technically wrong, just to avoid having to deal with broken enterprise software systems. They talk their way around problems and obstacles to get things done. These are not the idealized "geeks" taking apart technology and making things in their bedrooms but its still hacking systems to get results that are otherwise not possible. Are these people "disobedient"? In a way, certainly. Is their motivation "disobedience"? I think not. In many cases its very much the urge to keep the system working and to fit in and not be the "exception" that forces them to hack the system.

This leads to an interesting question. Will they continue to hack the system when the need is no longer there. Does this "breed" disobedience or will these people return to being rule abiding citizens once the enterprise software is fixed or the passive aggressive supervisor has been fired? Is "hacking" simply a case of people "finding the path of least resistance"?

I think there are some people who, having had to learn the skills and techniques of hacking their particular problem will never forget these skills. The question is if they will ever again be in an opportunity to need to employ them. For other people, I would guess that the discomfort threshold required to have to hack a system is sufficiently high that they will probably never go back to it, once the itch is scratched. Because, at the end of the day, Hacking is hard, risky and often involves swimming against the flow in many ways. This can include threats to comfort, safety and security ( loose a job, legal action, fines, social stigma etc) so these put pressure on some pretty basic motivations. Which means that generally the motivation to hack is dampened by many of the rules that the "system" has been constructed from, otherwise it would not be a system in the first place.

Ok, enough D&M.

Saturday, October 30, 2010

Bad book purchase experience

I decided to experiment with something I had been avoiding for a while. Buying books off Amazon. I have been avoiding it because having done it in the past a few times with very poor results, I had considered the whole experience to have such a low success to failure ratio that it was not worth the effort. Even with the illusion of low prices.

Anyway, I tried it again, on the rationalization that ... perhaps things had changed.
So after placing a couple of orders for some tasty looking reading... I waited. One of the books arrived after just a week. This is equivalent to Amazon implementing teleportation. Suffice to say I was both impressed and anxious to open the package.
Inside was a book and a packing slip with the name of the book I had ordered in very big letters. Naturally the actual book delivered did not match the packing slip. It was instead a 19 year old office administration text book with no residual value that I can detect except to a historian. (Maybe?)

So I complained and asked for the correct book from the seller. Instead I got a prompt refund and a message from the seller saying that they had checked their stock and did not in fact have that book at all and I should keep what they had sent and not bother to send it back. Ya Think!

So I tried to follow the train of though of the bookseller. They get an order through Amazon, and check their stock database? They find an entry for the book ( obviously otherwise they would not have even entered it on to Amazon to sell... I assume) So they print a packing slip and hand it to the picker/packer who was doing the packing. The picker gets a packing slip, wander to the stock shelves and look for the book, which they don't find(or do they)....

This is the point where the thought train goes in one of two directions...

The fist, the picker who has not found the correct book, instead decides to send what is obviously rubbish, packs it and dispatches it. Why? Were they hoping that I would not notice and instead accept the substitute without complaining? (Playing the odds? Seems pretty unlikely) Or instead were they trying to delay something? Perhaps so they can tell Amazon that the item has been dispatched and so keep up some sort of statistic? Either way, this results in their company wasting money(shipping and price of one book) and wasting my time.

The second, is that the picker likes the look of the book they were supposed to send instead sends some crap that would never sell anyway while they steal the actual book for themselves. Again wasting their companies money ( shipping cost and price of the two books) and wasting my time.

Either way the book picker has wasted their companies money and my time. Not factoring in the time and effort of their staff to respond to my email, check the stock rooms, update their database, and feel pissed off that they are going to get bad feedback on Amazon. Essentially their book picker should be fired.

If only they had actually confessed that the book could not be found and had send a kill order back through the system it would have saved time and money for all concerned.

Now I just have to go and bitch about them on Amazon. Ah the little pleasures.

How to fix broken or incomplete torrent downloads

Just taken the time to figure out how to fix torrents that go dead or are incomplete due to running out of seeds or peers with enough pieces to complete them.

The problem
One of my torrents was down to one peer and we both had the same pieces.

My option was to kill the download and try a different torrent or find some more peers with the same torrent hash. The only way to do that was to find some other trackers for this same file hash. Also, I didn't want to leave the old peer hanging, so I needed a solution that would merge a number of torrent files together somehow.

The Solution

1. Hit google and find every other torrent file for that particular file/file set you were downloading. Make sure the hash is the same.
2. Crack each one open in http://torrenteditor.com and check if there are any other trackers that are tracking the same file hash.(Note the piece size as well just for interest sake)
3a. Either add the tracker URL's to your existing download ( you can do this in Vuze. I have no idea about other BT clients) using copy and paste (In Vuze you need to right click on the torrent > Advanced > Trackers/Torrent > Add Tracker URL.
3b. Or use TorrentEditor to create a new mega torrent file containing every tracker you can find, then import it into your client and point it at the directory that you were downloading into. Vuze will re-check the file and pick up where it left off. (Assuming the piece size is the same. If its smaller you may lose a little but if it works finish an otherwise dead dl)
5. Cross your fingers that you have found enough seeds/peers to complete the file.
6. Remember to keep seeding after you have finished to help others.

Nothing like frustration to motivate a solution to an ugly problem.

This should be filed under ideas like bridging trackers and merging torrent files. Hope this helps someone else.

Friday, October 29, 2010

The effect of traffic roundabouts on driver attitudes

Had an idea on the way to work this morning. Traffic roundabouts are training drivers to be more opportunistic and proactive in contrast with traffic lights which train drivers to be passive and rules bound. (Also frustrated due to the lack of control and self determination)

I wonder what effect this has on their satisfaction with life beyond the road?

The perception of control is also interesting, in that generally (on a flat roundabout) you can see everyone and know where you are up to in getting to where you are going. While with traffic lights, you never quite know if you are getting a fair deal.

Moving on...

Profiler for VBA code in Access

I found a nice little stack style Profiler for VBA from a book called Access Cookbook. Its been very handy for identifying and tracking down a couple of memory leaks (My fault as usual) The Profiler is in a module called basProfiler and is discussed in an article called "Create an Execution Time Profiler" which has been reprinted all over the web. I had to fix a small bug in the code that formatted the output to the log file. Other than that it was good to go out of the box.

My usage of basProfiler

1. Import the module.
2. Set the path to the log file to somewhere conveinent. (I put mine into my working directory for subversion to manage)
2. Instrument the code blocks with the opening and closing lines.

Public Sub Whatever()

     acbProPushStack "Whatever" 'profiler

         'code to be profiled here

         '.....

     acbProPopStack 'profiler

End Sub

I open the log file in Notepad++ on the other monitor and every time I run the code, Notepad++ detects that the file has changed on disk and allows me to reload it. This gives me the power of a real text editor rather than a log window to play with the output.

Simple and elegant. Just the way I like my tools.

I considered hacking it to output a comma delimited file for import into excel for analysis but I ended up not needing that sort of power on this job.

And we are done.

Quicksort for VBA array of custom objects

Can you believe that there is no sort function for the collection object in Access VBA? Anyway, I have dumped the collection for a typed array to avoid some of the overhead of type casting variants everywhere. (Long story involving a profiler and a lot of time which finally resulted in finding a different source for the bug that was causing the slow down.)

Back to the main thread. I went looking for a good implementation of a sorting algorithm in VBA. After a couple of absolutly spectacular crappy implementations I found this one.

http://www.java2s.com/Code/VBA-Excel-Access-Word/Data-Type/QuickSort2.htm

Which is short, simple and elegant. In comparison to some of the other multi-page examples I found its really ... beautiful.

Anyway, the first thing I did was refactor out the comparison operation and the swap operation, so I could apply the operation to an array of custom objects without adding clutter to the quicksort sub. By extracting the comparison operation, I can now sort the objects in the array on a variety of properties. If I get bored I will implement both ascending and decending sort but I don't need it right now.

The most painful part of this is not being able to write a template for the code and make type independent. (Anyone mentions VBA variants as being equivalent and they are outa here ...)

Just thought I would share...

Friday, October 8, 2010

Thoughts on this years crop of research student projects.

Thoughts on this years crop of research student projects.

The projects are not unusual. The same sorts of mistakes by the students trying to get up to speed quickly with complex systems and subtle processes that take time and experience to master. Issues like

Not testing their experiments completely before starting to collect data.
Not keeping a log of their experiment activity
Not having any idea about how to process data after its been collected
Having no quality control concepts
Having only the most basic ideas about backup and versioning
Designing experiments with no resilience or risk management concepts
No time management skills
Not being computer literate and running computer based experiments(..words fail me!)

Again I have been asked to develop a wide array of experiment systems, data processing systems and in some cases analysis and visualizations of the final data.

These are all throw-away systems so are not being maintained in any way. This year has been remarkable for the size of the data sets I am being asked to work with. The use of automated data collection systems is allowing researchers and student to collect greater volumes of data. Some of which I would suggest is never going to be used and just adds bulk to the task of transforming the data sets and stresses the tools.

This is again a mistake from ignorance. If the students had done the experiments on paper and then hand processes all their data I think they would be a little more constrained when they next planned an experiment. Is this a good thing though?

While the downside of automation is that an inexperienced researcher can generate huge amounts of worthless data, it also allows an experienced researcher to "Think Bigger" and not feel constrained by the hard lessons they learned as an undergraduate that have limited their scope in the past.

I still have issues with people trying to keep a foot in both camps. Who are using the automated tools to generate massive data sets and then trying to hand processes them. This ends up being the worst of both worlds. There is a project going on at the moment that has gone down this road and is reaching epic levels of manual labor. Essentially all the data is coming from databases, being converted to static reports and then hand entered back into a new database. All without any intentional transformation of the data. And its a huge data set containing more than 1600 items per record. And did I mention they are trying to code it on the fly using an evolving code book so it can kind of end up in an SPSS file like they used to do in the old days... Talk about frustrating. Its going to take months of work, thousands of man hours of labour and introduce all manner of human errors into the data... all because the lead investigator has control issues and cannot grasp the scope of the problems they create for everyone else.

Ahhhhhhhh.

But back to the students. Like most years its been interesting. There have been a constant flow of new problems to solve and an acceptable number of repetitions of previous years problems. Some of them are starting to get a little tedious but I am still learning new tricks so its not a total waste of time.

I am encountering a couple of recurring issues that just add to the misery. One is dependent questions in Surveys. The other is designing projects to manage attrition of participants. I have a paper in mind on the first issue but I have not really decided how I want to try to deal with the second. Its still evolving in my head. Maybe a blog post will let me work out some angles....

Later.

Friday, August 20, 2010

Tools for research students

I keep encountering problems that research students are having that have long been solved in other industries. Nothing new here, its just frustrating to find someone struggling to re-invent the wheel.

My current wish list for research students would be:

Project Management Tool
Basic task and resource tracking, critical path analysis, Gantt charts
Microsoft Project is the simplest and easiest we have accessible.
Single user is fine. Little or no collaboration needed.

Project file management
Subversion with TortoiseSVN are my favorite combination.
Still a little bit complex to explain and use but its the best I have so far.

The other issues I constantly deal with is research students trying to re-invent the wheel on their project processes.

Formulate a hypothesis
Come up with an experiment to try to destroy that hypothesis
Perform the experiment to collect data
Evaluate the results of the experiment against the hypothesis
Publish the results, data and ideas generated

How hard is that conceptually? I get that it takes some repetition to understand and appreciate the subtly of the scientific method, but these are research students. They are supposed to have seen this idea in print at least once.

http://en.wikipedia.org/wiki/Scientific_method

I keep having conversations with students who are doing an experiment to "find" something or "prove" something..... it bothers me. All this being said, I remember as a student how weird it seemed the first time I was confronted with the ideas of hypothesis testing. It seems totally ass about. So I forgive without reservation and try my best to explain the ideas again... but it still bothers me.

I have the sneaking suspicion that I might be getting a little out of touch with my own ignorance. I may have been doing the same thing too long. Its all getting a bit familiar and I am starting to imagine that I am seeing patterns. I think a little bit of fear and uncertainty keeps me grounded. I have the disturbing habit of feeling like I know what I am doing a little too frequently at the moment.

Still there are surprises every day. Its just not the surprises of discovery and success, because I have had all those already, now its just the surprises of violated assumptions and forgotten but important details and meetings.

Moving on. It seems like I didn't have as much to talk about on the subject I started with as I thought. Such is life.

Friday, July 16, 2010

BrainVision Analyzer 2 Workshop on EEG & TMS and EEG & fMRI

Ahhhh, professional development courses.....

This one was held over three days at the QBI at UQ and hosted by JLM Accutek, the Australian distributors for Brain Products.The lecturer was Dr. Ingmar Gutberlet.

The course was very intensive. Three days of technical demonstrations and in depth software tutorial sessions. I'm still digesting everything that we covered. I guess it will only really sink in once I get some serious practice time back home.

Being on campus at UQ has also been something quite thought provoking. Its pretty intimidating to go from a relatively tiny regional campus to one of the G8 campuses. Something of a culture shock. I have probably got just as much to think about from the campus experience and the people I've met as the content of the course.

One thing that does need some comment is the quality of the accommodation. I have to say that for the price we paid, I feel we didn't get value for money.

The room takes some figuring out. The weather is freezing at night because of the river and you need to wake up intermittently and turn on the air conditioner which turns out to sound like a small jet engine. This makes sleeping a bit challenging.
The plumbing is terrible. They're on a water saving kick so someone has gone around and sabotaged the shower with some flow restrictor and a water saving nozzle. The difference between arctic and third degree burns is a very fine line.
And then there's the alarm clock. This consists of a 20 ton excavator tearing up a giant hole right beside the building. Strangely enough the digging starts at about 7am every day and seems to be done for the day about half an hour later. Perhaps its just my persecution complex....

I have to say that I was surprised by the attendance pattern of some of the other attendees. I get that they're busy and have other calls on their time but it seems like such a waste to sign up and show up for only a couple of sessions. Fully half the attendees were AWOL most of the time. Makes you wonder what they were getting out of it that was worth the price.

I think many of the attendees were there to get some practical skills that were applicable to a particular problem they were facing in their work. Perhaps they were just more able to discriminate the sessions that were appropriate for their work. I was a bit of a kid in a candy shop. Everything was good.

Some of the software was a bit rugged; that's the nature of these kinds of systems, half of its a hack and half of its done but lacking polish. Usually its just amazing that it works as well as it does. It's an incredibly complex domain to work with and the market place is both saturated and the customers are non-uniform, so the number of users of most features may be quite low. Makes for a hard business environment and low margins.

The people here are different. I've never before been surrounded by such a bunch of high achievers. This is no bad thing as it has provided a real learning experience. There are so many things I need to work on that are just not getting exercise at Coffs. I understand some of the more traveled staff a little better now.

I've spent the time harvesting ideas from everything. The workshop, the people, the campus, the software, the uni website. Maybe it was just the scary amount of coffee I've been drinking to try to stay awake and the sense of being away from the usual distractions. Now I just need the time to write some of it up before it all turns to smoke.

I need to figure out a good time to leave tomorrow to miss the rush hour traffic. It was insane coming up. I managed to hit the rush about 110km south of Brisbane and was in rush hour traffic for more than an hour at freeway speeds. Not really good when the fatigue is at its maximum.

Back to thinking and catching up on all the work that's been piling up....

Thursday, July 1, 2010

Building a Calibration Wand for a Phasespace Motion Capture System from a Walkingstick

This post is documenting an interesting hardware hack.

The background.

A research project has just landed that involves using our Phasespace motion capture system. Since its been idle for some time, I turned it on to check it out and remind myself how all the bits worked. Obviously it was broken.

So after replacing a video card in the hub computer and figuring out that the batteries in the LED driver units were dead, I finally got the rig up and talking. Then found that the calibration wand was non-functional. Goes without saying really... IGOR rule 101 states "Any equipment left in the proximity of students will be TOUCHED, no matter what you say, how well its locked up or how many signs are erected."

The wand is one of those "damage magnets"! Its just too visually attractive. People are fascinated with it and will ask about it first out of all the equipment. It's just too pretty to live!

Anyway, today's IGOR hack-of-the-day is to build a calibration wand for a Phasespace (http://www.phasespace.com/) motion capture system.

Step 1 Scrounging
Find something to use as the wand shaft. Search the store rooms and the junk pile in my office. Nothing... nothing ... nothing. Almost going to go to metal fabrication and scrounge there when I find an old walking stick that was given to someone as a joke. Perfect. Also it has a bit more chic than a length of plastic pipe or whatever else I might have found.

Step 2 Procurement
Think quick and figure out how to attach a string of Multi-LED's to the stick without destroying the multi-LED's. They have a Velcro backing so all I need is some Velcro and some hot-melt glue. Time for a "Bunnings Run"(TM)

Shopping list
Hot-melt glue gun and reloads
Some Velcro cable holding tape
More cable ties

Step 3 Assemble the stick

Here you can see the walking stick measured up and with pieces of the Velcro tape glued strategically in place. Alternating by 90 degrees around the front of the stick.

Here is a detail of two Velcro pads.

When one of the pads is out of alignment. Rip it off and do it again.

And a final overview of the stick and Velcro assembly.

Step 4 Building the wiring loom
Wire spool
Punch down tool
Punch down connectors

Now measure out the wire. Remember to add a bit of slack between each LED position as they are fiddly to position and you don't want them under any tension. Velcro vs wire will also end one way. Wire wins!
Careful of the punch down tool too. It doesn't really work the way its intended on heavy insulated speaker wire. Mostly it tries to puncture your finger rather than securing the wire.

I use a knife to split the wire strands and then remove some of the insulation to help the punch down connector make a good connection.

Once you have all the connectors on and facing the right way. Put some hot glue in the back of each one to make sure its not going to come off the loom. Let it cool and pull off the hot glue cobwebs.

Step 5 Assembly

Assemble the stick. Lots of cable ties make it look better.

Add a LED driver. Cable ties make everything good.

Now, plug in and turn the whole system on, put it into calibration mode and you can test your wiring. Note how only three of the eight LED's work. Debugging time! Cut off all the cable ties....

Now take it apart again, pull the cable out of the connectors and hot glue, clean the glue off, cut away a little bit more insulation and re-assemble the wiring loom. This time, before you put the glue in each connector, assemble and test using calibration mode again. If the connector still does not work, cut away a little more insulation until you have bare wire and then punch it down into the connector again. When all are working. Glue them up again.

Note the working LED's this time.

I now have a functional Calibration Wand. All I need to do is change the values in the wand.rb file to match the position of the LED's on this wand and I can get the system calibrated. Get out the ruler and begin measuring...

And that's all folks. Pretty straight forward.

Friday, June 4, 2010

Netflix prize paper and iTunes Genius

http://www2.research.att.com/~volinsky/netflix/
http://www.technologyreview.com/blog/guest/25267/

The netflix recommendation engine is an interesting problem. I ran across a mention of this in an article on iTunes Genius.

The iTunes genius system is a simply leveraging a massive data set to appear cleaver. I respect that its taken a lot of work to get it to work but the essential strategy is not particularly special. Its just the effect of the massive data set that allows it to be viable. Its the same as any system that has a huge "memory" and can effectively leverage it to improve its performance.

The netflix problem is similar but its more of an optimization problem. They are still doing the same thing as any recommendation engine in that they are trying to match a product with a consumer.

It would be interesting to try to look at the properties of the product vs the properties that the consumer thought they were looking for vs the properties of previous products that the consumer had consumed and their rating of that product.

This is all based on a classification problem as well. How subjective/objective are the properties that are being discussed?

There is another difference. The magnitude of the experience. A music track ( iTunes problem ) is a couple of minutes of your life; while a movie may be a couple of hours. If you don't like a song, its a fairly small cost to discard it or not even discard it. But a movie that you don't like has a large cost and you will probably avoid it completely in the future, so it generates a much stronger response.

The experiences is also different. Over the course of a two hour movie, the watcher may go through a range of experiences ( especially with a good narrative arc. ) So they may try to report a much more varied response when asked if they liked the movie or not. If you look at some of the film review forums there is a lot of aspects that get discussed. While music tracks are much quicker and get a much simpler discussion ( like or not like ). Anyway, these are just data points at the end of the day.

In summary, the iTunes problem is a simple recommendation engine with fairly simple data points and a large set of sample training data. The netflix problem is two fold, the first is getting a good recommendation engine and the second is getting it to present a result in a reasonable time. The second part is just an optimization problem.

The recommendation engines have two input problems. The first is classification of the properties of the product being recommended. The second is getting useful data from a consumer about what they might like. Its then just a matter of finding all the possible matches and ranking them using some ranking scheme.

Fair enough this is a problem with real scale issues but it can be simplified by splitting the search space in a couple of ways and doing some pre-computing.

The fact that people are so predictable means that you can probably pre-computer a great deal of this and build a set of "stereotype" user profiles and keep them up to date then build an individual profile for each actual user as a function of the nearest "stereotype" with a customized set of deltas to represent their divergence from the stereotype.

It would probably be easy enough at scale to build a hierarchy of stereotypes and move the actual user between more or less specialized stereotypes as their taste changes. Then it simply becomes a matter of searching through the stereotypes for the nearest match rather than doing a comparison of that actual user with each and every film in existence.
All you would need to do is to update the stereotypes as each new film is added to the database. Even if there were a few thousand stereotypes, it would still be nice and cheap to keep it all up to date. Sort of an intermediate processing strategy.

The number of stereotypes would probably be something like the number of permutations of combination of the properties of the product minus the silly and unpopular. The list could probably be simplifying even further by collapsing similar stereotypes for the less popular and increasingly specializing those that are popular. This could then be managed with an evolutionary strategy.

Once the problem starts to be described in terms of entities its possible to play all sorts of social and population games with them.

Thursday, June 3, 2010

Thought exercise on applying Neural Nets used to sort galaxy images

http://www.space.com/businesstechnology/computer-learns-galaxies-100601.html

Article on using a neural net to sort galaxies. Good application of known technology but that's not the point I'm interested in. My interest is how the tool is applied to "help" a human function more effectively.

Imagine the scenario if you can, a human slaving away over a pile of images of galaxies and sorting them into the relevant type piles. No problem except for boredom and scaling. The human can sort them into all the type piles, plus a "weird" pile and maybe a "problem" pile for the ones they are unsure about. Later on have another look at the weird and problem piles, maybe with some friends to help. Finally get them all sorted and start again. Keep in mind that the flow of images never stops.

Now get a computer to do it. Easy enough, but slightly semantically different. Sort all the easy ones, put all the "maybe" ones into a third pile, the "problem" ones into another pile and finally the "weird" ones into another. Pass the weird and problem ones to the humans and have them spend some quality time sorting them out.

The beauty with a neural net is that you can now feed the weird and problem items back in ( with their new classification applied by the human think tank ) as training data and improve the performance of the neural net. This process can occur every time the system finds weird and problem data.
I remember reading someones idea about exceptions in software as being "an opportunity for more processing". If you think of the whole system ( neural net + data + humans ) as a single system then each edge case becomes the opportunity to improve the system.

All in all its a pretty boring job, classifying galaxies based on an image ( I assume there is a lot more to it, so work with my line of through rather than the actuality) but the one thing the job does have is a huge, rich data stream and a fairly straight forward classification problem.

So the question arises, could the computer do a job beyond the capacity of the human classifiers? The whole idea of applying a classification structure to a set of data points is to simplify and apply a human scale structure to the data for some purpose. But what if the software was used instead just to add meta data to the images in much finer granularity than the simple classification scheme used by humans. (This could then be a simplification of the meta data if humans wanted to search for a set of images at some later point)

Taken to its logical conclusion however, this would generate a set of data that was as complex as the original data stream and provided no additional value. (Interesting that "additional value" in this case equates to "simplified") So perhaps this is not actually a classification problem, rather its a search problem. In that the data already exists in the original image/data stream (different wave length images, xray, radio etc of the galaxy) so rather that trying to use the software to add metadata to each image to simplify any future searches, it would be better to have a faster search engine that could look at all the original images in the database and return a set that matched the search parameters without having the additional layer of metadata.

Keep in mind that the meta data is going to be only as accurate as the system (human or NN) that applied it in the first place. All neural nets have some "certainty" or confidence function that essentially means "I am this sure that this image should go in that pile". The implicit inverse of this statement is that the neural net is also "this" sure that the image should NOT go in each of the other possible piles.
And the if the neural net is always being retrained, then it may improve over time and change its ideas about which pile earlier images should have gone into. So the meta data may change and evolve.

The other thing is that the meta data scheme may change. Obviously with computers it just a matter of re-classifying all the ealier work. This is just a factor of applying computing power to the problem. This may or may not be physically or economically viable but is theoretically the solution.

Which gets me back to my earlier point about not bothering with a metadata scheme and just building a database of images and building a better search engine that can work from the raw data rather than from some pre-constructed but potentially flawed index of meta data that may or may not have evolved.

Conceptually neat but may be impractical in reality. This then leads into an argument about how to "optimise" the solution so it becomes practical. Which probably leads back to doing some sort of pre-sort, which then leads to a finer grained sort, which then leads to applying metadata to help the sort, which then leads back to the original point of building a neural net to apply meta data so a big dumb search engine can build and index and return a result in a reasonable amount of time. Circle complete.

We get to a point where its a game of pick-your-compromise. The three corners of this equation are search time, completeness of search, correctness of search.

And the same optimization strategies keep recurring, more computing power, per-processing, constant improvement, partial results, imperfect results etc.

As I said, pick your compromise.

Perhaps, rather than seeing the meta-data as a subset or simplification of the data within the image for search and indexing purposes (and the context it was captured. Time, date, source device blah blah) use the pre-processing to value add to the data. Add data that helps to shape future analysis rather than categorisation.
Look for interesting features and make predictions based on current state of the art knowledge but also do some enrichment of the data set by integrating it with other data sets and make notes on any gaps in the data or aspects that need to be re-examined from a better angle. Aim for completeness.

This becomes another game of impractical activity but is fun none the less.

Imagine being able to split the data on a star into layers and drill down into the spectral frequencies of a particular star, and then find that there is some frequency that has been incompletely documented and have the system automatically schedule some telescope time to re-capture that in a future pass but also learn to capture that aspect for all future images because some researchers are interested in that aspect.

So the system could evolve in response to use. Which raises the issue of data that can be generated from the data set. Do we store that for future re-use or is it more efficient (and less flawed) to discard it and re-generate it when its next needed (based on the assumption that the tool used to re-generate it later will potentially be better and include less flaws and errors). This then becomes merely a factor of available computing power at any point in time. And with the cloud, we can start to do some really big data crunching without the previous compromises. It then becomes a factor of how cleaver the tool and the tool creators are. (parallelism + marshaling + visualization = Data Geek Bliss )

I would be very interested in the size of the neural net they used and some of the other factors, such as number of classification classes and all the other fun details but the study seems to be both unnamed and the only identified source may or may not be involved. (His page shows some similar work)

An issue with all this waffling is the actual quality of the data in the data stream. Its far from being "perfect" as its astronomical images in various wavelengths reaching us across space take with very good but imperfect devices and then sampled into some digital format with additional limitations, artifacts and assumptions. So to build a perfect system based on imperfect data is possibly another case of me over-engineering something.

Such is life.

Wednesday, June 2, 2010

Simplify for sanity

Reduce, simplify, clarify.

This seems to be the theme for my week at the moment. I have been cleaning out and clearing up at home, at work and on the web. Nothing spectacular but its all been lightening the load that I have been dragging around. My task at the moment has been to simplify all my web properties and remove the duplication between them. I am about 50% done so far. Got a couple more sites that need a refresh and some profiles on various other sites that need to be cleansed and I will be up to date.

Probably just in time to do it all again, but its worth doing anyway.

Saturday, May 29, 2010

Search strategies

Ever lost something in your house? Thought you knew where it was but turns out you didn't? When you go looking for it, its just not there. What do you do?

Search nearby? Search in ever widening circles around the spot where it should be? Try to retrace steps? Look in the lost-and-found basket? Ask someone else? Systematically begin searching everywhere? Quarter the house and start a search grid? Do a sampled search of specific areas? Try to apply probability to where it most likely could be? Employ search agents( not your children... really, it doesn't work.)

There are some interesting strategies for searching for an thing in an unknown environment. There are a few ways to try to optimize the search but they are often dependent on properties of either the thing, the environment or the search tool(s). Not always generalizable.

As you might have guessed, I have lost something.