Stratasphere: todo

Showing posts with label todo. Show all posts

Monday, September 10, 2012

Computational Social Science

http://www.nature.com/news/computational-social-science-making-the-links-1.11243

This is an interesting article that touches on a couple of fun items. Good general background piece. Couple of names to chase up on Academia.edu

Friday, June 1, 2012

Wolfram System Modeller

http://blog.wolfram.com/2012/05/23/announcing-wolfram-systemmodeler/

This looks handy for a whole range of research projects. Must have a look in more detail later.

Tuesday, January 24, 2012

Common Crawl

http://www.commoncrawl.org/data/

This looks interesting for doing research on the internet and text analysis. Need to have a better look at some stage.

Monday, January 23, 2012

Article on social friction

http://socialmediacollective.org/2011/11/28/in-defense-of-friction/

There is lots to think about with this article... todo.

Tuesday, March 1, 2011

Risk Managment and Building Experiment systems

I'm busy building a test suit at the moment in E-Prime for a client and it got me thinking about some issues particular to building experimental systems. The issue I'm looking at is that of embedding analysis code inside the experiment module itself. (By experiment module, I mean the software unit/system that runs the experiment and collects the raw data.)

My thoughts on this is that its a "bad thing"(tm) My general attitude is that during the experimental run there should be the least amount of code running as possible. Just enough to present the stimuli and log the result. Analysis can be done afterward. I.e Not in Real Time.

This provides a couple of benefits.

1) There is less potential things to go wrong during the experiment and crash/trash a run.
2) You can re-think the analysis later on. (Some good, some bad)
3) Researchers can't play the black box game and "trust" the software to always be right.
4) You can't fix bugs in real time.
5) It provides separation of functionality. The experimental software is focused on doing one thing right.
6) It spreads the cost of development over time.
7) It allows you to use different tools for different parts of the tool chain.

This all works fine until you have a feedback system that depends on some sort of calculated property based on the results. But that's ok... it just takes a little more testing to verify.

None of this makes a system idiot proof. You can still introduce bugs into an analysis tool chain just as easily as into an experiment module. But when the modules are smaller and simpler its easier to verify each one individually.

I've found a couple of researchers who are of different opinions to me and want the system to do everything in a single all consuming step(Generate, collect and analyze). I think this illustrates a general lack of understanding of process and methodology rather than any particular lack of insight into software or programming. My generalization is that they have used fairly high end systems that did a great deal of hand-holding rather than being used to creating their own toolchain out of lower level units. Neither good nor bad but just a difference in expectations between myself and the client which needs to be managed early and often. It also speaks of a "delegation" mindset. This is fine when there are lots of RA's working on the project and you are delegating to a person that you trust and can fiddle with the process until they get it right. But delegating to a piece of software carries a degree of fragility and untested assumptions.

The other side of this is that having a researcher who is very clear on their analysis before the data collection begins is both interesting and somewhat risky. I like that they are prepared and have clarity on what they want to do with the data, but on the other hand, I find that until you really get a look at the raw data, its always a slight unknown. (Which is the point of experimentation.) So I find that crafting an analysis toolchain that will inhale, clean, process and summarise the data all without the researcher having to look at the raw data... worries me. There are just too many un-tested assumptions hard wired into that process. Too much blind trust.

The other side is obviously just as dangerous. Where the researcher has a vague or non-existent idea of what analysis they want to run on the data and wants to "see" what it looks like before they decide... is usually a bad sign. Sometimes its just that they're a visual learner and can't articulate what is still clear in their head... the other is where they are just wasting everyone's time and really have no idea what they are doing and should get out of the lab until they figure it out or get a job with a private sector public opinion firm. How they get their research proposals past Ethics is a mystery to me....

I guess it all comes down to the issues of the relationship between the researchers and their process. Mostly they are exploring a half grasped idea and improvising as they go. This demands a degree of flexibility from the tools and modularity. Very clean and clear interfaces that don't leak assumptions. Decomposition of the problem rather than excessive composition of functionality. All this with an eye toward mitigating the risks of:

1) Changing ideas and the associated cost to change
2) Change in research focus
3) Unexpected failures in other modules
3) Insufficient testing due to constant evolution
4) Poor time planning
5) Other parallel projects and tasks cause distraction

Unfinished idea

Friday, February 11, 2011

Sci Fi to catch up on

http://en.wikipedia.org/wiki/Hugo_Awards

http://en.wikipedia.org/wiki/Nebula_Awards

To read.

Monday, January 31, 2011

100 books that shaped a century of science

http://www.americanscientist.org/bookshelf/pub/100-or-so-books-that-shaped-a-century-of-science

Some interesting reads here. Surprisingly few that I have actually read. Probably a measure of just how few of these books are on topics that I'm interested in. The one's I have read were generally boring, so perhaps I should look up some of the other ones.... when I have a whole pile of time to burn...

Thursday, January 20, 2011

Information on traffic shaping for local network with bittorrent

http://apenwarr.ca/log/?m=201101#14

Read this again when my head is clear.

Tuesday, November 16, 2010

Tenacious C IDE

http://tenaciousc.com/

This looks like an interesting product. Need to investigate more at a later date.

Tuesday, November 9, 2010

Supervisor pattern

http://vasters.com/clemensv/2010/09/28/Cloud+Architecture+The+SchedulerAgentSupervisor+Pattern.aspx

A heavy post on cloud patterns. Need to re-read this and have a think about it.

Learn Python tuts

http://learnpythonthehardway.org/index

Something for my endless supply of free time.

Irony Language Implementer

http://irony.codeplex.com/

This looks useful... but not right now. Could be used for implementing domain specific languages that leverage the .NET runtime. May be a good solution to create an embedded language for inclusion in experiment generations tools.

Wednesday, November 3, 2010

Fastflow Parallel Programming framework

http://calvados.di.unipi.it/dokuwiki/doku.php?id=ffnamespace:about

This looks very interesting. When I suddenly come across a bottle of "free Time" I might just have a drink and a think...

Thursday, May 13, 2010

Idea Notes - Markup Language for soundtracks for video and movies

The problem. Searchable content in a video stream which is currently hard to index. This also has accessibility issues for people with language, hearing, vision differences etc.

The actual content stream could be dialog ( the script track ), sound effects ( both salient and ambient ), characters activities ( salient activities and ambient ) set & location information and finally camera framing, shot length etc. You could also make notes on colour palate, lighting, effects ( slomo, fast speed, cutting, montage etc)

There is a lot that could be borrowed from an animation directors work notes I guess. The point being that with a common and open language specification it would be possible to reverse-engineer any peices of video and apply this meta data to it. This would be useful for film restoration as well as feeding a whole slew of useful data into search engines.

All it needs is a catchy name. Something like OpenVideoDescriptionLanguage (OVDL). Or eXtensibleVideoDescriptionLanguage. (XVDL). Every one likes four letter acronymns....