Stratasphere: August 2012

Monday, August 27, 2012

Platform Churn...

http://channel9.msdn.com/Blogs/Vector/Platform-Hygiene

This is another post that captures (indirectly) some of the complexity of the platform choices that devs need to try to make.

I have been thinking about this a bit recently, that the life span of the platforms is shortening and the uncertainty is increasing. If you assume there is some sort of fairly fixed start-up cost for a developer to get on board a new platform, and to master it, then the useful lifespan of a platform from one of the large enterprises looks pretty horrible.

My sense is that its only the open platforms with big community buy in (C++, Node.js, Mono, OpenGL etc) , built over other open or flexible platforms (HTML, HTTP etc) that have any sense of longevity.

Corporate platforms that require large investments and have similarly large political overheads can be canned simply because the corp changed direction, or the politics shifted. The point being that it takes only a few people to effectively kill the platform and the collective investment by the community. This has to be seen as a giant risk to anyone thinking of committing to one of these monolithic beasts. (Is there a word to describe single ownership ... I keep thinking "single-point-of-failure", "fragile", but they don't encapsulate the concept cleanly)

While trying to describe the situation I started listing some of the life-span issues:

* Shorter life-span
* Higher Uncertainty
* Greater evolution through their lifespan
* Less open and clear communication
* Abrupt terminations

The lost flow on effects are:

* Documentation and book support
* Best practice and body-of-knowledge growth
* User Community and Ecosystem curation and growth
* Third party integration
* Third party extension
* In-depth security research and testing

The biggest issues that I see is the fatigue in the whole developer sphere. Committing to a platform is a multi-year investment by a developer. To commit, build experience and knowledge and then to have the platform die, can be traumatic in terms of business, job, product, but also in terms of leaving a hole in your CV, wasting all the "personal" time and attention that you may have invested and robbing you of the cumulative benefit of growing into and with the platform.

I see the net effect of robbing the total developer community of energy and resources. There are only so many developers coming on-stream every year and for major products to bloom and die rapidly tears great big holes in the community. Think of the millions of man hours of labour that have been poured into something like Silverlight. All that investment, mostly not carried by Microsoft, but by independent businesses, individuals, students, courseware developers, book publishers, etc. Its all now basically been made irrelevant and trashed. Not because Silverlight has been officially executed.. but its looks like its on life support. Who in their right mind would start a new career focusing on silverlight at the moment?

The same problem occurs around all the fad languages. Especially those with closed business models (Delphi?) or have a single flagship developer who "owns" the language and its future.

I think that having a couple of these platform die-offs on your CV would pretty much end you. If you assume 2-4 year commitment for a platform, and that your earnings suck when you are starting up on a new platform and suck as the platform is dying... then you may only see 1 out of 4 years where you are getting paid the market rate. (Assuming you are fully employed through that period...its my game, so my assumptions. I'm making a point, not telling a story.) That's pretty horrible odds.

So whats the answer? Avoid proprietary platforms? Wait for one to mature? Let others take the risk? Follow the crowd? Ride the pretty Unicorn?

It's getting harder as the walled gardens keep expanding. Facebook, Apple, Google, Microsoft all control massive portions of the market place. Every single one of them is using platform control to try to progress their political agendas. The problem is that they are all trying to out evolve each other at the moment and thus churning their ecosystems quite badly. I would expect there to be increasing churn, simply because none are able to dominate the space and achieve stability.

In the worst case scenario, dev shops can simply rent talent in whatever platform is currently hot then dump the staff when the platform goes cold. Individual developers are the ones who have to carry the cost of re-tooling and learning a new platform before they are employable again.
In the best case scenario, the dev shop re-trains the staff on the new platform and carries or shares the cost of acquiring the new skill set and experience.
In neither of these cases does the cost fall on the enterprise who is responsible for the platform or the killing/uncertainty around that platform.
The platform owner is only interested in launching the platform and getting as many people "on-board" with the platform to make it a commercial/marketing success (or whatever the political objective of the investment is) Once the platform reaches a self sustaining level ( it has acquired enough support from the user/dev community to achieve its goals and is recruiting enough new talent to replace any losses) then the enterprise essentially is along for the ride. Theoretically the platform ( given no changes in the environment) would continue indefinitely. However, we all know that these things live and die based on the environment.

There is rationale for killing a platform. However, there seems to be little effort spent in trying to transition a community to a replacement platform. Microsoft seem to be most successful at providing pathways from one to another. (I use the term very gently... as success in its truest sense would suggest something more ...successful) They do usually provide some pathway ( or at least a suggestion in a poorly linked blog post about where the devs can go and what they should do to themselves once they get there...

In the end, I have no particular insight. The fact that there is platform churn at the moment is obvious. The fact that most of the cost of this churn is born directly by individual developers is also obvious. The fact that the link from where the cost is born to the where the decisions are made about the platforms life cycle is why the system is particularly unresponsive.

Can this be fixed by people whinging on blogs or forums when their favorite platform gets the axe? I doubt it, because the developer community is taken for granted. Any company that creates a platform expects developers to eventually colonise it, simply because its a niche to be exploited. If there is no value to be had... no one will come.

I would predict that in the future the rate of platform emergence will escalate, the rate of platform death will similarly escalate, simply because more and more people want to own a platform. In the middle of the pond, the big platforms will continue to life longer, slower lives, but still they will evolve more quickly to compete with all the flash-in-the-pan competitors that emerge. If one of the big platforms makes a miss-step or the environment changes quickly, we may see a spectacular flare-out ( see napster or myspace etc for an example) but on the whole, betting on the bigger platforms will be a more secure way to survive as a dev.

Thursday, August 23, 2012

Algorithmic Colour Article..

http://devmag.org.za/2012/07/29/how-to-choose-colours-procedurally-algorithms/

This is a really well developed article on selecting colour algorithmically. Very useful for GUI and game design applications as well as explaining some of the failures.

The variable that is always a problem in these kinds of systems is the final output and the eyeballs that receive it. The randomness of colour output on all the monitors around here alwasy frustrates me. No two are the same (Except perhaps the Dell Ultrasharps.) Everything else is just... random. bad backlights, crappy ghosting, motion artifacts, dust, flakey illumination, fingerprints.... endless pain when trying to acheive "sameness" between research stations.

Wednesday, August 22, 2012

Change'd Management...

I went to a meeting with the campus IT steering committe a few days ago. This is a regular quarterly gig where the IT stakeholders from the various sector partners get together and run through whats happening, where its up to, etc. It also serves to be a forum where we can get together and look over each others fences to see whats happening. The discussions are generall technical and consist of topics like router loading, authentication and WLAN configuration issues. Fun stuff like that. This has tended to attract people with an interest and role in IT.

Previously the meetings have been chaired by a person who had a good facilitory manner and showed interest in the subject matter. People talked at a level and it was collegial. Meetings moved along but the material was treated well and the stakeholders had their say in their own time.

We had a new chair person this time.

He spent 10 minuets talking shit and making jokes about people being late, talked shit about expensive cars that he could afford but no one else on the panel will ever get paid enough to buy, blew off most of the agenda, talked his way around listening to the operations report, asked if there were any questions and basically bolted. You know a meeting has gone well when all the people around the table have that pursed mouth look; like they're pissed off but don't have anything specific to say. Just a little bit stunned. Kind of like their assumptions have been roughly violated!

The fact that I had a report to present that didn't even get mentioned, that I had been working on for the past 3 months didn't actually bother me that much... but it was surreal. It was like a script from "The IT Crowd".

I felt a complete lack of respect for the people at the meeting, for the purpose of the meeting and the content matter. I accept that the people who did get a chance to speak may not be the most dynamic talkers. But they still matter. The take away impression is that nothing we do is worth mentioning, our time and preparation is of little consequence and why did we bother showing up?

I can respect the position that if there is nothing to say in a meeting... why have it? But on the other hand... making sure no one else has a chance to say anything is different from there being nothing to say.

I think we have a serious lack of cultural understanding at play. Is "IT" a culture...?

Monday, August 20, 2012

Some bedtime reading

Explanation in Agent-Based Modelling: Functions, Causality or Mechanisms?
http://jasss.soc.surrey.ac.uk/15/3/1.html

Wednesday, August 15, 2012

Debugging errors with playing movies in E Prime 2.0.10.etc

Ok, set up the experiment and test on my main dev machine. Movies work fine. using MOV files in this case.

Move the experiment and the movie files to laptop... failes miserably due to video codec.

Install quicktime... fails.
Install VLC... fails.
Install K-lite... plays. (now using ffdshow)
The experiment runs like a dog. (No hardware acceleration)

Move to a new machine.... test the exp. Fails miserably due to video codec.

Install K-lite... fails.. but differently.
Install DirectX.... better but new failure. (Unable to load sound 203...)
Install VLC and Quicktime just to cover my bases... still fails but whinging about audio now. Tells me to check AudioLoadReport in Experiment folder. (They are actually in the same folder as the stimuli videos.. which is not the same as the experiment folder... but close enough. (BUG anyone at PST who cares))

The AudioLoadReport is complete garbage. It's XML, which I could deal with.. but it does not actually say anything useful.

Interestingly, of the 60 MOV files in the stimuli set, 13 have generated AudioLoadReport.xml and VideoLoadReport.xml files... but the rest have not. This is common on all three computers. I have had a look at all the video files in GSpot and MediaInfo and they use the same Codec's for both Audio and Video (Vanilla Mov file with MPEG4, AVC and PCM audio.)

Weirdly inconsistent.

I'm now trying to update everything on box 3 to see if it will magically come good. I'm at a bit of a loss at the moment as to why it is having trouble. Quicktime, VLC, Media Player Classic and Windows Media Player can all play the files correctly. There is some weird filter graph issue that shows up in an older copy of GSpot...so I think the issue is with my DirectX setup on that machine.

The investigation continues.

Ran CodecConfig. Still no love.

Later...

I used GraphStudio from K-lite to have a look at the filter graph and found the audio pin was not rendering to anything from the Apple Media Splitter filter. This seems to be due to the Mov files having PCM audio rather than AAC which is more usual in Mov files. I then tried using the Haali media splitter and hooking the audio pin to the DirectSound output filter... Success! So all I had to do was then swap the prefered filter for Mov files from Gabest Splitter to Haali using the "Manage prefered DirectShow source Filters" tool in K-lite and the experiment played flawlessly.

Interestingly on my main dev machine the Gabest Splitter filter is not installed. It has the LAV Splitter as the primary splitter for Mov and it works perfectly. Download LAV Splitter here

Crusing for Registry Interface Classes for use in win32 C++

I am looking at drragging some old c-ish win32 class that handles my registry IO forward into the current century. My option is either to re-write it, test, debug..etc or find some code that has the job done already.

My criteria would be:

STL aware ( at least for reasonable string type handling )
Fairly modern C++
Not include dependencies for stuff I don't use/want/have (MFC, Managed, .NET, ATL, etc)
Sane Exception design.
Unicode aware or at least cleanly written.
Secure use of std c functions. (_s variants)

My survey of the options is:

http://www.codeproject.com/Articles/14508/Registry-Manipulation-Using-NT-Native-APIs
This is much more functionally complete but needs a complete re-write. While interesting in that it uses the NTAPI calls. The Unicode handling is a mess and needs a re-write along with the general structure and error handling philosophy.
No maintenance since 2006

http://www.codeproject.com/Articles/19/Registry-Class
This is fairly rudimentary and uses the Win32API calls. (RegCreateKey etc)
No maintenance since 1999
http://www.codeproject.com/Articles/2521/Another-registry-class
Looks like early Windows Classes Style of code.
No maintenance since 2003

http://www.codeproject.com/Articles/2916/CRegSettings-registry-helper-class
Written in a kind of C meets MFC style. Very Quick and dirty.
No maintenance since 2002

http://www.codeproject.com/Articles/6676/Access-Registry-Settings-Declaratively
Written in old school MFC/ATL style Macro madness. Nice work back in the day but a pain to maintain or use going forward.
No maintenance since 2004.

http://www.codeproject.com/Articles/10729/A-nother-C-Registry-Wrapper
Written in fairly modern c++. STL aware. Unit tests of some description. All Win32API, fairly simple design.
No maintenance since 2005

http://www.codeproject.com/Articles/936/Stream-like-operations-for-the-registry
This is an interesing style. Seems to be fairly simple in terms of functionality. Written in older Win32 style with error codes and error flags rather than exceptions. Basic use of templates.
No maintenance since 2001

http://www.codeproject.com/Articles/3611/Simple-Stack-Based-Wrapper-for-Windows-and-XML-Reg
This looks nice.
No maintenance since 2004

http://www.codeproject.com/Articles/11913/An-AES-Encrypting-Registry-Class
Registry wrapper with encryption. Interesting but overkill for my needs.
No maintenance since 2007

http://www.codeproject.com/Articles/345/Registry-API-Wrapper
Yet another "simple" wrapper for the Registry. Pre STL and very old school. Not Unicode compliant.
No maintenance since 2000

http://www.codeproject.com/Articles/1031/A-Registry-Class
A bit old school but well written.
Some comments posting fixes in 2009 for Unicode issues.

http://www.codeproject.com/Articles/1108/CPJRegistry-2-0
MFC audience.
No maintenance since 2001

http://www.codeproject.com/Articles/8708/S-I-V-Simple-registry-config-class
Another "simple" class
No maintenance since 2004

http://65.39.148.52/Articles/8953/Registry-Wrapper-Class-CRegistry
Looks a bit MFC style. Some bugs in the comments.
No maintenance since 2005.

http://65.39.148.52/Articles/2535/A-set-of-template-classes-for-working-with-the-reg
Uses templates so there is some hope. A bit dated as far as functionality goes.
No maintenance since 2002

http://65.39.148.52/Articles/1803/CRegisterEx-a-registry-wrapper-class
Looks like a simple read/write wrapper.
No maintenance since 2002.

http://65.39.148.52/Articles/10046/Registry-Wrapper-to-save-your-Application-settings
Another simple read/write wrapper.
No maintenance since 2005.

http://code.google.com/p/cregistry/
No information that I can find about this.

http://www.engr.sjsu.edu/wbarrett/registry.htm
Very basic functionality. Looks like write only.

http://www.codeguru.com/cpp/w-p/win32/article.php/c1433/Registry-API-Wrapper.htm
No maintenance since 1998.

http://65.39.148.52/Articles/343664/Template-Cplusplus-wrapper-for-Windows-registry
Looks nice, smells nice. Nice.
Posted 2012.

I think my search is over. This is current generation code and frankly there is nothing better out there that I can turn up in this time. I'm done.

So you formatted your hard drive... oops.

Here is the scenario:

You have a huge external USB hard drive (2T or so) full of who knows what... but generally things you want to play through your TV.
You see a USB port on the side of the TV. You plug in the hard drive.
The TV then formats the hard drive to a RAW format for you. (Yes there may have been some message that might have explained that in cryptic terms or expected you to have read the manual before doing this reasonable act. No one remembers the message or cares...)

Result... lots of crying, then finding your favorite IT guy(non gender specific) and asking them if they can fix it.

Favorite IT guy guesses pretty much what has happened. (Shit message from TV which clearly did not articulate the consequences. Shit Interface on the TV and generally Shit TV... did I mention I hate TV's? They work... but in their own little fantasy land of propietary unicorns with non standard horns and cryptic software goblins who are clearly not customer focused. The first phrase they learn after Saruman pops them out of the birth goo is "customer lock-in". They can all go and suck my troll. )

Anyway, when confronted with a drive that has been partially or completely formatted as RAW... what does one do?

One gets a copy of Zero Assumptions Data Recovery, One sets aside a couple of days to scan the drive and find all the lost files ( and everything else that has been deleted..ever) then one copies the found data to another drive big enough to hold it all. (670G in this case) which took about another 2 days as it was between two large USB drives. Then format the old drive back to Fat32 using GPartEd. Then One copies all the found files back to the old drive... again across USB so again taking acouple of days. One then hands the fixed drive back to the owner... has "the talk" about not doing it again, and returns to what they were previously doing.... a week and a half later.

The owner then spends a few days deleting everything they had previously deleted and trying to figure out the folder names and hierarchy they previously had.

Job done.

Wednesday, August 8, 2012

Unicode Endgame

First step

Sort out the literal strings in the code.

Debugging Text intended to be dumped to debug log files

This is used for trace messages. Dumping error codes and dumping stack traces. None of which the user will/should (probably not) see.

Debugging messages that will be compiled out could stay as simple char *. Or equally they could all get pulled up to a common standard as a std::wstring.

My feel is that pulling it all up makes life simpler. But there should not be any cross over between this kind of string and user messages. Perhaps seperation of types enforces that assumption.

Exception Messages

There is a slew of exceptions that pick up explanitory strings. There are even just char strings being thrown. This has to stop. There is actually an exception hierarchy somewhere. Must dust it off and implement it completely.

But what sort of text is appropriate?

Mostly the text is in two distict classes. User errors to talk to the user about ( which goes out via the ExceptionManager Interface) and debugging text. Much of which is consumed or dumpted to the debug log.

More Definitions for Bits of Unicode

Code Point http://en.wikipedia.org/wiki/Code_point

Code Unit http://en.wikipedia.org/wiki/Code_unit (Titled "Character Encoding")

Code Page http://en.wikipedia.org/wiki/Code_page

Character - Vague Idea. See Glyph.

Character Encoding - http://en.wikipedia.org/wiki/Code_unit

Glyph http://en.wikipedia.org/wiki/Glyph

Grapheme & Grapheme Cluster http://useless-factor.blogspot.com.au/2007/08/unicode-implementers-guide-part-4.html

So my current question is about the prefered character encoding for use internally in my app.

http://en.wikipedia.org/wiki/UTF-8
http://unicode.org/notes/tn12/ Advocates for UTF-16 - Essentially says ... everyone else is doing it so you should too.
http://programmers.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful
Advocates for UTF-8 - Essentially says... crap decisions where made... don't do it as well.

http://utf8everywhere.org/
This answers the question again and extends the above couple of items with more clarity and specifics. Seems to be the same or similar author. Best collection of specifics I have found anywhere so far. I like!

The End Result of my days of Unicode research???

Store the text internally as UTF-8. Handle most of the stuff as ASCII anyway. Turn on _UNICODE flag to deal with the transparent swapping in the Win32 API now that my hand has been forced. Only transcode to wide characters where needed for compatability with the API's in use. Try like hell not to need to process text.

At least this has forced me to deal with this issue explicitly and understand some of the subtle bugs that were lurking. As usual... ignorance is no protection.

Boost Locale for formatting is pretty heavy duty
http://www.boost.org/doc/libs/1_50_0/libs/locale/doc/html/main.html

Boost (Maybe) Nowide is a much simpler solution (you will still need Boost.Locale installed)
http://cppcms.com/files/nowide/html/

Its trivial to use and "Just Works"(tm).

So my Unicode awareness strategy has been:

1) Turn on UNICODE and _UNICODE build flags.
2) Wrap all literals in one of two Macros

#define _UTF8(x) x
#define _UTF16(x) L##x

These replace the _T(), TEXT() etc variants that litter various bits of code in the codebase. There are a couple of places that I have explicitly left these alone. This is where I am including the source from someone elses work and its included "as-is".

The only other variation is at the interface to XERCES. There is a lot of text handling already wrapped around this and I need to get my head clear before I simplify all that. There is a messy transcoding class with an
X() macro that seems to trancode between literals and the XERCES XChar * type. Which I am guessing is probably a UTF16 wide type used internally in XERCES. I have just not gotten to this bit yet.

3) Expliclty pick the Win32 API functions that are being used. So rather than "DrawText" which plays swapsies when _UNICODE is turned on. I have expliclty used "DrawTextW" in the code and used boost::nowide::widen() to pull my internal UTF8 strings up to UTF16 at the API call sites.

This forces the compiler to find all the locaitons where I am passing std::string to an old API call which I can then address (see below).

4) Naming variables with UTF8 or UTF16 as part of the name to describe what it logically holds. I know that hungarian coding is dead.. blah blah. But this is about what is logically stored in the variable not about its type. This is simply a transitional technique to force me to consider and explicitly recognise whats going on with the logical content of the variables. There is too much code for me to physically eyeball and think about everything, so I need to force the compiler to play on my team.

So the code starts to look like this:

//Type defs in a header...
typedef t_utf8_str std::string;
typedef t_utf16_str std::wstring;

//some literal in the code flow...
t_utf8_str myLiteralUTF8("Tada!");

//Use the literal for various stuff (note the UTF8 aware cout from the nowide lib)
boost::nowide::cout << myLiteral;

//Transcode it only at the interface with the Win32API
t_utf16_str myLiteralUTF16 = boost::nowide::widen(myLiteralUTF8);

//Use and discard it.
HRESULT hr = SomeAPICallW(myLiteralUTF16.c_str(), etc, etc);

This way, there is no accidental uses of wide strings without expliclty knowing that they contain what I think they contain.

I build a couple of regex searches to troll the source and find all the quoted strings and wrap them in the _UTF8() macro. This forced a bunch more implicit conversions to get picked up and I could then expliclty handle them.

The search regex (in Visual Studio find replace dialog) is

~((\#include)|(_UTF8)|(_UTF16)){[\(\,:b]}{:q}

The Replace regex is

\1_UTF8(\2)

This simply ignores the #include "someheader.h", any _UTF("someLiteral") or _UTF16("Some other literal") that may be in the source. This allowed me to step through the source and wrap all the literals easily.

Tuesday, August 7, 2012

Bloody Unicode

Oh how to describe my pain? Is it like a summers day or... fuck me! Lyrical is just not where my head is at right now. The following is an exercise in getting my head straight.

I am, yet again, investigating upgrading a creaking old code base to handle Unicode.

Why, you might ask would I do this stupid thing? Wellll... simply because I have tried to ignore, hide from or otherwise avoid this issue for long enough and have finally needed to use a library that is is forcing my hand by requreing a Unicode build. This has caused a cascade of shit of epic proportions.

Don't get me wrong... I think I could avoid this a while longer by simply avoiding this library and going with some second rate half-assed replacement.... but in reality that is just throwing more effort into a worsening situation...that I know will come back to bite me... just with bigger teeth.

So do I avoid some technical debit by adding more bad code... or do I confront the deamon and do the nasty? Add to this equation a very limited time span to get it all done..

So, firstly, I google for the current state of play in "Unicode Solutions" and am dismally disapointed to find its about the same as last time I looked ( probably a year or two ago) a few half applicable tutorial -article things and a couple of libraries with abysmal documentation. The general wisdom of the herd on stackoverflow or other centers of excerlence is pretty thin....

So, lets start with a problem description:

Problem Description

Unicode is simply a different way of handling character data. Mmmk! Your code is a mess of single width character handling shit from about the past 6 generations of coding styles. MmmmK!

A + B = Problem!

Well.... after looking through the docs and a few battle stories... my conclusion is that I am a little bit fucked. (Mostly because of the short time to fix... not the actual difficulty)
Just for fun I turn on the Unicode build and see what happens... watch those errors fly... about a 1000 give or take. (I hope there is lots of repetition in there...) This is mildly embarassing... but I will probably get over it. Yep. Over it.

Problem Analysis

The code base contains a spicy stew of Win32 C routines, C++ with all manner of string and naked char * handling, along with much use of STL bits and peices. Toss in some _T Macros here and there and some templates and its about as messy as can be imagined. Its also been stiched together from all sorts of downloaded code with a mess of styles and pushed out without sufficient resources to clean it all up and get some consistency. Just your average rotting old code base.

About the only thing that seems to be missing is the use of MBCS or other third party string libraries... I'm probably just not looking hard enough yet.

Error Types

From my first cursory look... it looks like lots of type casts and failures to handle types in function definitions.

The ugly stuff seems to be in the interfaces to libraries where they have hard coded type assumption in the interfaces. There is a mix of shit there from single byte char * style c-strings to STL style std::string (pointers and refs) through to Win32 style cstrings and the spew of pointer types to similar constructs that Microsoft eternally tries to baffle us with.

There is a mess of errors generated by the use of STL exception classes that cannot handle wide characters. (How weak is that?)

Another pile is just where I have hard coded a type rather than used my own type definition. Should be easy to fix.

There is another big pile where I interface with the XERCES XML parser and XSD which should be sorted out once I get the compiler flags untangled.

Time for some thinking and planning...

Logically, the actual app should only really have three types of problem.

1) Input data (Command Line, Form Fields, read from files or read from port stream)

2) Internal representation (Exceptions text, Code Literals and Constants, data in play)

3) Outputable strings ( Screen messages, written to data, written to ports)

So the question is really:
How to handle these cases elegantly...
How to pull all the crap code up to standard...
and finally how to do it quickly.

Input Data

Most of this stuff is user data. Like in any app... never trust the user to do something nice. Assume the worst, parse and clean... then reject everything out of hand that doesn't suit my assumptions. DO NOT BE KIND TO THE USER. Don't help them. Don't quietly fix their mistakes and finally do not fix their assumptions. Report all errors in mind numbing detail. Make them fix it.

But on to the real problem. Command line data arrives from the OS as a char * with a null terminator. Could it contain Unicode characters? Don't know. googled to Yes... but with some issues on Windows ( and various other freaky issues on various other platforms... never saw that coming...)
OK, so I could get Unicode characters... presented as a char *? WTF? OK, Use GetCommandLineW rather than picking up the c string from the params field. (Should I use it explicitly or use GetCommandLine and use the _UNICODE compiler macro to make the switch...hmmmm)

See...
http://stackoverflow.com/questions/7660651/passing-command-line-unicode-argument-to-java-code/#9043883
http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683156%28v=vs.85%29.aspx

Form Fields are not a problem at the moment because I'm not dealing with the GUI.. per say. Although I am dealing with keyboard input... but as keystrokes without caring much about the character... then again...

Reading Unicode from Files

Ok, so a file is stored as ANSI text characters or is it? Go educate yourself... or confuse yourself even more.
http://www.codeproject.com/Articles/12759/Reading-or-writing-a-line-from-to-an-ANSI-Unicode

Clearly I am not yet on top of this subject... so lets start with some general definitions.

Reading Unicode from a Port Stream

When I say a port stream, what do I mean? Keyboard data? Parallel Port characters? Serial Port Packets? IP Network packets?

Keyboard Data - This is not Unicode. Its keyboard scan codes. Does it represent Unicode in the typists head? Possible. Most of this crap is handled by the OS and GUI for forms apps. But for low level game code... its all you baby!

Parallel Port Data - This is barely a character, let alone an encoded character. Think of it as flag data. It still has the possibility to be interpreted as something higher level... but if you are trying to pass encoded characters bi-directionally via a Parallel Port... you probably have bigger problems and can handle sequential parsing and encoding without breaking a sweat. I only use Parallel Ports for passing event flags, so I doubt this will be a problem any time soon. (How many times have I heard the echo of those words come back to slap me in the head....)

Serial Port Data - Currently, not using the Serial Port in this project but its been talked about. Since its simply a binary stream, its more a problem of coordinating the transport layer. Once you have the stream moving happily then you can agree on th encoding that the bits represent. Again, similar to the Parallel Port... if you are doing this sort of thing.. then how the data in encoded is fairly trivial. Just pack the packets, pass the stream...catch the stream in the buffer then unpack the packet.... At no point should you need to deal with many "unknowns" There is little chance of a User trying to pass you a poorly formed packet. If so, reject it all, complain loudly and make them fix it.

IP Network Data - Well hell... it could be anything. The point being that within the context of my app... its probably going to be structured event data with very little text content. So ... I will probably treat it as above. Buffer it, unpack the packet. Then put any text data into a wide string and handle it as requred. (By passing it to the log files or data files)

WTF is Unicode again?

Even after all the books, websites and tutorials I've read... (I know the real definitions, it's the definitions that are in-use by the rest of the world that's making the mess)

Unicode - is a fucking mess created by people trying to sort out a bigger mess, grafted on top of the many different messes created by ANSI centric language and library designers for the past 50~ish years. Then lots of half literate bastards have written poorly worded articles, tutorials, books, docs, libraries and operating systems which use the term loosely and thus add to the scale of the mess exponentially.

Unicode data - data stored in various incompatable formats called variously ANSI, UTF-8 UTF-16, UTF-32, Unicode (meainging UTF-16 Little Endian), Unicode Big Endian (Meaning UTF-16 Big Endian) But in reality its just data... its how its used that really makes it Unicode or not.

Unicode string - some sort of internal software construct containing (in the programmers head) data that through some mechanism may turn out to be related in some fashion to Unicode application, system, files, books, triple box set or drinking game. In reality, its just data until its "interpreted" by something or shown to someone.

Hmmm.... I feel like there is some clarity.

In reality, there are only three problems in my app. The first is to detect and handle Unicode at all Input sites. The second is to deal with an internal represeantation of data without messing up its potential Unicode structure. Finally, displaying Unicode where appropriate in all its glory.

The third part is in reality the ugly issue. My app is hard coded with the assumption of Left to Right directionality in all sorts of places. Since I'm manually building most of the screens, there is no help from the OS with anything like a GUI. If I want to deal with Right to Left directionality in any way gracefully, it will probably mean a mess of work. Which in reality may be irrelevant as my customer base is Academics and is probably mostly english literate... whether they like it or not. Sooooo.... is this really a case that is worth any of my time to deal with?

Can I just deal with it by wrapping it up in an object and deal with it later when someone complains?

Should I ignore it completely? Hell even my output files are fundamentally left justified. Just about everything assumes Left to Right.

More thinking required....

Mostly, I think I can deal with the input and the storage of Unicode text. I can probably deal with output of any Unicode text if its Left to Right without much change. (Assuming font choices do not make a messs of anything) but dealing with Right to Left layout is just going to make a complete mess of everything. Every screen will need to be individually re-thought, laid out and then tested to make sure it works correctly. Then we start to deal with all the issues of line break rules, wrapping etc.

All this without answering the question about if I even care.

My best guess is that the bulk of my client base will be able to deal with english either gracefully or not. This is based on my email list... which while not all English speaking countries... have generally been literate in English. Also that equivilent systems are primarily availible in English. This is not to say that this is "Right", just that most of the other system developers have taken the easy options and stayed with the european languages of English, French, Spanish etc.

So is it just that my product has defined the clients or that the clients will define the product. I have has some interest from China, so I would expect to have to deal with other language sets sometime soon. They are Left to Right arn't they? Educate thy self...

http://en.wikipedia.org/wiki/CJK_characters
http://en.wikipedia.org/wiki/Han_unification

Nope. Top to bottom... but can be written left to right... thankfully.

Ok, so while its a politically charged situation... from my point of view it will still be some form of string that shows up (due to the OS forcing the client to use Unicode) so the semantic issues are not my problem.

Where does that leave me?

Latin, Greek and Cyrilic Based Writing systems - Not a problem. See http://en.wikipedia.org/wiki/Cyrillic_script and http://en.wikipedia.org/wiki/Greek_alphabet
Chineese, Japanese, Korean, Vietnamese - (Top to bottom but can be left to right)Not too much of a problem.
Arabic, Hebrew - (Right to Left) - Big problem. (others here http://en.wikipedia.org/wiki/Right-to-left ) (Can I solve this simply by ignoring it and making my users
Bi-directional Text (BiDi) - Who would do this? What sort of insane person tries to build systems to mush all this crap together and then builds a sub system within it to further mangle everything so different frankenstein bits can live in the same freak'n phrase. (Yes I understand why they would do this ... in the cold light of day... but then to actually implement a system around this crap.... this is the insano bit)

So can I support all these different systems or am I way down the rabbit hole already. In reality, most of these systems are irrelevant to my problem space. Simply because any text the user feeds in will be used literally, so it doesn't matter. The only bit that does matter is where that text is used to issue commands to the system. In this case, I can only accept commands in English. Simply because currently that's the only language I can test.

The explanation of those commands can be delivered in any langauge.. but thats a documentation issue.

Hmmm... All the XML in the script files in essentially in english.... There is no way I can translate all that and support 100 different language variants of the file format... that is just stupidity on a grand scale.

So, to deal with that... translate all the GUI text for the editor and force any non-english speakers to only use the GUI for editing. Unless they want to figure out the equivilent artifact in English within the file format... yuck. But thats about the only reasonable option I think I have.

Is it too late to ask for the Red pill?

Fonts, Locales and OS Language Settings

Do I even want to go here? How much of this pain can I just chuck back in the Users lap?

I need coffee...

So in summary, there is a mess of stuff to do. Some of it has to do with my code being littered with bad habits. The rest is to do with all the habitual assumptions... which have turned out to be very english centric and are in effect not portable and thus... bad.

My ToDo list is something like this:

1) Extract all the text for errors, exceptions and user messages so it could be internationalised if required.
2) Handle input text in a Unicode aware way.
3) Store text internally as Unicode characters. Use only wide aware function to manipulate it.
4) Output text as Unicode characters to the screen. (Including all error messages, dialogs and log files)
5) Put translation of the docs on the todo list so it can conceptually be done later.
6) Figure out how to test all this shit.
7) Actually test it.

I think, in general that exceptions should not be carrying much text. (if any!) My feel is that this is one of the bad habits that I need to remove from the code base.

Looking over the resource files that already contain all the strings for the XML files... I am not feeling happy. Since all this stuff is compiled in to the exe... perhaps I should extract it all and load it dynamically. (Everything else is data driven...so why not the error text?) Although this opens up the possibility of the user trying to change the functionality of the system by hacking the data files... ugh. Yet another debugging condition to consider when supporting them over the phone..... yuck. Ok, perhaps compiling the strings in is safer/less complicated. It does make any translation a bit more complex and the exe becomes language specific.

Hmmm.... games within games.

Friday, August 3, 2012

Trouble in the walled garden

http://www.marco.org/2012/07/26/mac-app-store-future

This article raises some interesting points. Specifically about the effect of ... well "political instability" within the Apple App store. This is equivilent to any other form of instability in an environment or niche.

The population within that environment cannot adapt to the environment because the "time to adapt" is longer than the environments "time between changes". Effectivly the population will mostly "suck" at "exploiting" the opportunities in the environment in an optimal way.

Now..... what to do? What to do? Do what?

Well if there is an exodus of both customers and sellers from a market place... the market place will either diminish or must recruit more of both. Since the Apple App store has a constant stream of new customers ( purchasing new devices and being main-lined into the App store exerience) this is partially mitigated. The other side of this is that there is a whole mass of App developers who are just waiting for the opportunity presented when a big name app exits the market place. This mitigates the other part of the problem.

The only people who loose are the developers who exit the app store and their customers who follow. They all go to tiny niche app stores ( developer web sites ..etc) which are more fagile and have a much higher level of risk. Let me put it this way... would you bet on a small app developer being here in five years or Apple being here in five years?

The problem with the position in the article above is that the developers think they are unique and have some irreplacable "stuff". This, I would contend is ..."crap". (Technical term...)

In the event that an App exits the market place simply because the developer is too arrogant to adapt to the changes in the market place... it will be about 0.01 seconds before someone else starts trying to replace them....even if they offer less functionality or do things differently... they will grow and follow what the customers need/want/crave/etc.

No one will miss the apps and developers that have left to find there own destiny in the waste outside the walled garden.... simply because they have no way to control the "Unique-ness" they assume they possess. It can be replaced with other "Unique-ness" of similarly ephemerial specialness.

NO ONE CARES IF YOU TAKE YOUR TOYS AND LEAVE THE GAME.

A few may follow... for a while, but there is no cost to the customers to return to the walled garden when they choose. There is a huge cost to the developers to leave and then return. Their position will be lost/diminished/destroyed. And no one will care.... There is no loyalty when the choice is between functionality or no functionality. The customer will go where their pain is least... however, they also will not try every option to optimise their choice... they will simply stop with the first/nearest option that reduces their pain enough to be worth while. Satisficing.