Friday, June 4, 2010

Netflix prize paper and iTunes Genius

The netflix recommendation engine is an interesting problem. I ran across a mention of this in an article on iTunes Genius.

The iTunes genius system is a simply leveraging a massive data set to appear cleaver. I respect that its taken a lot of work to get it to work but the essential strategy is not particularly special. Its just the effect of the massive data set that allows it to be viable. Its the same as any system that has a huge "memory" and can effectively leverage it to improve its performance.

The netflix problem is similar but its more of an optimization problem. They are still doing the same thing as any recommendation engine in that they are trying to match a product with a consumer.

It would be interesting to try to look at the properties of the product vs the properties that the consumer thought they were looking for vs the properties of previous products that the consumer had consumed and their rating of that product.

This is all based on a classification problem as well.  How subjective/objective are the properties that are being discussed?

There is another difference. The magnitude of the experience.  A music track ( iTunes problem ) is a couple of minutes of your life; while a movie may be a couple of hours.  If you don't like a song, its a fairly small cost to discard it or not even discard it.  But a movie that you don't like has a large cost and you will probably avoid it completely in the future, so it generates a much stronger response.

The experiences is also different.  Over the course of a two hour movie, the watcher may go through a range of experiences ( especially with a good narrative arc. ) So they may try to report a much more varied response when asked if they liked the movie or not. If you look at some of the film review forums there is a lot of aspects that get discussed.  While music tracks are much quicker and get a much simpler discussion ( like or not like ).  Anyway, these are just data points at the end of the day.

In summary, the iTunes problem is a simple recommendation engine with fairly simple data points and a large set of sample training data.  The netflix problem is two fold, the first is getting a good recommendation engine and the second is getting it to present a result in a reasonable time.  The second part is just an optimization problem.

The recommendation engines have two input problems. The first is classification of the properties of the product being recommended. The second is getting useful data from a consumer about what they might like. Its then just a matter of finding all the possible matches and ranking them using some ranking scheme.

Fair enough this is a problem with real scale issues but it can be simplified by splitting the search space in a couple of ways and doing some pre-computing.

The fact that people are so predictable means that you can probably pre-computer a great deal of this and build a set of "stereotype" user profiles and keep them up to date then build an individual profile for each actual user as a function of the nearest "stereotype" with a customized set of deltas to represent their divergence from the stereotype.

It would probably be easy enough at scale to build  a hierarchy of stereotypes and move the actual user between more or less specialized stereotypes as their taste changes.  Then it simply becomes a matter of searching through the stereotypes for the nearest match rather than doing a comparison of that actual user with each and every film in existence.
All you would need to do is to update the stereotypes as each new film is added to the database.  Even if there were a few thousand stereotypes, it would still be nice and cheap to keep it all up to date. Sort of an intermediate processing strategy.

The number of stereotypes would probably be something like the number of permutations of combination of the properties of the product minus the silly and unpopular.  The list could probably be simplifying even further by collapsing similar stereotypes for the less popular and increasingly specializing those that are popular. This could then be managed with an evolutionary strategy.

Once the problem starts to be described in terms of entities its possible to play all sorts of social and population games with them.

No comments:

Post a Comment