Stratasphere: Recomendation Strategies

Showing posts with label Recomendation Strategies. Show all posts

Wednesday, March 14, 2012

Out of control sales bots

http://carlos.bueno.org/2012/02/bots-seized-control.html

There is so much fun to be had with the content in this article.

The fact that Amazon is now just a giant bot feeding ground, run by bots, and exploited by other bots; is just a little bit scary.
The fact that the Amazon "business" cannot be "stopped" or reset to rid it of these bots suggests that its really close to being a permanent "environment". As long as the bots keep paying the bills... who will try to stop it?

The Matrix has you....

Wednesday, April 27, 2011

Fun with matching algorithms

http://www.theregister.co.uk/2011/04/26/amazon_pricing_algorithms/

This is a fun little example of matching algorithms with limited inputs and a very short feedback loop. What do you get.... uh, growth... duh.

Friday, March 25, 2011

Dueling Algorithms Paper

http://web.mit.edu/newsoffice/2011/dueling-algorithms-0318.html

This is interesting. Must read the paper.

Friday, January 21, 2011

DNA, Game Theory and Risk in partner selection....

http://blog.okcupid.com/index.php/the-mathematics-of-beauty/

This is quite thought provoking. There are a slew of confounds and different directions to take the research... but that's not what interests me initially.

I was interested in the two types of graph patterns that were being discussed. Not if they were correct, just the thoughts that I had trying to explain them.

Firstly, the two profiles are essentially the "Normally Attractive" and the "Split Hot or Hate" shapes. It got me thinking about DNA and partner selection along with some game theory.

The first point is how to explain the "Split Hot or Hate" graph.

If we assume that the "Normally Attractive" pattern represents a female partner who has effective enough genes to be successful in the current environment. This means that her DNA has been selected for and reinforced across a significant part of the population enough to be considered the "normal" look. (This also assumes that appearance and whatever else was being overtly or covertly evaluated by the fairly simple measure is tied directly to DNA or social DNA... anyway)
So people who are outside this "normal" are aberrant in some way. This may be because they express non-normal characteristics. Now if the environment is static, then the conservative strategy for picking a mate would be to always pick a "normal" mate and work on the assumption that others have taken all the risks and normal is good enough to survive. The higher risk strategy is to look for someone who exceeds normal in some fashion and try to breed with an exceptional partner.

However in the event that the environment is not stable, picking a "normal" is no longer the risk free or conservative strategy. Because "normals" are adapted for the environment that was previously stable and may not be so tomorrow. (The question is if our DNA is smart enough to express that kind of strategy via testosterone or just uses blind selection to deal with this scenario.... which is simpler do you think?)

So how does this explain the "Split hot or hate" pattern? I would suggest that risk taking in mate selection is essential for a healthy population and the ability to explore different mutations. But its probably only valuable if a small-ish part of the population engages in these risky ventures. The rest of the population should be steered toward only breeding with "normal" partners and avoiding anyone "abnormal". This way the population should have a large central pool of "normal" stock with a smaller pool of mutants ready to take advantage of any useful mutations or environment changes. So these graphs represent the difference between "normals" and "abnormals". While the "normals" get a response pattern that suggests they are acceptable to conservative and risk taking partners, the "abnormals" are either highly attractive to risk taking partners or highly repulsive to conservative partners.

This study is possibly more interesting because it was done on the male response to female potential mates. The dis-equality between the male and female illustrates just how wildly different mate selection is on the two sides of the fence.
The cost to a male of breeding is relatively tiny if there is no husbandry involved. While the cost to a female is potentially a year or so of her life or ~ 1/15 - ~1/20th of her possible breeding events(Assuming starting breeding about 15 and being fertile until mid to late 30s). So for Males to breed with a female outside the "normal" pool is fairly low cost/low risk. While for females its a much higher cost, but with the same risk.

Once you involve husbandry and the social conventions of mating, it gets much more complex but the underlying biology is the same.

I would predict that a similar study from the female point of view would show a similar pattern for the "normal" pattern males but a much smaller percentage of "hot" ( high risk but worth it) types and a much larger percentage of clear "hate" group.

This reflects the percentage of "conservative" choices being higher for women due to the cost and risk of breeding with an "abnormal" being higher.

It would be interesting to break the scoring given to each partner down into a set of finer grained plus or minus scores. IE.

Like their profile information (score 1-5)
Like their picture (Score 1-5)
Overall like ( Score 1-5)

Later....

Friday, June 4, 2010

Netflix prize paper and iTunes Genius

http://www2.research.att.com/~volinsky/netflix/
http://www.technologyreview.com/blog/guest/25267/

The netflix recommendation engine is an interesting problem. I ran across a mention of this in an article on iTunes Genius.

The iTunes genius system is a simply leveraging a massive data set to appear cleaver. I respect that its taken a lot of work to get it to work but the essential strategy is not particularly special. Its just the effect of the massive data set that allows it to be viable. Its the same as any system that has a huge "memory" and can effectively leverage it to improve its performance.

The netflix problem is similar but its more of an optimization problem. They are still doing the same thing as any recommendation engine in that they are trying to match a product with a consumer.

It would be interesting to try to look at the properties of the product vs the properties that the consumer thought they were looking for vs the properties of previous products that the consumer had consumed and their rating of that product.

This is all based on a classification problem as well. How subjective/objective are the properties that are being discussed?

There is another difference. The magnitude of the experience. A music track ( iTunes problem ) is a couple of minutes of your life; while a movie may be a couple of hours. If you don't like a song, its a fairly small cost to discard it or not even discard it. But a movie that you don't like has a large cost and you will probably avoid it completely in the future, so it generates a much stronger response.

The experiences is also different. Over the course of a two hour movie, the watcher may go through a range of experiences ( especially with a good narrative arc. ) So they may try to report a much more varied response when asked if they liked the movie or not. If you look at some of the film review forums there is a lot of aspects that get discussed. While music tracks are much quicker and get a much simpler discussion ( like or not like ). Anyway, these are just data points at the end of the day.

In summary, the iTunes problem is a simple recommendation engine with fairly simple data points and a large set of sample training data. The netflix problem is two fold, the first is getting a good recommendation engine and the second is getting it to present a result in a reasonable time. The second part is just an optimization problem.

The recommendation engines have two input problems. The first is classification of the properties of the product being recommended. The second is getting useful data from a consumer about what they might like. Its then just a matter of finding all the possible matches and ranking them using some ranking scheme.

Fair enough this is a problem with real scale issues but it can be simplified by splitting the search space in a couple of ways and doing some pre-computing.

The fact that people are so predictable means that you can probably pre-computer a great deal of this and build a set of "stereotype" user profiles and keep them up to date then build an individual profile for each actual user as a function of the nearest "stereotype" with a customized set of deltas to represent their divergence from the stereotype.

It would probably be easy enough at scale to build a hierarchy of stereotypes and move the actual user between more or less specialized stereotypes as their taste changes. Then it simply becomes a matter of searching through the stereotypes for the nearest match rather than doing a comparison of that actual user with each and every film in existence.
All you would need to do is to update the stereotypes as each new film is added to the database. Even if there were a few thousand stereotypes, it would still be nice and cheap to keep it all up to date. Sort of an intermediate processing strategy.

The number of stereotypes would probably be something like the number of permutations of combination of the properties of the product minus the silly and unpopular. The list could probably be simplifying even further by collapsing similar stereotypes for the less popular and increasingly specializing those that are popular. This could then be managed with an evolutionary strategy.

Once the problem starts to be described in terms of entities its possible to play all sorts of social and population games with them.