The Cold Start Problem


People who bought this item also bought...

If you've ever shopped online, then you're familiar with recommendations that rely on collaborative filtering, predicting a user's interests from the past behavior of other users with similar interests. There are countless varieties of collaborative filtering -- we've come a long way since researchers at Xerox PARC proposed the method two decades ago.

But collaborative filtering depends on the availability of past data. What about recommendations for new products that people have not yet had a chance to purchase, let alone recommend?

Here we encounter what's known as the "cold-start" problem. For the product recommendation case, there's often a work-around -- namely, making inferences about new products based on the historical purchasing behavior of related products.

But there's a more extreme version of the cold-start problem: launching a new data product, like when LinkedIn first introduced the world to "People You May Know". For this scenario, there's no past data to draw on at all, at least in theory.

In practice, there's usually a way to bootstrap. We can apply what Monica Rogati calls "data recycling" -- using data from other contexts to bootstrap our initial statistical models until we can collect live data. We can also use an "explore / exploit" strategy to optimize the speed of learning.

Developing a creative solution for a cold-start problem is a great test of a data scientist's prowess -- it also makes for a great interview problem.

But cold-start problems aren't just for data scientists. They come up in everyday life.

For example, consider the human tendency to overgeneralize from first impressions. Or the reliance on stereotypes instead of evidence. Both are attempts to overcome cold-start problems. At one time they may have provided evolutionary advantages, but today they can be crippling cognitive biases.

Still, as human beings, we have to make decisions all the time, and we don't always have the luxury of obtaining the best decision from a model built with representative training data. So we should all get better at facing cold-start problems. At some level, we are all data scientists. So let's try to be good ones.

Vegard Sandvold

Frontend Developer at FINN.no

10y

Very interesting! Is reminds me of other challenges with collaborative filtering I've heard of - popularity bias (or the "rich get richer" effect) and early rater bias. Are they relevant in this case?

Like
Reply

Cold Start with Cross-Product Marketing data, can be a better start!

Like
Reply
Bharat Gera

Founder at Human Centric Healthcare Ecosystem (HCHE)

10y

have been tackling a related problem of anonymous visitors in ecommerce, tough problem. cold start plus limited data - like cold starting with no clue about engine used in the car :-)

Like
Reply
Evan Bradley

Building teams and products, solving problems that matter.

10y

Daniel, interesting article, actually it brings up some excellent product features I would love to see in LinkedIn...1) Building influence on Linkedin (features for those who are not currently influencers on LinkedIn) 2) People you should learn about, 3) Thought leaders in your area of work, 4) Career Trajectory (People who had your career in their lineage showing a tree of pathways forward...a fascinating data use from LI). Cold start is very interesting from a consumer experience standpoint as each person should have a unique taste signature, but like epidemiology or population genetics there are clear trends that can be drawn if you have enough good data. Great article, Daniel do you have any book recommendations to refresh on the math for deeper understanding of cold start, my calculus/matric alg. is rusty.

Like
Reply
Samuel B.

Principal Software Engineer at Voya Health

10y

We are all data scientists, let's try to be good ones.

Like
Reply

To view or add a comment, sign in

Insights from the community

Explore topics