The Cold Start Problem

Daniel Tunkelang

High-Class Consultant

Published Apr 29, 2013

People who bought this item also bought...

If you've ever shopped online, then you're familiar with recommendations that rely on collaborative filtering, predicting a user's interests from the past behavior of other users with similar interests. There are countless varieties of collaborative filtering -- we've come a long way since researchers at Xerox PARC proposed the method two decades ago.

But collaborative filtering depends on the availability of past data. What about recommendations for new products that people have not yet had a chance to purchase, let alone recommend?

Here we encounter what's known as the "cold-start" problem. For the product recommendation case, there's often a work-around -- namely, making inferences about new products based on the historical purchasing behavior of related products.

But there's a more extreme version of the cold-start problem: launching a new data product, like when LinkedIn first introduced the world to "People You May Know". For this scenario, there's no past data to draw on at all, at least in theory.

In practice, there's usually a way to bootstrap. We can apply what Monica Rogati calls "data recycling" -- using data from other contexts to bootstrap our initial statistical models until we can collect live data. We can also use an "explore / exploit" strategy to optimize the speed of learning.

Developing a creative solution for a cold-start problem is a great test of a data scientist's prowess -- it also makes for a great interview problem.

But cold-start problems aren't just for data scientists. They come up in everyday life.

For example, consider the human tendency to overgeneralize from first impressions. Or the reliance on stereotypes instead of evidence. Both are attempts to overcome cold-start problems. At one time they may have provided evolutionary advantages, but today they can be crippling cognitive biases.

Still, as human beings, we have to make decisions all the time, and we don't always have the luxury of obtaining the best decision from a model built with representative training data. So we should all get better at facing cold-start problems. At some level, we are all data scientists. So let's try to be good ones.

Vegard Sandvold

Frontend Developer at FINN.no

10y

Very interesting! Is reminds me of other challenges with collaborative filtering I've heard of - popularity bias (or the "rich get richer" effect) and early rater bias. Are they relevant in this case?

Naveena Tripurana

10y

Cold Start with Cross-Product Marketing data, can be a better start!

Bharat Gera

Founder at Human Centric Healthcare Ecosystem (HCHE)

10y

have been tackling a related problem of anonymous visitors in ecommerce, tough problem. cold start plus limited data - like cold starting with no clue about engine used in the car :-)

Evan Bradley

Building teams and products, solving problems that matter.

10y

Daniel, interesting article, actually it brings up some excellent product features I would love to see in LinkedIn...1) Building influence on Linkedin (features for those who are not currently influencers on LinkedIn) 2) People you should learn about, 3) Thought leaders in your area of work, 4) Career Trajectory (People who had your career in their lineage showing a tree of pathways forward...a fascinating data use from LI). Cold start is very interesting from a consumer experience standpoint as each person should have a unique taste signature, but like epidemiology or population genetics there are clear trends that can be drawn if you have enough good data. Great article, Daniel do you have any book recommendations to refresh on the math for deeper understanding of cold start, my calculus/matric alg. is rusty.

Samuel B.

Principal Software Engineer at Voya Health

10y

We are all data scientists, let's try to be good ones.

See more comments

To view or add a comment, sign in

See all

The Cold Start Problem

Daniel Tunkelang

High-Class Consultant

More articles by this author

Insights from the community

Explore topics

LLMs and RAG are great. What’s Next?

Apr 18, 2024

Sparse and Dense Representations

Apr 15, 2024

AI-Powered Search: Embedding-Based Retrieval and Retrieval-Augmented Generation (RAG)

Apr 8, 2024

Analyzing the AI Search Opportunity

Apr 2, 2024

LLMs and RAG are Great, But Don’t Throw Away Your Inverted Index Yet

Mar 29, 2024

Learn to Rank = Learn to be Humble

Mar 24, 2024

Hierarchy is Hard!

Mar 14, 2024

An Update on Search Classes

Mar 12, 2024

Making Sense of Null and Low Results

Mar 7, 2024

Precision, Recall, and Desirability

Feb 15, 2024

Insights from the community

Explore topics