The Curious Truths of Big Data

The Curious Truths of Big Data

In the world of big data, strange truths about the world begin to emerge. Orange cars are the most reliable used cars to buy. Prepaid phone card sales can predict unrest in Africa. And women with larger breasts spend more money online.

That last one comes from a recent study released by Alibaba, the Chinese website that hopes to be the next Amazon. Data analysts looking at data points for ladies’ underwear sales noticed that women who purchased larger bra sizes spent more online overall.

But is that knowledge useful? Maybe, maybe not.

Correlation does not equal causation.

If you ever took a science class in school, you might have heard the phrase, “Correlation does not equal causation.” It basically tells us that just because women who purchase larger sized bras spend more money online, that doesn’t mean that their larger bra size caused them to spend more money.

And that can be the problem when data analysts are looking at these strange and interesting new truths that emerge from the mass quantities of data to which we now have access. If we take it as true that orange used cars are more reliable, the question then becomes why: Are owners of orange cars more careful? Does the color prevent people from getting in accidents? Or does the color orange have some other magical property that keeps a car running well? The data has no answers.

Tyler Vigen posts funny charts to his website, Spurious Correlations, that show the danger of simply matching two data sets without any deeper understanding of how the things are related. For example, if correlation is all you need to go by, then we can assume that the more films Nicolas Cage appears in in any given year, the more swimming pool drownings will result and that an increase in U.S. spending on science results in an increase of suicides by hanging. Spurious indeed, we hope, or U.S. researchers and Nick Cage’s film career are in trouble.

The data-driven crystal ball.

Now that we have all this data, we’re just on the cusp of figuring out how to use it to our advantage. The goal is to be able to use these strange truths to try to predict everything from buying habits to the spread of the flu virus, and the results are just as varied.

Researchers have realized that Twitter updates can more quickly and more accurately predict flu outbreaks than traditional CDC tracking methods — in fact, Twitter data can predict an outbreak up to 8 days in advance with more than 90 percent accuracy.

The African company CellTel realized a similar prediction ability when it noticed an uptick in prepaid phone cards before major incidents of violence and unrest in Congo. They realized that the cards were denominated in U.S. dollars, and people bought them to have something portable and valuable to take with them and protect against local inflation.

Similarly, Alibaba hopes to use the incredible quantities of data it collects (as many as 14 million data points in a single day) to predict factors in a huge variety of businesses it may try.

“For example, if we have a lot of data on what people purchase in terms of food, groceries, is that data going to be helpful when we do healthcare? I think so,” an executive told online magazine Quartz.

As more companies try to use their data to predict consumer behavior, don’t be surprised to see more of these curious truths emerge. Facebook, of course, has an entire team dedicated to data science, and they frequently post their findings to their Facebook page, like the fact that if your name is Yvette, you are more than 37 percent more likely than the average person to have a sister named Yvonne.

How that helps Facebook’s business plan is yet to be seen.

As always, let me know your views on the topic.

--------------

I really appreciate that you are reading my post. Here, at LinkedIn, I regularly write about management and technology issues and trends. If you would like to read my regular posts then please click 'Follow' and send me a LinkedIn invite. And, of course, feel free to also connect via Twitter, Facebook and The Advanced Performance Institute.

Check out other recent LinkedIn Influencer posts by Bernard Marr:

About : Bernard Marr is a globally recognized expert in strategy, performance management, analytics, KPIs and big data. He helps companies and executive teams manage, measure, analyze and improve performance.

His new book is: Big Data: Using Smart Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance

Photo: Shutterstock.com

Fon Nguyen

Data Analyst with interest in backend & engineering

9y

I agree, Bernard. Correlation should be validated with "Why?" to explore the underlying reason. In the case of large-sized underwear buyers, there may be something with the user-experience of smaller sized buyers for the company to improve sales. Say, sizing might not fit, product display is not attractive, statement of benefits is not clear...

Like
Reply
Luis Miguel Serrano Valle

Project manager (PMP). Scrum Master (PSM)

9y

First was the challenge of storing and obtaining information of raw data, after was obtaining correlations from those raw information... the real goal, I agree, is obtaining causation relationship on information obtained through big data techniques. It could be like-jokes you has exposed (bra size or Nicolas Cage appears) or tips that provide huge potential value for the company. That's the point. Great post!

Like
Reply
Megan Sumner

Love the adventure we call LIFE!

9y

Entertaining and insightful as always! Happy New Year- looking forward to all your new posts!

Like
Reply
Kaushal Devater

Product and services for next generation of business and technology solution

9y

I guess some of the basic science and communication principal like causal and non-causal effect and impact will be redefined...but the psychological principle will remain same as long as human give birth to a human and not a cyborg or robot.

Like
Reply

Very interesting read, thank you for sharing. I will be adding this book to my reading list.

Like
Reply

To view or add a comment, sign in

Insights from the community

Explore topics