Data Science: What's in a Name?

Explore. Hypothesize. Test. Repeat.

That's what scientists do. We explore the world around us, come up with hypotheses that generalize our observations, and then test those hypotheses through controlled experiments. The positive and negative outcomes of those experiments advance our understanding of reality.

The word "scientist" carries a certain cachet, and deservedly so. Scientists have made key discoveries that make our lives better, creating the foundation for advances in technology. Moreover, the scientific method is a harsh taskmaster: it requires that our leaps of faith be falsifiable, and that we determine the truth of our claims through repeatable experiments.

Hence, it's no surprise that many professions -- and even religions -- have wrapped themselves in the flag of science. Science -- especially "rocket science" -- has come to connote anything that requires a high degree of intelligence.

Which leads to the trend of "data scientists" in Silicon Valley and beyond. I use scare quotes because the term acts as a Rorschach test: how a person interprets the term often reveals more about person than the profession.

Let's try the definition from Wikipedia, which aspires to present a neutral point of view:

Data scientists solve complex data problems through employing deep expertise in some scientific discipline. It is generally expected that data scientists are able to work with various elements of mathematics, statistics and computer science, although expertise in these subjects are not required. However, a data scientist is most likely to be an expert in only one or two of these disciplines and proficient in another two or three.

Drew Conway presents a similar definition using a Venn diagram:

But Drew also points out the challenge with this definition: "the split between substance and methodology is ambiguous, and as such it is unclear how to distinguish among hackers, statisticians, subject matter experts, their overlaps and where data science fits."

It's ironic that a profession devoted to rigorous analysis struggles to converge on a precise definition. Not that there's a lack of debate. You can get a taste of that debate from a popular Quora post on "How would you define data science and data scientists and distinguish it from older related terms?".

I'm not going to attempt to resolve the debate here. There's clearly a need for people who blend math, computer science, software engineering, and product sense -- I should know, since I hired a bunch of them to be data scientists at LinkedIn, and they've made key contributions to our products. Are they analysts? Engineers? Product visionaries? The answer is yes to all of these, which is what makes these folks so hard to hire!

I am, however, skeptical of the use of any term to create an elite club of experts. We are what we do. And if we're going to call ourselves scientists, then the most important thing we can do is follow the scientific method in our quest to understand the world around us and advance the state of technology.

Explore. The reason that data science emphasize technical skills is that those skills are essential for performing exploratory data analysis.

Hypothesize. The point of exploration is to make surprising observations and generalize those observations to generate hypotheses.

Test. Testing is what makes this process a science. Testing is how we validate hypotheses by subjecting them to cold, harsh reality.

Repeat. Science is an endless process. And, like software engineering, data science at its best is agile and iterative.

Explore. Hypothesize. Test. Repeat. If you do all of these, then you've earned the right to call it whatever you want.

If you'd like to hear more about science as a strategy, watch the video below.

Andrew Pandre, Ph.D.

Data Visualization Consultant and Director

10y

Why so many people seriously discussing the definition of "Data Science"?

Like
Reply
Geoffrey L Flagstad

Managing Director at GLF Media LLC

11y

Not much humility in this group. What is is. Science is dependent on time and volatility.

Like
Reply
Istvan Hajnal

Insights Director Ipsos The Netherlands/Global

11y

Great article. Thanks. I recently did a study where I Explored, Hypothesized and Tested. But the Repeat part is not always that easy. In my case it was an election study that investigated an (alleged) flaw in the voting process. I for sure hope this will never be Repeated. But on the other hand, I recommended that other researchers would analyze the same data. That's a "poor man's Repeat", but sometimes that's all you can do. Regards, Istvan

Like
Reply
Vlad Didenko

Team Lead at WorldQuant

11y

Contemplation and what's next follow-up for Daniel's Data Science definition: http://blog.didenko.com/2013/02/data-science-as-science.html

Like
Reply
Laurence Kennedy

IT Consultant/Software Developer

11y

Good points Daniel. Isaac Newton brought modern science into being with his Mathematical Principles of Natural Philosophy 'Philosophiæ Naturalis Principia Mathematica' so data scientists should know whose shoulders they stand on. I don't doubt that there are a couple of extra circles involved as what is not contained in the data set is often just as important as what is.

Like
Reply

To view or add a comment, sign in

Insights from the community

Explore topics