WHY DATA SCIENCE IS HERE TO STAY

Sravan Ankaraju

Engineering Responsible AI solutions | Powering growth with top tech US veteran talent | Author of Hackers and Heroes

Published Jun 9, 2015

One of my favorite big-picture quotes comes from Microsoft co-founder, Bill Gates:

“We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten.”

I recently read an article claiming that Data Science is a fad and will disappear within a relatively short amount of time. In a totally different article, I also read that big data technologies have not resulted in any competitive differentiation or operational improvements. I disagree with both of these notions, because individuals skilled in data science will be enormously helpful as technology advances, with almost no exception. However, what we must consider is the timing of individuals beginning these careers.

Take a look at the latest job listings on LinkedIn for Data Science positions at Microsoft. This may serve as evidence for their need. I think we are seeing an increase in attention paid to this specific role, and that we will see more of these jobs created as more companies like Microsoft start institutionalizing the role. The process of learning what works and does not work for the role will translate to better opportunities for the individuals seeking a career in this field in the future. Todays, a Data Scientist’s job description calls for many different skills, but I wouldn’t be surprised to see a better role definition in a year’s time (with better career transition paths thereafter).

A good friend of mine once said, “In a room, for every wall of technology there is at least three walls of API and Frameworks”. This model has been around for as long as I have been in the technology field. I have two pictures side-by-side of my world, then and now, specific to 4 areas (architecture, algorithms, design patterns and languages).

I have seen this play out many times in my career; if you believe that a task or an activity should be automated, trust me, it will become automated. That means the walls in this model increase in size to accommodate new automations until there is a cry for standardization. When there is a cry, you know we have arrived to the day of mass adoption and customization. The good news is that as technology gets more pervasive, the time to get these automations is shrinking by the day. In this shrinking continuum, we are still a long way away from standardization, especially in the field of Big Data and other new frontiers of Data Science. It almost feels like deja vu (from the 2000’s) when you see a new solution (a SaaS service today) launched on a daily basis that addresses a gap in the current pipeline of work for Big Data and Data Science – from Data Visualization, tools to manage reproducible research, Algorithm hubs, new and better container management tools to new/better languages that abstract out underlying complexity. Technology complexities always get resolved by technology like has happened with Continuous Integration, RESTful APIs, newer languages (Scala, Clojure) etc.

During any early adoption cycles, there is some anxiety and fear of failure, and any initiatives started by an organization come under the umbrella of “Proof-of-concept”. This investment in POC is typically a hedge – if the technology takes off – then the organization can move forward toward larger scale implementation, or shut it down without incurring incremental investments.

Big Data related technologies, especially Hadoop, have been around for 7 years now, and they are not going away. How do I know? The conversations have changed from Volume (scale of data) and Variety (different forms of data) dimensions of Big Data to tackling challenges of Velocity (analysis of streaming data) and Veracity (uncertainty of data). I feel I am being an optimist here, as the enterprises understand and accept to fix the Volume and Variety challenges in near-term and are looking ahead to establish Centers of Excellence to tackle newer challenges related to Velocity and Veracity.

Josh Willis of Cloudera defines a data scientist as – “Person who is better at statistics than any software engineer and better at software engineering than any statistician.”

When I see research in labs around probabilistic programming languages like BUGS, Church and new C# and F# bindings on infer.net, I know that we are already looking ahead to a possible next generation software engineer who will also be familiar with Bayesian models. He or she will frequently run large scale analytics in the cloud by spinning up containers on demand.

I’ve seen this scenario play out so many times with new technologies, organizations, initiatives, etc. Things start slow, as they always do, and then eventually pick up momentum such that you look back a few years and are blown away by how much has changed. We are apt to repeat ourselves and consistently overestimate the next two years and underestimate the next ten, and that’s highly unlikely to change.

REFERENCES

WHY DATA SCIENCE IS HERE TO STAY

Sravan Ankaraju

Engineering Responsible AI solutions | Powering growth with top tech US veteran talent | Author of Hackers and Heroes

More articles by this author

Insights from the community

Explore topics

Meeting this moment: AI-driven automations

Apr 23, 2024

Finding the balance between caution and optimism

Jan 2, 2023

Many Paths, Many Destinations

Jan 11, 2022

Talent Crossing: Transitioning New Non-traditional Learners into a Digital Workforce in 2021

Jan 4, 2021

BIG THEMES OF 2019

Jan 22, 2019

Looking ahead to 2018

Jan 2, 2018

Reimagining the American Dream

Sep 2, 2017

Hazaar Din.... 1000 days

May 2, 2017

Data Product Development Approaches (Industry 4.0 Series - Part 4)

Aug 23, 2016

An Industrial Strength foundation for Innovative Data Product development (Industry 4.0 Series - Part 3)

Jul 11, 2016

Insights from the community

Explore topics