WHY DATA SCIENCE IS HERE TO STAY

WHY DATA SCIENCE IS HERE TO STAY

One of my favorite big-picture quotes comes from Microsoft co-founder, Bill Gates:

We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next ten.”

I recently read an article claiming that Data Science is a fad and will disappear within a relatively short amount of time. In a totally different article, I also read that big data technologies have not resulted in any competitive differentiation or operational improvements. I disagree with both of these notions, because individuals skilled in data science will be enormously helpful as technology advances, with almost no exception. However, what we must consider is the timing of individuals beginning these careers.

Take a look at the latest job listings on LinkedIn for Data Science positions at Microsoft. This may serve as evidence for their need. I think we are seeing an increase in attention paid to this specific role, and that we will see more of these jobs created as more companies like Microsoft start institutionalizing the role. The process of learning what works and does not work for the role will translate to better opportunities for the individuals seeking a career in this field in the future. Todays, a Data Scientist’s job description calls for many different skills, but I wouldn’t be surprised to see a better role definition in a year’s time (with better career transition paths thereafter).

A good friend of mine once said, “In a room, for every wall of technology there is at least three walls of API and Frameworks”. This model has been around for as long as I have been in the technology field. I have two pictures side-by-side of my world, then and now, specific to 4 areas (architecture, algorithms, design patterns and languages).

I have seen this play out many times in my career; if you believe that a task or an activity should be automated, trust me, it will become automated. That means the walls in this model increase in size to accommodate new automations until there is a cry for standardization. When there is a cry, you know we have arrived to the day of mass adoption and customization. The good news is that as technology gets more pervasive, the time to get these automations is shrinking by the day. In this shrinking continuum, we are still a long way away from standardization, especially in the field of Big Data and other new frontiers of Data Science. It almost feels like deja vu (from the 2000’s) when you see a new solution (a SaaS service today) launched on a daily basis that addresses a gap in the current pipeline of work for Big Data and Data Science – from Data Visualization, tools to manage reproducible research, Algorithm hubs, new and better container management tools to new/better languages that abstract out underlying complexity. Technology complexities always get resolved by technology like has happened with Continuous Integration, RESTful APIs, newer languages (Scala, Clojure) etc.

During any early adoption cycles, there is some anxiety and fear of failure, and any initiatives started by an organization come under the umbrella of “Proof-of-concept”. This investment in POC is typically a hedge – if the technology takes off – then the organization can move forward toward larger scale implementation, or shut it down without incurring incremental investments.  

Big Data related technologies, especially Hadoop, have been around for 7 years now, and they are not going away. How do I know? The conversations have changed from Volume (scale of data) and Variety (different forms of data) dimensions of Big Data to tackling challenges of Velocity (analysis of streaming data) and Veracity (uncertainty of data). I feel I am being an optimist here, as the enterprises understand and accept to fix the Volume and Variety challenges in near-term and are looking ahead to establish Centers of Excellence to tackle newer challenges related to Velocity and Veracity.

Josh Willis of Cloudera defines a data scientist as – “Person who is better at statistics than any software engineer and better at software engineering than any statistician.

When I see research in labs around probabilistic programming languages like BUGS, Church and new C# and F# bindings on infer.net, I know that we are already looking ahead to a possible next generation software engineer who will also be familiar with Bayesian models. He or she will frequently run large scale analytics in the cloud by spinning up containers on demand.

I’ve seen this scenario play out so many times with new technologies, organizations, initiatives, etc. Things start slow, as they always do, and then eventually pick up momentum such that you look back a few years and are blown away by how much has changed. We are apt to repeat ourselves and consistently overestimate the next two years and underestimate the next ten, and that’s highly unlikely to change.

 REFERENCES

 

 

 

Ryan Berry

Senior VP Cloud Architecture @ OneStream Software | Azure Cloud-Native Capabilities

8y

Nice article!

To view or add a comment, sign in

Insights from the community

Explore topics