The Trends in Data Science

Image for post
Image for post

Introduction

As data scientist named as “the sexiest job of the 21st century” by Harvard Business Review in 2021, the job annual growth has soar since then. According to Glassdoor, until 2018 the job has 650% growth since 2012 which shows us a lot of demand in this field.

A lot of people want to become data scientist lately even though they are not coming from CS, mathematics or statistics degree. What drive them to switch career to data science? some tempted because they earn a lot of money, but some just because they are “math nerd” and data science is exactly things they are good at.

Luckily, Stackoverflow has annual survey result among people around the world. The result consist of what tech stacks they use, what job position they are in, how much their annual earning etc. We will use the latest result, 2020 public data, to answer some of the question on this article. You can download the Stackoverflow public datasets here

Image for post
Image for post
Raw data from 2020 Stackoverflow public survey dataset

The skill requirements for data scientist are a lot in terms a stack will be used. There are Pandas, NumPy, Docker, Kubernetes, TensorFlow, PyTorch, etc. Since there are many stacks involved in data science, I will try to give insight on the stacks trends to give future data scientist a brief look of what stack they should learn.

What Are the Most Common Stacks Among Data Scientist?

This part will help you to decide what stacks you should learn now. Although the more stack user doesn’t mean the more job available, but it will help you to give insight on what stack currently trending.

Image for post
Image for post
Number of people work with stack in survey

As you can see, HTML/CSS/JS is the common stack to work with among data scientist. It shows us that even data scientist needs to have skill of web development to publish it to top management or this stack skills is the most common skills since everybody learn web development since the day one.

In data related stack, SQL is the most common language known. It infers that having SQL skill is the most important in this field since data scientist will use it a lot for data preparation, wrangling and analysis.

In terms of database, MySQL is the most popular one with PostgreSQL in second place. It shows us that free open-source database is still the most popular compared to licensed one like MSSQL or Oracle. It’s not a surprise since people tend choose the free one (yeah, I love free stuff too).

What are The Most Wanted Data Science Stack?

This part will help you to answer what is the future going to be. The most wanted stack tends to give good feature or functionality that any other stack cannot give. That’s why they are the most wanted.

Image for post
Image for post
Number of people want to learn stack

In the graph above, it shows us that Linux and Docker are the top platform with the most desired. It aligns with the trend of container and microservice architecture in the recent years and it will keep going in the future.

In cloud platform, AWS is still leading as the most wanted cloud for among data scientist as it’s already the leader in cloud market. It has big lead in the most wanted cloud with Google Cloud Platform in second place and Microsoft Azure in the third place.

Surprisingly, there is JavaScript, Node.js and React.js as it shows that we development is still in high demand among data scientist. It aligns with the previous statement that many data scientists have web development skills to publish the result online.

What Stack are The Top Earnings?

In this part, I only use respondent from United States to remove the bias from different GDP per capita for each country. Then we show the graph with boxplot to give insight how wide the variance is.

Image for post
Image for post
Annual earnings of stack

Surprisingly, PyTorch has the highest median of all stack with the gap of the second stack. It clearly shows us that deep learning skill is highly in demand or it shows that PyTorch is a lot better deep learning framework compared to other, such as TensorFlow.

Pandas, the-must-known stack for data scientist is placed at fourth. But it also shows that it has wide spread variance. It aligns with the number of people across all level can use it.

On cloud platform, even though Google Cloud Platform is not the most popular one, but the median income beat AWS to second place. It shows that Google Cloud Platform has great tool for data analytics, ML and AI.

Conclusion

The keys in this article are:

  1. Even though you are data scientist, having web development skill is essential. Since we are not only process the data, but also show it to the top management of organization

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store