A Complete Data Science Roadmap in 2021
Build your own data science curriculum with online resources.
Around three years ago, I did an undergraduate degree in computer science. I chose to major in data science since it was so hyped up at that time.
I realized one year back that my degree did not equip me with the skills necessary to become a data scientist.
And it cost my parents approximately $25K.
This was before I knew about online learning platforms like edX and Coursera.
I taught myself all the skills required to become a data scientist. And I learnt it all outside my degree - I learnt it online.
Now, I'm working as a data scientist for a data and AI company.
In an article I wrote last year, I provided a list of courses you could take to break into the data science industry.
I will refresh that list here, and provide you with a few more learning resources that will help you break into data science in 2021.
These courses will teach you so much more than my entire degree program taught me, and for only a fraction of the cost.
Step 1: Learn Python
If you want to learn data science from scratch, the first thing you need to do is learn how to code.
Pick a programming language (either Python or R), and start learning.
I suggest starting out with Python because it is more widely used than R. It is also more general and highly flexible, and you will be able to make the transition to different domains (data analytics, web development) if you have Python knowledge.
To learn Python, take any one of these courses:
DataCamp: Introduction to Python
This DataCamp course will take you through exercises and teach you how to code in Python.
What will you learn in this course?
- You will learn the basics of Python: variables, data types, functions, methods, lists, and arrays. Knowing how to manipulate arrays is a very important skill to have when working as a data scientist, which is why they have an entire module dedicated to it.
- After you have a strong foundation of basic Python, this course will teach you to use a library called Numpy. Numpy is a popular Python package used by data scientists to manipulate arrays.
This course will provide you with a basic understanding of programming in Python.
There are a few topics that this course doesn't cover, such as conditional statements and loops.
These are very important concepts, and you shouldn't skip out on learning it. I suggest using external resources like FreeCodeCamp and YouTube videos to gain an understanding of these concepts.
Udemy: 2021 Complete Python Bootcamp From Zero to Hero in Python
This is an alternative to the DataCamp course. It is taught by Jose Portilla, who is (in my opinion) the best instructor alive.
I haven't taken this course because I had some basic programming knowledge before entering the field of data science.
However, I have taken his data science and machine learning course. This was the first data science online course I took, and I immediately fell in love with the subject.
Jose's teaching style is incredible. His programming exercises are at just the right level of difficulty, and will push you to think and come up with a solution.
If you are a complete beginner with no programming experience whatsoever, I would 100% recommend taking this course.
The only downside compared to the DataCamp course is that there is no built in code editor. You will need to set up your own programming environment (but Jose will guide you through this and it isn't hard at all).
After taking either one of these courses, you should have a good grasp of programming basics.
However, your journey in learning how to code doesn't end here.
You need to learn how to solve problems with the new syntax you learnt.
I once asked a data scientist how he learnt to code, and he suggested a site called HackerRank.
He told me that every time he wanted to learn a new language, he would solve as many problems as he could on the site. He suggested doing around 10 problems a day.
This might be a bit too much for when you're just starting out.
When I was just starting to learn programming, I remember taking an entire day to just solve even one coding challenge on the site.
However, as my Python and problem-solving skills progressed over time, I started becoming better at it.
Spend around 4-5 hours a day just solving HackerRank problems, and your Python programming skills will improve in no time.
Step 2: Learn Data Science
After you have a better grasp of programming in Python and problem solving, you can start learning the basics of data science and machine learning.
To do this, you can take one (or both) of the following courses:
Udemy: Python for Data Science and Machine Learning Bootcamp
This is the first data science course I've ever taken. I spent around 5 hours a day on this course and completed it within a month.
It is an introductory level data science course taught by Jose Portilla. It will teach you how to use libraries like Numpy and Pandas for data analysis, along with visualization libraries like Matplotlib and Seaborn.
Jose also introduces you to the basics of machine learning. He explains how the different machine learning models work, and then walks you through implementation of these models in Python.
I learnt more in one month from this course than I did in my entire data science degree.
Remember, you need to have some programming experience before taking this course, so make sure to take a Python course before doing this one.
Datacamp: Machine Learning Fundamentals with Python
This course will teach you the fundamentals of machine learning with Python.
In this course, you will learn the theory behind both supervised and unsupervised machine learning algorithms.
You will also learn practical implementation of these models in Python.
I haven't taken this Datacamp course before. However, the course content seems to be more detailed and comprehensive than the Udemy course I took.
A lot of topics (such as model regularization and metrics) were not covered in the Udemy course by Jose Portilla.
I suggest taking Jose's course on Udemy first to learn the basics and understand how to build and train models in Python.
Then, you can take the Datacamp course to fill in the gaps in your learning.
After taking these two courses, you will have complete understanding of how machine learning algorithms work and their implementation in Python.
Now, you are ready to start building your own machine learning projects.
Having a theoretical understanding of machine learning isn't sufficient to break into the industry.
In fact, I got my first data science job because of the data science projects I built and showcased on my portfolio. Once you finish learning data science with the help of these courses, read this article on the types of data science projects you should showcase on your portfolio.
Step 3: Learn statistics
Most people suggest learning statistics before diving into machine learning and data science.
I suggest the opposite.
I suggest learning Python and building machine learning models first.
Once you have a high level understanding of these models and know how to implement them in Python, you can learn how they work.
You can go in and learn the theory and math behind these models.
This is called the top-down learning approach, and its how I taught myself data science.
Here are some statistics courses you should take to gain a better understanding of data science and machine learning:
Probability and Statistics: To p or not to p?
This course is for you if you have no previous statistical knowledge. It is one of the best introductory statistics course I've taken.
It will take you through some of the most important concepts in statistics, such as the different probability distributions, standardization, descriptive statistics, random sampling, hypothesis testing, and the central limit theorem.
The best part about this course: It is tailored for students who come from a non statistical background.
The instructor of this course, James Abdey, explains the material with interesting examples and case studies.
He explains all the concepts in simple English and doesn't use any complex mathematical notation.
Once you complete this course, you will have a basic understanding of probability and statistics, and the methods used to make decision making under uncertainty.
edX: Statistical Learning
This course will provide you with an in-depth understanding of machine learning algorithms.
It is the only resource in this list that is taught in R. You don't need to know R programming before taking this course. The instructors will teach you how to code in R before bringing you through practical implementation.
This course covers supervised learning techniques like linear regression, logistic regression, support vector machines, and decision trees. It also covers unsupervised learning algorithms like K-means clustering and principle component analysis.
Unlike all the resources listed above, this course does assume previous calculus and linear algebra background. To take this course, you must be familiar with summation notation and matrix manipulation.
The reason I'm suggesting this course is because I find that it goes deep into the intuition behind machine learning models.
It will teach you to pick the best machine learning algorithm based on the distribution of variables.
You will learn the different sampling techniques that can be employed to train your model when you don't have sufficient data available.
You will also get answers to questions like "why can't linear regression be used for classification problems?"
If you want to get a comprehensive understanding of machine learning algorithms and how they work, I also suggest reading a book called Introduction to Statistical Learning that this course is based on.
I followed the steps above to teach myself data science.
This roadmap helped me break into the data industry and get a job as a data scientist.
Of course, your data science learning journey doesn't end here.
There is so much more to machine learning and data science, and these topics barely scratch the surface of all there is to learn.
This article contains affiliate links. This means that if you click on it and choose to buy a course I linked above, a small portion of your subscription fee will go to me.
As a creator, this helps me grow and continue to create content like this.
However, I only recommend courses I think are good. I have taken almost all of the courses mentioned above, and they have been vital in helping me with my transition to data science.
Thanks for your support!
That's all for this article, thanks for reading!
Every month, I send out the best of my writing to followers via e-mail. You can subscribe to my free newsletter to get this delivered straight to your inbox.