I came across the term "MLOps engineer" a year back, when I teaching myself data science. I read many blog posts by data scientists who strongly suggested learning MLOps skills. They stated that it wasn't sufficient to just build and train models if these models couldn't be used in production.
After working in the data industry for almost two years, I've come to realize that this was completely true. Data science models are of no use if they can't be deployed and continuously learn from new data.
As a data scientist, it is important to at least have a basic understanding of MLOps. In most companies, the data science team isn't just in charge of building and training machine learning models, but also has to put them into production.
The job scope of a data scientist usually entails the entire data science lifecycle - from collecting data to deploying ML models onto the company's server.
Unfortunately, most data science bootcamps and online courses only teach students model training. There is too much focus on creating and evaluating machine learning models, and almost none of these courses cover what happens after the model is built.
In this article, I will try to clear some of the misconception surrounding MLOps, break down the role of an MLOps engineer.
What is MLOps?
MLOps is a set of practices used to deploy and maintain machine learning models in production.
In layman terms, MLOps covers everything that comes after model building. After a model is trained and evaluated, it is ready for end-use. It can then make predictions on new user data entering the system.
Let's take the example of a simple recommendation system on a music streaming site.
Once the model is built based on past user data and is able to make accurate predictions, it is deployed on the site. The model will then take in new data from the site and come up with song recommendations for each user.
Why do we need MLOps?
In the past, a lot of importance was given to the model building aspect of the machine learning workflow. Model training and evaluation was considered to be the most time-consuming aspect of the data science lifecycle, and the job usually ended there.
There was little importance given to everything that happened after the trained ML model was deployed.
However, as time passed, it became increasingly obvious that deployed machine learning models were starting to underperform. Models that achieved high accuracy during validation were still not able to make correct predictions on new user data entering the system.
This is attributed to a number of factors:
a) Firstly, the model that was trained on old data might not be adapting to new data flowing into the system.
Let's take the recommender system example again. As users start expanding their song selection and their taste in music changes, the model needs to be updated.
Depending on the nature of data, these updates might need to be done on a monthly, weekly, or even on an hourly basis.
b) Next, the model might be unable to generalize to data it has never seen before. If there are discrepancies between training data and real-world data, the ML model won't be able to perform well outside the domain it was trained on. If not fixed, this will lead to inaccurate predictions and poor performance.
c) MLOps engineers also need to ensure that the training data is processed using the same techniques as the real-world data entering the system. The same data cleaning/pre-processing techniques need to be applied to new data, to ensure that there are no discrepancies in model prediction.
d) Also, since all this will be done in real-time, the MLOps engineer needs to ensure that system performance won't be negatively impacted. This means that they need to re-factor codes written by data scientists and optimize it to ensure high performance.
Most companies use cloud platforms to deploy and maintain their machine learning models.
As a data scientist, it is a good idea to explore some of these platforms and their MLOps solutions.
If you have worked on a data science project before and have some code sitting in your Jupyter Notebook, it might be a good idea to try deploying it online.
Platforms like Microsoft Azure even provide you with an MLOps pipeline that you can modify to fit your own use case.
You can also try implementing a full-stack machine learning project using only cloud services, in order to get a better idea of the end-to-end ML workflow from data collection to deployment/maintenance.
If you are interested in learning more about model deployment, I have two tutorials to help you get started:
- Build a machine learning web-app with Flask and deploy with GCP
- Deploy a machine learning web-app with Heroku
That's all for this article, thanks for reading!