AI/ML Models are the heart of product discovery. Whether it is query understanding, catalog understanding, personalization, all can benefit from AI. Building AI models is easier when datasets are limited to a customer, but when it comes to building models for a platform that hosts 100s of customers and each of those customers can have up to or more than 10 models, the problem becomes significantly difficult. This is where AI workflows can help scale the model generation process.
Most data scientists excel at creating models, but only a tiny percentage of them make it to production. Venturebeat reports this number to be 13%, while Redapt claims it is 10%. This highlights the need for a good AI platform to bridge the gap between building the AI models offline and taking them to production.
Let’s talk about various stages of a model lifecycle to understand this better.
Stages of an AI/ML Model Lifecycle
There are various stages of building an AI model:
- Data Transformations/Preparation
- Model Training
- Model Tuning
- Model Validation
- Model Serving.
These stages, in fact, represent steps in an AI workflow. There are multiple open-source ML toolkits available to build an AI pipeline/workflow.
At Unbxd have built AI pipelines using Kubeflow to manage the model life cycle of an AI/ML model. Kubeflow comes up with almost all the ML toolkits needed to train, tune and deploy desired machine learning. Since the whole infrastructure is built on top of Kubernetes, we can use its various features like autoscaling, rollouts, rollbacks, load balancing, etc., to support AI models at scale.
Building AI Workflows and their Orchestration
AI Pipelines are built using a combination of data transformation and machine learning components.
Data Storage and Transformations
The Foundation of any good AI/ML model is data. There are mainly two types of data requirements when building/serving a model; real-time data and historical data.
Historical data is the data that is needed to create an initial model artifact and model analysis. However, real-time/streaming datasets are usually needed for model serving.
Historical data is usually powered by a data lake or data warehouse based on requirements. We have gone for a data lake built on top of an object store S3 and columnar data format parquet.
Common data sources like catalog, clickstream, search API usage, etc., are partitioned by custom id to optimize customer-specific data transformations. These datasets are generated by running spark jobs on top of raw data in the data lake. We also generate many use case datasets like query-product stats, relevant user impressions, etc., to speed up data analysis and model generation pipeline.
One of the major advantages of such a data storage layer is that it helps us quickly perform selective data transformation like query stats and product stats on selective event type (click/cart/order).
We perform all data transformation tasks on standalone spark or spark on Kubernetes, depending on the use case.
Real-time data is powered by a feature store kept in sync by the analytics pipeline or catalog ingestion pipeline, depending on the data source.
Model Training & Tuning
Model training components are responsible for generating the model artifact. These components use TensorFlow/PyTorch/Theano/Spark ML libraries based on the model.
Distributed Model Training and HPO
One of the major problems in model training is the enormous time taken when the dataset is huge or model architecture is complex. We usually go for bigger GPU instances, and even model training can take hours, if not days. Some of the commonly used ML frameworks (TensorFlow, PyTorch, Horovod) introduced distributed training to solve this problem. Here we make use of multiple instances to distribute the dataset and train the model. There are various strategies for distributed training of ML models, which is beyond the scope of this blog post. We use distributed training operators supported by Kubeflow based on the model algorithm.
Hyperparameter Tuning & Neural Architecture Search
Automated tuning of machine learning model’s hyperparameters and neural architecture search is an important task for building models at scale. We use Katib experiments to perform hyperparameter tuning of most of the models at Unbxd. Following is an example of a personalization Katib experiment:
Model Validation & Deploy Components
Model validation usually consists of running integration tests using the model artifact generated in the model training step. The validation step can also serve as a model selector when multiple models are trained as part of the workflow. Once the model is validated, it is passed on to the next stage, model deployment.
Model deploy components are responsible for pushing batch prediction output to sink databases or model serving components built using KFServing. KFServing gives us the ability to roll out models using various APIs available for different ML platforms (TensorFlow, XGBoost, SK-learn, etc.)
Model Artifacts & Lineage Tracking
Kubeflow Pipelines support automatic artifact and lineage tracking powered by ML Metadata. ML Pipelines usually involve various types of artifacts like models, data statistics, model evaluation metrics, etc.
Artifact lineage tracking helps us answer some basic questions regarding any model artifact like:
- What datasets were used to train the model?
- What were the metrics generated with this model and dataset?
And many more related questions.
AI Pipeline run metadata can also be accessed by using metadata server, pipeline run details API. This helped us build a reliable algorithm metric, run details API for various use cases.
We have built an AI workflow orchestration layer responsible for generating AI workflows and resource management of each of the components in the above workflow.
Salient features include the following:
- Customer, algorithm-specific AI workflow generation
- Resource templating based on component type & its resource requirement
- Workflow status details APIs using Kubeflow MLMD APIs
- Tier/Vertical/Customer level configurations for each algorithm*
* Here algorithm could be an ML model or simply a data transformation job for some report generation
Following is the complete AI workflow representation of a personalization model.
Above DAG (directed acyclic graph) demonstrates an AI workflow consisting of all the deep recommender-based personalization model stages for a customer. We host 100s of such models scheduled to run daily/weekly based on the requirement and algorithm.
Effectively taking models into production doesn’t have to be hard if all the stages of the ML model lifecycle are considered. As we discussed, building AI workflows for models gives us the ability to reproduce, experiment, and effectively serve models at scale. We plan to provide more details into AI Pipelines architecture & its various use cases in our next blog posts.