MLOps -Open Source flow
Module 1: Model Development and Training with PyTorch Lightning
Article: Streamlining Machine Learning with PyTorch Lightning
Introduction:
The foundation of any MLOps pipeline is the machine learning model itself. PyTorch Lightning is a powerful framework that simplifies the process of developing and training complex models. It provides a structured way to organize code, manage hardware resources, and abstract away boilerplate, allowing data scientists to focus on the core logic of their models.
Key Features and Benefits:
Abstraction of Boilerplate: Lightning handles common training loops, hardware management (GPUs, TPUs), and logging, reducing the amount of code you need to write.
Organization and Structure: It enforces a clear structure for your code, making it more readable and maintainable.
Scalability: Lightning makes it easy to scale your training across multiple GPUs or machines.
Reproducibility: It promotes best practices for reproducibility, ensuring that your experiments can be easily replicated.
Example:
Let's illustrate how PyTorch Lightning simplifies the training of a simple image classification model using MNIST dataset.
Python
import torch
import torch.nn.functional as F
from torch import nn
from torch.utils.data import DataLoader, random_split
from torchvision.datasets import MNIST
from torchvision import transforms
import pytorch_lightning as pl
class MNISTClassifier(pl.LightningModule):
def init(self):
super().__init__()
self.l1 = nn.Linear(28 * 28, 10)
def forward(self, x):
return torch.relu(self.l1(x.view(x.size(0), -1)))
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self(x)
loss = F.cross_entropy(y_hat, y)
self.log('train_loss', loss)
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-3)
# Data Preparation
transform = transforms.ToTensor()
mnist_full = MNIST(".", train=True, download=True, transform=transform)
train_set, val_set = random_split(mnist_full, [55000, 5000])
train_loader = DataLoader(train_set, batch_size=32)
val_loader = DataLoader(val_set, batch_size=32)
# Model Training
model = MNISTClassifier()
trainer = pl.Trainer(max_epochs=3)
trainer.fit(model, train_loader, val_loader)
Explanation:
The MNISTClassifier class inherits from pl.LightningModule, providing the structure for our model.
The training_step method defines the training logic, including loss calculation.
The configure_optimizers method specifies the optimizer to use.
The pl.Trainer handles the training loop, logging, and other boilerplate.
PyTorch Lightning simplifies the training process, making it easier to develop and manage complex machine learning models.
Module 2: Experiment Tracking with Weights & Biases
Article: Enhancing ML Development with Weights & Biases
Introduction:
In machine learning, experimentation is crucial. Weights & Biases (W&B) is a powerful tool for tracking and visualizing experiments, enabling data scientists to understand model behavior, compare different configurations, and improve reproducibility.
Key Features and Benefits:
Experiment Tracking: Logs metrics, hyperparameters, code, and system information during training.
Visualization: Provides interactive dashboards for visualizing and analyzing experiment results.
Collaboration: Enables teams to collaborate on experiments and share insights.
Reproducibility: Helps ensure reproducibility by tracking all relevant information.
Example:
Let's integrate W&B into our PyTorch Lightning example.
Python
import torch
import torch.nn.functional as F
from torch import nn
from torch.utils.data import DataLoader, random_split
from torchvision.datasets import MNIST
from torchvision import transforms
import pytorch_lightning as pl
import wandb
from pytorch_lightning.loggers import WandbLogger
class MNISTClassifier(pl.LightningModule):
# ... (same model definition as before)
# Data Preparation
# ... (same data loading as before)
# Model Training with W&B
wandb_logger = WandbLogger(project="mnist-classification")
model = MNISTClassifier()
trainer = pl.Trainer(max_epochs=3, logger=wandb_logger)
trainer.fit(model, train_loader, val_loader)
Explanation:
We import wandb and WandbLogger from PyTorch Lightning.
We initialize a WandbLogger with our project name.
We pass the wandb_logger to the pl.Trainer.
Now, W&B will automatically log metrics, hyperparameters, and other relevant information during training, allowing us to visualize and analyze our experiments.
Module 3: Configuration Management with Hydra
Article: Simplifying Configuration with Hydra
Introduction:
Managing complex configurations is a common challenge in machine learning projects. Hydra is a configuration management tool that simplifies this process by providing a hierarchical configuration system and enabling easy experimentation with different settings.
Key Features and Benefits:
Hierarchical Configuration: Organizes configurations into a hierarchical structure, making them easier to manage.
Composition: Allows composing configurations from multiple files, enabling modularity and reuse.
Experimentation: Simplifies running experiments with different configurations by providing a command-line interface.
Reproducibility: Ensures reproducibility by tracking the configuration used for each experiment.
Example:
Let's use Hydra to manage our model's hyperparameters.
Python
import torch
import torch.nn.functional as F
from torch import nn
from torch.utils.data import DataLoader, random_split
from torchvision.datasets import MNIST
from torchvision import transforms
import pytorch_lightning as pl
import hydra
from omegaconf import DictConfig, OmegaConf
class MNISTClassifier(pl.LightningModule):
def init(self, cfg):
super().__init__()
self.l1 = nn.Linear(28 * 28, cfg.hidden_size)
self.l2 = nn.Linear(cfg.hidden_size, 10)
def forward(self, x):
x = torch.relu(self.l1(x.view(x.size(0), -1)))
return self.l2(x)
def training_step(self, batch, batch_idx):
# ... (same training step as before)
def configure_optimizers(self, cfg):
return torch.optim.Adam(self.parameters(), lr=cfg.lr)
@hydra.main(config_path=".", config_name="config")
def train(cfg: DictConfig):
transform = transforms.ToTensor()
mnist_full = MNIST(".", train=True, download=True, transform=transform)
train_set, val_set = random_split(mnist_full, [55000, 5000])
train_loader = DataLoader(train_set, batch_size=cfg.batch_size)
val_loader = DataLoader(val_set, batch_size=cfg.batch_size)
model = MNISTClassifier(cfg.model)
trainer = pl.Trainer(max_epochs=cfg.trainer.max_epochs)
trainer.fit(model, train_loader, val_loader)
if name == "__main__":
train()
config.yaml:
YAML
defaults:
- model: default
- trainer: default
model:
hidden_size: 128
trainer:
max_epochs: 3
lr: 0.001
batch_size: 32
Explanation:
We use @hydra.main to decorate our train function, enabling Hydra's configuration management.
We define our configuration in config.yaml, specifying hyperparameters like hidden_size, lr, and batch_size.
We access the configuration using cfg in our train function and MNISTClassifier class.
Hydra simplifies managing complex configurations and running experiments with different settings.
Module 4: Data Version Control with DVC
Article: Managing Data and Models with DVC
Introduction:
Data Version Control (DVC) is an open-source tool that extends Git to handle large datasets and machine learning models. It provides version control, reproducibility, and collaboration for data-intensive projects.
Key Features and Benefits:
Version Control for Data and Models: Enables versioning of data and models alongside code.
Reproducibility: Ensures that experiments can be reproduced by tracking data and model dependencies.
Collaboration: Facilitates collaboration by sharing data and models across teams.
Storage Agnostic: Supports various storage backends, including cloud storage (S3, GCS) and local file systems.
Example:
Let's illustrate how to use DVC to version our trained model and data.
Install DVC:
Bash
pip install dvc
dvc init
Track Data and Model:
Assume we have a trained model file model.pth and our MNIST dataset in a data directory.
Bash
dvc add model.pth
dvc add data
Commit Changes:
Bash
git add model.pth.dvc data.dvc
git commit -m "Track model and data with DVC"
Store Data Remotely (e.g., AWS S3):
Bash
dvc remote add storage s3://your-bucket-name
dvc push
Retrieve Data and Model:
Bash
dvc pull
Explanation:
dvc add tracks the specified files or directories and generates .dvc files that store metadata.
dvc push uploads the data and models to the remote storage.
dvc pull retrieves the data and models from the remote storage.
DVC helps manage large datasets and models, ensuring reproducibility and collaboration.
Module 5: Model Conversion with ONNX
Article: Ensuring Model Portability with ONNX
Introduction:
ONNX (Open Neural Network Exchange) is an open standard format for representing machine learning models. It enables interoperability between different frameworks and runtimes, allowing models to be deployed in various environments.
Key Features and Benefits:
Interoperability: Allows models to be used with different frameworks and runtimes.
Optimization: Enables model optimization for specific hardware and software platforms.
Deployment Flexibility: Provides flexibility in deploying models across various environments.
Example:
Let's convert our PyTorch model to ONNX format.
Python
import torch
import torch.onnx
from model import MNISTClassifier # Assuming your PyTorch Lightning model is in model.py
# Load the trained model
model = MNISTClassifier.load_from_checkpoint("path/to/your/checkpoint.ckpt")
model.eval()
# Create a dummy input
dummy_input = torch.randn(1, 1, 28, 28)
# Export the model to ONNX format
torch.onnx.export(
model,
dummy_input,
"model.onnx",
export_params=True,
opset_version=11,
do_constant_folding=True,
input_names=["input"],
output_names=["output"],
dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}},
)
Explanation:
We load the trained PyTorch model from a checkpoint.
We create a dummy input to trace the model's execution.
We use torch.onnx.export to convert the model to ONNX format, specifying input and output names and dynamic axes.
ONNX ensures that our model can be used with various frameworks and runtimes, improving portability.
Module 6: Containerization with Docker
Article: Packaging Models with Docker for Consistent Deployment
Introduction:
Docker is a platform for building, shipping, and running applications in containers. Containers provide a consistent and isolated environment for running applications, ensuring they work the same way regardless of the underlying infrastructure.
Key Features and Benefits:
Consistency: Ensures that applications run the same way across different environments.
Isolation: Provides isolation between applications, preventing conflicts.
Portability: Enables applications to be easily moved between different machines and cloud platforms.
Scalability: Facilitates scaling applications by running multiple containers.
Example:
Let's create a Dockerfile to package our ONNX model and its dependencies.
Dockerfile:
Dockerfile
FROM python:3.8-slim-buster
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model.onnx .
COPY app.py .
CMD ["python", "app.py"]
requirements.txt:
onnxruntime
torch
torchvision
Python
import onnxruntime
import numpy as np
from PIL import Image
import torchvision.transforms as transforms
# Load the ONNX model
session = onnxruntime.InferenceSession("model.onnx")
# Preprocessing function
def preprocess(image_path):
transform = transforms.Compose([
transforms.Resize((28, 28)),
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
image = Image.open(image_path).convert("L")
image = transform(image).unsqueeze(0).numpy()
return image
# Inference function
def predict(image_path):
input_data = preprocess(image_path)
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
output = session.run([output_name], {input_name: input_data})[0]
return np.argmax(output)
# Example usage
image_path = "test_image.png"
prediction = predict(image_path)
print(f"Prediction: {prediction}")
Build and Run the Docker Image:
Bash
docker build -t mnist-onnx .
docker run -p 8080:8080 mnist-onnx
Explanation:
The Dockerfile specifies the base image, working directory, dependencies, and entry point.
The requirements.txt file lists the Python packages required by the application.
The app.py script loads the ONNX model and performs inference.
Docker ensures that our model and its dependencies are packaged into a consistent and portable container.
\Module 7: CI/CD with GitHub Actions and AWS ECR
Article: Automating Deployment with GitHub Actions and AWS ECR
Introduction:
Continuous Integration (CI) and Continuous Delivery (CD) are crucial for automating the build, test, and deployment of machine learning models. GitHub Actions provides a platform for automating these workflows, and AWS Elastic Container Registry (ECR) serves as a secure and scalable repository for Docker images.
Key Features and Benefits:
Automation: Automates the build, test, and deployment process.
Consistency: Ensures consistent deployments across different environments.
Scalability: Enables scalable deployments by leveraging cloud resources.
Collaboration: Facilitates collaboration by providing a centralized platform for managing deployments.
Example:
Let's create a GitHub Actions workflow to build and push our Docker image to AWS ECR.
Create an ECR Repository:
In your AWS account, create an ECR repository to store your Docker images.
Configure AWS Credentials:
Set up AWS credentials in your GitHub repository secrets.
Create a GitHub Actions Workflow:
Create a .github/workflows/deploy.yml file with the following content:
YAML
name: Deploy to AWS ECR
on:
push:
branches:
- main
jobs:
build_and_push:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build and tag Docker image
id: build-image
run: |
docker build -t ${{ steps.login-ecr.outputs.registry }}/mnist-onnx:${{ github.sha }} .
echo "::set-output name=image::${{ steps.login-ecr.outputs.registry }}/mnist-onnx:${{ github.sha }}"
- name: Push Docker image to ECR
id: push-image
run: |
docker push ${{ steps.login-ecr.outputs.registry }}/mnist-onnx:${{ github.sha }}
Explanation:
The workflow is triggered on pushes to the main branch.
It checks out the code, configures AWS credentials, and logs in to ECR.
It builds the Docker image and tags it with the commit SHA.
It pushes the Docker image to ECR.
GitHub Actions automates the build and deployment process, ensuring consistent and scalable deployments.
Module 8: Serverless Deployment with AWS Lambda & API Gateway
Article: Deploying Models as Serverless Functions with AWS Lambda and API Gateway
Introduction:
AWS Lambda allows running code without managing servers, and API Gateway creates API endpoints for accessing Lambda functions. This combination enables serverless deployment of machine learning models, providing scalability and cost-effectiveness.
Key Features and Benefits:
Serverless: Eliminates the need to manage servers.
Scalability: Automatically scales based on demand.
Cost-Effectiveness: Charges only for the compute time consumed.
Integration: Integrates with other AWS services.
Example:
Let's deploy our ONNX model as a Lambda function and expose it through API Gateway.
Create a Lambda Function:
Create a Lambda function in your AWS account.
Configure Lambda Function:
Set the runtime to Python 3.8.
Add the necessary dependencies (ONNX Runtime, Pillow) as Lambda layers or include them in the deployment package.
Write the Lambda function code to load the ONNX model and perform inference.
Lambda Function Code (lambda_function.py):
Python
import json
import onnxruntime
import numpy as np
from PIL import Image
import torchvision.transforms as transforms
import base64
# Load the ONNX model
session = onnxruntime.InferenceSession("model.onnx")
# Preprocessing function
def preprocess(image_data):
transform = transforms.Compose([
transforms.Resize((28, 28)),
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
image = Image.open(image_data).convert("L")
image = transform(image).unsqueeze(0).numpy()
return image
# Inference function
def predict(image_data):
input_data = preprocess(image_data)
input_name = session.get_inputs()[0].name
output_name = session.get_outputs()[0].name
output = session.run([output_name], {input_name: input_data})[0]
return np.argmax(output).item()
def lambda_handler(event, context):
try:
image_data = base64.b64decode(event["body"])
prediction = predict(image_data)
return {
"statusCode": 200,
"body": json.dumps({"prediction": prediction}),
}
except Exception as e:
return {
"statusCode": 500,
"body": json.dumps({"error": str(e)}),
}
Create an API Gateway:
Create an API Gateway to expose the Lambda function as an API.
Configure API Gateway:
Create a POST method.
Integrate the POST method with the Lambda function.
Deploy the API.
Explanation:
The Lambda function loads the ONNX model and performs inference on the input image.
The API Gateway exposes the Lambda function as an API endpoint.
AWS Lambda and API Gateway provide a serverless and scalable way to deploy machine learning models.
Module 9: Monitoring with CloudWatch, Elasticsearch, and Kibana
Article: Ensuring Model Reliability with CloudWatch, Elasticsearch, and Kibana
Introduction:
Monitoring is crucial for ensuring the reliability and performance of deployed machine learning models. CloudWatch, Elasticsearch, and Kibana provide a comprehensive monitoring solution for logging, analyzing, and visualizing model metrics and logs.
Key Features and Benefits:
Logging: Collects and stores logs from Lambda functions and other AWS services.
Analysis: Enables searching and analyzing logs using Elasticsearch.
Visualization: Provides interactive dashboards for visualizing metrics and logs using Kibana.
Alerting: Configures alerts based on specific metrics or log patterns.
Example:
Let's set up monitoring for our Lambda function using CloudWatch, Elasticsearch, and Kibana.
Configure CloudWatch Logging:
Lambda automatically sends logs to CloudWatch Logs.
Set up Elasticsearch and Kibana:
Create an Elasticsearch domain and a Kibana instance in your AWS account.
Configure Log Streaming:
Configure CloudWatch to stream logs to Elasticsearch.
Create Kibana Dashboards:
Create Kibana dashboards to visualize metrics and logs, such as latency, error rates, and prediction distributions.
Explanation:
CloudWatch Logs collects logs from the Lambda function.
Elasticsearch indexes the logs for efficient searching and analysis.
Kibana provides dashboards for visualizing the logs and creating custom visualizations.
CloudWatch, Elasticsearch, and Kibana enable robust monitoring of deployed machine learning models.
Comments
Post a Comment