MLOps -Open Source flow

Module 1: Model Development and Training with PyTorch Lightning

Article: Streamlining Machine Learning with PyTorch Lightning

Introduction:

The foundation of any MLOps pipeline is the machine learning model itself. PyTorch Lightning is a powerful framework that simplifies the process of developing and training complex models. It provides a structured way to organize code, manage hardware resources, and abstract away boilerplate, allowing data scientists to focus on the core logic of their models.

Key Features and Benefits:

Abstraction of Boilerplate: Lightning handles common training loops, hardware management (GPUs, TPUs), and logging, reducing the amount of code you need to write.

Organization and Structure: It enforces a clear structure for your code, making it more readable and maintainable.

Scalability: Lightning makes it easy to scale your training across multiple GPUs or machines.

Reproducibility: It promotes best practices for reproducibility, ensuring that your experiments can be easily replicated.

Example:

Let's illustrate how PyTorch Lightning simplifies the training of a simple image classification model using MNIST dataset.

Python

import torch

import torch.nn.functional as F

from torch import nn

from torch.utils.data import DataLoader, random_split

from torchvision.datasets import MNIST

from torchvision import transforms

import pytorch_lightning as pl

class MNISTClassifier(pl.LightningModule):

def init(self):

super().__init__()

self.l1 = nn.Linear(28 * 28, 10)

def forward(self, x):

return torch.relu(self.l1(x.view(x.size(0), -1)))

def training_step(self, batch, batch_idx):

x, y = batch

y_hat = self(x)

loss = F.cross_entropy(y_hat, y)

self.log('train_loss', loss)

return loss

def configure_optimizers(self):

return torch.optim.Adam(self.parameters(), lr=1e-3)

# Data Preparation

transform = transforms.ToTensor()

mnist_full = MNIST(".", train=True, download=True, transform=transform)

train_set, val_set = random_split(mnist_full, [55000, 5000])

train_loader = DataLoader(train_set, batch_size=32)

val_loader = DataLoader(val_set, batch_size=32)

# Model Training

model = MNISTClassifier()

trainer = pl.Trainer(max_epochs=3)

trainer.fit(model, train_loader, val_loader)

Explanation:

The MNISTClassifier class inherits from pl.LightningModule, providing the structure for our model.

The training_step method defines the training logic, including loss calculation.

The configure_optimizers method specifies the optimizer to use.

The pl.Trainer handles the training loop, logging, and other boilerplate.

PyTorch Lightning simplifies the training process, making it easier to develop and manage complex machine learning models.

Module 2: Experiment Tracking with Weights & Biases

Article: Enhancing ML Development with Weights & Biases

Introduction:

In machine learning, experimentation is crucial. Weights & Biases (W&B) is a powerful tool for tracking and visualizing experiments, enabling data scientists to understand model behavior, compare different configurations, and improve reproducibility.

Key Features and Benefits:

Experiment Tracking: Logs metrics, hyperparameters, code, and system information during training.

Visualization: Provides interactive dashboards for visualizing and analyzing experiment results.

Collaboration: Enables teams to collaborate on experiments and share insights.

Reproducibility: Helps ensure reproducibility by tracking all relevant information.

Example:

Let's integrate W&B into our PyTorch Lightning example.

Python

import torch

import torch.nn.functional as F

from torch import nn

from torch.utils.data import DataLoader, random_split

from torchvision.datasets import MNIST

from torchvision import transforms

import pytorch_lightning as pl

import wandb

from pytorch_lightning.loggers import WandbLogger

class MNISTClassifier(pl.LightningModule):

# ... (same model definition as before)

# Data Preparation

# ... (same data loading as before)

# Model Training with W&B

wandb_logger = WandbLogger(project="mnist-classification")

model = MNISTClassifier()

trainer = pl.Trainer(max_epochs=3, logger=wandb_logger)

trainer.fit(model, train_loader, val_loader)

Explanation:

We import wandb and WandbLogger from PyTorch Lightning.

We initialize a WandbLogger with our project name.

We pass the wandb_logger to the pl.Trainer.

Now, W&B will automatically log metrics, hyperparameters, and other relevant information during training, allowing us to visualize and analyze our experiments.

Module 3: Configuration Management with Hydra

Article: Simplifying Configuration with Hydra

Introduction:

Managing complex configurations is a common challenge in machine learning projects. Hydra is a configuration management tool that simplifies this process by providing a hierarchical configuration system and enabling easy experimentation with different settings.

Key Features and Benefits:

Hierarchical Configuration: Organizes configurations into a hierarchical structure, making them easier to manage.

Composition: Allows composing configurations from multiple files, enabling modularity and reuse.

Experimentation: Simplifies running experiments with different configurations by providing a command-line interface.

Reproducibility: Ensures reproducibility by tracking the configuration used for each experiment.

Example:

Let's use Hydra to manage our model's hyperparameters.

Python

import torch

import torch.nn.functional as F

from torch import nn

from torch.utils.data import DataLoader, random_split

from torchvision.datasets import MNIST

from torchvision import transforms

import pytorch_lightning as pl

import hydra

from omegaconf import DictConfig, OmegaConf

class MNISTClassifier(pl.LightningModule):

def init(self, cfg):

super().__init__()

self.l1 = nn.Linear(28 * 28, cfg.hidden_size)

self.l2 = nn.Linear(cfg.hidden_size, 10)

def forward(self, x):

x = torch.relu(self.l1(x.view(x.size(0), -1)))

return self.l2(x)

def training_step(self, batch, batch_idx):

# ... (same training step as before)

def configure_optimizers(self, cfg):

return torch.optim.Adam(self.parameters(), lr=cfg.lr)

@hydra.main(config_path=".", config_name="config")

def train(cfg: DictConfig):

transform = transforms.ToTensor()

mnist_full = MNIST(".", train=True, download=True, transform=transform)

train_set, val_set = random_split(mnist_full, [55000, 5000])

train_loader = DataLoader(train_set, batch_size=cfg.batch_size)

val_loader = DataLoader(val_set, batch_size=cfg.batch_size)

model = MNISTClassifier(cfg.model)

trainer = pl.Trainer(max_epochs=cfg.trainer.max_epochs)

trainer.fit(model, train_loader, val_loader)

if name == "__main__":

train()

config.yaml:

YAML

defaults:

- model: default

- trainer: default

model:

hidden_size: 128

trainer:

max_epochs: 3

lr: 0.001

batch_size: 32

Explanation:

We use @hydra.main to decorate our train function, enabling Hydra's configuration management.

We define our configuration in config.yaml, specifying hyperparameters like hidden_size, lr, and batch_size.

We access the configuration using cfg in our train function and MNISTClassifier class.

Hydra simplifies managing complex configurations and running experiments with different settings.

Module 4: Data Version Control with DVC

Article: Managing Data and Models with DVC

Introduction:

Data Version Control (DVC) is an open-source tool that extends Git to handle large datasets and machine learning models. It provides version control, reproducibility, and collaboration for data-intensive projects.

Key Features and Benefits:

Version Control for Data and Models: Enables versioning of data and models alongside code.

Reproducibility: Ensures that experiments can be reproduced by tracking data and model dependencies.

Collaboration: Facilitates collaboration by sharing data and models across teams.

Storage Agnostic: Supports various storage backends, including cloud storage (S3, GCS) and local file systems.

Example:

Let's illustrate how to use DVC to version our trained model and data.

Install DVC:

Bash

pip install dvc

dvc init

Track Data and Model:

Assume we have a trained model file model.pth and our MNIST dataset in a data directory.

Bash

dvc add model.pth

dvc add data

Commit Changes:

Bash

git add model.pth.dvc data.dvc

git commit -m "Track model and data with DVC"

Store Data Remotely (e.g., AWS S3):

Bash

dvc remote add storage s3://your-bucket-name

dvc push

Retrieve Data and Model:

Bash

dvc pull

Explanation:

dvc add tracks the specified files or directories and generates .dvc files that store metadata.

dvc push uploads the data and models to the remote storage.

dvc pull retrieves the data and models from the remote storage.

DVC helps manage large datasets and models, ensuring reproducibility and collaboration.

Module 5: Model Conversion with ONNX

Article: Ensuring Model Portability with ONNX

Introduction:

ONNX (Open Neural Network Exchange) is an open standard format for representing machine learning models. It enables interoperability between different frameworks and runtimes, allowing models to be deployed in various environments.

Key Features and Benefits:

Interoperability: Allows models to be used with different frameworks and runtimes.

Optimization: Enables model optimization for specific hardware and software platforms.

Deployment Flexibility: Provides flexibility in deploying models across various environments.

Example:

Let's convert our PyTorch model to ONNX format.

Python

import torch

import torch.onnx

from model import MNISTClassifier # Assuming your PyTorch Lightning model is in model.py

# Load the trained model

model = MNISTClassifier.load_from_checkpoint("path/to/your/checkpoint.ckpt")

model.eval()

# Create a dummy input

dummy_input = torch.randn(1, 1, 28, 28)

# Export the model to ONNX format

torch.onnx.export(

model,

dummy_input,

"model.onnx",

export_params=True,

opset_version=11,

do_constant_folding=True,

input_names=["input"],

output_names=["output"],

dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}},

)

Explanation:

We load the trained PyTorch model from a checkpoint.

We create a dummy input to trace the model's execution.

We use torch.onnx.export to convert the model to ONNX format, specifying input and output names and dynamic axes.

ONNX ensures that our model can be used with various frameworks and runtimes, improving portability.

Module 6: Containerization with Docker

Article: Packaging Models with Docker for Consistent Deployment

Introduction:

Docker is a platform for building, shipping, and running applications in containers. Containers provide a consistent and isolated environment for running applications, ensuring they work the same way regardless of the underlying infrastructure.

Key Features and Benefits:

Consistency: Ensures that applications run the same way across different environments.

Isolation: Provides isolation between applications, preventing conflicts.

Portability: Enables applications to be easily moved between different machines and cloud platforms.

Scalability: Facilitates scaling applications by running multiple containers.

Example:

Let's create a Dockerfile to package our ONNX model and its dependencies.

Dockerfile:

Dockerfile

FROM python:3.8-slim-buster

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY model.onnx .

COPY app.py .

CMD ["python", "app.py"]

requirements.txt:

onnxruntime

torch

torchvision

app.py:

Python

import onnxruntime

import numpy as np

from PIL import Image

import torchvision.transforms as transforms

# Load the ONNX model

session = onnxruntime.InferenceSession("model.onnx")

# Preprocessing function

def preprocess(image_path):

transform = transforms.Compose([

transforms.Resize((28, 28)),

transforms.ToTensor(),

transforms.Normalize((0.1307,), (0.3081,))

])

image = Image.open(image_path).convert("L")

image = transform(image).unsqueeze(0).numpy()

return image

# Inference function

def predict(image_path):

input_data = preprocess(image_path)

input_name = session.get_inputs()[0].name

output_name = session.get_outputs()[0].name

output = session.run([output_name], {input_name: input_data})[0]

return np.argmax(output)

# Example usage

image_path = "test_image.png"

prediction = predict(image_path)

print(f"Prediction: {prediction}")

Build and Run the Docker Image:

Bash

docker build -t mnist-onnx .

docker run -p 8080:8080 mnist-onnx

Explanation:

The Dockerfile specifies the base image, working directory, dependencies, and entry point.

The requirements.txt file lists the Python packages required by the application.

The app.py script loads the ONNX model and performs inference.

Docker ensures that our model and its dependencies are packaged into a consistent and portable container.

\Module 7: CI/CD with GitHub Actions and AWS ECR

Article: Automating Deployment with GitHub Actions and AWS ECR

Introduction:

Continuous Integration (CI) and Continuous Delivery (CD) are crucial for automating the build, test, and deployment of machine learning models. GitHub Actions provides a platform for automating these workflows, and AWS Elastic Container Registry (ECR) serves as a secure and scalable repository for Docker images.

Key Features and Benefits:

Automation: Automates the build, test, and deployment process.

Consistency: Ensures consistent deployments across different environments.

Scalability: Enables scalable deployments by leveraging cloud resources.

Collaboration: Facilitates collaboration by providing a centralized platform for managing deployments.

Example:

Let's create a GitHub Actions workflow to build and push our Docker image to AWS ECR.

Create an ECR Repository:

In your AWS account, create an ECR repository to store your Docker images.

Configure AWS Credentials:

Set up AWS credentials in your GitHub repository secrets.

Create a GitHub Actions Workflow:

Create a .github/workflows/deploy.yml file with the following content:

YAML

name: Deploy to AWS ECR

on:

push:

branches:

- main

jobs:

build_and_push:

runs-on: ubuntu-latest

steps:

- name: Checkout code

uses: actions/checkout@v2

- name: Configure AWS credentials

uses: aws-actions/configure-aws-credentials@v1

with:

aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}

aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

aws-region: ${{ secrets.AWS_REGION }}

- name: Login to Amazon ECR

id: login-ecr

uses: aws-actions/amazon-ecr-login@v1

- name: Build and tag Docker image

id: build-image

run: |

docker build -t ${{ steps.login-ecr.outputs.registry }}/mnist-onnx:${{ github.sha }} .

echo "::set-output name=image::${{ steps.login-ecr.outputs.registry }}/mnist-onnx:${{ github.sha }}"

- name: Push Docker image to ECR

id: push-image

run: |

docker push ${{ steps.login-ecr.outputs.registry }}/mnist-onnx:${{ github.sha }}

Explanation:

The workflow is triggered on pushes to the main branch.

It checks out the code, configures AWS credentials, and logs in to ECR.

It builds the Docker image and tags it with the commit SHA.

It pushes the Docker image to ECR.

GitHub Actions automates the build and deployment process, ensuring consistent and scalable deployments.

Module 8: Serverless Deployment with AWS Lambda & API Gateway

Article: Deploying Models as Serverless Functions with AWS Lambda and API Gateway

Introduction:

AWS Lambda allows running code without managing servers, and API Gateway creates API endpoints for accessing Lambda functions. This combination enables serverless deployment of machine learning models, providing scalability and cost-effectiveness.

Key Features and Benefits:

Serverless: Eliminates the need to manage servers.

Scalability: Automatically scales based on demand.

Cost-Effectiveness: Charges only for the compute time consumed.

Integration: Integrates with other AWS services.

Example:

Let's deploy our ONNX model as a Lambda function and expose it through API Gateway.

Create a Lambda Function:

Create a Lambda function in your AWS account.

Configure Lambda Function:

Set the runtime to Python 3.8.

Add the necessary dependencies (ONNX Runtime, Pillow) as Lambda layers or include them in the deployment package.

Write the Lambda function code to load the ONNX model and perform inference.

Lambda Function Code (lambda_function.py):

Python

import json

import onnxruntime

import numpy as np

from PIL import Image

import torchvision.transforms as transforms

import base64

# Load the ONNX model

session = onnxruntime.InferenceSession("model.onnx")

# Preprocessing function

def preprocess(image_data):

transform = transforms.Compose([

transforms.Resize((28, 28)),

transforms.ToTensor(),

transforms.Normalize((0.1307,), (0.3081,))

])

image = Image.open(image_data).convert("L")

image = transform(image).unsqueeze(0).numpy()

return image

# Inference function

def predict(image_data):

input_data = preprocess(image_data)

input_name = session.get_inputs()[0].name

output_name = session.get_outputs()[0].name

output = session.run([output_name], {input_name: input_data})[0]

return np.argmax(output).item()

def lambda_handler(event, context):

try:

image_data = base64.b64decode(event["body"])

prediction = predict(image_data)

return {

"statusCode": 200,

"body": json.dumps({"prediction": prediction}),

}

except Exception as e:

return {

"statusCode": 500,

"body": json.dumps({"error": str(e)}),

}

Create an API Gateway:

Create an API Gateway to expose the Lambda function as an API.

Configure API Gateway:

Create a POST method.

Integrate the POST method with the Lambda function.

Deploy the API.

Explanation:

The Lambda function loads the ONNX model and performs inference on the input image.

The API Gateway exposes the Lambda function as an API endpoint.

AWS Lambda and API Gateway provide a serverless and scalable way to deploy machine learning models.

Module 9: Monitoring with CloudWatch, Elasticsearch, and Kibana

Article: Ensuring Model Reliability with CloudWatch, Elasticsearch, and Kibana

Introduction:

Monitoring is crucial for ensuring the reliability and performance of deployed machine learning models. CloudWatch, Elasticsearch, and Kibana provide a comprehensive monitoring solution for logging, analyzing, and visualizing model metrics and logs.

Key Features and Benefits:

Logging: Collects and stores logs from Lambda functions and other AWS services.

Analysis: Enables searching and analyzing logs using Elasticsearch.

Visualization: Provides interactive dashboards for visualizing metrics and logs using Kibana.

Alerting: Configures alerts based on specific metrics or log patterns.

Example:

Let's set up monitoring for our Lambda function using CloudWatch, Elasticsearch, and Kibana.

Configure CloudWatch Logging:

Lambda automatically sends logs to CloudWatch Logs.

Set up Elasticsearch and Kibana:

Create an Elasticsearch domain and a Kibana instance in your AWS account.

Configure Log Streaming:

Configure CloudWatch to stream logs to Elasticsearch.

Create Kibana Dashboards:

Create Kibana dashboards to visualize metrics and logs, such as latency, error rates, and prediction distributions.

Explanation:

CloudWatch Logs collects logs from the Lambda function.

Elasticsearch indexes the logs for efficient searching and analysis.

Kibana provides dashboards for visualizing the logs and creating custom visualizations.

CloudWatch, Elasticsearch, and Kibana enable robust monitoring of deployed machine learning models.

Search This Blog

Leaping to next generation technologies

MLOps -Open Source flow

Comments

Post a Comment

Popular posts from this blog

AI Agents for Enterprise Leaders -Next Era of Organizational Transformation

Airport twin basic requirements

AI രസതന്ത്രജ്ഞൻ: തൂവൽ പോലെ ഭാരം കുറഞ്ഞ സ്റ്റീലിന്റെ സ്വപ്നം യാഥാർത്ഥ്യമായ കഥ