Deploying machine learning models to production is where many data scientists struggle. In this comprehensive guide, I’ll walk through the complete process of taking a TensorFlow model from notebook to production Flask application.
From Notebook to Production
The journey typically involves several stages:
- Model training and validation
- Model optimization for inference
- Containerization
- Deployment infrastructure
- Monitoring and maintenance
Model Optimization
TensorFlow models trained for accuracy aren’t always optimized for production inference. We need to consider:
- Model quantization to reduce size
- Pruning unnecessary weights
- TensorFlow Lite for mobile/edge
- ONNX conversion for cross-platform compatibility
Our approach used quantization-aware training, reducing model size by 70% while maintaining 99% accuracy.
Flask Application Structure
def load_model():
model = tf.keras.models.load_model('model.h5')
return model
def predict(input_data):
preprocessed = preprocess(input_data)
prediction = model.predict(preprocessed)
return postprocess(prediction)
Docker Containerization
We created a multi-stage Docker build:
- Stage 1: Build Python dependencies
- Stage 2: Copy only production artifacts
- Final: Runtime container with Flask app
This reduced image size from 3GB to 500MB.
Deployment Options
| Option | Best For | Tradeoff |
|---|---|---|
| Traditional VMs | Full control | More overhead |
| Kubernetes | High scale | Complexity |
| AWS Lambda | Sporadic traffic | Cold starts |
| Cloud AI Platform | GCP ecosystem | Vendor lock-in |
Performance Considerations
- Model inference time: 50–200ms depending on size
- Request batching for higher throughput
- Caching predictions for identical inputs
- Async processing for long-running inferences
Monitoring and Alerting
Production models require continuous monitoring:
- Prediction latency metrics
- Model accuracy tracking (data drift detection)
- Resource utilization (CPU, memory, GPU)
- Error rates and exceptions
Tools we used:
- Prometheus for metrics
- Grafana for visualization
- TensorBoard for model-specific metrics
- Custom dashboards for business metrics
Lessons Learned
- Test your model with production-like data
- Plan for model updates without downtime
- Monitor for data drift early
- Implement proper versioning for models
- Document preprocessing steps thoroughly
Production ML deployment is a continuous journey of improvement, monitoring, and optimization.