DevOps with Docker - Lessons from Production Hell
Real-world Docker lessons learned while maintaining 99.9% uptime at Intelity and scaling applications across multiple companies. Spoiler: It's not just about docker run.
When I first started containerizing applications at Bhoos Games, I thought Docker was just a fancy way to avoid “it works on my machine” problems. Three companies and countless production deployments later, I’ve learned that Docker is both your best friend and your worst nightmare - sometimes simultaneously.
The Journey: From Docker Newbie to Production Survivor
My Docker journey started at Bhoos Games when we needed to containerize our React Native game engine servers. The challenge? Multiple games, different Node.js versions, and a deployment process that involved more prayer than science.
Pro Tip: Start Small, Think Big
Don’t try to containerize your entire monolith on day one. Start with stateless services, learn the ropes, then gradually move to more complex components. Your future self will thank you.
The Dockerfile That Changed Everything
At Intelity, our conversational AI platform needed to handle multiple ML models, FastAPI endpoints, and maintain sub-second latency. Here’s the Dockerfile structure that saved our sanity:
# Multi-stage build for ML applications
FROM python:3.9-slim as base
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
FROM base as development
COPY requirements-dev.txt .
RUN pip install --no-cache-dir -r requirements-dev.txt
COPY . .
CMD ["uvicorn", "main:app", "--reload", "--host", "0.0.0.0"]
FROM base as production
COPY . .
RUN useradd --create-home --shell /bin/bash app \
&& chown -R app:app /app
USER app
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Multi-stage Builds Are Your Friend
Use multi-stage builds to keep production images lean. Development dependencies don’t belong in production containers. This single change reduced our image size by 60%.
Docker Compose: Orchestrating the Chaos
At Ensemble Matrix, while working on the cheque clearing automation project, we had a complex setup: computer vision services, database cleanup tools, and monitoring systems. Docker Compose became our orchestration lifeline.
version: '3.8'
services:
app:
build:
context: .
target: production
ports:
- "8000:8000"
environment:
- DATABASE_URL=postgresql://user:pass@db:5432/myapp
- REDIS_URL=redis://redis:6379
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
restart: unless-stopped
db:
image: postgres:13
environment:
POSTGRES_DB: myapp
POSTGRES_USER: user
POSTGRES_PASSWORD: pass
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d myapp"]
interval: 30s
timeout: 10s
retries: 3
redis:
image: redis:6-alpine
volumes:
- redis_data:/data
volumes:
postgres_data:
redis_data:
The Production Reality Check
Here’s what they don’t tell you in Docker tutorials: production is where dreams go to die. At Intelity, maintaining 99.9% uptime meant learning these lessons the hard way:
Resource Limits Are Not Optional
One rogue container brought down our entire Kubernetes cluster because we didn’t set memory limits. Always, ALWAYS set resource constraints.
Health Checks Save Lives
A container that starts doesn’t mean it’s healthy. Implement proper health checks for all your services. Your load balancer will thank you.
Logging Strategy Matters
Use structured logging and send logs to stdout/stderr. Let Docker handle log rotation and forwarding. JSON logs are your friend for parsing.
Kubernetes: When Docker Compose Isn’t Enough
At Intelity, as our conversational AI platform grew, we needed more than Docker Compose could offer. Enter Kubernetes - the orchestrator that makes you question your life choices and appreciate them simultaneously.
apiVersion: apps/v1
kind: Deployment
metadata:
name: chatbot-api
spec:
replicas: 3
selector:
matchLabels:
app: chatbot-api
template:
metadata:
labels:
app: chatbot-api
spec:
containers:
- name: api
image: myregistry/chatbot-api:latest
ports:
- containerPort: 8000
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
Observability from Day One
Implement metrics, logging, and tracing before you need them. When your chatbot starts hallucinating at 3 AM, you’ll want to know why without playing detective.
The Deployment Pipeline That Actually Works
After working across multiple companies, here’s the deployment pipeline that has saved my sanity more times than I can count:
The Golden Pipeline
- Code Push: Developer pushes to feature branch
- Build & Test: CI runs tests, builds Docker image
- Security Scan: Scan image for vulnerabilities
- Deploy to Staging: Automatic deployment to staging environment
- Integration Tests: Run full integration test suite
- Manual Approval: Team lead approves production deployment
- Rolling Deployment: Zero-downtime deployment to production
- Health Checks: Verify deployment success
- Rollback Ready: Automatic rollback if health checks fail
Common Pitfalls (And How to Avoid Them)
The "Latest" Tag Trap
Never use “latest” in production. I learned this when a midnight auto-deployment broke our entire platform because “latest” had changed.
Solution: Use semantic versioning or commit SHAs for image tags.
The Secrets in Environment Variables
Don’t put secrets in Dockerfiles or docker-compose.yml files. They end up in your image layers and version control.
Solution: Use Docker secrets, Kubernetes secrets, or external secret management systems.
The Persistent Data Problem
Containers are ephemeral, but your data isn’t. Don’t store important data in container filesystems.
Solution: Use volumes, external databases, and proper backup strategies.
Monitoring and Debugging: Your Sanity Savers
At Intelity, with our sub-second latency requirements, monitoring isn’t optional. Here’s what we monitor:
Container Metrics
- CPU and memory usage
- Network I/O
- Container restart count
- Health check status
Application Metrics
- Response times
- Error rates
- Request throughput
- Business metrics
# Quick debugging commands that save time
# Check container logs
docker logs -f container_name
# Execute into running container
docker exec -it container_name /bin/bash
# Check container resource usage
docker stats
# Inspect container configuration
docker inspect container_name
# Check container processes
docker exec container_name ps aux
The Future: What’s Next?
Docker and containerization continue to evolve. At Intelity, we’re exploring serverless containers, edge computing with Docker, and improved security practices. The key is to stay curious and keep learning from production failures (because they will happen).
Key Takeaways
- Start simple, but think about production from day one
- Resource limits and health checks are non-negotiable
- Implement observability before you need it
- Never use “latest” tags in production
- Automate everything, but keep rollback strategies ready
- Learn from failures - they’re your best teachers
Conclusion
Docker transformed how we build and deploy applications, but it’s not magic. It’s a tool that requires understanding, respect, and proper implementation. The lessons I’ve learned across Bhoos Games, Ensemble Matrix, Tathyakar, and Intelity have taught me that successful containerization is 20% Docker knowledge and 80% understanding your application’s needs.
Remember: containers are not VMs, microservices are not a silver bullet, and production will always find new ways to surprise you. Embrace the chaos, monitor everything, and always have a rollback plan.
References
- Docker Development Best Practices - Official Docker guidelines
- Kubernetes Concepts - Understanding K8s fundamentals
- The Twelve-Factor App - Essential patterns for modern apps