DevOps & Infrastructure

DevOps with Docker - Lessons from Production Hell

Real-world Docker lessons learned while maintaining 99.9% uptime at Intelity and scaling applications across multiple companies. Spoiler: It's not just about docker run.

5 min read
devops docker containers infrastructure deployment scaling

When I first started containerizing applications at Bhoos Games, I thought Docker was just a fancy way to avoid “it works on my machine” problems. Three companies and countless production deployments later, I’ve learned that Docker is both your best friend and your worst nightmare - sometimes simultaneously.

The Journey: From Docker Newbie to Production Survivor

My Docker journey started at Bhoos Games when we needed to containerize our React Native game engine servers. The challenge? Multiple games, different Node.js versions, and a deployment process that involved more prayer than science.

Pro Tip: Start Small, Think Big

Don’t try to containerize your entire monolith on day one. Start with stateless services, learn the ropes, then gradually move to more complex components. Your future self will thank you.

The Dockerfile That Changed Everything

At Intelity, our conversational AI platform needed to handle multiple ML models, FastAPI endpoints, and maintain sub-second latency. Here’s the Dockerfile structure that saved our sanity:

# Multi-stage build for ML applications
FROM python:3.9-slim as base
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM base as development
COPY requirements-dev.txt .
RUN pip install --no-cache-dir -r requirements-dev.txt
COPY . .
CMD ["uvicorn", "main:app", "--reload", "--host", "0.0.0.0"]

FROM base as production
COPY . .
RUN useradd --create-home --shell /bin/bash app \
    && chown -R app:app /app
USER app
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Multi-stage Builds Are Your Friend

Use multi-stage builds to keep production images lean. Development dependencies don’t belong in production containers. This single change reduced our image size by 60%.

Docker Compose: Orchestrating the Chaos

At Ensemble Matrix, while working on the cheque clearing automation project, we had a complex setup: computer vision services, database cleanup tools, and monitoring systems. Docker Compose became our orchestration lifeline.

version: '3.8'
services:
  app:
    build: 
      context: .
      target: production
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgresql://user:pass@db:5432/myapp
      - REDIS_URL=redis://redis:6379
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    restart: unless-stopped
    
  db:
    image: postgres:13
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d myapp"]
      interval: 30s
      timeout: 10s
      retries: 3
      
  redis:
    image: redis:6-alpine
    volumes:
      - redis_data:/data
      
volumes:
  postgres_data:
  redis_data:

The Production Reality Check

Here’s what they don’t tell you in Docker tutorials: production is where dreams go to die. At Intelity, maintaining 99.9% uptime meant learning these lessons the hard way:

Resource Limits Are Not Optional

One rogue container brought down our entire Kubernetes cluster because we didn’t set memory limits. Always, ALWAYS set resource constraints.

Health Checks Save Lives

A container that starts doesn’t mean it’s healthy. Implement proper health checks for all your services. Your load balancer will thank you.

Logging Strategy Matters

Use structured logging and send logs to stdout/stderr. Let Docker handle log rotation and forwarding. JSON logs are your friend for parsing.

Kubernetes: When Docker Compose Isn’t Enough

At Intelity, as our conversational AI platform grew, we needed more than Docker Compose could offer. Enter Kubernetes - the orchestrator that makes you question your life choices and appreciate them simultaneously.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: chatbot-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: chatbot-api
  template:
    metadata:
      labels:
        app: chatbot-api
    spec:
      containers:
      - name: api
        image: myregistry/chatbot-api:latest
        ports:
        - containerPort: 8000
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5

Observability from Day One

Implement metrics, logging, and tracing before you need them. When your chatbot starts hallucinating at 3 AM, you’ll want to know why without playing detective.

The Deployment Pipeline That Actually Works

After working across multiple companies, here’s the deployment pipeline that has saved my sanity more times than I can count:

The Golden Pipeline

  1. Code Push: Developer pushes to feature branch
  2. Build & Test: CI runs tests, builds Docker image
  3. Security Scan: Scan image for vulnerabilities
  4. Deploy to Staging: Automatic deployment to staging environment
  5. Integration Tests: Run full integration test suite
  6. Manual Approval: Team lead approves production deployment
  7. Rolling Deployment: Zero-downtime deployment to production
  8. Health Checks: Verify deployment success
  9. Rollback Ready: Automatic rollback if health checks fail

Common Pitfalls (And How to Avoid Them)

The "Latest" Tag Trap

Never use “latest” in production. I learned this when a midnight auto-deployment broke our entire platform because “latest” had changed.

Solution: Use semantic versioning or commit SHAs for image tags.

The Secrets in Environment Variables

Don’t put secrets in Dockerfiles or docker-compose.yml files. They end up in your image layers and version control.

Solution: Use Docker secrets, Kubernetes secrets, or external secret management systems.

The Persistent Data Problem

Containers are ephemeral, but your data isn’t. Don’t store important data in container filesystems.

Solution: Use volumes, external databases, and proper backup strategies.

Monitoring and Debugging: Your Sanity Savers

At Intelity, with our sub-second latency requirements, monitoring isn’t optional. Here’s what we monitor:

Container Metrics

Application Metrics

# Quick debugging commands that save time
# Check container logs
docker logs -f container_name

# Execute into running container
docker exec -it container_name /bin/bash

# Check container resource usage
docker stats

# Inspect container configuration
docker inspect container_name

# Check container processes
docker exec container_name ps aux

The Future: What’s Next?

Docker and containerization continue to evolve. At Intelity, we’re exploring serverless containers, edge computing with Docker, and improved security practices. The key is to stay curious and keep learning from production failures (because they will happen).

Key Takeaways

  • Start simple, but think about production from day one
  • Resource limits and health checks are non-negotiable
  • Implement observability before you need it
  • Never use “latest” tags in production
  • Automate everything, but keep rollback strategies ready
  • Learn from failures - they’re your best teachers

Conclusion

Docker transformed how we build and deploy applications, but it’s not magic. It’s a tool that requires understanding, respect, and proper implementation. The lessons I’ve learned across Bhoos Games, Ensemble Matrix, Tathyakar, and Intelity have taught me that successful containerization is 20% Docker knowledge and 80% understanding your application’s needs.

Remember: containers are not VMs, microservices are not a silver bullet, and production will always find new ways to surprise you. Embrace the chaos, monitor everything, and always have a rollback plan.

References