Skip to main content

Docker Comprehensive Primer

The Ultimate Guide to Containerization and Agent-Enhanced Development


Table of Contents

  1. Introduction & Philosophy
  2. Installation & Setup
  3. Core Concepts & Architecture
  4. Dockerfile & Image Creation
  5. Container Management
  6. Networking & Storage
  7. Docker Compose
  8. Security Best Practices
  9. Performance Optimization
  10. CI/CD Integration
  11. Agent Integration Patterns
  12. Troubleshooting
  13. Real-World Workflows
  14. Best Practices

Introduction & Philosophy

What is Docker?

Docker is an open-source platform that enables developers to build, deploy, and run applications in containers. Containers package up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.

Core Philosophy

"Build Once, Run Anywhere" - Docker embodies the principle that applications should be portable and consistent across all environments. For agent-enhanced development, this means:

  • Consistent Agent Environments - Agents work the same way across development, testing, and production
  • Isolated Agent Execution - Each agent can run in its own container with specific dependencies
  • Scalable Agent Deployments - Easy to scale up/down agent workloads based on demand
  • Reproducible Agent Workflows - Eliminate "works on my machine" for agent configurations

Agent-Enhanced Benefits

Docker provides unique advantages for AI agent development:

  1. Environment Isolation: Agents can't interfere with each other or the host system
  2. Dependency Management: Each agent has its exact required dependencies
  3. Resource Limits: Prevent runaway agents from consuming all system resources
  4. Security Boundaries: Sandbox agent execution for safety
  5. Deployment Consistency: Agents behave identically across environments

Installation & Setup

System Requirements

Operating Systems

  • Linux: Ubuntu 18.04+, Debian 10+, CentOS 7+, Fedora 32+
  • macOS: 10.15 Catalina or later (Intel and Apple Silicon)
  • Windows: Windows 10/11 Pro, Enterprise, or Education (Build 19041+)
  • WSL 2: Required for Windows support

Hardware Requirements

  • RAM: 4GB minimum, 8GB recommended for agent workloads
  • CPU: 2 cores minimum, 4+ cores recommended for concurrent agents
  • Storage: 4GB free space minimum, SSD recommended for image layers
  • Virtualization: Must be enabled in BIOS/UEFI

Installation Methods

# Download from official website
# https://www.docker.com/products/docker-desktop

# After installation, verify
docker --version
docker compose version

Method 2: Docker Engine (Linux)

# Ubuntu/Debian
sudo apt update
sudo apt install apt-transport-https ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io

# Add user to docker group (avoid sudo)
sudo usermod -aG docker $USER
newgrp docker

# Verify installation
docker run hello-world

Post-Installation for Agent Development

# Create agent development network
docker network create agent-dev-network

# Pull common base images for agents
docker pull python:3.11-slim
docker pull rust:1.70-alpine
docker pull node:18-alpine

# Verify agent-ready setup
docker run --rm python:3.11-slim python --version

Core Concepts & Architecture

Docker Objects for Agent Development

Images as Agent Templates

Images serve as templates for agent environments:

# List agent-ready images
docker images --filter "reference=*agent*"

# Build agent image
docker build -t my-agent:latest .

# Tag for different environments
docker tag my-agent:latest my-agent:dev
docker tag my-agent:latest my-agent:prod

Containers as Agent Instances

Each container runs a specific agent instance:

# Run agent container
docker run -d --name agent-worker-1 my-agent:latest

# Scale agents horizontally
docker run -d --name agent-worker-2 my-agent:latest
docker run -d --name agent-worker-3 my-agent:latest

# Monitor agent containers
docker ps --filter "name=agent-*"

Agent Networking Patterns

Isolated Agent Networks

# Create isolated network for sensitive agents
docker network create --internal secure-agents

# Connect agents to secure network
docker run -d --name secure-agent --network secure-agents my-agent:latest

Agent Communication Networks

# Create shared network for agent collaboration
docker network create agent-mesh

# Deploy coordinating agents
docker run -d --name coordinator --network agent-mesh coordinator-agent:latest
docker run -d --name worker-1 --network agent-mesh worker-agent:latest
docker run -d --name worker-2 --network agent-mesh worker-agent:latest

Dockerfile & Image Creation

Understanding Dockerfiles

A Dockerfile is a text document that contains all the commands a user could run on the command line to create an image. For agent development, Dockerfiles define the exact environment each agent needs.

Basic Dockerfile Structure

# Base image selection
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Copy requirements and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Expose port (if applicable)
EXPOSE 8080

# Define environment variables
ENV AGENT_MODE=production
ENV LOG_LEVEL=info

# Run the agent
CMD ["python", "agent.py"]

Best Practices for Agent Dockerfiles

  1. Choose Minimal Base Images: Use alpine or slim variants to reduce attack surface and image size.
  2. Layer Caching: Put frequently changing files (like agent code) after stable layers (like dependencies).
  3. Multi-Stage Builds: Separate build-time dependencies from runtime.

Multi-Stage Build Example for Agents

# Build stage for compiling agent dependencies
FROM rust:1.70 AS builder
WORKDIR /app
COPY Cargo.toml Cargo.lock ./
RUN cargo build --release
COPY src ./src
RUN cargo build --release

# Runtime stage - minimal image
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY --from=builder /app/target/release/agent /usr/local/bin/agent
USER 1000:1000 # Non-root user
CMD ["agent"]

Image Optimization Techniques

  • Use .dockerignore: Exclude unnecessary files like .git, node_modules, tests.
  • Combine RUN Commands: Reduce layers by combining related RUN instructions.
  • Remove Cache After Install: Clean package caches in the same layer as installation.
# Optimized dependency installation
RUN apt-get update && \
apt-get install -y python3-pip && \
pip install --no-cache-dir -r requirements.txt && \
apt-get purge -y --auto-remove python3-pip && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Agent-Specific Image Patterns

For AI agents, consider including ML libraries, but keep images lean:

FROM python:3.11-slim

# Install only necessary ML components
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

# For GPU agents (if using NVIDIA Docker)
# FROM nvidia/cuda:11.8-runtime-ubuntu20.04
# RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

Container Management

Basic Container Lifecycle

Creating and Running Containers

# Run a container from an image
docker run -d --name my-agent python:3.11-slim python -c "print('Agent ready')"

# Run with port mapping
docker run -d -p 8080:8080 --name web-agent my-agent:latest

# Run with environment variables
docker run -d -e API_KEY=secret --name secure-agent my-agent:latest

# Interactive container for debugging
docker run -it --rm my-agent:latest bash

Managing Running Containers

# List running containers
docker ps

# List all containers (including stopped)
docker ps -a

# Stop a container
docker stop my-agent

# Start a stopped container
docker start my-agent

# Restart a container
docker restart my-agent

# Remove a stopped container
docker rm my-agent

Advanced Container Management for Agents

Resource Limiting

# CPU and memory limits for agents
docker run -d \
--name limited-agent \
--cpus="0.5" \
--memory="512m" \
my-agent:latest

# GPU allocation for ML agents
docker run -d \
--gpus device=0 \
--name gpu-agent \
nvidia/cuda:11.8-runtime-ubuntu20.04 \
python agent.py

Volume Mounting for Persistent Data

# Mount local directory for agent data
docker run -d \
-v /host/agent-data:/app/data \
--name persistent-agent \
my-agent:latest

# Named volumes for better management
docker volume create agent-logs
docker run -d \
-v agent-logs:/app/logs \
my-agent:latest

Health Checks

# Add healthcheck to Dockerfile
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
# Inspect container health
docker inspect --format='{{.State.Health.Status}}' my-agent

Container Logging and Monitoring

# View container logs
docker logs my-agent

# Follow logs in real-time
docker logs -f my-agent

# Logs with timestamps
docker logs -t my-agent

# Inspect running processes
docker top my-agent

Networking & Storage

Docker Networking Fundamentals

Default Networking

  • Bridge Network: Default for standalone containers (isolated by default)
  • Host Network: Container shares host's network stack
  • None Network: Container has no network interface
# Run on host network (high performance, less isolation)
docker run -d --network host my-agent:latest

# Run with no networking (for offline agents)
docker run -d --network none my-agent:latest

Custom Bridge Networks

# Create custom network
docker network create my-app-network

# Run containers on custom network
docker run -d --name app --network my-app-network my-app:latest
docker run -d --name db --network my-app-network postgres:13

# Containers can communicate by name
# From app container: curl db:5432

Advanced Networking for Agents

Multi-Container Agent Communication

# Create agent-specific network
docker network create agent-network

# Deploy agents with service discovery
docker run -d --name coordinator --network agent-network coordinator:latest
docker run -d --name worker1 --network agent-network worker:latest
docker run -d --name worker2 --network agent-network worker:latest

# Workers can reach coordinator by name
# curl coordinator:8080/api

External Network Access

# docker-compose.yml with external network
version: '3.8'
services:
agent:
image: my-agent:latest
networks:
- default
- external-monitoring

networks:
external-monitoring:
external: true
name: monitoring-net

Storage and Volumes

Volume Types

  1. Named Volumes: Managed by Docker, best for persistent data
  2. Bind Mounts: Map host directories, good for development
  3. Tmpfs Mounts: In-memory storage, temporary data
# Create named volume
docker volume create app-data

# Run with named volume
docker run -d -v app-data:/app/data my-app:latest

# Bind mount for development
docker run -d -v $(pwd)/src:/app/src -w /app/src my-app:latest

Agent Data Persistence Patterns

# Persistent agent state
docker volume create agent-state
docker run -d \
-v agent-state:/app/state \
-v /var/log/agents:/app/logs \
my-agent:latest

# Shared volumes for agent collaboration
docker volume create shared-knowledge
docker run -d -v shared-knowledge:/knowledge worker1:latest
docker run -d -v shared-knowledge:/knowledge worker2:latest

Backup and Restore Volumes

# Backup volume
docker run --rm -v agent-data:/source -v $(pwd):/backup \
alpine tar czf /backup/agent-data.tar.gz -C /source .

# Restore volume
docker run --rm -v agent-data:/target -v $(pwd):/backup \
alpine tar xzf /backup/agent-data.tar.gz -C /target

Storage Drivers

  • Overlay2 (default on modern Linux): Good performance, copy-on-write
  • AUFS: Legacy, avoid for new setups
  • Btrfs/ZFS: Advanced features, filesystem-specific
# Check storage driver
docker info | grep 'Storage Driver'

# Configure storage driver (daemon.json)
# {
# "storage-driver": "overlay2"
# }

Docker Compose

Introduction to Docker Compose

Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application's services.

Basic docker-compose.yml Structure

version: '3.8'

services:
web:
build: .
ports:
- "8080:80"
depends_on:
- db

db:
image: postgres:13
environment:
POSTGRES_DB: myapp
POSTGRES_USER: user
POSTGRES_PASSWORD: password
volumes:
- db-data:/var/lib/postgresql/data

volumes:
db-data:

Running Compose Applications

# Start services
docker compose up -d

# View logs
docker compose logs -f web

# Scale services
docker compose up -d --scale web=3

# Stop services
docker compose down

# Remove volumes (data loss warning)
docker compose down -v

Compose for Agent Orchestration

Simple Agent Swarm

version: '3.8'

services:
coordinator:
build: ./coordinator
ports:
- "8080:8080"
environment:
- REDIS_URL=redis://redis:6379
depends_on:
- redis

worker:
build: ./worker
deploy:
replicas: 5
environment:
- COORDINATOR_URL=http://coordinator:8080
depends_on:
- coordinator
- redis

redis:
image: redis:alpine
volumes:
- redis-data:/data

volumes:
redis-data:

Development Compose Setup

version: '3.8'

services:
agent-dev:
build:
context: .
dockerfile: Dockerfile.dev
volumes:
- .:/app
- /app/node_modules # Avoid overwriting node_modules
ports:
- "8000:8000"
- "9229:9229" # Node debugger
environment:
- DEBUG=true
- NODE_ENV=development
command: npm run dev

mock-api:
image: postman/newman:alpine
volumes:
- ./mocks:/etc/newman
command: run collection.json

database:
image: postgres:15
environment:
POSTGRES_DB: agent_dev
POSTGRES_USER: dev
POSTGRES_PASSWORD: dev
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data

volumes:
pgdata:

Advanced Compose Features

Health Checks in Compose

services:
agent:
image: my-agent:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
depends_on:
db:
condition: service_healthy

Resource Constraints

services:
cpu-intensive-agent:
image: compute-agent:latest
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
reservations:
cpus: '0.5'
memory: 1G

Secrets Management

# docker-compose.yml
version: '3.8'

services:
agent:
image: my-agent:latest
secrets:
- api_key
- db_password

secrets:
api_key:
file: ./secrets/api_key.txt
db_password:
file: ./secrets/db_pass.txt

Compose Override Patterns

# docker-compose.yml (base)
version: '3.8'
services:
agent:
image: my-agent:${ENV:-dev}
ports:
- "8080:8080"

# docker-compose.prod.yml (overrides)
version: '3.8'
services:
agent:
image: my-agent:prod
ports:
- "80:8080"
environment:
- NODE_ENV=production
# Run with overrides
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

Security Best Practices

Agent Container Security

# Secure agent Dockerfile
FROM python:3.11-slim-bullseye

# Create non-root user for agent
RUN groupadd -r agent && useradd -r -g agent agent

# Install dependencies as root
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy agent code and set ownership
COPY --chown=agent:agent agent/ /app/
WORKDIR /app

# Switch to agent user
USER agent

# Read-only filesystem (agent writes to mounted volumes only)
# docker run --read-only -v /tmp my-agent:latest

Network Security for Agents

# Secure agent network configuration
services:
public-agent:
image: my-agent:latest
networks:
- public-facing
- internal-agents

private-agent:
image: sensitive-agent:latest
networks:
- internal-agents
# No external network access

data-agent:
image: data-processor:latest
networks:
- internal-agents
- database-network

networks:
public-facing:
internal-agents:
internal: true # No external access
database-network:
internal: true

Resource Limits for Agent Safety

# Resource-limited agent deployment
services:
worker-agent:
image: worker-agent:latest
deploy:
resources:
limits:
cpus: '0.5' # Prevent CPU hogging
memory: 512M # Limit memory usage
reservations:
cpus: '0.1'
memory: 128M
security_opt:
- no-new-privileges:true # Prevent privilege escalation
cap_drop:
- ALL # Drop all capabilities
cap_add:
- NET_BIND_SERVICE # Only add what's needed

Additional Security Measures

  1. Regular Scanning: Use tools like Trivy or Clair to scan images for vulnerabilities.
  2. Signing Images: Use Docker Content Trust or cosign for image signing.
  3. Runtime Security: Implement tools like Falco for runtime behavior monitoring.
  4. Secrets Management: Never hardcode secrets; use Docker secrets or external managers like Vault.
# Scan image for vulnerabilities
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy image my-agent:latest

# Enable content trust
export DOCKER_CONTENT_TRUST=1
docker pull my-agent:latest

Performance Optimization

Agent Image Optimization

# Multi-stage build for minimal agent images
FROM python:3.11 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

FROM python:3.11-slim AS runtime
# Copy only what's needed
COPY --from=builder /root/.local /root/.local
COPY agent/ /app/
WORKDIR /app

# Ensure agent can find packages
ENV PATH=/root/.local/bin:$PATH

CMD ["python", "main.py"]

Agent Resource Monitoring

# Monitor agent resource usage
docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.Name}}" \
$(docker ps --filter "name=agent-*" -q)

# Agent-specific monitoring script
#!/bin/bash
while true; do
echo "$(date): Agent Resource Usage"
docker stats --no-stream --format "{{.Container}}: CPU {{.CPUPerc}}, Memory {{.MemUsage}}" \
$(docker ps --filter "name=agent-*" -q)
sleep 30
done

High-Performance Agent Networking

# Create optimized network for agent communication
docker network create \
--driver bridge \
--opt com.docker.network.driver.mtu=9000 \
--opt com.docker.network.bridge.name=agent-br0 \
high-perf-agents

# Deploy agents with performance tuning
docker run -d \
--name fast-agent \
--network high-perf-agents \
--memory=1g \
--cpus=2 \
my-agent:latest

Build and Runtime Optimizations

  • Layer Caching: Order Dockerfile instructions to maximize cache hits.
  • Parallel Builds: Use BuildKit for parallel layer building.
  • Image Pruning: Regularly clean up unused images, containers, and volumes.
# Enable BuildKit for faster builds
export DOCKER_BUILDKIT=1
docker build -t my-agent:latest .

# Prune unused resources
docker system prune -a --volumes

# Analyze image layers
docker history my-agent:latest

CI/CD Integration

Docker in CI/CD Pipelines

Docker enables consistent environments across CI, testing, and deployment stages.

GitHub Actions Example

# .github/workflows/docker-ci.yml
name: Docker CI/CD

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2

- name: Login to DockerHub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: user/app:latest,user/app:${{ github.sha }}

- name: Run tests
run: |
docker run --rm user/app:${{ github.sha }} npm test

deploy:
needs: build
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- name: Deploy to staging
run: |
# Deploy to Kubernetes, ECS, etc.
kubectl set image deployment/app agent=user/app:${{ github.sha }}

GitLab CI Example

# .gitlab-ci.yml
stages:
- build
- test
- deploy

variables:
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: "/certs"

build:
stage: build
image: docker:20.10
services:
- docker:20.10-dind
script:
- docker build -t $CI_REGISTRY_IMAGE:latest .
- docker push $CI_REGISTRY_IMAGE:latest
only:
- main

test:
stage: test
image: docker:20.10
services:
- docker:20.10-dind
script:
- docker run --rm $CI_REGISTRY_IMAGE:latest npm test

deploy:
stage: deploy
image: alpine
script:
- apk add --no-cache kubectl
- kubectl set image deployment/agent agent=$CI_REGISTRY_IMAGE:latest
only:
- main
environment:
name: production

Automated Testing with Docker

Integration Testing

# docker-compose.test.yml
version: '3.8'

services:
agent:
build: .
environment:
- TEST_MODE=true
depends_on:
- test-db

test-db:
image: postgres:15
environment:
POSTGRES_DB: test
POSTGRES_USER: test
POSTGRES_PASSWORD: test

tests:
image: node:18-alpine
volumes:
- .:/app
working_dir: /app
command: npm run test:integration
depends_on:
- agent
- test-db
# Run integration tests
docker compose -f docker-compose.test.yml up --abort-on-container-exit tests

Security Scanning in CI

# Add to CI pipeline
security-scan:
stage: test
image: aquasec/trivy:latest
script:
- trivy image --exit-code 1 --no-progress $CI_REGISTRY_IMAGE:latest
allow_failure: false

Deployment Strategies

Blue-Green Deployments

#!/bin/bash
# blue-green-deploy.sh

OLD_VERSION=$(kubectl get deployment agent -o jsonpath='{.spec.template.spec.containers[0].image}')
NEW_VERSION="my-agent:${CI_COMMIT_SHA}"

# Deploy green
kubectl apply -f agent-green.yaml # image: $NEW_VERSION

# Health check
kubectl rollout status deployment/agent-green

# Switch traffic
kubectl patch service agent -p '{"spec":{"selector":{"version":"green"}}}'

# Cleanup blue
kubectl delete deployment agent-blue

Canary Releases

# k8s canary deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-canary
spec:
replicas: 2 # 10% traffic
selector:
matchLabels:
app: agent
version: canary
template:
metadata:
labels:
app: agent
version: canary
spec:
containers:
- name: agent
image: my-agent:v1.1.0-canary
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-stable
spec:
replicas: 18 # 90% traffic
selector:
matchLabels:
app: agent
version: stable
template:
metadata:
labels:
app: agent
version: stable
spec:
containers:
- name: agent
image: my-agent:v1.0.0

Rollback Strategies

# Rollback to previous version
kubectl rollout undo deployment/agent

# Verify rollback
kubectl rollout status deployment/agent
kubectl logs deployment/agent -f

# Image-based rollback
kubectl set image deployment/agent agent=my-agent:v1.0.0
kubectl rollout status deployment/agent

Agent Integration Patterns

Single Agent Container

# Dockerfile for basic agent
FROM python:3.11-slim

# Install agent dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy agent code
WORKDIR /app
COPY agent/ .

# Set up agent user (security)
RUN useradd -m agent
USER agent

# Health check for agent
HEALTHCHECK --interval=30s --timeout=10s \
CMD python health_check.py || exit 1

# Run agent
CMD ["python", "main.py"]

Multi-Agent Orchestration

# docker-compose.yml for agent swarm
version: '3.8'

services:
coordinator:
build: ./coordinator
environment:
- AGENT_NETWORK=agent-mesh
networks:
- agent-mesh
depends_on:
- redis

worker-agents:
build: ./worker
deploy:
replicas: 3
environment:
- COORDINATOR_URL=http://coordinator:8080
networks:
- agent-mesh
depends_on:
- coordinator

redis:
image: redis:alpine
networks:
- agent-mesh

networks:
agent-mesh:
driver: bridge

Agent Development Environment

# docker-compose.dev.yml
version: '3.8'

services:
agent-dev:
build:
context: .
dockerfile: Dockerfile.dev
volumes:
# Mount source for live reload
- ./agent:/app/agent
# Persistent agent data
- agent-data:/app/data
ports:
- "8080:8080" # Agent API
- "9229:9229" # Debug port
environment:
- DEBUG=true
- LOG_LEVEL=debug
networks:
- dev-network

mock-services:
image: wiremock/wiremock:latest
ports:
- "8081:8080"
volumes:
- ./mocks:/home/wiremock
networks:
- dev-network

volumes:
agent-data:

networks:
dev-network:
driver: bridge

Production Agent Deployment

#!/bin/bash
# deploy-agents.sh

# Build production images
docker build -t agent:${VERSION} .

# Deploy with rolling update
docker service update --image agent:${VERSION} production-agents

# Health check deployment
timeout 60 bash -c 'until docker service ps production-agents | grep Running; do sleep 1; done'

echo "Agents deployed successfully"

Troubleshooting

Common Agent Issues

Agent Won't Start

# Check agent logs
docker logs agent-container-name

# Check agent health
docker exec agent-container-name python health_check.py

# Inspect agent configuration
docker inspect agent-container-name | grep -A 10 "Env"

Agent Performance Issues

# Monitor agent resources in real-time
docker stats agent-container-name

# Check agent system calls
docker exec agent-container-name strace -p 1

# Profile agent memory usage
docker exec agent-container-name cat /proc/1/status

Agent Network Problems

# Test agent connectivity
docker exec agent-container-name ping other-agent

# Check agent network configuration
docker exec agent-container-name ip addr show

# Verify agent DNS resolution
docker exec agent-container-name nslookup coordinator

Agent Development Debugging

# Run agent in debug mode
docker run -it --rm \
-v $(pwd):/app \
-p 9229:9229 \
my-agent:dev \
python -m pdb main.py

# Attach debugger to running agent
docker exec -it agent-container-name bash

# View agent logs with timestamps
docker logs -t -f agent-container-name

General Docker Troubleshooting

Image/Build Issues

# Clear build cache
docker builder prune -a

# Debug build process
docker build --no-cache --progress=plain .

# Check for layer conflicts
docker history my-image:latest

Storage/Volume Problems

# List volumes
docker volume ls

# Inspect volume
docker volume inspect my-volume

# Remove unused volumes
docker volume prune

Network Debugging

# Inspect network
docker network inspect bridge

# Test connectivity between containers
docker exec container1 ping container2

# Check DNS resolution
docker run --rm busybox nslookup google.com

Common Error Messages

  • "no space left on device": Clean up images/containers/volumes with docker system prune -a
  • "permission denied": Add user to docker group or use sudo
  • "port already in use": Check with netstat -tuln | grep :8080 and stop conflicting process
  • "image not found": Verify image name/tag, pull if necessary
  • "container name already in use": Remove existing container or use different name

Real-World Workflows

CI/CD Pipeline for Agents

# .github/workflows/agent-pipeline.yml
name: Agent CI/CD Pipeline

on:
push:
branches: [main]
paths: ['agents/**']

jobs:
test-agents:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Build agent images
run: |
docker build -t test-agent:latest ./agents/

- name: Test agent functionality
run: |
docker run --rm test-agent:latest python -m pytest

- name: Security scan
run: |
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy image test-agent:latest

deploy-agents:
needs: test-agents
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- name: Deploy to production
run: |
docker tag test-agent:latest production-agent:${{ github.sha }}
docker push production-agent:${{ github.sha }}

Agent Monitoring and Observability

# Monitoring stack for agents
version: '3.8'

services:
agents:
image: my-agent:latest
deploy:
replicas: 3
networks:
- agent-network
- monitoring

prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
networks:
- monitoring

grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
networks:
- monitoring

networks:
agent-network:
monitoring:

Agent Backup and Recovery

#!/bin/bash
# backup-agents.sh

# Backup agent configurations
docker run --rm \
-v agent-configs:/source \
-v $(pwd):/backup \
alpine tar czf /backup/agent-configs-$(date +%Y%m%d).tar.gz -C /source .

# Backup agent data
docker run --rm \
-v agent-data:/source \
-v $(pwd):/backup \
alpine tar czf /backup/agent-data-$(date +%Y%m%d).tar.gz -C /source .

# Export agent images
docker save my-agent:latest | gzip > my-agent-$(date +%Y%m%d).tar.gz

Development to Production Workflow

  1. Local Development: Use docker-compose.dev.yml with volume mounts
  2. CI Testing: Build and test images in isolated environment
  3. Staging Deployment: Deploy to staging cluster with production config
  4. Production Rollout: Blue-green or canary deployment
  5. Monitoring: Set up alerts for key metrics
  6. Rollback Plan: Automated rollback on failure detection

Best Practices

Agent Development Workflow

#!/bin/bash
# agent-dev-workflow.sh

# Start development environment
echo "Starting agent development environment..."
docker-compose -f docker-compose.dev.yml up -d

# Wait for services
sleep 5

# Run agent tests
docker-compose -f docker-compose.dev.yml exec agent-dev python -m pytest

# Show agent status
docker-compose -f docker-compose.dev.yml ps

echo "Agent development environment ready!"
echo "Agent API: http://localhost:8080"
echo "Debug port: 9229"

Agent Production Checklist

  • Agent runs as non-root user
  • Resource limits configured
  • Health checks implemented
  • Logging configured
  • Security scanning passed
  • Performance benchmarks met
  • Backup strategy in place
  • Monitoring configured
  • Documentation updated

Agent Security Guidelines

  1. Principle of Least Privilege: Agents should have minimal required permissions
  2. Network Isolation: Use internal networks for agent communication
  3. Resource Limits: Prevent agents from consuming all system resources
  4. Input Validation: Validate all inputs before processing
  5. Secret Management: Use Docker secrets or external secret managers
  6. Regular Updates: Keep base images and dependencies updated
  7. Audit Logging: Log all agent actions for compliance

General Docker Best Practices

  1. Use Official Images: Start with trusted base images
  2. Keep Images Small: Multi-stage builds, minimal base images
  3. Scan Regularly: Automate vulnerability scanning
  4. Tag Consistently: Use semantic versioning for images
  5. Clean Up: Regularly prune unused resources
  6. Document: Maintain Dockerfiles as code with clear comments
  7. Test Locally: Validate images before pushing to registry

This Docker primer provides the foundation for building secure, scalable, and efficient agent-enhanced development workflows. By following these patterns and best practices, teams can confidently deploy AI agents in production environments.