Docker Comprehensive Primer
The Ultimate Guide to Containerization and Agent-Enhanced Development
Table of Contents
- Introduction & Philosophy
- Installation & Setup
- Core Concepts & Architecture
- Dockerfile & Image Creation
- Container Management
- Networking & Storage
- Docker Compose
- Security Best Practices
- Performance Optimization
- CI/CD Integration
- Agent Integration Patterns
- Troubleshooting
- Real-World Workflows
- Best Practices
Introduction & Philosophy
What is Docker?
Docker is an open-source platform that enables developers to build, deploy, and run applications in containers. Containers package up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.
Core Philosophy
"Build Once, Run Anywhere" - Docker embodies the principle that applications should be portable and consistent across all environments. For agent-enhanced development, this means:
- Consistent Agent Environments - Agents work the same way across development, testing, and production
- Isolated Agent Execution - Each agent can run in its own container with specific dependencies
- Scalable Agent Deployments - Easy to scale up/down agent workloads based on demand
- Reproducible Agent Workflows - Eliminate "works on my machine" for agent configurations
Agent-Enhanced Benefits
Docker provides unique advantages for AI agent development:
- Environment Isolation: Agents can't interfere with each other or the host system
- Dependency Management: Each agent has its exact required dependencies
- Resource Limits: Prevent runaway agents from consuming all system resources
- Security Boundaries: Sandbox agent execution for safety
- Deployment Consistency: Agents behave identically across environments
Installation & Setup
System Requirements
Operating Systems
- Linux: Ubuntu 18.04+, Debian 10+, CentOS 7+, Fedora 32+
- macOS: 10.15 Catalina or later (Intel and Apple Silicon)
- Windows: Windows 10/11 Pro, Enterprise, or Education (Build 19041+)
- WSL 2: Required for Windows support
Hardware Requirements
- RAM: 4GB minimum, 8GB recommended for agent workloads
- CPU: 2 cores minimum, 4+ cores recommended for concurrent agents
- Storage: 4GB free space minimum, SSD recommended for image layers
- Virtualization: Must be enabled in BIOS/UEFI
Installation Methods
Method 1: Docker Desktop (Recommended for Mac/Windows)
# Download from official website
# https://www.docker.com/products/docker-desktop
# After installation, verify
docker --version
docker compose version
Method 2: Docker Engine (Linux)
# Ubuntu/Debian
sudo apt update
sudo apt install apt-transport-https ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install docker-ce docker-ce-cli containerd.io
# Add user to docker group (avoid sudo)
sudo usermod -aG docker $USER
newgrp docker
# Verify installation
docker run hello-world
Post-Installation for Agent Development
# Create agent development network
docker network create agent-dev-network
# Pull common base images for agents
docker pull python:3.11-slim
docker pull rust:1.70-alpine
docker pull node:18-alpine
# Verify agent-ready setup
docker run --rm python:3.11-slim python --version
Core Concepts & Architecture
Docker Objects for Agent Development
Images as Agent Templates
Images serve as templates for agent environments:
# List agent-ready images
docker images --filter "reference=*agent*"
# Build agent image
docker build -t my-agent:latest .
# Tag for different environments
docker tag my-agent:latest my-agent:dev
docker tag my-agent:latest my-agent:prod
Containers as Agent Instances
Each container runs a specific agent instance:
# Run agent container
docker run -d --name agent-worker-1 my-agent:latest
# Scale agents horizontally
docker run -d --name agent-worker-2 my-agent:latest
docker run -d --name agent-worker-3 my-agent:latest
# Monitor agent containers
docker ps --filter "name=agent-*"
Agent Networking Patterns
Isolated Agent Networks
# Create isolated network for sensitive agents
docker network create --internal secure-agents
# Connect agents to secure network
docker run -d --name secure-agent --network secure-agents my-agent:latest
Agent Communication Networks
# Create shared network for agent collaboration
docker network create agent-mesh
# Deploy coordinating agents
docker run -d --name coordinator --network agent-mesh coordinator-agent:latest
docker run -d --name worker-1 --network agent-mesh worker-agent:latest
docker run -d --name worker-2 --network agent-mesh worker-agent:latest
Dockerfile & Image Creation
Understanding Dockerfiles
A Dockerfile is a text document that contains all the commands a user could run on the command line to create an image. For agent development, Dockerfiles define the exact environment each agent needs.
Basic Dockerfile Structure
# Base image selection
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Copy requirements and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application code
COPY . .
# Expose port (if applicable)
EXPOSE 8080
# Define environment variables
ENV AGENT_MODE=production
ENV LOG_LEVEL=info
# Run the agent
CMD ["python", "agent.py"]
Best Practices for Agent Dockerfiles
- Choose Minimal Base Images: Use
alpine
orslim
variants to reduce attack surface and image size. - Layer Caching: Put frequently changing files (like agent code) after stable layers (like dependencies).
- Multi-Stage Builds: Separate build-time dependencies from runtime.
Multi-Stage Build Example for Agents
# Build stage for compiling agent dependencies
FROM rust:1.70 AS builder
WORKDIR /app
COPY Cargo.toml Cargo.lock ./
RUN cargo build --release
COPY src ./src
RUN cargo build --release
# Runtime stage - minimal image
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY --from=builder /app/target/release/agent /usr/local/bin/agent
USER 1000:1000 # Non-root user
CMD ["agent"]
Image Optimization Techniques
- Use .dockerignore: Exclude unnecessary files like
.git
,node_modules
, tests. - Combine RUN Commands: Reduce layers by combining related RUN instructions.
- Remove Cache After Install: Clean package caches in the same layer as installation.
# Optimized dependency installation
RUN apt-get update && \
apt-get install -y python3-pip && \
pip install --no-cache-dir -r requirements.txt && \
apt-get purge -y --auto-remove python3-pip && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
Agent-Specific Image Patterns
For AI agents, consider including ML libraries, but keep images lean:
FROM python:3.11-slim
# Install only necessary ML components
RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
# For GPU agents (if using NVIDIA Docker)
# FROM nvidia/cuda:11.8-runtime-ubuntu20.04
# RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
Container Management
Basic Container Lifecycle
Creating and Running Containers
# Run a container from an image
docker run -d --name my-agent python:3.11-slim python -c "print('Agent ready')"
# Run with port mapping
docker run -d -p 8080:8080 --name web-agent my-agent:latest
# Run with environment variables
docker run -d -e API_KEY=secret --name secure-agent my-agent:latest
# Interactive container for debugging
docker run -it --rm my-agent:latest bash
Managing Running Containers
# List running containers
docker ps
# List all containers (including stopped)
docker ps -a
# Stop a container
docker stop my-agent
# Start a stopped container
docker start my-agent
# Restart a container
docker restart my-agent
# Remove a stopped container
docker rm my-agent
Advanced Container Management for Agents
Resource Limiting
# CPU and memory limits for agents
docker run -d \
--name limited-agent \
--cpus="0.5" \
--memory="512m" \
my-agent:latest
# GPU allocation for ML agents
docker run -d \
--gpus device=0 \
--name gpu-agent \
nvidia/cuda:11.8-runtime-ubuntu20.04 \
python agent.py
Volume Mounting for Persistent Data
# Mount local directory for agent data
docker run -d \
-v /host/agent-data:/app/data \
--name persistent-agent \
my-agent:latest
# Named volumes for better management
docker volume create agent-logs
docker run -d \
-v agent-logs:/app/logs \
my-agent:latest
Health Checks
# Add healthcheck to Dockerfile
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
# Inspect container health
docker inspect --format='{{.State.Health.Status}}' my-agent
Container Logging and Monitoring
# View container logs
docker logs my-agent
# Follow logs in real-time
docker logs -f my-agent
# Logs with timestamps
docker logs -t my-agent
# Inspect running processes
docker top my-agent
Networking & Storage
Docker Networking Fundamentals
Default Networking
- Bridge Network: Default for standalone containers (isolated by default)
- Host Network: Container shares host's network stack
- None Network: Container has no network interface
# Run on host network (high performance, less isolation)
docker run -d --network host my-agent:latest
# Run with no networking (for offline agents)
docker run -d --network none my-agent:latest
Custom Bridge Networks
# Create custom network
docker network create my-app-network
# Run containers on custom network
docker run -d --name app --network my-app-network my-app:latest
docker run -d --name db --network my-app-network postgres:13
# Containers can communicate by name
# From app container: curl db:5432
Advanced Networking for Agents
Multi-Container Agent Communication
# Create agent-specific network
docker network create agent-network
# Deploy agents with service discovery
docker run -d --name coordinator --network agent-network coordinator:latest
docker run -d --name worker1 --network agent-network worker:latest
docker run -d --name worker2 --network agent-network worker:latest
# Workers can reach coordinator by name
# curl coordinator:8080/api
External Network Access
# docker-compose.yml with external network
version: '3.8'
services:
agent:
image: my-agent:latest
networks:
- default
- external-monitoring
networks:
external-monitoring:
external: true
name: monitoring-net
Storage and Volumes
Volume Types
- Named Volumes: Managed by Docker, best for persistent data
- Bind Mounts: Map host directories, good for development
- Tmpfs Mounts: In-memory storage, temporary data
# Create named volume
docker volume create app-data
# Run with named volume
docker run -d -v app-data:/app/data my-app:latest
# Bind mount for development
docker run -d -v $(pwd)/src:/app/src -w /app/src my-app:latest
Agent Data Persistence Patterns
# Persistent agent state
docker volume create agent-state
docker run -d \
-v agent-state:/app/state \
-v /var/log/agents:/app/logs \
my-agent:latest
# Shared volumes for agent collaboration
docker volume create shared-knowledge
docker run -d -v shared-knowledge:/knowledge worker1:latest
docker run -d -v shared-knowledge:/knowledge worker2:latest
Backup and Restore Volumes
# Backup volume
docker run --rm -v agent-data:/source -v $(pwd):/backup \
alpine tar czf /backup/agent-data.tar.gz -C /source .
# Restore volume
docker run --rm -v agent-data:/target -v $(pwd):/backup \
alpine tar xzf /backup/agent-data.tar.gz -C /target
Storage Drivers
- Overlay2 (default on modern Linux): Good performance, copy-on-write
- AUFS: Legacy, avoid for new setups
- Btrfs/ZFS: Advanced features, filesystem-specific
# Check storage driver
docker info | grep 'Storage Driver'
# Configure storage driver (daemon.json)
# {
# "storage-driver": "overlay2"
# }
Docker Compose
Introduction to Docker Compose
Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application's services.
Basic docker-compose.yml Structure
version: '3.8'
services:
web:
build: .
ports:
- "8080:80"
depends_on:
- db
db:
image: postgres:13
environment:
POSTGRES_DB: myapp
POSTGRES_USER: user
POSTGRES_PASSWORD: password
volumes:
- db-data:/var/lib/postgresql/data
volumes:
db-data:
Running Compose Applications
# Start services
docker compose up -d
# View logs
docker compose logs -f web
# Scale services
docker compose up -d --scale web=3
# Stop services
docker compose down
# Remove volumes (data loss warning)
docker compose down -v
Compose for Agent Orchestration
Simple Agent Swarm
version: '3.8'
services:
coordinator:
build: ./coordinator
ports:
- "8080:8080"
environment:
- REDIS_URL=redis://redis:6379
depends_on:
- redis
worker:
build: ./worker
deploy:
replicas: 5
environment:
- COORDINATOR_URL=http://coordinator:8080
depends_on:
- coordinator
- redis
redis:
image: redis:alpine
volumes:
- redis-data:/data
volumes:
redis-data:
Development Compose Setup
version: '3.8'
services:
agent-dev:
build:
context: .
dockerfile: Dockerfile.dev
volumes:
- .:/app
- /app/node_modules # Avoid overwriting node_modules
ports:
- "8000:8000"
- "9229:9229" # Node debugger
environment:
- DEBUG=true
- NODE_ENV=development
command: npm run dev
mock-api:
image: postman/newman:alpine
volumes:
- ./mocks:/etc/newman
command: run collection.json
database:
image: postgres:15
environment:
POSTGRES_DB: agent_dev
POSTGRES_USER: dev
POSTGRES_PASSWORD: dev
ports:
- "5432:5432"
volumes:
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:
Advanced Compose Features
Health Checks in Compose
services:
agent:
image: my-agent:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
depends_on:
db:
condition: service_healthy
Resource Constraints
services:
cpu-intensive-agent:
image: compute-agent:latest
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
reservations:
cpus: '0.5'
memory: 1G
Secrets Management
# docker-compose.yml
version: '3.8'
services:
agent:
image: my-agent:latest
secrets:
- api_key
- db_password
secrets:
api_key:
file: ./secrets/api_key.txt
db_password:
file: ./secrets/db_pass.txt
Compose Override Patterns
# docker-compose.yml (base)
version: '3.8'
services:
agent:
image: my-agent:${ENV:-dev}
ports:
- "8080:8080"
# docker-compose.prod.yml (overrides)
version: '3.8'
services:
agent:
image: my-agent:prod
ports:
- "80:8080"
environment:
- NODE_ENV=production
# Run with overrides
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
Security Best Practices
Agent Container Security
# Secure agent Dockerfile
FROM python:3.11-slim-bullseye
# Create non-root user for agent
RUN groupadd -r agent && useradd -r -g agent agent
# Install dependencies as root
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy agent code and set ownership
COPY --chown=agent:agent agent/ /app/
WORKDIR /app
# Switch to agent user
USER agent
# Read-only filesystem (agent writes to mounted volumes only)
# docker run --read-only -v /tmp my-agent:latest
Network Security for Agents
# Secure agent network configuration
services:
public-agent:
image: my-agent:latest
networks:
- public-facing
- internal-agents
private-agent:
image: sensitive-agent:latest
networks:
- internal-agents
# No external network access
data-agent:
image: data-processor:latest
networks:
- internal-agents
- database-network
networks:
public-facing:
internal-agents:
internal: true # No external access
database-network:
internal: true
Resource Limits for Agent Safety
# Resource-limited agent deployment
services:
worker-agent:
image: worker-agent:latest
deploy:
resources:
limits:
cpus: '0.5' # Prevent CPU hogging
memory: 512M # Limit memory usage
reservations:
cpus: '0.1'
memory: 128M
security_opt:
- no-new-privileges:true # Prevent privilege escalation
cap_drop:
- ALL # Drop all capabilities
cap_add:
- NET_BIND_SERVICE # Only add what's needed
Additional Security Measures
- Regular Scanning: Use tools like Trivy or Clair to scan images for vulnerabilities.
- Signing Images: Use Docker Content Trust or cosign for image signing.
- Runtime Security: Implement tools like Falco for runtime behavior monitoring.
- Secrets Management: Never hardcode secrets; use Docker secrets or external managers like Vault.
# Scan image for vulnerabilities
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy image my-agent:latest
# Enable content trust
export DOCKER_CONTENT_TRUST=1
docker pull my-agent:latest
Performance Optimization
Agent Image Optimization
# Multi-stage build for minimal agent images
FROM python:3.11 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
FROM python:3.11-slim AS runtime
# Copy only what's needed
COPY --from=builder /root/.local /root/.local
COPY agent/ /app/
WORKDIR /app
# Ensure agent can find packages
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "main.py"]
Agent Resource Monitoring
# Monitor agent resource usage
docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.Name}}" \
$(docker ps --filter "name=agent-*" -q)
# Agent-specific monitoring script
#!/bin/bash
while true; do
echo "$(date): Agent Resource Usage"
docker stats --no-stream --format "{{.Container}}: CPU {{.CPUPerc}}, Memory {{.MemUsage}}" \
$(docker ps --filter "name=agent-*" -q)
sleep 30
done
High-Performance Agent Networking
# Create optimized network for agent communication
docker network create \
--driver bridge \
--opt com.docker.network.driver.mtu=9000 \
--opt com.docker.network.bridge.name=agent-br0 \
high-perf-agents
# Deploy agents with performance tuning
docker run -d \
--name fast-agent \
--network high-perf-agents \
--memory=1g \
--cpus=2 \
my-agent:latest
Build and Runtime Optimizations
- Layer Caching: Order Dockerfile instructions to maximize cache hits.
- Parallel Builds: Use BuildKit for parallel layer building.
- Image Pruning: Regularly clean up unused images, containers, and volumes.
# Enable BuildKit for faster builds
export DOCKER_BUILDKIT=1
docker build -t my-agent:latest .
# Prune unused resources
docker system prune -a --volumes
# Analyze image layers
docker history my-agent:latest
CI/CD Integration
Docker in CI/CD Pipelines
Docker enables consistent environments across CI, testing, and deployment stages.
GitHub Actions Example
# .github/workflows/docker-ci.yml
name: Docker CI/CD
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to DockerHub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: user/app:latest,user/app:${{ github.sha }}
- name: Run tests
run: |
docker run --rm user/app:${{ github.sha }} npm test
deploy:
needs: build
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- name: Deploy to staging
run: |
# Deploy to Kubernetes, ECS, etc.
kubectl set image deployment/app agent=user/app:${{ github.sha }}
GitLab CI Example
# .gitlab-ci.yml
stages:
- build
- test
- deploy
variables:
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: "/certs"
build:
stage: build
image: docker:20.10
services:
- docker:20.10-dind
script:
- docker build -t $CI_REGISTRY_IMAGE:latest .
- docker push $CI_REGISTRY_IMAGE:latest
only:
- main
test:
stage: test
image: docker:20.10
services:
- docker:20.10-dind
script:
- docker run --rm $CI_REGISTRY_IMAGE:latest npm test
deploy:
stage: deploy
image: alpine
script:
- apk add --no-cache kubectl
- kubectl set image deployment/agent agent=$CI_REGISTRY_IMAGE:latest
only:
- main
environment:
name: production
Automated Testing with Docker
Integration Testing
# docker-compose.test.yml
version: '3.8'
services:
agent:
build: .
environment:
- TEST_MODE=true
depends_on:
- test-db
test-db:
image: postgres:15
environment:
POSTGRES_DB: test
POSTGRES_USER: test
POSTGRES_PASSWORD: test
tests:
image: node:18-alpine
volumes:
- .:/app
working_dir: /app
command: npm run test:integration
depends_on:
- agent
- test-db
# Run integration tests
docker compose -f docker-compose.test.yml up --abort-on-container-exit tests
Security Scanning in CI
# Add to CI pipeline
security-scan:
stage: test
image: aquasec/trivy:latest
script:
- trivy image --exit-code 1 --no-progress $CI_REGISTRY_IMAGE:latest
allow_failure: false
Deployment Strategies
Blue-Green Deployments
#!/bin/bash
# blue-green-deploy.sh
OLD_VERSION=$(kubectl get deployment agent -o jsonpath='{.spec.template.spec.containers[0].image}')
NEW_VERSION="my-agent:${CI_COMMIT_SHA}"
# Deploy green
kubectl apply -f agent-green.yaml # image: $NEW_VERSION
# Health check
kubectl rollout status deployment/agent-green
# Switch traffic
kubectl patch service agent -p '{"spec":{"selector":{"version":"green"}}}'
# Cleanup blue
kubectl delete deployment agent-blue
Canary Releases
# k8s canary deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-canary
spec:
replicas: 2 # 10% traffic
selector:
matchLabels:
app: agent
version: canary
template:
metadata:
labels:
app: agent
version: canary
spec:
containers:
- name: agent
image: my-agent:v1.1.0-canary
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-stable
spec:
replicas: 18 # 90% traffic
selector:
matchLabels:
app: agent
version: stable
template:
metadata:
labels:
app: agent
version: stable
spec:
containers:
- name: agent
image: my-agent:v1.0.0
Rollback Strategies
# Rollback to previous version
kubectl rollout undo deployment/agent
# Verify rollback
kubectl rollout status deployment/agent
kubectl logs deployment/agent -f
# Image-based rollback
kubectl set image deployment/agent agent=my-agent:v1.0.0
kubectl rollout status deployment/agent
Agent Integration Patterns
Single Agent Container
# Dockerfile for basic agent
FROM python:3.11-slim
# Install agent dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy agent code
WORKDIR /app
COPY agent/ .
# Set up agent user (security)
RUN useradd -m agent
USER agent
# Health check for agent
HEALTHCHECK --interval=30s --timeout=10s \
CMD python health_check.py || exit 1
# Run agent
CMD ["python", "main.py"]
Multi-Agent Orchestration
# docker-compose.yml for agent swarm
version: '3.8'
services:
coordinator:
build: ./coordinator
environment:
- AGENT_NETWORK=agent-mesh
networks:
- agent-mesh
depends_on:
- redis
worker-agents:
build: ./worker
deploy:
replicas: 3
environment:
- COORDINATOR_URL=http://coordinator:8080
networks:
- agent-mesh
depends_on:
- coordinator
redis:
image: redis:alpine
networks:
- agent-mesh
networks:
agent-mesh:
driver: bridge
Agent Development Environment
# docker-compose.dev.yml
version: '3.8'
services:
agent-dev:
build:
context: .
dockerfile: Dockerfile.dev
volumes:
# Mount source for live reload
- ./agent:/app/agent
# Persistent agent data
- agent-data:/app/data
ports:
- "8080:8080" # Agent API
- "9229:9229" # Debug port
environment:
- DEBUG=true
- LOG_LEVEL=debug
networks:
- dev-network
mock-services:
image: wiremock/wiremock:latest
ports:
- "8081:8080"
volumes:
- ./mocks:/home/wiremock
networks:
- dev-network
volumes:
agent-data:
networks:
dev-network:
driver: bridge
Production Agent Deployment
#!/bin/bash
# deploy-agents.sh
# Build production images
docker build -t agent:${VERSION} .
# Deploy with rolling update
docker service update --image agent:${VERSION} production-agents
# Health check deployment
timeout 60 bash -c 'until docker service ps production-agents | grep Running; do sleep 1; done'
echo "Agents deployed successfully"
Troubleshooting
Common Agent Issues
Agent Won't Start
# Check agent logs
docker logs agent-container-name
# Check agent health
docker exec agent-container-name python health_check.py
# Inspect agent configuration
docker inspect agent-container-name | grep -A 10 "Env"
Agent Performance Issues
# Monitor agent resources in real-time
docker stats agent-container-name
# Check agent system calls
docker exec agent-container-name strace -p 1
# Profile agent memory usage
docker exec agent-container-name cat /proc/1/status
Agent Network Problems
# Test agent connectivity
docker exec agent-container-name ping other-agent
# Check agent network configuration
docker exec agent-container-name ip addr show
# Verify agent DNS resolution
docker exec agent-container-name nslookup coordinator
Agent Development Debugging
# Run agent in debug mode
docker run -it --rm \
-v $(pwd):/app \
-p 9229:9229 \
my-agent:dev \
python -m pdb main.py
# Attach debugger to running agent
docker exec -it agent-container-name bash
# View agent logs with timestamps
docker logs -t -f agent-container-name
General Docker Troubleshooting
Image/Build Issues
# Clear build cache
docker builder prune -a
# Debug build process
docker build --no-cache --progress=plain .
# Check for layer conflicts
docker history my-image:latest
Storage/Volume Problems
# List volumes
docker volume ls
# Inspect volume
docker volume inspect my-volume
# Remove unused volumes
docker volume prune
Network Debugging
# Inspect network
docker network inspect bridge
# Test connectivity between containers
docker exec container1 ping container2
# Check DNS resolution
docker run --rm busybox nslookup google.com
Common Error Messages
- "no space left on device": Clean up images/containers/volumes with
docker system prune -a
- "permission denied": Add user to docker group or use sudo
- "port already in use": Check with
netstat -tuln | grep :8080
and stop conflicting process - "image not found": Verify image name/tag, pull if necessary
- "container name already in use": Remove existing container or use different name
Real-World Workflows
CI/CD Pipeline for Agents
# .github/workflows/agent-pipeline.yml
name: Agent CI/CD Pipeline
on:
push:
branches: [main]
paths: ['agents/**']
jobs:
test-agents:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build agent images
run: |
docker build -t test-agent:latest ./agents/
- name: Test agent functionality
run: |
docker run --rm test-agent:latest python -m pytest
- name: Security scan
run: |
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy image test-agent:latest
deploy-agents:
needs: test-agents
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- name: Deploy to production
run: |
docker tag test-agent:latest production-agent:${{ github.sha }}
docker push production-agent:${{ github.sha }}
Agent Monitoring and Observability
# Monitoring stack for agents
version: '3.8'
services:
agents:
image: my-agent:latest
deploy:
replicas: 3
networks:
- agent-network
- monitoring
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
networks:
- monitoring
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
networks:
- monitoring
networks:
agent-network:
monitoring:
Agent Backup and Recovery
#!/bin/bash
# backup-agents.sh
# Backup agent configurations
docker run --rm \
-v agent-configs:/source \
-v $(pwd):/backup \
alpine tar czf /backup/agent-configs-$(date +%Y%m%d).tar.gz -C /source .
# Backup agent data
docker run --rm \
-v agent-data:/source \
-v $(pwd):/backup \
alpine tar czf /backup/agent-data-$(date +%Y%m%d).tar.gz -C /source .
# Export agent images
docker save my-agent:latest | gzip > my-agent-$(date +%Y%m%d).tar.gz
Development to Production Workflow
- Local Development: Use docker-compose.dev.yml with volume mounts
- CI Testing: Build and test images in isolated environment
- Staging Deployment: Deploy to staging cluster with production config
- Production Rollout: Blue-green or canary deployment
- Monitoring: Set up alerts for key metrics
- Rollback Plan: Automated rollback on failure detection
Best Practices
Agent Development Workflow
#!/bin/bash
# agent-dev-workflow.sh
# Start development environment
echo "Starting agent development environment..."
docker-compose -f docker-compose.dev.yml up -d
# Wait for services
sleep 5
# Run agent tests
docker-compose -f docker-compose.dev.yml exec agent-dev python -m pytest
# Show agent status
docker-compose -f docker-compose.dev.yml ps
echo "Agent development environment ready!"
echo "Agent API: http://localhost:8080"
echo "Debug port: 9229"
Agent Production Checklist
- Agent runs as non-root user
- Resource limits configured
- Health checks implemented
- Logging configured
- Security scanning passed
- Performance benchmarks met
- Backup strategy in place
- Monitoring configured
- Documentation updated
Agent Security Guidelines
- Principle of Least Privilege: Agents should have minimal required permissions
- Network Isolation: Use internal networks for agent communication
- Resource Limits: Prevent agents from consuming all system resources
- Input Validation: Validate all inputs before processing
- Secret Management: Use Docker secrets or external secret managers
- Regular Updates: Keep base images and dependencies updated
- Audit Logging: Log all agent actions for compliance
General Docker Best Practices
- Use Official Images: Start with trusted base images
- Keep Images Small: Multi-stage builds, minimal base images
- Scan Regularly: Automate vulnerability scanning
- Tag Consistently: Use semantic versioning for images
- Clean Up: Regularly prune unused resources
- Document: Maintain Dockerfiles as code with clear comments
- Test Locally: Validate images before pushing to registry
This Docker primer provides the foundation for building secure, scalable, and efficient agent-enhanced development workflows. By following these patterns and best practices, teams can confidently deploy AI agents in production environments.