Skip to main content

Kubernetes

AI-Optimized Primer | Production-ready container orchestration patterns for Claude Code, Cursor & AI agents

Primer Metadata
  • Category: Infrastructure & DevOps
  • Difficulty: Intermediate to Advanced
  • Examples: 85+ copy-paste YAML configurations
  • Tokens: 32,000+ comprehensive coverage
  • Last Updated: January 28, 2025
  • AI-Ready: Optimized for agent consumption

The Ultimate Guide to Container Orchestration and Cloud-Native Application Deployment Tags: #kubernetes #containers #orchestration #devops #cloud-native

Table of Contents

  1. Introduction & Philosophy
  2. Installation & Setup
  3. Core Concepts & Architecture
  4. Workload Management
  5. Networking & Service Discovery
  6. Storage & Persistence
  7. Configuration Management
  8. Security Best Practices
  9. Monitoring & Observability
  10. Advanced Features
  11. Custom Resource Definitions (CRDs) and Custom Resources
  12. Helm Package Management
  13. Service Mesh
  14. Serverless and Knative
  15. Autoscaling
  16. Resource Management
  17. Multi-Cluster Management
  18. Real-World Patterns and Use Cases
  19. GitOps Workflows
  20. Troubleshooting
  21. CI/CD & GitOps Integration
  22. Best Practices
  23. Resources & Next Steps

Introduction & Philosophy

What is Kubernetes?

Kubernetes (often abbreviated as K8s) is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It provides a framework for running distributed systems resiliently, handling scaling and failover for applications, and providing deployment patterns.

Core Philosophy

"Declarative Configuration and Desired State Management" - Kubernetes embodies the principle that you declare what you want, not how to achieve it. The system continuously works to match the actual state to your desired state.

Key philosophical principles:

  • Declarative Configuration: Describe the desired state, and Kubernetes works to achieve and maintain it
  • Automation: Automates manual processes involved in deploying, scaling, and managing containerized applications
  • Self-Healing: Automatically restarts failed containers, replaces and reschedules containers when nodes die
  • Extensibility: Designed to be extended and customized without changing the core system
  • Portability: Runs anywhere - on-premises, hybrid cloud, public cloud, and edge computing

Key Differentiators

  1. Not a PaaS: Doesn't limit supported languages/runtimes or frameworks
  2. Container-Centric: Manages containers rather than VMs or bare metal
  3. Loosely Coupled: Components are extensible and can be swapped out
  4. Ecosystem Rich: Large and growing ecosystem of tools and integrations
  5. Cloud-Native: Designed for cloud architectures and microservices

Architecture Overview

┌────────────────────────────────────────────────────────────┐
│ Control Plane │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ API Server │ │ etcd │ │ Controller Mgr │ │
│ │ │ │ (Key-Value │ │ │ │
│ │ Frontend │ │ Store) │ │ Node Controller │ │
│ │ │ │ │ │ Replication Ctrl │ │
│ └─────────────┘ └──────────────┘ └──────────────────┘ │
│ │ │ │
│ ┌──────────────┐ ┌────────────────────┐ │
│ │ Scheduler │ │ Cloud Controller │ │
│ │ │ │ Mgr │ │
│ └──────────────┘ └────────────────────┘ │
└────────────────────────────────────────────────────────────┘

│ Watches/Communicates

┌─────────────────────────────────────────────────────────────┐
│ Worker Nodes │
│ │
│ Node 1 Node 2 Node N
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ │ kubelet │ │ kubelet │ │ kubelet │
│ │ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ │ kube-proxy │ │ kube-proxy │ │ kube-proxy │
│ │ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ │ Container │ │ Container │ │ Container │
│ │ Runtime │ │ Runtime │ │ Runtime │
│ │ (Docker/ │ │ (Docker/ │ │ (Docker/ │
│ │ containerd │ │ containerd │ │ containerd │
│ │ /CRI-O) │ │ /CRI-O) │ │ /CRI-O) │
│ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ │ Pods │ │ Pods │ │ Pods │
│ │ (Containers)│ │ (Containers)│ │ (Containers)│
│ └─────────────┘ └─────────────┘ └─────────────┘
└─────────────────────────────────────────────────────────────┘

Installation & Setup

System Requirements

Minimum Requirements for Production Clusters

  • CPU: 2+ CPUs per machine
  • RAM: 2GB+ RAM per machine
  • Disk: 20GB+ free disk space
  • Network: Full network connectivity between all machines
  • Unique Identifiers: Unique hostname, MAC address, and product_uuid for each node
  • Swap: Disabled (kubelet fails to start if swap is detected by default)

Required Ports

  • Control Plane Nodes: 6443 (API server), 2379-2380 (etcd), 10250 (kubelet), 10259 (kube-scheduler), 10257 (kube-controller-manager)
  • Worker Nodes: 10250 (kubelet), 30000-32767 (NodePort services)

Local Development Setups

Minikube

# Installation
curl -LO https://github.com/kubernetes/minikube/releases/latest/download/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube

# Start cluster
minikube start --driver=docker

# Access dashboard
minikube dashboard

# Example deployment
kubectl create deployment hello-minikube --image=kicbase/echo-server:1.0
kubectl expose deployment hello-minikube --type=NodePort --port=8080

Kind (Kubernetes in Docker)

# Installation
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.30.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind

# Create single-node cluster
kind create cluster

# Create multi-node cluster
kind create cluster --config - <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
EOF

Production Installation Methods

kubeadm

# On all nodes:
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl gpg

# Add Kubernetes repository
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.34/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.34/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

# Install Kubernetes components
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

# Initialize control plane (on master node)
sudo kubeadm init --pod-network-cidr=192.168.0.0/16

# Set up kubectl for regular user
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# Install network plugin (e.g., Calico)
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

Cloud Provider Options

Amazon EKS

# Install eksctl
curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin

# Create cluster
eksctl create cluster \
--name my-cluster \
--region us-east-1 \
--node-type t3.medium \
--nodes 3

Google Kubernetes Engine (GKE)

# Create cluster
gcloud container clusters create my-cluster \
--zone us-central1-a \
--machine-type e2-medium \
--num-nodes 3

# Configure kubectl
gcloud container clusters get-credentials my-cluster --zone us-central1-a

Azure Kubernetes Service (AKS)

# Create resource group
az group create --name myResourceGroup --location eastus

# Create AKS cluster
az aks create \
--resource-group myResourceGroup \
--name myAKSCluster \
--node-count 3 \
--enable-addons monitoring \
--generate-ssh-keys

# Configure kubectl
az aks get-credentials --resource-group myResourceGroup --name myAKSCluster

Post-Installation Configuration

Common Setup Tasks

# Verify cluster status
kubectl cluster-info
kubectl get nodes

# Install Helm package manager
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# Install Kubernetes Dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0/aio/deploy/recommended.yaml

Core Concepts & Architecture

Kubernetes Objects

Object Structure

All Kubernetes objects contain two main fields:

  • spec: Describes the desired state
  • status: Describes the current state, updated by Kubernetes
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: my-container
image: nginx:latest
status:
phase: Running

Common Kubernetes Objects

  1. Pod: Smallest deployable unit
  2. Service: Network abstraction for pods
  3. Deployment: Manages ReplicaSets and provides declarative updates
  4. StatefulSet: Manages stateful applications
  5. ConfigMap: Stores non-sensitive configuration data
  6. Secret: Stores sensitive data
  7. Volume: Storage for containers
  8. Namespace: Virtual cluster within a physical cluster
  9. Ingress: Manages external access to services

Namespaces

Namespaces provide a scope for names within a cluster:

# List namespaces
kubectl get namespaces

# Create namespace
kubectl create namespace my-namespace

# Set current namespace
kubectl config set-context --current --namespace=my-namespace

Labels and Selectors

Labels are key/value pairs attached to objects:

metadata:
labels:
app: my-app
version: v1.2
environment: production

Selectors are used to filter objects:

selector:
matchLabels:
app: my-app
matchExpressions:
- key: environment
operator: In
values:
- production
- staging

Annotations

Annotations provide metadata about objects:

metadata:
annotations:
description: "Frontend web server"
contact: "team@example.com"
last-modified: "2024-01-15T10:00:00Z"

Workload Management

Pods

Pods are the smallest deployable units in Kubernetes:

apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
spec:
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80

Multi-Container Pods

apiVersion: v1
kind: Pod
metadata:
name: multi-container-pod
spec:
containers:
- name: nginx
image: nginx:1.25
volumeMounts:
- name: shared-data
mountPath: /usr/share/nginx/html
- name: content-creator
image: busybox
command: ["/bin/sh", "-c"]
args:
- while true; do
echo "$(date) - Hello from sidecar!" > /usr/share/nginx/html/index.html;
sleep 10;
done
volumeMounts:
- name: shared-data
mountPath: /usr/share/nginx/html
volumes:
- name: shared-data
emptyDir: {}

Deployments

Deployments manage ReplicaSets and provide declarative updates:

apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80

Rolling Updates

# Update deployment image
kubectl set image deployment/nginx-deployment nginx=nginx:1.26

# Rollback to previous version
kubectl rollout undo deployment/nginx-deployment

# Check rollout status
kubectl rollout status deployment/nginx-deployment

StatefulSets

StatefulSets for stateful applications:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres-statefulset
spec:
serviceName: postgres
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
env:
- name: POSTGRES_DB
value: mydb
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi

DaemonSets

DaemonSets ensure all (or some) nodes run a pod:

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-logging
spec:
selector:
matchLabels:
name: fluentd-logging
template:
metadata:
labels:
name: fluentd-logging
spec:
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- name: fluentd
image: fluent/fluentd:v1.16
volumeMounts:
- name: varlog
mountPath: /var/log
- name: containers
mountPath: /var/lib/docker/containers
volumes:
- name: varlog
hostPath:
path: /var/log
- name: containers
hostPath:
path: /var/lib/docker/containers

Jobs and CronJobs

Jobs for batch processing:

apiVersion: batch/v1
kind: Job
metadata:
name: batch-job
spec:
template:
spec:
containers:
- name: batch
image: busybox
command: ["echo", "Hello from batch job!"]
restartPolicy: Never

CronJobs for scheduled tasks:

apiVersion: batch/v1
kind: CronJob
metadata:
name: hello-cronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
command: ["echo", "Hello from cron job!"]
restartPolicy: OnFailure

Networking & Service Discovery

Cluster Networking Models

Kubernetes imposes the following fundamental requirements on any networking implementation:

  • All pods can communicate with all other pods without NAT
  • All nodes can communicate with all pods without NAT
  • The IP that a pod sees itself as is the same IP that others see it as

Service Types

ClusterIP

Default service type, exposes the service on a cluster-internal IP:

apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: 8080

NodePort

Exposes the service on each Node's IP at a static port:

apiVersion: v1
kind: Service
metadata:
name: my-nodeport-service
spec:
type: NodePort
selector:
app: my-app
ports:
- port: 80
targetPort: 80
nodePort: 30007

LoadBalancer

Exposes the service externally using a cloud provider's load balancer:

apiVersion: v1
kind: Service
metadata:
name: my-loadbalancer-service
spec:
type: LoadBalancer
selector:
app: my-app
ports:
- port: 80
targetPort: 80

Service Discovery

Kubernetes provides two primary modes of service discovery:

  • DNS: Services are assigned DNS names
  • Environment Variables: Services are available as environment variables

Network Policies

Network policies control traffic flow between pods:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-network-policy
namespace: default
spec:
podSelector:
matchLabels:
role: db
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
project: myproject
ports:
- protocol: TCP
port: 6379
egress:
- to:
- ipBlock:
cidr: 10.0.0.0/24
ports:
- protocol: TCP
port: 5978

Ingress

Ingress manages external access to services:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: example-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- http:
paths:
- path: /app1
pathType: Prefix
backend:
service:
name: app1-service
port:
number: 80
- path: /app2
pathType: Prefix
backend:
service:
name: app2-service
port:
number: 80

Storage & Persistence

Volume Types

emptyDir

EmptyDir is created when a Pod is assigned to a node and exists as long as that Pod is running on that node:

volumes:
- name: cache-volume
emptyDir:
sizeLimit: 500Mi

hostPath

HostPath mounts a file or directory from the host node's filesystem:

volumes:
- name: host-volume
hostPath:
path: /data
type: DirectoryOrCreate

Persistent Volumes and Claims

PersistentVolume (PV)

apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: slow
hostPath:
path: /mnt/data

PersistentVolumeClaim (PVC)

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 8Gi
storageClassName: slow

Storage Classes

StorageClasses enable dynamic provisioning:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
- debug
volumeBindingMode: Immediate

Using Volumes in Pods

apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: my-container
image: nginx
volumeMounts:
- name: my-volume
mountPath: /data
volumes:
- name: my-volume
persistentVolumeClaim:
claimName: my-pvc

Configuration Management

ConfigMaps

ConfigMaps store non-sensitive configuration data:

apiVersion: v1
kind: ConfigMap
metadata:
name: my-config
data:
database_url: "postgres://localhost:5432/mydb"
feature_flags: |
feature1=true
feature2=false
feature3=true

Using ConfigMaps

As environment variables:

envFrom:
- configMapRef:
name: my-config

As mounted files:

volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
name: my-config

Secrets

Secrets store sensitive data:

apiVersion: v1
kind: Secret
metadata:
name: my-secret
type: Opaque
data:
username: YWRtaW4= # base64 encoded
password: MWYyZDFlMmU2N2Rm # base64 encoded

Kustomize

Kustomize enables configuration customization:

# kustomization.yaml
resources:
- deployment.yaml
- service.yaml

configMapGenerator:
- name: app-config
files:
- config.properties

images:
- name: my-app
newTag: v1.2.3

Security Best Practices

Authentication and Authorization

RBAC (Role-Based Access Control)

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: default
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list"]
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: default
subjects:
- kind: User
name: jane
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io

Pod Security

Security Context

securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000

Pod Security Standards

apiVersion: policy/v1
kind: PodSecurityPolicy
metadata:
name: restricted
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'

Network Security

Network Policies

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress

Secrets Management

External Secrets Operator

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: db-credentials
data:
- secretKey: username
remoteRef:
key: secret/data/database
property: username
- secretKey: password
remoteRef:
key: secret/data/database
property: password

Monitoring & Observability

Metrics Collection

Prometheus

apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod

Metrics Server

# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Check resource usage
kubectl top nodes
kubectl top pods

Logging

Fluentd Configuration

apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
data:
fluent.conf: |
<source>
@type tail
path /var/log/containers/*_app_*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
format json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</source>

<match kubernetes.**>
@type elasticsearch
host elasticsearch
port 9200
index_name fluentd
type_name _doc
</match>

Health Checks

Liveness and Readiness Probes

livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 15
periodSeconds: 20

readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 10

Advanced Features

Custom Resource Definitions (CRDs)

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: crontabs.stable.example.com
spec:
group: stable.example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
cronSpec:
type: string
image:
type: string
scope: Namespaced
names:
plural: crontabs
singular: crontab
kind: CronTab
shortNames:
- ct

Operators

Operator SDK Example

// controllers/crontab_controller.go
func (r *CronTabReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// Fetch the CronTab instance
instance := &stablev1.CronTab{}
err := r.Get(ctx, req.NamespacedName, instance)
if err != nil {
// Handle error
}

// Reconcile logic here
return ctrl.Result{}, nil
}

Service Mesh

Istio Gateway

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: bookinfo-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"

Custom Resource Definitions (CRDs) and Custom Resources

Overview

CRDs extend Kubernetes API to define custom resources, allowing you to treat your applications as native Kubernetes objects.

Creating a CRD

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.myapp.example.com
spec:
group: myapp.example.com
versions:
- name: v1alpha1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
engine:
type: string
enum: [mysql, postgresql, mongodb]
version:
type: string
size:
type: string
pattern: '^\d+(Gi|Mi)$'
replicas:
type: integer
minimum: 1
maximum: 10
status:
type: object
properties:
phase:
type: string
message:
type: string
readyReplicas:
type: integer
additionalPrinterColumns:
- name: Engine
type: string
jsonPath: .spec.engine
- name: Size
type: string
jsonPath: .spec.size
- name: Status
type: string
jsonPath: .status.phase
scope: Namespaced
names:
plural: databases
singular: database
kind: Database
shortNames:
- db

Creating a Custom Resource

apiVersion: myapp.example.com/v1alpha1
kind: Database
metadata:
name: prod-mysql
spec:
engine: mysql
version: "8.0"
size: 20Gi
replicas: 3

Validation and Webhooks

Admission Webhook for Validation

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
name: database-validator
webhooks:
- name: database-validator.myapp.example.com
rules:
- apiGroups: ["myapp.example.com"]
apiVersions: ["v1alpha1"]
operations: ["CREATE", "UPDATE"]
resources: ["databases"]
clientConfig:
service:
name: database-webhook-service
namespace: myapp-system
path: "/validate"
admissionReviewVersions: ["v1"]
sideEffects: None

Conversion Webhook for Version Migration

apiVersion: admissionregistration.k8s.io/v1
kind: ConversionReview
metadata:
name: database-conversion
webhooks:
- name: database-converter.myapp.example.com
conversionReviewVersions: ["v1", "v1alpha1"]
clientConfig:
service:
name: database-converter-service
namespace: myapp-system

Best Practices for CRDs

  1. Start with v1alpha1 for unstable APIs
  2. Use semantic versioning
  3. Provide clear validation schemas
  4. Include status subresource for reporting state
  5. Use labels and annotations for metadata
  6. Implement proper garbage collection
  7. Document API changes thoroughly

Operators and Operator Pattern

What is an Operator?

An operator extends Kubernetes to automate the management of complex applications using custom resources and controllers.

Operator Architecture

┌─────────────────────────────────────────────────┐
│ Kubernetes API │
└─────────────────────┬───────────────────────────┘

┌─────────────────────▼───────────────────────────┐
│ Custom Controller │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────┐ │
│ │ Watch CRDs │ │ Reconcile │ │ Act │ │
│ │ │──▶ Loop │──▶ │ │
│ └─────────────┘ └─────────────┘ └─────────┘ │
└─────────────────────────────────────────────────┘

Building an Operator with Operator SDK

Project Structure

my-operator/
├── api/
│ └── v1alpha1/
│ ├── database_types.go
│ ├── database_webhook.go
│ └── groupversion_info.go
├── controllers/
│ └── database_controller.go
├── config/
│ ├── crd/
│ ├── manager/
│ └── webhook/
├── main.go
└── go.mod

Controller Implementation

// controllers/database_controller.go
package controllers

import (
"context"

"k8s.io/apimachinery/pkg/runtime"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"

myappv1alpha1 "myapp.example.com/api/v1alpha1"
)

type DatabaseReconciler struct {
client.Client
Scheme *runtime.Scheme
}

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// Fetch the Database instance
database := &myappv1alpha1.Database{}
if err := r.Get(ctx, req.NamespacedName, database); err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}

// Check if deployment exists
deployment := &appsv1.Deployment{}
if err := r.Get(ctx, req.NamespacedName, deployment); err != nil {
if errors.IsNotFound(err) {
// Create deployment
return r.createDeployment(ctx, database)
}
return ctrl.Result{}, err
}

// Update status
database.Status.ReadyReplicas = deployment.Status.ReadyReplicas
if err := r.Status().Update(ctx, database); err != nil {
return ctrl.Result{}, err
}

return ctrl.Result{RequeueAfter: time.Minute * 5}, nil
}

Operator Patterns

  1. Stateless Operator: Manages resources without persistent state
  2. Stateful Operator: Maintains state about managed resources
  3. Leader Election: Runs only one instance in HA setup
  4. Finalizers: Clean up resources before deletion

Example: Database Operator with Finalizer

func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// ... existing code ...

// Handle deletion
if !database.ObjectMeta.DeletionTimestamp.IsZero() {
if containsString(database.ObjectMeta.Finalizers, "database.finalizer") {
// Perform cleanup
if err := r.cleanupDatabase(ctx, database); err != nil {
return ctrl.Result{}, err
}

// Remove finalizer
database.ObjectMeta.Finalizers = removeString(database.ObjectMeta.Finalizers, "database.finalizer")
if err := r.Update(ctx, database); err != nil {
return ctrl.Result{}, err
}
}
return ctrl.Result{}, nil
}

// Add finalizer if not present
if !containsString(database.ObjectMeta.Finalizers, "database.finalizer") {
database.ObjectMeta.Finalizers = append(database.ObjectMeta.Finalizers, "database.finalizer")
if err := r.Update(ctx, database); err != nil {
return ctrl.Result{}, err
}
}

// ... rest of reconciliation logic ...
}
  1. Prometheus Operator: Manages Prometheus, Alertmanager, and related components
  2. Elasticsearch Operator: Automates Elasticsearch cluster management
  3. PostgreSQL Operator: Manages PostgreSQL clusters
  4. Strimzi Operator: Runs Apache Kafka on Kubernetes
  5. Cert-Manager: Automates certificate management

Operator Maturity Levels

  • Level 1: Basic installation
  • Level 2: Seamless upgrades
  • Level 3: Full lifecycle API
  • Level 4: Auto-pilot (automatic healing, scaling)
  • Level 5: Autonomous operations

Helm Package Management

Helm Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Helm Client │───▶│ Helm Repo │───▶│ Chart Repo │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Kubernetes API │◀───│ Tiller │◀───│ Chart (tar) │
└─────────────────┘ └─────────────────┘ └─────────────────┘

Creating a Helm Chart

Chart Structure

my-chart/
├── Chart.yaml # Chart metadata
├── values.yaml # Default values
├── values.schema.json # Values schema validation
├── charts/ # Dependency charts
├── templates/ # Manifest templates
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ ├── _helpers.tpl # Template helpers
│ └── NOTES.txt # Installation notes
└── tests/ # Test templates
└── test-connection.yaml

Chart.yaml

apiVersion: v2
name: my-app
description: A Helm chart for my application
type: application
version: 0.1.0
appVersion: "1.16.0"
keywords:
- web
- nginx
home: https://github.com/myorg/my-app
sources:
- https://github.com/myorg/my-app
maintainers:
- name: John Doe
email: john@example.com
dependencies:
- name: redis
version: "^14.0.0"
repository: "https://charts.bitnami.com/bitnami"

Template with Values

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "my-app.fullname" . }}
labels:
{{- include "my-app.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
{{- include "my-app.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "my-app.selectorLabels" . | nindent 8 }}
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 80
protocol: TCP
env:
{{- range $key, $value := .Values.env }}
- name: {{ $key }}
value: {{ $value | quote }}
{{- end }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}

Values.yaml

# Default values for my-app.
replicaCount: 1

image:
repository: nginx
pullPolicy: IfNotPresent
tag: ""

env:
NODE_ENV: production
DEBUG: "false"

service:
type: ClusterIP
port: 80

ingress:
enabled: false
className: ""
annotations: {}
hosts:
- host: chart-example.local
paths:
- path: /
pathType: ImplementationSpecific

resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi

autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 80

Advanced Helm Features

Template Functions

# Using built-in functions
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
value: {{ .Release.Namespace | quote }}
- name: CONFIG_HASH
value: {{ .Values.config | toYaml | sha256sum | trunc 8 | quote }}

Conditionals and Loops

# Conditional inclusion
{{- if .Values.ingress.enabled -}}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ include "my-app.fullname" . }}
annotations:
{{- toYaml .Values.ingress.annotations | nindent 4 }}
spec:
# ... ingress spec
{{- end }}

# Loop through volumes
volumes:
{{- range .Values.volumes }}
- name: {{ .name }}
persistentVolumeClaim:
claimName: {{ .claimName }}
{{- end }}

Named Templates

# templates/_helpers.tpl
{{- define "my-app.labels" -}}
helm.sh/chart: {{ include "my-app.chart" . }}
{{ include "my-app.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

Helm Repository Management

Creating a Helm Repository

# Package chart
helm package my-chart/

# Create index
helm repo index .

# Serve with nginx
docker run -v $(pwd):/usr/share/nginx/html -p 8080:80 nginx

Adding and Using Repositories

# Add repository
helm repo my-repo https://my-repo.example.com/charts

# Update repositories
helm repo update

# Install chart
helm install my-release my-repo/my-chart

# Search charts
helm search repo my-repo

Helm 3 Features

  • No Tiller component
  • Client-side only
  • Improved security
  • Better library support
  • Helm tests
  • OCI registry support

Service Mesh

Service Mesh Architecture

┌─────────────────────────────────────────────────────────────────┐
│ Service Mesh │
├─────────────────────┬─────────────────────┬─────────────────────┤
│ Control Plane │ Data Plane │ Application │
│ │ │ │
│ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┐ │
│ │ Istiod │ │ │ Envoy │ │ │ Micro- │ │
│ │ (Pilot) │ │ │ Proxy │ │ │ service │ │
│ │ │ │ │ │ │ │ │ │
│ └─────────────┘ │ └─────────────┘ │ └─────────────┘ │
│ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┐ │
│ │ Citadel │ │ │ Envoy │ │ │ Micro- │ │
│ │ (mTLS) │ │ │ Proxy │ │ │ service │ │
│ │ │ │ │ │ │ │ │ │
│ └─────────────┘ │ └─────────────┘ │ └─────────────┘ │
│ ┌─────────────┐ │ │ │
│ │ Galley │ │ │ │
│ │ (Config) │ │ │ │
│ │ │ │ │ │
│ └─────────────┘ │ │ │
└─────────────────────┴─────────────────────┴─────────────────────┘

Istio Installation and Configuration

Install Istio

# Download Istio
curl -L https://istio.io/downloadIstio | sh -

# Install demo profile
istioctl install --set profile=demo -y

# Enable sidecar injection
kubectl label namespace default istio-injection=enabled

Gateway and VirtualService

# Gateway
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: myapp-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- myapp.example.com

# VirtualService
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp.example.com
gateways:
- myapp-gateway
http:
- match:
- headers:
user-agent:
regex: ".*Mobile.*"
route:
- destination:
host: myapp
subset: v2
- route:
- destination:
host: myapp
subset: v1

DestinationRule

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: myapp
spec:
host: myapp
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 100
maxRequestsPerConnection: 10
outlierDetection:
consecutiveGatewayErrors: 5
interval: 30s
baseEjectionTime: 30s
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2

Service Mesh Features

Traffic Management

  • Request routing
  • Load balancing
  • Retries and timeouts
  • Fault injection
  • Mirroring traffic

Security

  • mTLS between services
  • RBAC for services
  • JWT validation
  • Audit logging

Observability

  • Distributed tracing
  • Metrics collection
  • Access logging
  • Custom metrics

Linkerd Service Mesh

Install Linkerd

# Install CLI
curl -sL https://run.linkerd.io/install | sh

# Install on cluster
linkerd install | kubectl apply -f -

# Check installation
linkerd check

# Inject mesh
kubectl get deploy -o yaml | linkerd inject - | kubectl apply -f -

Linkerd Service Profile

apiVersion: linkerd.io/v1alpha2
kind: ServiceProfile
metadata:
name: myapp.default.svc.cluster.local
spec:
routes:
- name: GET /api/users
condition:
method: GET
path: /api/users
- name: POST /api/users
condition:
method: POST
path: /api/users
retryBudget:
retryRatio: 0.2
minRetriesPerSecond: 10
ttl: 10s

Service Mesh Best Practices

  1. Start with monitoring before enabling policies
  2. Use incremental rollout
  3. Monitor performance impact
  4. Implement proper mTLS key rotation
  5. Use service profiles for optimization

Serverless and Knative

Knative Architecture

┌─────────────────────────────────────────────────────────────────┐
│ Knative │
├─────────────────────┬─────────────────────┬─────────────────────┤
│ Serving │ Eventing │ Build │
│ │ │ │
│ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┐ │
│ │ Autoscaler │ │ │ Broker │ │ │ Build │ │
│ │ │ │ │ │ │ │ │ │
│ └─────────────┘ │ └─────────────┘ │ └─────────────┘ │
│ ┌─────────────┐ │ ┌─────────────┐ │ ┌─────────────┐ │
│ │ Activator │ │ │ Trigger │ │ │ Tekton │ │
│ │ │ │ │ │ │ │ │ │
│ └─────────────┘ │ └─────────────┘ │ └─────────────┘ │
│ ┌─────────────┐ │ │ │
│ │ Queue │ │ │ │
│ │ │ │ │ │
│ └─────────────┘ │ │ │
└─────────────────────┴─────────────────────┴─────────────────────┘

Knative Serving

Install Knative Serving

# Install Serving
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.10.0/serving-core.yaml

# Install networking layer (Contour)
kubectl apply -f https://github.com/knative/net-contour/releases/download/knative-v1.10.0/contour.yaml

# Configure default networking
kubectl patch configmap/config-network \
--namespace knative-serving \
--type merge \
--patch '{"data":{"ingress.class":"contour.ingress.networking.knative.dev"}}'

Knative Service

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: hello-world
spec:
template:
spec:
containers:
- image: gcr.io/knative-samples/helloworld-go
ports:
- containerPort: 8080
env:
- name: TARGET
value: "World"

Autoscaling Configuration

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: autoscaled-service
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/min-scale: "1"
autoscaling.knative.dev/max-scale: "10"
autoscaling.knative.dev/target: "100"
autoscaling.knative.dev/target-utilization-percentage: "70"
spec:
containers:
- image: my-app:latest

Knative Eventing

Install Eventing

# Install Eventing
kubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.10.0/eventing-core.yaml

# Install Channel (MTChannelBasedBroker)
kubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.10.0/mt-channel-broker.yaml

Event Source (CronJob)

apiVersion: sources.knative.dev/v1
kind: CronJobSource
metadata:
name: cronjob-source
spec:
schedule: "*/1 * * * *"
data: '{"message": "Hello world!"}'
sink:
ref:
apiVersion: serving.knative.dev/v1
kind: Service
name: event-display

Trigger

apiVersion: eventing.knative.dev/v1
kind: Trigger
metadata:
name: my-service-trigger
spec:
broker: default
filter:
attributes:
type: dev.knative.sources.cron
subscriber:
ref:
apiVersion: serving.knative.dev/v1
kind: Service
name: my-service

Serverless Patterns

  1. Event-driven architecture
  2. API Gateway integration
  3. Stream processing
  4. Scheduled tasks
  5. Chatbots and automation

Autoscaling

Horizontal Pod Autoscaler (HPA)

Basic HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Max

HPA with Custom Metrics

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metrics-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 1
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: requests_per_second
target:
type: AverageValue
averageValue: "1000"
- type: External
external:
metric:
name: queue_messages_ready
selector:
matchLabels:
queue: "tasks"
target:
type: AverageValue
averageValue: "30"

Vertical Pod Autoscaler (VPA)

Install VPA

# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/
./hack/vpa-up.sh

VPA Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: "Deployment"
name: "my-app"
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "my-app"
minAllowed:
cpu: "100m"
memory: "100Mi"
maxAllowed:
cpu: "1"
memory: "1Gi"
controlledResources: ["cpu", "memory"]

Cluster Autoscaler

Cluster Autoscaler on GKE

# Enable cluster autoscaler
gcloud container clusters update my-cluster \
--enable-autoscaling \
--min-nodes 1 \
--max-nodes 10 \
--node-pool default-pool

Node Auto-Provisioning

# Enable node auto-provisioning
gcloud container clusters update my-cluster \
--enable-autoprovisioning \
--autoprovisioning-config-file=config.yaml

Configuration File

# config.yaml
resourceLimits:
- resourceType: 'cpu'
minimum: '4'
maximum: '100'
- resourceType: 'memory'
minimum: '4'
maximum: '1000'
autoprovisioningNodePoolDefaults:
diskSizeGb: 100
diskType: 'pd-ssd'
management:
autoRepair: true
autoUpgrade: true

KEDA (Kubernetes Event-Driven Autoscaler)

Install KEDA

# Install KEDA
kubectl apply -f https://github.com/kedacore/keda/releases/download/v2.8.0/keda.yaml

Scale based on RabbitMQ queue

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: rabbitmq-consumer
spec:
scaleTargetRef:
name: rabbitmq-consumer
pollingInterval: 30
cooldownPeriod: 300
minReplicaCount: 0
maxReplicaCount: 30
triggers:
- type: rabbitmq
metadata:
host: "amqp://user:password@rabbitmq:5672"
queueName: "myqueue"
queueLength: "20"

Scale based on Prometheus metrics

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-scaler
spec:
scaleTargetRef:
name: my-app
minReplicaCount: 1
maxReplicaCount: 20
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: http_requests_total
threshold: '100'
query: sum(rate(http_requests_total{deployment="my-app"}[2m]))

Resource Management

Resource Quotas

Namespace Quota

apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
namespace: development
spec:
hard:
pods: "10"
requests.cpu: "4"
requests.memory: "8Gi"
limits.cpu: "10"
limits.memory: "16Gi"
persistentvolumeclaims: "4"

Object Count Quota

apiVersion: v1
kind: ResourceQuota
metadata:
name: object-counts
namespace: production
spec:
hard:
configmaps: "10"
persistentvolumeclaims: "4"
replicationcontrollers: "20"
secrets: "10"
services: "10"
services.loadbalancers: "2"

Limit Ranges

Default Limits

apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: default
spec:
limits:
- default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "250m"
memory: "256Mi"
type: Container

Min/Max Constraints

apiVersion: v1
kind: LimitRange
metadata:
name: min-max-limits
namespace: production
spec:
limits:
- min:
cpu: "100m"
memory: "128Mi"
max:
cpu: "2"
memory: "4Gi"
type: Container

Quality of Service (QoS) Classes

Guaranteed QoS

apiVersion: v1
kind: Pod
metadata:
name: guaranteed-pod
spec:
containers:
- name: my-container
image: my-app:latest
resources:
limits:
cpu: "1"
memory: "1Gi"
requests:
cpu: "1"
memory: "1Gi"

Burstable QoS

apiVersion: v1
kind: Pod
metadata:
name: burstable-pod
spec:
containers:
- name: my-container
image: my-app:latest
resources:
requests:
cpu: "500m"
memory: "512Mi"

BestEffort QoS

apiVersion: v1
kind: Pod
metadata:
name: besteffort-pod
spec:
containers:
- name: my-container
image: my-app:latest

Priority Classes

Define Priority Classes

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "High priority class for critical services"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: 1000
globalDefault: true
description: "Low priority class for batch jobs"

Use Priority Class

apiVersion: apps/v1
kind: Deployment
metadata:
name: critical-service
spec:
template:
spec:
priorityClassName: high-priority
containers:
- name: critical-app
image: critical-app:latest

Pod Disruption Budget

Ensure Minimum Availability

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: my-app

Allow Maximum Disruptions

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
maxUnavailable: 1
selector:
matchLabels:
app: my-app

Multi-Cluster Management

Cluster API (CAPI)

Install Cluster API

# Install clusterctl
curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v1.3.0/clusterctl-linux-amd64 -o clusterctl
chmod +x clusterctl
sudo mv clusterctl /usr/local/bin/

# Initialize management cluster
clusterctl init --infrastructure aws

Cluster Manifest

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: my-workload-cluster
namespace: default
spec:
clusterNetwork:
pods:
cidrBlocks:
- 192.168.0.0/16
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: my-workload-cluster-control-plane
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
name: my-workload-cluster

Worker Machine Deployment

apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: my-workload-cluster-md-0
spec:
clusterName: my-workload-cluster
replicas: 3
selector:
matchLabels: {}
template:
spec:
clusterName: my-workload-cluster
version: v1.25.0
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSMachineTemplate
name: my-workload-cluster-md-0

KubeFed (Kubernetes Federation)

Install KubeFed

# Install KubeFed control plane
kubectl apply -f https://github.com/kubernetes-sigs/kubefed/releases/download/v0.9.0/kubefed-operator.yaml

# Create federation
kubefedctl join cluster1 --cluster-context cluster1 --host-cluster-context cluster1
kubefedctl join cluster2 --cluster-context cluster2 --host-cluster-context cluster1

Federated Deployment

apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
name: my-app
namespace: default
spec:
template:
metadata:
labels:
app: my-app
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: my-app:latest
placement:
clusters:
- name: cluster1
- name: cluster2
overrides:
- clusterName: cluster2
clusterOverrides:
- path: "/spec/replicas"
value: 5

Rancher Multi-Cluster Management

Install Rancher

# Install Rancher using Helm
helm repo add rancher-latest https://releases.rancher.com/server-charts/latest
helm install rancher rancher-latest/rancher \
--namespace cattle-system \
--create-namespace \
--set hostname=rancher.example.com

Multi-Cluster Application

# Cluster template
apiVersion: management.cattle.io/v3
kind: Cluster
metadata:
name: production-cluster
spec:
dockerEngine:
storageDriver: overlay2
kubernetesVersion: v1.25.0
rancherKubernetesEngineConfig:
rkeConfig:
network:
plugin: calico
services:
etcd:
snapshot: true
creation: "6h"
retention: "24h"

Cross-Cluster Service Discovery

Multi-Cluster Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: global-ingress
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/upstream-vhost: "$service_name.$namespace.svc.cluster.local"
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-app
port:
number: 80

Real-World Patterns and Use Cases

Microservices Architecture Patterns

Backend for Frontend (BFF) Pattern

# Frontend Service
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend-service
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: frontend:latest
env:
- name: API_GATEWAY_URL
value: "http://api-gateway:8080"
---
# API Gateway
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
spec:
replicas: 2
template:
spec:
containers:
- name: gateway
image: gateway:latest
env:
- name: USER_SERVICE_URL
value: "http://user-service:8081"
- name: ORDER_SERVICE_URL
value: "http://order-service:8082"

Circuit Breaker Pattern with Istio

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: payment-service
spec:
host: payment-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
connectTimeout: 30ms
tcpKeepalive:
time: 7200s
interval: 75s
outlierDetection:
consecutiveGatewayErrors: 5
interval: 30s
baseEjectionTime: 30s
maxEjectionPercent: 50

Stateful Applications

PostgreSQL with StatefulSet

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:13
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
ports:
- containerPort: 5432
name: postgres
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
livenessProbe:
exec:
command:
- pg_isready
- -U
- $(POSTGRES_USER)
- -d
- postgres
initialDelaySeconds: 30
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast-ssd
resources:
requests:
storage: 10Gi

Redis Cluster

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis-cluster
spec:
serviceName: redis-cluster
replicas: 6
podManagementPolicy: Parallel
template:
spec:
containers:
- name: redis
image: redis:6.2-alpine
command:
- redis-server
- /etc/redis/redis.conf
- --cluster-enabled
- --cluster-config-file
- /data/nodes.conf
- --cluster-node-timeout
- "5000"
- --appendonly
- "yes"
- --protected-mode
- "no"
ports:
- containerPort: 6379
name: client
- containerPort: 16379
name: gossip
volumeMounts:
- name: data
mountPath: /data
- name: config
mountPath: /etc/redis
volumes:
- name: config
configMap:
name: redis-cluster-config
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi

Batch Processing and ETL Workflows

CronJob for Daily Reports

apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-report
spec:
schedule: "0 2 * * *" # Run at 2 AM daily
jobTemplate:
spec:
template:
spec:
containers:
- name: report-generator
image: report-generator:latest
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
- name: REPORT_DATE
value: "$(date -d 'yesterday' +%Y-%m-%d)"
resources:
requests:
cpu: "100m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "1Gi"
restartPolicy: OnFailure
concurrencyPolicy: Forbid

Argo Workflows

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: etl-pipeline
spec:
entrypoint: etl-pipeline
volumes:
- name: workdir
persistentVolumeClaim:
claimName: etl-workspace
templates:
- name: etl-pipeline
dag:
tasks:
- name: extract
template: extract-data
- name: transform
template: transform-data
dependencies: [extract]
- name: load
template: load-data
dependencies: [transform]

- name: extract-data
script:
image: python:3.9
command: [python]
source: |
import requests
import pandas as pd

# Extract data from API
response = requests.get("https://api.example.com/data")
data = response.json()
df = pd.DataFrame(data)
df.to_csv("/work/raw_data.csv", index=False)
print("Data extracted successfully")
volumeMounts:
- name: workdir
mountPath: /work

- name: transform-data
script:
image: python:3.9
command: [python]
source: |
import pandas as pd

# Transform data
df = pd.read_csv("/work/raw_data.csv")
df['processed_date'] = pd.Timestamp.now()
df.to_parquet("/work/processed_data.parquet")
print("Data transformed successfully")
volumeMounts:
- name: workdir
mountPath: /work

- name: load-data
script:
image: python:3.9
command: [python]
source: |
import pandas as pd
import psycopg2

# Load to database
conn = psycopg2.connect(
host="postgres",
database="analytics",
user="analytics",
password="password"
)

df = pd.read_parquet("/work/processed_data.parquet")
# Load logic here
print("Data loaded successfully")
volumeMounts:
- name: workdir
mountPath: /work

Machine Learning Workloads

Kubeflow Pipeline

apiVersion: kubeflow.org/v1beta1
kind: KFPipeline
metadata:
name: ml-training-pipeline
spec:
pipelineSpec:
pipelines:
- name: training-pipeline
components:
- name: data-preprocessing
implementation:
container:
image: data-prep:latest
command: ["python", "/app/preprocess.py"]
inputs:
artifacts:
- name: raw-data
path: /data/raw
outputs:
artifacts:
- name: processed-data
path: /data/processed

- name: model-training
implementation:
container:
image: trainer:latest
command: ["python", "/app/train.py"]
inputs:
artifacts:
- name: training-data
path: /data/processed
outputs:
artifacts:
- name: model
path: /model
parameters:
- name: accuracy
valueFrom:
path: /accuracy.txt

dependencies:
model-training:
after: [data-preprocessing]

GPU-enabled Training Job

apiVersion: batch/v1
kind: Job
metadata:
name: model-training
spec:
template:
spec:
containers:
- name: trainer
image: nvidia/cuda:11.3.1-base
command: ["python", "train.py"]
resources:
limits:
nvidia.com/gpu: 2
memory: "16Gi"
requests:
nvidia.com/gpu: 2
memory: "16Gi"
volumeMounts:
- name: dataset
mountPath: /data
- name: models
mountPath: /models
volumes:
- name: dataset
persistentVolumeClaim:
claimName: dataset-pvc
- name: models
persistentVolumeClaim:
claimName: models-pvc
nodeSelector:
cloud.google.com/gke-accelerator: nvidia-tesla-t4
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule

Edge Computing and IoT

Edge Device Pattern

# Edge Controller Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: edge-controller
spec:
replicas: 3
template:
spec:
containers:
- name: controller
image: edge-controller:latest
env:
- name: DEVICE_REGISTRY_URL
value: "http://device-registry:8080"
- name: MESSAGE_BROKER_URL
value: "mqtt://mqtt-broker:1883"
resources:
limits:
cpu: "500m"
memory: "1Gi"
---
# Device Registry StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: device-registry
spec:
serviceName: device-registry
replicas: 1
template:
spec:
containers:
- name: registry
image: device-registry:latest
ports:
- containerPort: 8080
volumeMounts:
- name: data
mountPath: /data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi

Hybrid and Multi-Cloud Deployments

Hybrid Cloud with Cluster API

# Cloud cluster template
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: cloud-cluster
spec:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AWSCluster
name: cloud-cluster
---
# On-premises cluster template
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: onprem-cluster
spec:
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereCluster
name: onprem-cluster

Multi-Cloud Ingress Controller

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: global-app
annotations:
kubernetes.io/ingress.class: "global-ingress"
global-ingress/load-balancer: "multi-cloud"
global-ingress/health-check: "/health"
spec:
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app-service
port:
number: 80

GitOps Workflows

Argo CD

Install Argo CD

# Install Argo CD
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Access Argo CD UI
kubectl port-forward svc/argocd-server -n argocd 8080:443

Argo CD Application

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/my-app.git
targetRevision: HEAD
path: kubernetes/manifests
destination:
server: https://kubernetes.default.svc
namespace: my-app
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- Validate=false
- PrunePropagationPolicy=foreground
- PruneLast=true
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas

Argo CD App of Apps Pattern

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/infrastructure.git
targetRevision: HEAD
path: argocd/apps
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true

Flux CD

Install Flux

# Install Flux CLI
curl -s https://toolkit.fluxcd.io/install.sh | sudo bash

# Bootstrap Flux
flux bootstrap github \
--owner=myorg \
--repository=infrastructure \
--path=clusters/my-cluster \
--personal

Flux Kustomization

apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: my-app
namespace: flux-system
spec:
interval: 5m
path: "./apps/my-app/overlays/production"
prune: true
validation: client
healthChecks:
- kind: Deployment
name: my-app
namespace: my-app
force: false

Flux HelmRelease

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: my-app
namespace: flux-system
spec:
interval: 5m
chart:
spec:
chart: my-app
version: "1.0.0"
sourceRef:
kind: HelmRepository
name: my-repo
namespace: flux-system
values:
replicaCount: 3
image:
tag: "1.0.0"
resources:
limits:
cpu: 500m
memory: 512Mi

GitOps Best Practices

  1. Structure your repository properly
infrastructure/
├── clusters/
│ ├── prod/
│ │ ├── flux-system/
│ │ └── apps/
│ └── staging/
│ ├── flux-system/
│ └── apps/
└── apps/
├── my-app/
│ ├── base/
│ └── overlays/
│ ├── production/
│ └── staging/
└── another-app/
  1. Use sealed secrets for sensitive data
# Install Sealed Secrets
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.0/sealedsecret-controller.yaml

# Seal a secret
kubeseal --raw --from-file=secret.txt > sealed-secret.yaml
  1. Implement proper CI/CD pipelines
# GitHub Actions workflow
name: Deploy to Kubernetes
on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Validate manifests
run: |
kubectl apply -f kubernetes/ --dry-run=client

- name: Setup kubeconfig
run: |
mkdir -p $HOME/.kube
echo "${{ secrets.KUBECONFIG }}" | base64 -d > $HOME/.kube/config

- name: Deploy to cluster
if: github.ref == 'refs/heads/main'
run: |
kubectl apply -f kubernetes/
  1. Monitor deployments and rollbacks
# Argo CD Rollback
argocd app rollback my-app --revision HEAD~1

# Flux Rollback
flux suspend kustomization my-app
git revert HEAD
flux resume kustomization my-app

Troubleshooting

Common Issues and Solutions

Pod Issues

# Check pod status
kubectl get pods

# Describe pod for details
kubectl describe pod <pod-name>

# View pod logs
kubectl logs <pod-name>

# View logs from previous instance
kubectl logs <pod-name> --previous

Networking Issues

# Check service endpoints
kubectl get endpoints <service-name>

# Test DNS resolution
kubectl run -it --rm dnsutils --image=tutum/dnsutils -- nslookup <service-name>

# Check network policies
kubectl get networkpolicy

Debugging Tools

kubectl debug

# Debug a running pod
kubectl debug <pod-name> --image=busybox --target=container-name

# Copy files from pod
kubectl cp <pod-name>:/path/to/file ./local-file

# Port forwarding
kubectl port-forward <pod-name> 8080:80

CI/CD & GitOps Integration

Argo CD Application

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: guestbook
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/argoproj/argocd-example-apps.git
targetRevision: HEAD
path: guestbook
destination:
server: https://kubernetes.default.svc
namespace: guestbook
syncPolicy:
automated:
prune: true
selfHeal: true

GitOps with Flux

apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
name: apps
namespace: flux-system
spec:
interval: 5m0s
path: "./apps/production"
prune: true
sourceRef:
kind: GitRepository
name: flux-system

Real-World Workflows

Microservices Architecture

# Frontend Service
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: myapp/frontend:v1.0
env:
- name: BACKEND_URL
value: "http://backend-service:8080"
ports:
- containerPort: 3000

Stateful Application (Database)

# MySQL StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
spec:
serviceName: mysql
replicas: 1
template:
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
volumeMounts:
- name: mysql-persistent-storage
mountPath: /var/lib/mysql
volumeClaimTemplates:
- metadata:
name: mysql-persistent-storage
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi

ML Workload with GPU

apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-training
spec:
replicas: 1
template:
spec:
containers:
- name: training
image: tensorflow/tensorflow:latest-gpu
resources:
limits:
nvidia.com/gpu: 1

Best Practices

Development Best Practices

Use ConfigMaps and Secrets

# Externalize configuration
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: app-secrets

Health Checks

# Always include health checks
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10

readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5

Production Best Practices

Resource Limits

resources:
limits:
cpu: "1"
memory: "1Gi"
requests:
cpu: "0.5"
memory: "512Mi"

Pod Disruption Budgets

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: my-app

Security Best Practices

Non-root User

securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000

Read-only Filesystem

securityContext:
readOnlyRootFilesystem: true

Monitoring Best Practices

Structured Logging

env:
- name: LOG_FORMAT
value: json

Custom Metrics

# Expose custom metrics endpoint
ports:
- name: metrics
containerPort: 9090

Resources & Next Steps

Official Documentation

Learning Resources

Tools and Ecosystem

  • Helm: Package manager for Kubernetes
  • Kustomize: Native Kubernetes configuration management
  • Istio: Service mesh
  • Prometheus: Monitoring and alerting
  • Grafana: Visualization and dashboards
  • Jaeger: Distributed tracing
  • Argo CD: GitOps continuous delivery
  • Flux CD: GitOps toolkit

Community and Support

Next Steps

  1. Practice with Minikube or Kind for local development
  2. Deploy a simple application to understand the basics
  3. Explore advanced features like operators and service mesh
  4. Join the community and contribute to open source projects
  5. Consider certification to validate your knowledge

Kubernetes has revolutionized how we deploy and manage containerized applications at scale. By following these practices and leveraging its powerful features, teams can achieve reliable, scalable, and efficient deployments across hybrid and multi-cloud environments.