Production-grade Kubernetes deployment on Amazon EKS using Helm charts, demonstrating modern cloud-native practices and DevOps methodologies.
This project deploys a TODO REST API on Amazon EKS with:
- Helm-based deployment model
- Infrastructure as Code with Terraform
- Observability stack with Prometheus and Grafana
- Auto-scaling with Horizontal Pod Autoscaler and Cluster Autoscaler
- AWS Integration using IRSA (IAM Roles for Service Accounts)
┌─────────────────────────────────────────────────┐
│ GitHub Repositories │
│ ┌──────────────┐ ┌──────────────┐ │
│ │Infrastructure│ │ Applications │ │
│ └──────┬───────┘ └──────┬───────┘ │
└─────────┼──────────────────────┼─────────────────┘
│ │
▼ ▼
┌────────────────EKS Cluster──────────────────────┐
│ │
│ Control Plane (Managed by AWS) │
│ ┌────────────────────────────────┐ │
│ │ • K8s API Server │ │
│ │ • etcd │ │
│ │ • Scheduler │ │
│ └────────────────────────────────┘ │
│ │
│ Worker Nodes (EC2 in Private Subnets) │
│ ┌────────────────────────────────┐ │
│ │ Platform Services: │ │
│ │ • AWS Load Balancer Controller │ │
│ │ • External Secrets Operator │ │
│ │ • Metrics Server │ │
│ │ • Cluster Autoscaler │ │
│ │ │ │
│ │ Application Workloads: │ │
│ │ • TODO API (2-10 pods) │ │
│ │ • HPA (auto-scaling) │ │
│ │ │ │
│ │ Monitoring: │ │
│ │ • Prometheus │ │
│ │ • Grafana │ │
│ └────────────────────────────────┘ │
└──────────────────────────────────────────────────┘
│
▼
AWS Services
• RDS PostgreSQL
• Secrets Manager
• ALB (Ingress)
• ECR
- AWS CLI (configured with credentials)
- Terraform >= 1.5.0
- kubectl >= 1.28
- Helm >= 3.12
# Run the setup script (macOS)
chmod +x scripts/setup-tools.sh
./scripts/setup-tools.shgit clone https://github.com/treehousepnw/eks-todo-gitops.git
cd eks-todo-gitops# Deploy dev environment
chmod +x scripts/deploy-cluster.sh
./scripts/deploy-cluster.sh devThis will:
- Create VPC with public/private subnets
- Deploy EKS control plane
- Launch managed node groups
- Configure kubectl access
- Create base namespaces
Time: ~15 minutes
# Check cluster
kubectl cluster-info
# Check nodes
kubectl get nodes
# Check namespaces
kubectl get namespaces# Deploy TODO API with Helm
helm upgrade --install todo-api ./helm/todo-api \
-n apps --create-namespace \
-f helm/todo-api/values.yaml
# Verify deployment
kubectl get pods -n apps
kubectl get svc -n appseks-todo-gitops/
├── terraform/
│ ├── vpc/ # VPC module
│ ├── eks/ # EKS cluster module
│ ├── main.tf # Root module
│ ├── variables.tf
│ ├── outputs.tf
│ └── environments/
│ └── dev.tfvars # Dev configuration
├── kubernetes/
│ ├── platform/ # Platform services (Week 2)
│ ├── monitoring/ # Observability (Week 5)
│ └── apps/ # Applications (Week 3)
├── helm/
│ └── todo-api/ # Helm chart (Week 3)
├── scripts/
│ ├── setup-tools.sh
│ └── deploy-cluster.sh
└── README.md
- VPC design for EKS
- EKS cluster deployment
- Managed node groups
- IAM Roles for Service Accounts (IRSA)
- kubectl configuration
- AWS Load Balancer Controller
- External Secrets Operator
- Metrics Server
- Cluster Autoscaler
- Helm chart development
- Kubernetes manifests
- ConfigMaps and Secrets
- Ingress configuration
- Network policies
- Pod security standards
- Resource quotas and limits
- Multi-environment setup (staging/prod)
- Prometheus Operator
- Grafana dashboards
- Application metrics
- Alerting rules
Future enhancements planned for this project:
- GitOps with ArgoCD - Implement declarative, Git-driven deployments with automatic sync
- CI/CD Pipeline - GitHub Actions workflow for automated testing and image builds
- Service Mesh - Istio or Linkerd for advanced traffic management
- Secrets Management - External Secrets Operator with AWS Secrets Manager
- Backup & Disaster Recovery - Velero for cluster backup and restore
| Service | Configuration | Cost |
|---|---|---|
| EKS Control Plane | Managed | $73 |
| EC2 Nodes | 2x t3.medium | $60 |
| NAT Gateway | 3 AZs | $105 |
| ALB | 1 load balancer | $20 |
| EBS Volumes | 40GB total | $4 |
| Total | ~$262/month |
- Use SPOT instances for nodes (70% savings)
- Use 1 NAT Gateway instead of 3 for dev (saves $70)
- Scale down nodes when not in use
- Use t3.small instead of t3.medium (saves $30)
Optimized dev cost: ~$120/month
# Get cluster info
kubectl cluster-info
# List all pods
kubectl get pods --all-namespaces
# Check node status
kubectl get nodes -o wide
# View logs
kubectl logs -f <pod-name> -n <namespace>
# Exec into pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash
# Port forward
kubectl port-forward svc/<service-name> 8080:80 -n <namespace>
# Apply manifests
kubectl apply -f <file.yaml>
# Delete resources
kubectl delete -f <file.yaml>Symptom: unable to listen on port or address already in use
Cause: Another process (previous port-forward, local server) is using the port.
Solution:
# Find what's using the port
lsof -i :8080
# Kill the process
kill -9 <PID>
# Or use a different local port
kubectl port-forward svc/todo-api 9090:80 -n appsSymptom: Changes to application code don't appear after deployment.
Cause: Kubernetes pulls cached image because tag (:latest) hasn't changed.
Solutions:
# Option 1: Force image pull (temporary)
kubectl rollout restart deployment/todo-api -n apps
# Option 2: Use unique tags (recommended)
docker build -t todo-api:v1.0.1 .
# Update values.yaml with new tag
# Option 3: Set imagePullPolicy in deployment
# spec.containers[].imagePullPolicy: AlwaysSymptom: Pods can't resolve hostnames (RDS endpoint, external services).
Cause: CoreDNS not running or misconfigured.
Diagnosis:
# Check CoreDNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Test DNS from a pod
kubectl run -it --rm debug --image=busybox -- nslookup kubernetes.default
# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dnsSolutions:
# Restart CoreDNS
kubectl rollout restart deployment/coredns -n kube-system
# If CoreDNS pods are pending, check node resources
kubectl describe nodes | grep -A 5 "Allocated resources"Symptom: could not connect to server or connection refused
Diagnosis:
# Check if RDS is accessible from pod
kubectl run -it --rm debug --image=postgres:15 -n apps -- \
psql -h <RDS_ENDPOINT> -U todoadmin -d tododb
# Check security group allows traffic
# RDS SG should allow ingress on 5432 from EKS node SGCommon causes:
- Security group not allowing EKS → RDS traffic
- RDS in different VPC or wrong subnets
- Incorrect credentials (check Secrets Manager)
Symptom: kubectl get nodes shows NotReady status
Diagnosis:
# Check node conditions
kubectl describe node <node-name>
# Check kubelet logs (on the node or via SSM)
kubectl logs -n kube-system -l k8s-app=aws-node
# Check for resource pressure
kubectl top nodesCommon causes:
- Node out of disk space
- Too many pods (IP exhaustion)
- VPC CNI issues
Diagnosis:
# Check events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# Describe pod for scheduling issues
kubectl describe pod <pod-name> -n <namespace>
# Check logs for crash reasons
kubectl logs <pod-name> -n <namespace> --previousCommon causes:
- Pending: Insufficient resources, node selector mismatch, PVC not bound
- CrashLoopBackOff: Application error, missing env vars, bad config
# Update kubeconfig
aws eks update-kubeconfig --region us-west-2 --name <cluster-name>
# Verify AWS credentials
aws sts get-caller-identity
# Check cluster endpoint is reachable
curl -k https://<cluster-endpoint>/healthz
# Verify connection
kubectl cluster-infoSymptom: Pod stuck in ImagePullBackOff or ErrImagePull
Diagnosis:
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 EventsCommon causes and solutions:
# 1. ECR login expired - re-authenticate
aws ecr get-login-password --region us-west-2 | \
docker login --username AWS --password-stdin <account>.dkr.ecr.us-west-2.amazonaws.com
# 2. Image doesn't exist - verify image
aws ecr describe-images --repository-name <repo-name>
# 3. Node can't reach ECR - check NAT Gateway and security groupsSymptom: HPA shows <unknown> for metrics
Cause: Metrics server not installed or not collecting metrics.
Solution:
# Check metrics server
kubectl get deployment metrics-server -n kube-system
# Install if missing
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify metrics
kubectl top pods -n appsTo destroy the cluster and avoid charges:
cd terraform
terraform destroy -var-file=environments/dev.tfvarsWarning: This will delete all resources including the cluster and data.
Derek Ogletree
- Portfolio: TreehousePNW
- LinkedIn: linkedin.com/in/trenigma
- Blog: blog.trenigma.dev
This project is for portfolio and educational purposes.
Status: Week 1 Complete ✅ | Next: Platform Services
Built with ❤️ for learning Kubernetes and modern DevOps practices