Skip to content

treehousepnw/eks-todo-gitops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EKS TODO API with GitOps

AWS Terraform Docker Kubernetes PostgreSQL Helm Status

Production-grade Kubernetes deployment on Amazon EKS using Helm charts, demonstrating modern cloud-native practices and DevOps methodologies.

Project Overview

This project deploys a TODO REST API on Amazon EKS with:

  • Helm-based deployment model
  • Infrastructure as Code with Terraform
  • Observability stack with Prometheus and Grafana
  • Auto-scaling with Horizontal Pod Autoscaler and Cluster Autoscaler
  • AWS Integration using IRSA (IAM Roles for Service Accounts)

Architecture

┌─────────────────────────────────────────────────┐
│              GitHub Repositories                 │
│  ┌──────────────┐     ┌──────────────┐         │
│  │Infrastructure│     │ Applications │         │
│  └──────┬───────┘     └──────┬───────┘         │
└─────────┼──────────────────────┼─────────────────┘
          │                      │
          ▼                      ▼
┌────────────────EKS Cluster──────────────────────┐
│                                                  │
│  Control Plane (Managed by AWS)                 │
│  ┌────────────────────────────────┐            │
│  │ • K8s API Server                │            │
│  │ • etcd                          │            │
│  │ • Scheduler                     │            │
│  └────────────────────────────────┘            │
│                                                  │
│  Worker Nodes (EC2 in Private Subnets)         │
│  ┌────────────────────────────────┐            │
│  │ Platform Services:              │            │
│  │ • AWS Load Balancer Controller │            │
│  │ • External Secrets Operator    │            │
│  │ • Metrics Server               │            │
│  │ • Cluster Autoscaler           │            │
│  │                                 │            │
│  │ Application Workloads:          │            │
│  │ • TODO API (2-10 pods)         │            │
│  │ • HPA (auto-scaling)           │            │
│  │                                 │            │
│  │ Monitoring:                     │            │
│  │ • Prometheus                    │            │
│  │ • Grafana                       │            │
│  └────────────────────────────────┘            │
└──────────────────────────────────────────────────┘
          │
          ▼
    AWS Services
    • RDS PostgreSQL
    • Secrets Manager
    • ALB (Ingress)
    • ECR

Prerequisites

Tools Required

  • AWS CLI (configured with credentials)
  • Terraform >= 1.5.0
  • kubectl >= 1.28
  • Helm >= 3.12

Quick Setup

# Run the setup script (macOS)
chmod +x scripts/setup-tools.sh
./scripts/setup-tools.sh

Quick Start

1. Clone and Setup

git clone https://github.com/treehousepnw/eks-todo-gitops.git
cd eks-todo-gitops

2. Deploy EKS Cluster

# Deploy dev environment
chmod +x scripts/deploy-cluster.sh
./scripts/deploy-cluster.sh dev

This will:

  • Create VPC with public/private subnets
  • Deploy EKS control plane
  • Launch managed node groups
  • Configure kubectl access
  • Create base namespaces

Time: ~15 minutes

3. Verify Cluster

# Check cluster
kubectl cluster-info

# Check nodes
kubectl get nodes

# Check namespaces
kubectl get namespaces

4. Deploy the Application

# Deploy TODO API with Helm
helm upgrade --install todo-api ./helm/todo-api \
  -n apps --create-namespace \
  -f helm/todo-api/values.yaml

# Verify deployment
kubectl get pods -n apps
kubectl get svc -n apps

Project Structure

eks-todo-gitops/
├── terraform/
│   ├── vpc/              # VPC module
│   ├── eks/              # EKS cluster module
│   ├── main.tf           # Root module
│   ├── variables.tf
│   ├── outputs.tf
│   └── environments/
│       └── dev.tfvars    # Dev configuration
├── kubernetes/
│   ├── platform/         # Platform services (Week 2)
│   ├── monitoring/       # Observability (Week 5)
│   └── apps/             # Applications (Week 3)
├── helm/
│   └── todo-api/         # Helm chart (Week 3)
├── scripts/
│   ├── setup-tools.sh
│   └── deploy-cluster.sh
└── README.md

Learning Outcomes

Week 1 (Current): EKS Foundation ✅

  • VPC design for EKS
  • EKS cluster deployment
  • Managed node groups
  • IAM Roles for Service Accounts (IRSA)
  • kubectl configuration

Week 2: Platform Services

  • AWS Load Balancer Controller
  • External Secrets Operator
  • Metrics Server
  • Cluster Autoscaler

Week 3: Application Deployment

  • Helm chart development
  • Kubernetes manifests
  • ConfigMaps and Secrets
  • Ingress configuration

Week 4: Production Hardening

  • Network policies
  • Pod security standards
  • Resource quotas and limits
  • Multi-environment setup (staging/prod)

Week 5: Observability

  • Prometheus Operator
  • Grafana dashboards
  • Application metrics
  • Alerting rules

What's Next

Future enhancements planned for this project:

  • GitOps with ArgoCD - Implement declarative, Git-driven deployments with automatic sync
  • CI/CD Pipeline - GitHub Actions workflow for automated testing and image builds
  • Service Mesh - Istio or Linkerd for advanced traffic management
  • Secrets Management - External Secrets Operator with AWS Secrets Manager
  • Backup & Disaster Recovery - Velero for cluster backup and restore

Cost Breakdown

Dev Environment (Monthly)

Service Configuration Cost
EKS Control Plane Managed $73
EC2 Nodes 2x t3.medium $60
NAT Gateway 3 AZs $105
ALB 1 load balancer $20
EBS Volumes 40GB total $4
Total ~$262/month

Cost Optimization Tips

  • Use SPOT instances for nodes (70% savings)
  • Use 1 NAT Gateway instead of 3 for dev (saves $70)
  • Scale down nodes when not in use
  • Use t3.small instead of t3.medium (saves $30)

Optimized dev cost: ~$120/month

Common Commands

# Get cluster info
kubectl cluster-info

# List all pods
kubectl get pods --all-namespaces

# Check node status
kubectl get nodes -o wide

# View logs
kubectl logs -f <pod-name> -n <namespace>

# Exec into pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash

# Port forward
kubectl port-forward svc/<service-name> 8080:80 -n <namespace>

# Apply manifests
kubectl apply -f <file.yaml>

# Delete resources
kubectl delete -f <file.yaml>

Troubleshooting Guide

Port Conflicts with kubectl port-forward

Symptom: unable to listen on port or address already in use

Cause: Another process (previous port-forward, local server) is using the port.

Solution:

# Find what's using the port
lsof -i :8080

# Kill the process
kill -9 <PID>

# Or use a different local port
kubectl port-forward svc/todo-api 9090:80 -n apps

Docker Image Caching Issues

Symptom: Changes to application code don't appear after deployment.

Cause: Kubernetes pulls cached image because tag (:latest) hasn't changed.

Solutions:

# Option 1: Force image pull (temporary)
kubectl rollout restart deployment/todo-api -n apps

# Option 2: Use unique tags (recommended)
docker build -t todo-api:v1.0.1 .
# Update values.yaml with new tag

# Option 3: Set imagePullPolicy in deployment
# spec.containers[].imagePullPolicy: Always

DNS Resolution Failures in Pods

Symptom: Pods can't resolve hostnames (RDS endpoint, external services).

Cause: CoreDNS not running or misconfigured.

Diagnosis:

# Check CoreDNS pods
kubectl get pods -n kube-system -l k8s-app=kube-dns

# Test DNS from a pod
kubectl run -it --rm debug --image=busybox -- nslookup kubernetes.default

# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns

Solutions:

# Restart CoreDNS
kubectl rollout restart deployment/coredns -n kube-system

# If CoreDNS pods are pending, check node resources
kubectl describe nodes | grep -A 5 "Allocated resources"

Database Connection Errors

Symptom: could not connect to server or connection refused

Diagnosis:

# Check if RDS is accessible from pod
kubectl run -it --rm debug --image=postgres:15 -n apps -- \
  psql -h <RDS_ENDPOINT> -U todoadmin -d tododb

# Check security group allows traffic
# RDS SG should allow ingress on 5432 from EKS node SG

Common causes:

  1. Security group not allowing EKS → RDS traffic
  2. RDS in different VPC or wrong subnets
  3. Incorrect credentials (check Secrets Manager)

Nodes Not Ready

Symptom: kubectl get nodes shows NotReady status

Diagnosis:

# Check node conditions
kubectl describe node <node-name>

# Check kubelet logs (on the node or via SSM)
kubectl logs -n kube-system -l k8s-app=aws-node

# Check for resource pressure
kubectl top nodes

Common causes:

  1. Node out of disk space
  2. Too many pods (IP exhaustion)
  3. VPC CNI issues

Pods Stuck in Pending/CrashLoopBackOff

Diagnosis:

# Check events
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# Describe pod for scheduling issues
kubectl describe pod <pod-name> -n <namespace>

# Check logs for crash reasons
kubectl logs <pod-name> -n <namespace> --previous

Common causes:

  • Pending: Insufficient resources, node selector mismatch, PVC not bound
  • CrashLoopBackOff: Application error, missing env vars, bad config

kubectl Connection Issues

# Update kubeconfig
aws eks update-kubeconfig --region us-west-2 --name <cluster-name>

# Verify AWS credentials
aws sts get-caller-identity

# Check cluster endpoint is reachable
curl -k https://<cluster-endpoint>/healthz

# Verify connection
kubectl cluster-info

Image Pull Errors (ImagePullBackOff)

Symptom: Pod stuck in ImagePullBackOff or ErrImagePull

Diagnosis:

kubectl describe pod <pod-name> -n <namespace> | grep -A 10 Events

Common causes and solutions:

# 1. ECR login expired - re-authenticate
aws ecr get-login-password --region us-west-2 | \
  docker login --username AWS --password-stdin <account>.dkr.ecr.us-west-2.amazonaws.com

# 2. Image doesn't exist - verify image
aws ecr describe-images --repository-name <repo-name>

# 3. Node can't reach ECR - check NAT Gateway and security groups

HPA Not Scaling

Symptom: HPA shows <unknown> for metrics

Cause: Metrics server not installed or not collecting metrics.

Solution:

# Check metrics server
kubectl get deployment metrics-server -n kube-system

# Install if missing
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# Verify metrics
kubectl top pods -n apps

Cleanup

To destroy the cluster and avoid charges:

cd terraform
terraform destroy -var-file=environments/dev.tfvars

Warning: This will delete all resources including the cluster and data.

Resources

Author

Derek Ogletree

License

This project is for portfolio and educational purposes.


Status: Week 1 Complete ✅ | Next: Platform Services

Built with ❤️ for learning Kubernetes and modern DevOps practices

About

Production-grade Kubernetes deployment on Amazon EKS with GitOps, demonstrating modern cloud-native practices and DevOps methodologies.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors