Comprehensive container security monitoring covering the entire lifecycle from build to runtime
- Problem Statement & Solution
- Key Features & Capabilities
- Architecture Overview
- Technology Stack
- Prerequisites & Requirements
- Quick Start Guide
- Terraform Modules Structure
- CI/CD Pipeline Setup
- API Documentation
- Troubleshooting Guide
- Contributing Guidelines
- Security Development Practices
- Roadmap & Future Plans
- Cost Considerations
- Contact
WatchDog tackles critical container security challenges that organizations face in production environments:
- Problem: Manual vulnerability scanning leads to vulnerable images in production
- Solution: Automated Trivy scanning in GitHub Actions with build-breaking on HIGH/CRITICAL vulnerabilities
- Problem: Limited visibility into container behavior after deployment
- Solution: Runtime Security and active threat detection with Falco
- Problem: Security teams overwhelmed by manual processes
- Solution: Automated threat detection and response (Human in the loop system).
- Vulnerability Scanning: Trivy integration scanning for CRITICAL and HIGH severity vulnerabilities
- Secure Base Images: Multi-stage Docker builds with minimal attack surface
- Registry Security: ECR integration with automatic image scanning on push
- Build-Breaking Security: CI/CD pipeline fails on security violations
- Custom Security Rules: S3-based rule management with automated updates
- Real-time Alerting: CloudWatch Logs integration for security event collection to slack Channels via webhooks.
- Host Monitoring: Comprehensive coverage of ECS instances
- Performance Monitoring: Resource usage and performance metrics with grafana.
- Secure AWS Deployment: Production-ready AWS architecture with security best practices
- Network Segmentation: VPC with proper security groups and network isolation
- Identity & Access Management: Least-privilege IAM roles and policies
- SSL/TLS Termination: ALB with ACM certificate integration
- HTTPS Redirect: Automatic HTTP to HTTPS redirection
- Domain-based Deployment: Support for custom domain names
- Container Insights: Deep visibility into ECS cluster and task performance
- Health Monitoring: Application and infrastructure health checks
- Centralized Logging: Structured logging with centralized collection.
- Custom Metrics: Security-focused metrics and dashboards
- Dual-level Scaling: Both ECS service tasks and EC2 instances auto-scale
- Self-healing: Automatic replacement of unhealthy containers
- Infrastructure as Code: Complete infrastructure automation with Terraform
- Git Integration: Git-based deployment and configuration management
- CPU Target Tracking: Configurable CPU utilization targets
- Instance Lifecycle Management: Automatic instance replacement with configurable lifetime
- Lightweight Python web framework serving security APIs
- Health check endpoints for monitoring
- Container orchestration with automatic scaling
- Capacity providers for efficient resource utilization
- Service auto-scaling based on CPU metrics
- High availability with multi-AZ deployment
- Health checks and automatic failover
- SSL/TLS termination
- Private container registry with vulnerability scanning
- Integration with CI/CD for automated deployments
- Network isolation with public/private subnets
- Security groups with least-privilege access
- SSH access point in public subnet for administrative tasks
- Secure gateway for accessing private resources
- Integrated with security groups for controlled access
- Lambda function that tags instances with
ToBeInspectedon threat detection from that instance.
- Internet Boundary: ALB with public access, security groups filtering traffic
- Application Boundary: ECS tasks in private network, only ALB access allowed
- Management Boundary: IAM roles with least privilege, no root access
- Python 3.12: Latest Python with enhanced security features
- FastAPI: Modern, high-performance web framework
- Uvicorn: ASGI server with excellent performance
- AWS ECS: Container orchestration and management
- AWS ECR: Private container registry with security scanning
- AWS ALB: Application load balancing and SSL termination
- AWS VPC: Network isolation and security
- AWS IAM: Identity and access management
- AWS CloudWatch: Monitoring, logging, and alerting
- Terraform: Infrastructure provisioning and management
- Modular Architecture: Reusable Terraform modules for scalability
- State Management: Remote state storage (S3 + DynamoDB in development)
- GitHub Actions: Automated CI/CD pipelines
- Docker: Containerization with multi-stage and least privilege builds
- Trivy: Comprehensive vulnerability scanner
- Falco: Runtime security monitoring with threat detection and response
- Security Scanning: Automated security checks in pipeline
- CloudWatch Logs: Centralized logging for Falco security events
- S3: Secure storage for Falco custom rules with versioning
- Slack Integration: Falco log alerts sent to slack channel
- Grafana: Visualize EC2/Task metrics and resource usages.
- Lambda: Tags instances that have emmited falco alerts.
- AWS Account: With programmatic access
- Docker: Version 20.10 or later
- Terraform: Version 1.0 or later
- Python: Version 3.12 or later
- Git: For version control and CI/CD
- Slack: Slack workspace, and webhook url. Check out this post by fortum-tech on how to create one.
- SSH Key Pair: For EC2 instance access (create in AWS Console)
- AWS CLI: Configured with appropriate credentials
- Make: For running development tasks (recommended)
- MFA Enabled: Multi-factor authentication on AWS account
- Secure Credential Storage: Use AWS IAM roles or secure credential management
- Network Security: Understand VPC and security group implications
- Domain & SSL Certificate: Valid domain with ACM certificate for HTTPS
- Local IP Configuration: Your public IP address for bastion host access
- Enabled IAM Identity Center
- User in the Identity Center wih access to resources
git clone https://github.com/yourusername/watchdog.git
cd watchdogcd terraform
# Create terraform.tfvars file
cp terraform.example.tfvars terraform.tfvars
# Edit with your values
nano terraform.tfvarsterraform init
terraform apply -target=module.ecr
# Get ALB DNS name from Terraform output
terraform output alb_dns_name# Get ECR login token
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin your-account.dkr.ecr.us-east-1.amazonaws.com
# Build and push (or use GitHub Actions)
docker build -t watchdog:latest .
docker tag watchdog:latest your-account.dkr.ecr.us-east-1.amazonaws.com/watchdog-repo:latest
docker push your-account.dkr.ecr.us-east-1.amazonaws.com/watchdog-repo:latestterraform plan
terraform apply # Test the application
curl -kv https://your-alb-dns-name/api# Deploy Falco rules using the dedicated workflow
git add falco/custom_rules.yaml
git commit -m "Update Falco rules"
git push origin main
# Or deploy manually to S3
aws s3 cp falco/custom_rules.yaml s3://your-falco-bucket/custom_rules.yaml- Check ECR vulnerability scan results in AWS Console
- Monitor Falco logs in CloudWatch
- Review security group configurations
- Validate IAM role permissions
- Verify automated threat detection
- Set the
your-local-ipvariable interraform.tfvars. - SSH into an ecs host from the bastion and perform a prohibited event (e.g
sudo -i). - Wait a while, that instance should now be tagged as
ToBeInspected.
- Set the
-
Enable IAM Identity Center (SSO)
- Go to IAM Identity Center in AWS Console → Enable.
- Note the User Portal URL (e.g,
https://my-sso.awsapps.com/start).
-
Add Users / Groups
- In IAM Identity Center, create users or groups (e.g., GrafanaAdmins).
- Assign them to the Grafana workspace as ADMIN / EDITOR / VIEWER.
-
Add CloudWatch Data Source in Grafana(Requires ADMIN priviledges)
- Log into Grafana via the SSO User Portal.
- Go to Configuration → Data sources → Add data source.
- Select CloudWatch → set Default region (your ECS region).
- Save & Test → should return
Data source is working.
-
Import Dashboards
- Go to Dashboards → Import.
- You can import from grafana.com
- Select CloudWatch data source.
- Save or customize if needed
The infrastructure is organized into reusable modules:
terraform/
├── main.tf # Root module orchestration
├── variables.tf # Root variables
├── outputs.tf # Root outputs
├── provider.tf # AWS provider configuration
├── modules/
├── bastion/ # Bastion host for secure access
├── grafana/ # Amazon Managed Grafana workspace
├── falco/ # Falco security monitoring setup
├── lambda/ # Auto tag instance with "ToBeInspected"
├── vpc/ # Network infrastructure
├── security_group/ # Security groups
├── iam/ # IAM roles and policies
├── ecr/ # Container registry
├── ecs_cluster/ # ECS cluster and launch template
├── ecs_service/ # ECS service and task definition
├── alb/ # Application load balancer
└── asg/ # Auto scaling group
Configure the following secrets in your GitHub repository:
AWS_ACCESS_KEY: Your AWS access key
AWS_SECRET_KEY: Your AWS secret key
AWS_REGION: Your AWS region (e.g., us-east-1)
ECR_REPOSITORY: Your ECR repository name (e.g., watchdog-repo)
FALCO_BUCKT_NAME: Falco Bucket Name from terraform ouput
The CI/CD pipeline (.github/workflows/ci.yml) performs:
- Security Scanning: Trivy scans for vulnerabilities
- Build Breaking: Fails on HIGH/CRITICAL vulnerabilities
- Image Building: Creates optimized Docker image
- Registry Push: Pushes to ECR with automatic deployment
The Trivy scanner configuration:
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@0.28.0
with:
scan-type: 'fs'
format: 'table'
exit-code: '1' # Fail build on vulnerabilities
ignore-unfixed: true # Ignore unfixed vulnerabilities
severity: 'CRITICAL,HIGH' # Only fail on critical/high
timeout: '5m'The Falco workflow (.github/workflows/falco_workflow.yml) performs:
- Rule Validation: Ensures custom rules are properly formatted(Not Implemented yet).
- S3 Deployment: Uploads rules to dedicated S3 bucket
- Automatic Updates: EC2 instances pull updated rules via cron job
Basic health check endpoint.
Request:
# Health check with HTTPS
curl https://yourdomain.com/api
# Or using ALB DNS directly
curl -kv https://your-alb-dns-name/apiResponse:
{
"Hello from Project Watchdog!"
}Response Codes:
200 OK: Application is healthy500 Internal Server Error: Application error
# Get important deployment information
terraform output curl_testing_app_url # curl cmd for testing
terraform output ecr_repo_name # ECR repository name
terraform output falco_bucket_name # S3 bucket for Falco rules
terraform output bastion_ssh_command # SSH command for bastion access
terraform output grafana_workspace_endpoint # Endpoint to view grafana dashboardIssue: "Error creating ECR repository"
Error: Error creating ECR repository: InvalidParameterException: Repository name "watchdog-repo" already existsSolution:
# Check existing repositories
aws ecr describe-repositories
# Either delete existing repo or change name in variables
terraform destroy -target=module.ecr
terraform applyIssue: "No available subnets for ALB"
Error: ALB requires at least 2 subnets in different AZsSolution:
- Ensure your AWS region has at least 2 availability zones
- Check VPC module creates subnets in different AZs
- Verify subnet configurations in
terraform/modules/vpc/main.tf
Issue: ALB showing unhealthy targets
# Check target group health
aws elbv2 describe-target-health --target-group-arn <target-group-arn>Common Causes:
- Wrong Health Check Path: ALB checks
/api, app serves different path - Security Group Rules: Traffic blocked between ALB and ECS
- Container Not Binding: App not listening on 0.0.0.0:8000
- Startup Time: App takes longer than health check grace period
Solution:
# 1. Verify health check configuration
aws elbv2 describe-target-groups --names watchdog-tg
# 2. Test health check path directly
# SSH to EC2 instance and curl localhost:container-port/api
# 3. Check security groups
aws ec2 describe-security-groups --group-names watchdog-sg watchdog-alb-sg-
Local Container Storage Limitations
- Issue: No persistent storage for development
- Workaround: Use external databases or mock services
- Solution: Add Docker Compose with persistent volumes
-
Resource Constraints on t2.micro
- Issue: Limited CPU/memory on free tier instances
- Workaround: Use t3.small or higher for production
- Solution: Implement resource monitoring and right-sizing
Issue: "Certificate not found for domain"
Error: No certificate found for domain yourdomain.comSolution:
- Ensure ACM certificate exists in the same region
- Verify domain validation is complete
- Check certificate status in AWS Console
Issue: "Access denied to S3 bucket"
Solution:
- Verify IAM roles have S3 access permissions
- -Check bucket name matches terraform output
- Ensure GitHub Actions secrets are configured correctly
- Fork the Repository
- Create Feature Branch:
git checkout -b feature/your-feature-name - Make Changes: Follow coding standards
- Run Tests: Ensure all tests pass
- Security Scan: Run local security scans
- Submit Pull Request: Include description and tests
- Input Validation: Validate all user inputs
- Error Handling: Don't expose sensitive information in errors
- Dependency Management: Regular security updates
- Secret Management: Never commit secrets to code
- Least Privilege: Minimal required permissions
- Runtime Security Monitoring: Falco rules for container behavior analysis
- Infrastructure Security: Bastion host for controlled administrative access
- SSL/TLS Encryption: End-to-end encrypted communications
- Automated threat response
- State Management: Terraform remote state with S3 + DynamoDB(In progress)
Compute Costs:
- ECS Tasks: No additional charge (pay for EC2 instances)
- EC2 Instances: Primary cost driver
- t2.micro: $8.50/month (free tier eligible)
- t3.small: $15.00/month (better performance)
- t3.medium: $30.00/month (production recommended)
Storage Costs:
- ECR Storage: $0.10/GB/month
- EBS Volumes: $0.10/GB/month (GP2)
- CloudWatch Logs: $0.50/GB/month (after free tier)
Network Costs:
- ALB: $0.0225/hour ($16.20/month) + $0.008/LCU
- Data Transfer: $0.09/GB (after 1GB free)
Monitoring Costs:
- Amazon Managed Grafana: 90 days free trial(Max 5 Users)
- $9.00/Admin/month(After free tier)
- $5.00/Viewer/month(After free tier)
Instance Selection:
# Development
instance_type = "t2.micro" # $8.50/month, free tier eligible
# Staging
instance_type = "t3.small" # $15.00/month, good for testing
# Production
instance_type = "t3.medium" # $30.00/month, production ready- EC2: 750 hours of t2.micro instances per month
- ELB: 750 hours of Application Load Balancer
- ECR: 500MB of storage per month
- CloudWatch: 5GB of log data per month
- Data Transfer: 1GB per month
Built with ❤️ by Ofor David Tochukwu Open to feedback and collaboration. My Email: davidoforincloud@gmail.com

