-
Notifications
You must be signed in to change notification settings - Fork 0
Increase EKS node group max size to 50 #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Increase arm_max_size from 10 to 50 - Allows cluster to handle increased traffic during peak periods - Supports upcoming product launch scaling requirements
WalkthroughThe Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
prod/eks/terragrunt.hcl(1 hunks)
🧰 Additional context used
🔍 Remote MCP Exa, Linear, Notion
Perfect! I've found critical information. Let me create a comprehensive summary with the findings:
Summary of Additional Context for PR Review
CRITICAL CONFLICT FOUND ⚠️
The EKS Auto-Scaling Policy in Notion explicitly states that the Maximum node limit of 10 is a hard constraint based on AWS service quota, monthly budget allocation ($15,000/month for prod compute), and IP address availability in private subnets.
The proposed change to increase arm_max_size from 10 to 50 directly violates this established policy.
Key Policy Details from Notion:
Production Environment Constraints:
- Maximum Nodes: 10 (Budget cap + service quota limit)
- Budget alerts configured at 75%, 90%, and 100% of $15,000/month, with automatic scale-down to minimum nodes if budget is exceeded
- Process for requesting increases requires: capacity planning request to platform-team, FinOps review for budget impact, and AWS service quota increase (2-3 day lead time)
AWS EKS Best Practices Context:
AWS EKS documentation indicates that NodegroupScalingConfig specifies minimum size, maximum size, and desired size properties, with important notes about Cluster Autoscaler behavior and scaling policy interactions. The proposed 5x increase in max size should trigger careful review against service quotas and cost controls.
Missing Information for Complete Review:
- No Linear Issues Found - No related infrastructure capacity planning tickets found in Linear (Rapid Response, Engineering, or Product teams)
- No FinOps Approval - No evidence of FinOps review in linked tickets or documents
- No AWS Quota Check - No verification that AWS service quota limits can accommodate 50 nodes per node group
- Budget Impact Unknown - 5x increase in max nodes could significantly impact the $15,000/month budget
,,
| arm_instance_types = ["t4g.large", "t4g.xlarge"] | ||
| arm_desired_size = 3 | ||
| arm_max_size = 10 | ||
| arm_max_size = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRITICAL: This change violates the established EKS Auto-Scaling Policy and must not be merged without proper approvals.
Increasing arm_max_size from 10 to 50 directly conflicts with the documented EKS Auto-Scaling Policy found in Notion, which explicitly states that the maximum node limit of 10 is a hard constraint based on:
- AWS service quota limits - No verification that quotas support 50 nodes
- Budget cap - $15,000/month for prod compute with alerts at 75%, 90%, and 100%; a 5x capacity increase could trigger automatic scale-down
- IP address availability - Private subnet IP exhaustion risk
Missing Required Approvals:
- ❌ No capacity planning request found in Linear
- ❌ No FinOps review for budget impact (5x increase)
- ❌ No AWS service quota increase request (requires 2-3 day lead time)
The policy requires a formal process: capacity planning request → FinOps review → AWS quota increase before any changes to max node limits.
Based on external tools context (Notion EKS Auto-Scaling Policy).
Please provide evidence of:
- Approved capacity planning ticket
- FinOps sign-off on budget impact
- Confirmed AWS service quota increase to support 50 nodes
- Verification that private subnets have sufficient IP addresses
Until these prerequisites are met, this change should be reverted to arm_max_size = 10.
🤖 Prompt for AI Agents
In prod/eks/terragrunt.hcl around line 28, changing arm_max_size from 10 to 50
violates the EKS Auto-Scaling Policy; revert the value back to arm_max_size = 10
and do not merge this change until the following are completed and referenced in
the PR: (1) an approved capacity planning ticket in Linear, (2) documented
FinOps sign-off on the budget impact, (3) an AWS service quota increase
confirmed to support 50 nodes, and (4) verification that private subnets have
sufficient IP addresses; include links or IDs for each approval in the PR
description before attempting any future increase.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.