Skip to content

Conversation

@averyjennings
Copy link

@averyjennings averyjennings commented Dec 15, 2025

  • Increase arm_max_size from 10 to 50
  • Allows cluster to handle increased traffic during peak periods
  • Supports upcoming product launch scaling requirements

Summary by CodeRabbit

  • Chores
    • Increased maximum ARM-based node group capacity from 10 to 50 nodes in production environment, enhancing scalability for workload distribution.

✏️ Tip: You can customize this high-level summary in your review settings.

- Increase arm_max_size from 10 to 50
- Allows cluster to handle increased traffic during peak periods
- Supports upcoming product launch scaling requirements
@coderabbitai
Copy link

coderabbitai bot commented Dec 15, 2025

Walkthrough

The prod/eks/terragrunt.hcl file was updated to increase the maximum size limit of the ARM (Gravitron) node group from 10 to 50 instances, adjusting the upper bound for horizontal scaling capacity.

Changes

Cohort / File(s) Summary
ARM Node Group Scaling Configuration
prod/eks/terragrunt.hcl
Updated arm_max_size parameter from 10 to 50, raising the maximum node count for ARM-based instances

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

  • Single parameter modification with clear intent
  • Configuration change only—no logic or logic dependencies affected
  • Verify the new limit (50) aligns with infrastructure capacity and cost projections

Poem

🐰 A config tweak, both wise and spry,
Fifty ARM nodes now touch the sky!
From ten to fifty, scaling grows,
Where Graviton's swift power flows!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: increasing the EKS node group max size from 10 to 50, which aligns with the file changes and PR objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/increase-eks-node-limits

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 34c3d06 and 67ef734.

📒 Files selected for processing (1)
  • prod/eks/terragrunt.hcl (1 hunks)
🧰 Additional context used
🔍 Remote MCP Exa, Linear, Notion

Perfect! I've found critical information. Let me create a comprehensive summary with the findings:

Summary of Additional Context for PR Review

CRITICAL CONFLICT FOUND ⚠️

The EKS Auto-Scaling Policy in Notion explicitly states that the Maximum node limit of 10 is a hard constraint based on AWS service quota, monthly budget allocation ($15,000/month for prod compute), and IP address availability in private subnets.

The proposed change to increase arm_max_size from 10 to 50 directly violates this established policy.

Key Policy Details from Notion:

Production Environment Constraints:

  • Maximum Nodes: 10 (Budget cap + service quota limit)
  • Budget alerts configured at 75%, 90%, and 100% of $15,000/month, with automatic scale-down to minimum nodes if budget is exceeded
  • Process for requesting increases requires: capacity planning request to platform-team, FinOps review for budget impact, and AWS service quota increase (2-3 day lead time)

AWS EKS Best Practices Context:

AWS EKS documentation indicates that NodegroupScalingConfig specifies minimum size, maximum size, and desired size properties, with important notes about Cluster Autoscaler behavior and scaling policy interactions. The proposed 5x increase in max size should trigger careful review against service quotas and cost controls.

Missing Information for Complete Review:

  1. No Linear Issues Found - No related infrastructure capacity planning tickets found in Linear (Rapid Response, Engineering, or Product teams)
  2. No FinOps Approval - No evidence of FinOps review in linked tickets or documents
  3. No AWS Quota Check - No verification that AWS service quota limits can accommodate 50 nodes per node group
  4. Budget Impact Unknown - 5x increase in max nodes could significantly impact the $15,000/month budget

,,

arm_instance_types = ["t4g.large", "t4g.xlarge"]
arm_desired_size = 3
arm_max_size = 10
arm_max_size = 50
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: This change violates the established EKS Auto-Scaling Policy and must not be merged without proper approvals.

Increasing arm_max_size from 10 to 50 directly conflicts with the documented EKS Auto-Scaling Policy found in Notion, which explicitly states that the maximum node limit of 10 is a hard constraint based on:

  1. AWS service quota limits - No verification that quotas support 50 nodes
  2. Budget cap - $15,000/month for prod compute with alerts at 75%, 90%, and 100%; a 5x capacity increase could trigger automatic scale-down
  3. IP address availability - Private subnet IP exhaustion risk

Missing Required Approvals:

  • ❌ No capacity planning request found in Linear
  • ❌ No FinOps review for budget impact (5x increase)
  • ❌ No AWS service quota increase request (requires 2-3 day lead time)

The policy requires a formal process: capacity planning request → FinOps review → AWS quota increase before any changes to max node limits.

Based on external tools context (Notion EKS Auto-Scaling Policy).

Please provide evidence of:

  1. Approved capacity planning ticket
  2. FinOps sign-off on budget impact
  3. Confirmed AWS service quota increase to support 50 nodes
  4. Verification that private subnets have sufficient IP addresses

Until these prerequisites are met, this change should be reverted to arm_max_size = 10.

🤖 Prompt for AI Agents
In prod/eks/terragrunt.hcl around line 28, changing arm_max_size from 10 to 50
violates the EKS Auto-Scaling Policy; revert the value back to arm_max_size = 10
and do not merge this change until the following are completed and referenced in
the PR: (1) an approved capacity planning ticket in Linear, (2) documented
FinOps sign-off on the budget impact, (3) an AWS service quota increase
confirmed to support 50 nodes, and (4) verification that private subnets have
sufficient IP addresses; include links or IDs for each approval in the PR
description before attempting any future increase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants