Skip to content

Conversation

@kvaps
Copy link
Member

@kvaps kvaps commented Jan 26, 2026

Summary

Extends the rotate-ca command with automatic cluster node discovery and post-rotation configuration updates:

  • Auto-discover control plane and worker nodes from Kubernetes API
  • Auto-set --control-plane-nodes, --worker-nodes, --output, and --k8s-endpoint flags
  • Post-processing after successful rotation:
    • Update secrets.yaml with new CA certificates from cluster
    • Update secrets.encrypted.yaml if exists
    • Update talosconfig.encrypted if exists
    • Update kubeconfig via talm kubeconfig

Changes

  • Validate that only one endpoint/node is provided (rotate-ca requires single control plane node)
  • Use WithClientNoNodes for COSI requests to avoid multi-node proxying issues
  • Disable --with-docs and --with-examples by default
  • Update help text and add usage examples

Test plan

  • Test auto-discovery of cluster nodes
  • Test secrets.yaml update after rotation
  • Test encrypted files update (talosconfig.encrypted, secrets.encrypted.yaml, kubeconfig.encrypted)
  • Test validation error when multiple endpoints provided

Summary by CodeRabbit

  • Enhancements

    • rotate-ca now auto-discovers cluster nodes, derives endpoints and output paths, invokes original rotate flow, and updates CA certs across configs after rotation.
    • Post-rotation can optionally re-encrypt secrets, refresh Talos config, and refresh kubeconfig automatically.
    • Command help/examples updated and docs/examples flags disabled by default.
  • Bug Fixes

    • Init flow now fails reliably if Talos config decryption/re-encryption fails.
  • Behavior Changes

    • kubeconfig flag handling tightened to respect explicit flag presence.

✏️ Tip: You can customize this high-level summary in your review settings.

- Auto-discover control plane and worker nodes from Kubernetes API
- Auto-set --control-plane-nodes and --worker-nodes flags
- Auto-set --output to project talosconfig path
- Auto-set --k8s-endpoint from --endpoints flag
- Post-processing after rotation:
  - Update secrets.yaml with new CA from cluster
  - Update talosconfig.encrypted if exists
  - Update kubeconfig via talm kubeconfig
- Validate only one endpoint/node is provided
- Use endpoint as target node for COSI requests
- Disable --with-docs and --with-examples by default
- Update help text and examples

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Jan 26, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Reworks the rotate-ca wrapper to accept and invoke the original RunE, adds pre-rotation auto-discovery (project root, kubeconfig, endpoints, cluster nodes), performs the original rotation, and runs post-rotation updates (secrets, optional re-encryption, talosconfig and kubeconfig refresh). Tightens talosconfig init error handling.

Changes

Cohort / File(s) Summary
rotate-ca handler implementation
pkg/commands/rotate_ca_handler.go
Rewrites wrapRotateCACommand to accept originalRunE and replace PreRunE with a RunE flow. Adds pre-rotation detection (project root, talos-provided kubeconfig, endpoint derivation, cluster node discovery) and post-rotation steps: update secrets.yaml from cluster CAs, optionally write secrets.encrypted.yaml, re-encrypt talosconfig.encrypted, and optionally refresh talos kubeconfig. Adds helpers: discoverClusterNodes, getKubeconfigFromTalos, updateKubeconfigEndpoint, updateSecretsFromCluster, updateTalosconfigEncryption, runKubeconfigCmd. Expands imports (context, net/path, YAML, k8s client, Talos APIs, crypto/safety libs).
wrapper call site
pkg/commands/talosctl_wrapper.go
Updates wrapRotateCACommand signature and invocation to pass the original RunE (wrapRotateCACommand(wrappedCmd, originalRunE)) and tightens kubeconfig --force flag handling by checking flag presence before setting.
init command behavior
pkg/commands/init.go
Changes control flow to return an error if handleTalosconfigEncryption(false) fails when talosconfig.encrypted exists but talosconfig is missing (previously the error was ignored).

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as rotate-ca CLI
    participant Wrapper as Pre/Post Wrapper
    participant TalosAPI as Talos API
    participant KubeAPI as Kubernetes API
    participant Original as Original rotate-ca Logic
    participant FS as Filesystem

    User->>CLI: run rotate-ca
    CLI->>Wrapper: enter wrapper (with originalRunE)
    Wrapper->>FS: detect project root & output path
    Wrapper->>TalosAPI: fetch kubeconfig
    TalosAPI-->>Wrapper: kubeconfig
    Wrapper->>KubeAPI: list nodes / derive endpoint
    KubeAPI-->>Wrapper: node list (control-plane / workers)
    Wrapper->>Original: invoke originalRunE (perform CA rotation)
    Original-->>Wrapper: rotation complete
    Wrapper->>KubeAPI: fetch cluster CA certs
    KubeAPI-->>Wrapper: CA certs
    Wrapper->>FS: update secrets.yaml and optional encrypted files
    Wrapper->>CLI: optionally run kubeconfig refresh command
    FS-->>User: updated configs and secrets
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped in to wrap the rotate-ca flow,
Found roots and endpoints where configs grow.
I fetched kubeconfigs, polled nodes with delight,
Rewrote secrets and sealed them up tight.
A rabbit's small hop to keep the cluster right.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(rotate-ca): add auto-discovery and post-processing' directly and clearly describes the main changes in the pull request.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kvaps, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the rotate-ca command by introducing automated cluster node discovery and comprehensive post-rotation configuration management. It streamlines the CA rotation process by automatically identifying cluster nodes and updating relevant local configuration files such as secrets.yaml, talosconfig.encrypted, and kubeconfig to reflect the new CA certificates, thereby reducing manual intervention and potential errors.

Highlights

  • Automated Node Discovery: The rotate-ca command now automatically discovers control plane and worker nodes from the Kubernetes API, eliminating the need for manual specification.
  • Automatic Flag Configuration: Flags such as --control-plane-nodes, --worker-nodes, --output, and --k8s-endpoint are now automatically set based on discovered cluster information, streamlining command execution.
  • Comprehensive Post-Rotation Updates: After a successful CA rotation, the command performs several post-processing steps, including updating secrets.yaml with new CA certificates, re-encrypting secrets.encrypted.yaml and talosconfig.encrypted if they exist, and updating the kubeconfig via talm kubeconfig.
  • COSI Request Handling Improvement: Implemented WithClientNoNodes for COSI requests to prevent multi-node proxying issues when fetching CA certificates from the cluster.
  • Default Flag Changes: The --with-docs and --with-examples flags are now disabled by default for the rotate-ca command, simplifying its usage.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@kvaps kvaps marked this pull request as ready for review January 26, 2026 12:58
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the rotate-ca command by adding auto-discovery of cluster nodes and post-rotation processing to update local configuration files. The changes are well-structured and add valuable automation. My review focuses on a few key areas to improve correctness and maintainability: ensuring the correct node is targeted for API calls, improving context propagation, and refactoring for code clarity and robustness. I've identified a critical issue where a specific control-plane node is not being used for COSI requests, which could lead to failures when a VIP is in use.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@pkg/commands/rotate_ca_handler.go`:
- Around line 172-182: The targetNode value extracted in rotate_ca_handler.go is
never actually used inside updateSecretsFromCluster (which currently calls
WithClientNoNodes); either remove the unused parameter from
updateSecretsFromCluster and its call site (delete targetNode variable and stop
passing it), or modify updateSecretsFromCluster to accept and use targetNode by
replacing WithClientNoNodes with a client initializer that sets the node(s)
(e.g., create the Talos/Kubernetes/COSI client with the targetNode in its
context) so the COSI requests are directed to the specified node; update all
references to the updateSecretsFromCluster signature accordingly.
- Around line 131-134: The StringSlice flags are being set with
cmd.Flags().Set(...) which stores the whole comma-separated string as one
element (affecting later GetStringSlice() use and cpNodes[0]); change the flag
population to use the FlagSet.SetSlice API so each element is set separately —
obtain the pflag.FlagSet via cmd.Flags().(*pflag.FlagSet) and call
SetSlice("control-plane-nodes", cpNodes) and, when len(wNodes) > 0,
SetSlice("worker-nodes", wNodes); alternatively, call cmd.Flags().Set(...) once
per node string if SetSlice is not available.
🧹 Nitpick comments (3)
pkg/commands/rotate_ca_handler.go (3)

148-158: Consider reusing normalizeEndpoint from talosctl_wrapper.go.

This endpoint normalization logic (stripping protocol, extracting host, adding https:// and :6443) is duplicated from normalizeEndpoint() in talosctl_wrapper.go. Consider extracting to a shared helper to reduce duplication.

♻️ Suggested refactor
 		// Set --k8s-endpoint from GlobalArgs.Endpoints
 		if !cmd.Flags().Changed("k8s-endpoint") && len(GlobalArgs.Endpoints) > 0 {
-			host := GlobalArgs.Endpoints[0]
-			host = strings.TrimPrefix(host, "https://")
-			host = strings.TrimPrefix(host, "http://")
-			if h, _, err := net.SplitHostPort(host); err == nil {
-				host = h
-			}
-			k8sEndpoint := fmt.Sprintf("https://%s:6443", host)
+			k8sEndpoint := normalizeEndpoint(GlobalArgs.Endpoints[0])
 			cmd.Flags().Set("k8s-endpoint", k8sEndpoint)
 		}

282-305: Additional duplication of endpoint normalization logic.

This is the third occurrence of the endpoint normalization pattern (lines 289-296). Strongly consider extracting this to a shared helper function.

♻️ Extract shared helper

Create a helper function that can be reused across all three locations:

// extractHost extracts the host from an endpoint, stripping protocol and port.
func extractHost(endpoint string) string {
	host := endpoint
	host = strings.TrimPrefix(host, "https://")
	host = strings.TrimPrefix(host, "http://")
	if h, _, err := net.SplitHostPort(host); err == nil {
		host = h
	}
	return host
}

Then normalizeEndpoint and these functions can use extractHost.


394-404: Potential state mutation on shared command object.

Setting cmd.Flags().Set("force", "true") modifies the global Commands slice entry. If this command is run multiple times or in tests, the flag will remain set. This is likely fine for CLI usage but worth noting.

Additionally, the loop could break early after finding the command to avoid unnecessary iterations.

♻️ Minor optimization
 func runKubeconfigCmd() error {
 	for _, cmd := range Commands {
 		if cmd.Name() == "kubeconfig" {
 			// Set --force to avoid interactive prompt
 			cmd.Flags().Set("force", "true")
 			return cmd.RunE(cmd, []string{})
 		}
 	}
 	return fmt.Errorf("kubeconfig command not found")
 }

The current implementation is actually fine since return exits the loop. No change needed.

- Check total endpoints + nodes count in rotate-ca validation
- Propagate decryption errors in init.go instead of ignoring them
- Check flag existence with Lookup() before calling Set()
- Handle all Set() errors properly

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Andrei Kvapil <[email protected]>
@kvaps kvaps force-pushed the feat/rotate-ca-auto-discovery branch from 595e6b0 to f68e14b Compare January 26, 2026 13:40
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@pkg/commands/rotate_ca_handler.go`:
- Around line 401-415: runKubeconfigCmd currently calls cmd.RunE directly which
bypasses the command's PreRunE setup (e.g., wrapTalosCommand’s initialization);
change the invocation to use the command lifecycle so PreRunE runs first —
either call cmd.Execute() on the located "kubeconfig" command (preferred) or
explicitly invoke cmd.PreRunE(cmd, []string{}) and handle its error before
calling cmd.RunE(cmd, []string{}); keep the existing logic that sets the "force"
flag and propagate any errors from PreRunE/Execute/RunE.
♻️ Duplicate comments (4)
pkg/commands/rotate_ca_handler.go (4)

238-242: Use command context instead of context.Background().

Using context.Background() prevents proper cancellation and timeout handling. The command's context should be propagated from the RunE function.

♻️ Propagate context from command

Update discoverClusterNodes signature to accept context:

-func discoverClusterNodes() (controlPlane []string, workers []string, err error) {
+func discoverClusterNodes(ctx context.Context) (controlPlane []string, workers []string, err error) {

Update the call site in RunE:

-		cpNodes, wNodes, err := discoverClusterNodes()
+		cpNodes, wNodes, err := discoverClusterNodes(cmd.Context())

Use the passed context:

-	nodes, err := clientset.CoreV1().Nodes().List(context.Background(), metav1.ListOptions{})
+	nodes, err := clientset.CoreV1().Nodes().List(ctx, metav1.ListOptions{})

403-413: Command lookup by iteration is inefficient.

Searching for the kubeconfig command by iterating through the Commands slice on each call is inefficient. Consider creating a lookup map keyed by command name during initialization.


130-137: StringSlice flag setting with comma-joined strings does not work as intended.

Cobra's Set() method treats the entire comma-separated string as a single value rather than splitting it. When cpNodes is ["10.0.0.1", "10.0.0.2"], calling Set("control-plane-nodes", "10.0.0.1,10.0.0.2") stores it as a single element ["10.0.0.1,10.0.0.2"].

This causes cpNodes[0] on line 183 to be "10.0.0.1,10.0.0.2" instead of "10.0.0.1", breaking the targetNode logic.

🐛 Recommended fix: Call Set() for each node individually
-		if err := cmd.Flags().Set("control-plane-nodes", strings.Join(cpNodes, ",")); err != nil {
-			return fmt.Errorf("failed to set control-plane-nodes: %w", err)
-		}
-		if len(wNodes) > 0 {
-			if err := cmd.Flags().Set("worker-nodes", strings.Join(wNodes, ",")); err != nil {
-				return fmt.Errorf("failed to set worker-nodes: %w", err)
-			}
-		}
+		for _, node := range cpNodes {
+			if err := cmd.Flags().Set("control-plane-nodes", node); err != nil {
+				return fmt.Errorf("failed to set control-plane-nodes: %w", err)
+			}
+		}
+		for _, node := range wNodes {
+			if err := cmd.Flags().Set("worker-nodes", node); err != nil {
+				return fmt.Errorf("failed to set worker-nodes: %w", err)
+			}
+		}

179-189: The targetNode parameter is extracted but unused in updateSecretsFromCluster.

targetNode is passed to updateSecretsFromCluster (line 187) but inside that function, WithClientNoNodes is used (line 326) which doesn't configure the client to use the target node. The COSI requests may fail or connect to the wrong node if the endpoint is a VIP.

🔧 Option: Set endpoint to targetNode before COSI calls
 func updateSecretsFromCluster(updateTalos, updateKubernetes bool, targetNode string) error {
 	secretsPath := ResolveSecretsPath(Config.TemplateOptions.WithSecrets)

 	// Load existing secrets
 	bundle, err := secrets.LoadBundle(secretsPath)
 	if err != nil {
 		return fmt.Errorf("failed to load secrets bundle: %w", err)
 	}

+	// Temporarily set endpoint to the target node for COSI requests
+	originalEndpoints := GlobalArgs.Endpoints
+	if targetNode != "" {
+		GlobalArgs.Endpoints = []string{targetNode}
+	}
+	defer func() { GlobalArgs.Endpoints = originalEndpoints }()
+
 	// Use WithClientNoNodes to avoid automatic node setting
 	err = WithClientNoNodes(func(ctx context.Context, c *client.Client) error {
🧹 Nitpick comments (1)
pkg/commands/rotate_ca_handler.go (1)

74-82: Consider handling Set() errors consistently.

The errors from Set() for --with-docs and --with-examples are ignored with blank identifiers. While unlikely to fail for these optional boolean flags, this is inconsistent with the careful error handling elsewhere in this PR.

🔧 Suggested improvement
 	if f := wrappedCmd.Flags().Lookup("with-docs"); f != nil {
 		f.DefValue = "false"
-		_ = wrappedCmd.Flags().Set("with-docs", "false")
+		if err := wrappedCmd.Flags().Set("with-docs", "false"); err != nil {
+			// Log or ignore - these are optional flags
+		}
 	}
 	if f := wrappedCmd.Flags().Lookup("with-examples"); f != nil {
 		f.DefValue = "false"
-		_ = wrappedCmd.Flags().Set("with-examples", "false")
+		if err := wrappedCmd.Flags().Set("with-examples", "false"); err != nil {
+			// Log or ignore - these are optional flags
+		}
 	}

@kvaps kvaps merged commit c558560 into main Jan 26, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants