Skip to content

OCPBUGS-67159: openstack: Reserve addresses for load balancer#10375

Open
stephenfin wants to merge 1 commit intoopenshift:mainfrom
shiftstack:OCPBUGS-67159
Open

OCPBUGS-67159: openstack: Reserve addresses for load balancer#10375
stephenfin wants to merge 1 commit intoopenshift:mainfrom
shiftstack:OCPBUGS-67159

Conversation

@stephenfin
Copy link
Contributor

@stephenfin stephenfin commented Mar 10, 2026

When using Octavia, we use the .5 and .7 addresses on the first machine network for API and Ingress VIP, respectively. However, we were not reserving these address when creating the network. This can result in an IP collision if another service happens to allocate one of these two IPs before the load balancer is created. This can happen if DHCP agents are running on the controller/network nodes, as each agent creates its own DHCP port on the subnet.

The solution is to use allocation pools - which are designed exactly for this purpose - and configure one on the subnet we create (via CAPO). We reserve everything < .10 in case we need more addresses down the line.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed OpenStack network handling to compute and populate explicit allocation pools (start/end) for subnets while preserving CIDR and DNS nameserver settings, ensuring correct IP allocation bounds.

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 10, 2026
@openshift-ci-robot
Copy link
Contributor

@stephenfin: This pull request references Jira Issue OCPBUGS-67159, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

When using Octavia, we use the .5 and .7 addresses on the first machine network for API and Ingress VIP, respectively. However, we were not reserving these address when creating the network. This can result in an IP collision if another service happens to allocate one of these two IPs before the load balancer is created. This can happen if DHCP agents are running on the controller/network nodes, as each agent creates its own DHCP port on the subnet.

The solution is to use allocation pools - which are designed exactly for this purpose - and configure one on the subnet we create (via CAPO). We reserve everything < .10 in case we need more addresses down the line.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

Walkthrough

Modified OpenStack cluster manifest generation to compute subnet allocation pools using the go-cidr library. When no control plane port is specified, the code derives subnet CIDR, allocation start, and allocation end from the install configuration and populates AllocationPools alongside CIDR and DNSNameservers.

Changes

Cohort / File(s) Summary
OpenStack CIDR Allocation
pkg/asset/manifests/openstack/cluster.go
Added github.com/apparentlymart/go-cidr/cidr import; compute networkCIDR via capiutils.CIDRFromInstallConfig, derive allocationStart with cidr.Host() and allocationEnd via cidr.AddressRange()/cidr.Dec(); set Subnet.CIDR to networkCIDR.String() and add AllocationPools (Start/End) while retaining DNSNameservers.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: reserving addresses for a load balancer in OpenStack. It accurately reflects the primary objective of the PR, which is to configure allocation pools to reserve addresses below .10 to prevent IP collisions with VIPs.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Stable And Deterministic Test Names ✅ Passed The PR modifies pkg/asset/manifests/openstack/cluster.go but does not modify any Ginkgo test files. No Ginkgo tests are affected by this change.
Test Structure And Quality ✅ Passed The custom check for Ginkgo test structure is not applicable to this PR as no test files were modified.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/asset/manifests/openstack/cluster.go`:
- Around line 90-100: The allocation pool end currently uses the CIDR's last
address from cidr.AddressRange (allocationEnd), which for IPv4 is the broadcast
address and is rejected by Neutron; update the logic where
openStackCluster.Spec.ManagedSubnets is built so that after calling
cidr.AddressRange(&networkCIDR.IPNet) you detect IPv4 (networkCIDR.IP.To4() !=
nil) and decrement allocationEnd by one IP (subtract one from the byte
representation) to exclude the broadcast address before assigning it to
capo.AllocationPool.End; keep allocationStart as returned and only adjust
allocationEnd for IPv4 to match the baremetal pattern.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1de8d424-77a5-425e-951e-15b25b5bf6bf

📥 Commits

Reviewing files that changed from the base of the PR and between f7fc26d and b0164ea.

📒 Files selected for processing (1)
  • pkg/asset/manifests/openstack/cluster.go

Copy link
Member

@mandre mandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This restores the behavior we had with the terraform-based installer:

allocation_pool {
start = cidrhost(local.nodes_cidr_block, 10)
end = cidrhost(local.nodes_cidr_block, pow(2, (32 - split("/", local.nodes_cidr_block)[1])) - 2)
}

/lgtm

@openshift-ci openshift-ci bot requested review from gryf and mandre March 10, 2026 16:43
@mandre
Copy link
Member

mandre commented Mar 10, 2026

/lgtm cancel

CodeRabbit caught a potential bug: we got the allocation pool too big, we should drop the last 2 addresses from the end.

When using Octavia, we use the .5 and .7 addresses on the first machine
network for API and Ingress VIP, respectively [1]. However, we were not
reserving these address when creating the network. This can result in an
IP collision if another service happens to allocate one of these two IPs
before the load balancer is created. This can happen if DHCP agents are
running on the controller/network nodes, as each agent creates its own
DHCP port on the subnet.

The solution is to use allocation pools - which are designed exactly for
this purpose - and configure one on the subnet we create (via CAPO). We
reserve everything < .10 in case we need more addresses down the line.
We were previously doing this with Terraform but lost this during the
migration to CAPO.

[1] `SetPlatformDefaults` in `pkg/types/openstack/defaults/platform.go`
[2] https://github.com/openshift/installer/blob/release-4.20/data/data/openstack/masters/private-network.tf#L37-L43

Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
@stephenfin
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 10, 2026
@openshift-ci-robot
Copy link
Contributor

@stephenfin: This pull request references Jira Issue OCPBUGS-67159, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (imatza@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels Mar 10, 2026
@openshift-ci-robot
Copy link
Contributor

@stephenfin: This pull request references Jira Issue OCPBUGS-67159, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (imatza@redhat.com), skipping review request.

Details

In response to this:

When using Octavia, we use the .5 and .7 addresses on the first machine network for API and Ingress VIP, respectively. However, we were not reserving these address when creating the network. This can result in an IP collision if another service happens to allocate one of these two IPs before the load balancer is created. This can happen if DHCP agents are running on the controller/network nodes, as each agent creates its own DHCP port on the subnet.

The solution is to use allocation pools - which are designed exactly for this purpose - and configure one on the subnet we create (via CAPO). We reserve everything < .10 in case we need more addresses down the line.

Summary by CodeRabbit

  • Bug Fixes
  • Fixed OpenStack network handling to compute and populate explicit allocation pools (start/end) for subnets while preserving CIDR and DNS nameserver settings, ensuring correct IP allocation bounds.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/asset/manifests/openstack/cluster.go (1)

85-101: Please add a regression test for the allocation-pool bounds.

This path now depends on a couple of boundary decisions (.10 start and last-usable end). A focused unit test here would make future off-by-one regressions much harder.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/asset/manifests/openstack/cluster.go` around lines 85 - 101, Add a
focused unit test that asserts the allocation-pool bounds produced in
openStackCluster.Spec.ManagedSubnets match the intended boundary decisions
(start at host .10 and end at the last-usable IP). In a new test in the same
package, create a deterministic installConfig CIDR, call the function that
builds the OpenStack cluster object (the code path using
capiutils.CIDRFromInstallConfig and producing capo.SubnetSpec with
AllocationPool), then assert that the AllocationPool.Start equals the value
returned by cidr.Host(&networkCIDR.IPNet, 10).String() and AllocationPool.End
equals cidr.Dec(broadcastIP).String() where broadcastIP comes from
cidr.AddressRange(&networkCIDR.IPNet); fail the test if those values differ to
prevent off-by-one regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/asset/manifests/openstack/cluster.go`:
- Around line 90-91: The code unconditionally decrements allocationEnd via
cidr.Dec(broadcastIP), which wrongly removes one usable address for IPv6; change
it so you only decrement for IPv4 subnets by detecting IPv4 (e.g., test
networkCIDR.IPNet.IP.To4() != nil) before calling cidr.Dec; keep the use of
cidr.AddressRange to obtain broadcastIP but only set allocationEnd =
cidr.Dec(broadcastIP) when the IP is IPv4, otherwise use the broadcastIP (or its
equivalent) directly for IPv6 in the allocationEnd variable.

---

Nitpick comments:
In `@pkg/asset/manifests/openstack/cluster.go`:
- Around line 85-101: Add a focused unit test that asserts the allocation-pool
bounds produced in openStackCluster.Spec.ManagedSubnets match the intended
boundary decisions (start at host .10 and end at the last-usable IP). In a new
test in the same package, create a deterministic installConfig CIDR, call the
function that builds the OpenStack cluster object (the code path using
capiutils.CIDRFromInstallConfig and producing capo.SubnetSpec with
AllocationPool), then assert that the AllocationPool.Start equals the value
returned by cidr.Host(&networkCIDR.IPNet, 10).String() and AllocationPool.End
equals cidr.Dec(broadcastIP).String() where broadcastIP comes from
cidr.AddressRange(&networkCIDR.IPNet); fail the test if those values differ to
prevent off-by-one regressions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 696abdfd-c952-4593-ad8a-df0f19fb11ae

📥 Commits

Reviewing files that changed from the base of the PR and between b0164ea and 1a220e5.

📒 Files selected for processing (1)
  • pkg/asset/manifests/openstack/cluster.go

@stephenfin
Copy link
Contributor Author

/test e2e-openstack-proxy
/test e2e-openstack-ovn

Copy link
Member

@mandre mandre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Copy link
Member

@gryf gryf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@stephenfin
Copy link
Contributor Author

/verified later by @stephenfin

@openshift-ci-robot
Copy link
Contributor

@stephenfin: Only users can be targets for the /verified later command.

Details

In response to this:

/verified later by @stephenfin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 11, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 11, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gryf, mandre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 11, 2026
@stephenfin
Copy link
Contributor Author

/verified later stephenfin

@openshift-ci-robot
Copy link
Contributor

@stephenfin: Only users can be targets for the /verified later command.

Details

In response to this:

/verified later stephenfin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 11, 2026

@stephenfin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-openstack-ovn 1a220e5 link true /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-rhcos10-devpreview 1a220e5 link false /test e2e-aws-ovn-rhcos10-devpreview
ci/prow/e2e-openstack-proxy 1a220e5 link false /test e2e-openstack-proxy

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@stephenfin
Copy link
Contributor Author

/test e2e-openstack-proxy
/test e2e-openstack-ovn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants