Add IP Address Management#4046
Conversation
|
Review Complete Files Reviewed: 2 By Severity:
The new IPAM package introduces a territory-boundary bug in both IPv4 and IPv6 cursor-advance loops: when an IP from one device-type territory is deallocated and the opposite-type cursor advances far enough to reach it, the freed IP is silently allocated to the wrong device type, causing the same address to be handed to both a node and an extclient simultaneously. Files Reviewed (2 files) |
There was a problem hiding this comment.
Risk: 🟠 High (70/100) — 1 high finding · 1277 LOC across 2 files
Overview
This PR introduces a new ipam package providing stateful, dual-territory IPv4 and IPv6 address allocators. The overall design is sound (two cursors approaching from opposite ends, a shared sorted freelist with territory guards) and the test coverage is broad. However, there is a high-severity logic bug in both the IPv4 and IPv6 cursor-advance loops that breaks the core territory-isolation guarantee.
Bug: Cursor Walk Ignores Opposite-Territory Boundary
Affected code
ipam/ipam.go—IPv4Allocator.AllocateNode(lines 160–176)ipam/ipam.go—IPv4Allocator.AllocateExtclient(lines 197–213)ipam/ipam.go—IPv6Allocator.AllocateNode(lines 383–407)ipam/ipam.go—IPv6Allocator.AllocateExtclient(lines 430–454)
Root Cause
The freelist recycling path correctly uses the cursor as a territory guard (!a.nodeCursor.Less(a.freelist[0]) etc.). However, the cursor-advance (walk) path only checks a.allocated[addr] — it does not check whether addr has crossed into the opposite territory.
When Deallocate is called, the IP is removed from a.allocated and inserted into the freelist. If the advancing cursor subsequently reaches that freed IP, it is allocated to the wrong device type — the freelist guard only fires at the start of the method based on the front/back of the list, not during the cursor walk.
Concrete Reproduction (IPv4, 10.0.0.0/28)
- Allocate nodes
.1–.7(nodeCursor=.7); allocate extclients.14–.8(extCursor=.8). - Deallocate extclient
.13→.13removed fromallocated; freelist =[.13]. - Call
AllocateNode:- Freelist guard:
freelist[0]=.13 > nodeCursor=.7→ skips freelist (correct). - Cursor walk:
.8(allocated),.9,.10,.11,.12(all allocated),.13— not inallocated→ returned as a node IP.extCursorstays at.8.
- Freelist guard:
.13is now assigned to a node; the nextAllocateExtclientwill also recycle.13from the freelist, creating a duplicate assignment.
The symmetric bug hits AllocateExtclient when node IPs are freed and the extclient cursor retreats past them. Both IPv4 and IPv6 are affected identically.
Fix
Add a territory-boundary guard inside each cursor-advance loop before the allocated lookup:
// In AllocateNode — stop if addr has reached extclient territory
if !addr.Less(a.extCursor) {
return netip.Addr{}, ErrExhausted
}
// In AllocateExtclient — stop if addr has retreated into node territory
if !a.nodeCursor.Less(addr) {
return netip.Addr{}, ErrExhausted
}Why Tests Miss This
TestIPv4CursorsMeetFreelistRecycle frees node IPs .1, .2 and extclient IPs .14, .13, then calls AllocateNode exactly twice — both calls recycle via the freelist (.1 and .2 satisfy <= nodeCursor=.7). The test never calls AllocateNode a third time with the extclient IPs still in the freelist, which is where the bug manifests. A regression test should be added.
| if errors.Is(err, iplib.ErrAddressOutOfRange) || errors.Is(err, iplib.ErrBroadcastAddress) { | ||
| return netip.Addr{}, ErrExhausted | ||
| } | ||
| cursor = next | ||
| addr := netIPv4ToAddr(cursor) | ||
| if _, taken := a.allocated[addr]; !taken { | ||
| a.nodeCursor = addr | ||
| a.allocated[addr] = struct{}{} | ||
| return addr, nil | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // AllocateExtclient picks the next available IP from the high end of the | ||
| // subnet and marks it as allocated. It first attempts to reuse a freed IP | ||
| // from the back of the free list, but only if that IP lies within the | ||
| // extclient territory (i.e., it is not below the current extclient cursor). |
There was a problem hiding this comment.
🟠 Cursor-advance loop allocates freed opposite-territory IPs to the wrong device type (bug)
In both IPv4Allocator and IPv6Allocator, the AllocateNode and AllocateExtclient cursor-advance loops skip IPs that are present in a.allocated. When Deallocate removes an IP from allocated and places it on the freelist, the cursor walk can reach and allocate that IP regardless of which territory originally owned it.
IPv4 example (10.0.0.0/28): allocate nodes .1–.7 and extclients .14–.8. Deallocate extclient .13 (removed from allocated, freelist = [.13]). Call AllocateNode: freelist guard fires (freelist[0]=.13 > nodeCursor=.7) and correctly skips the freelist. Cursor walk then visits .8–.12 (all in allocated), hits .13 (not in allocated), and returns it as a node IP. extCursor remains .8, so .13 is now simultaneously a live extclient-territory address and an allocated node address — a duplicate assignment.
The symmetric bug exists in AllocateExtclient when node IPs are freed and the extclient cursor retreats past them. Both the IPv4 and IPv6 variants have identical cursor-walk code without this guard.
💡 Suggestion: Add a territory-boundary check inside each cursor-advance loop, immediately after computing addr and before the allocated lookup:
- In
AllocateNode(IPv4 and IPv6):if !addr.Less(a.extCursor) { return netip.Addr{}, ErrExhausted } - In
AllocateExtclient(IPv4 and IPv6):if !a.nodeCursor.Less(addr) { return netip.Addr{}, ErrExhausted }
This ensures neither cursor can walk into the other's territory even when IPs from that territory have been freed (and are thus absent from allocated).
📋 Prompt for AI Agents
In ipam/ipam.go, the cursor-advance loops in AllocateNode and AllocateExtclient do not enforce the territory boundary, causing freed opposite-territory IPs to be allocated to the wrong device type. Fix by adding boundary guards in four places:
- IPv4Allocator.AllocateNode (around line 164-165): after
addr := netIPv4ToAddr(cursor), add before the allocated-map check:
if !addr.Less(a.extCursor) {
return netip.Addr{}, ErrExhausted
}- IPv4Allocator.AllocateExtclient (around line 206-207): after
addr := netIPv4ToAddr(cursor), add:
if !a.nodeCursor.Less(addr) {
return netip.Addr{}, ErrExhausted
}- IPv6Allocator.AllocateNode (around line 391-393): after
addr := netIPv6ToAddr(cursor)and before theaddr == a.subnetMaxcheck, add:
if !addr.Less(a.extCursor) {
return netip.Addr{}, ErrExhausted
}- IPv6Allocator.AllocateExtclient (around line 444-446): after
addr := netIPv6ToAddr(cursor)and before theaddr == a.networkAddrcheck, add:
if !a.nodeCursor.Less(addr) {
return netip.Addr{}, ErrExhausted
}Also add a regression test in ipam_test.go: on 10.0.0.0/28, allocate nodes .1-.7 (nodeCursor=.7) and extclients .14-.8 (extCursor=.8), then deallocate extclient .13, then call AllocateNode and assert it returns ErrExhausted rather than .13.
Describe your changes
Provide Issue ticket number if applicable/not in title
Provide testing steps
Checklist before requesting a review