Skip to content

NM-347: fix acl cache race issue#4054

Merged
abhishek9686 merged 5 commits into
developfrom
NM-347
Jun 11, 2026
Merged

NM-347: fix acl cache race issue#4054
abhishek9686 merged 5 commits into
developfrom
NM-347

Conversation

@abhishek9686

Copy link
Copy Markdown
Member

Describe your changes

Provide Issue ticket number if applicable/not in title

Provide testing steps

Checklist before requesting a review

  • My changes affect only 10 files or less.
  • I have performed a self-review of my code and tested it.
  • If it is a new feature, I have added thorough tests, my code is <= 1450 lines.
  • If it is a bugfix, my code is <= 200 lines.
  • My functions are <= 80 lines.
  • I have had my code reviewed by a peer.
  • My unit tests pass locally.
  • Netmaker is awesome.

@tenki-reviewer

tenki-reviewer Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Complete

Files Reviewed: 1
Findings: 1

By Severity:

  • 🟠 High: 1

The PR correctly replaces the fragile len(aclCacheMap) > 0 cache-hit guard with an explicit aclCacheFullyLoaded flag, but the new bool is read and written outside aclCacheMutex, introducing a data race that Go's race detector would flag in production.

Files Reviewed (1 files)
logic/acls.go

@tenki-reviewer tenki-reviewer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Risk: 🟡 Medium (45/100) — 1 high finding · 20 LOC across 1 file


Overview

This single-file change to logic/acls.go improves the ACL list cache by introducing aclCacheFullyLoaded bool so that an empty but fully-populated cache (len == 0) is no longer mistakenly treated as a cache miss. The intent is sound: resetAclCacheLocked() wipes the map and clears the flag under the mutex before a fresh DB load, then the flag is set after the load completes.

Data Race on aclCacheFullyLoaded

The aclCacheMutex (sync.RWMutex) is already used to protect aclCacheMap inside the helper functions (storeAclInCache, removeAclFromCache, listAclFromCache, resetAclCacheLocked). However, two accesses to aclCacheFullyLoaded introduced in this PR are not protected:

  • Read at line 2965: if servercfg.CacheEnabled() && aclCacheFullyLoaded { — no lock held
  • Write at line 3013: aclCacheFullyLoaded = true — no lock held

resetAclCacheLocked() does write aclCacheFullyLoaded = false under the mutex, making the locking discipline inconsistent. Two goroutines concurrently calling ListAcls() can race: one finishes the DB load and is about to set the flag to true, while the other reads false and begins another DB fetch + resetAclCacheLocked() call, clearing the first goroutine's in-progress cache population mid-flight.

Fix: Use sync/atomic.Bool (Go 1.19+) for the flag — replace the declaration with var aclCacheFullyLoaded atomic.Bool, use .Load() for the read and .Store(true/false) for writes. This is the minimal, zero-overhead fix that satisfies Go's memory model without restructuring the existing mutex usage.

Other Notes

The resetAclCacheLocked() helper itself is well-formed — it atomically wipes the map and clears the flag while holding the write lock. The only gap is that the two new bare accesses outside that function bypass the protection.

Comment thread logic/acls.go Outdated
@abhishek9686 abhishek9686 merged commit 6bdc4b6 into develop Jun 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants