Skip to content

Conversation

@LukeAVanDrie
Copy link
Contributor

What type of PR is this?
/kind feature

What this PR does / why we need it:

This PR implements dynamic priority provisioning, allowing the Flow Control layer to accept and process requests with priority levels that were not explicitly configured at startup.

Previously, priority bands had to be statically defined in the config. Requests matching an unknown priority were rejected. This change enables a "zero-config" operational experience where the system automatically provisions new priority bands (with their corresponding queues, policies, stats tracking etc.) Just-In-Time (JIT) based on a default template (which itself is optionally configurable).

Key Changes:

  1. Hybrid State Model (sync.Map + Mutex):

    • Refactored FlowRegistry and registryShard to use sync.Map for operational state (queues, statistics). This ensures the data path remains lock-free for reads while allowing safe, concurrent additions of new bands.
    • Retained the standard map protected by RWMutex for the Config definition to ensure a consistent source of truth for topology limits (MaxBytes) and observability metadata.
  2. Dynamic Provisioning Logic:

    • Registry: Added ensurePriorityBand, which atomically updates the global configuration and orchestrates the rollout of the new band to all active shards.
    • Shard: Added addPriorityBand, which instantiates the runtime components (Queues, DispatchPolicies) for the new priority level.
  3. Configuration & Bootstrapping:

    • Introduced DefaultPriorityBand to Config. This template controls the policy and queue settings for dynamically created bands.
    • Optional Configuration: This template is optional. If omitted, the system applies sensible defaults ("FCFS" ordering, "BestHead" dispatch) automatically.
    • Updated the main entry point to use dynamic bootstrapping, removing hardcoded static priority definitions to validate the zero-config path.

Which issue(s) this PR fixes:
Fixes #1792

Does this PR introduce a user-facing change?:

flowcontrol: Enabled dynamic priority provisioning for arbitrary integer priority levels defined in `InferenceObjective` resources without requiring a restart or static configuration updates. The system now automatically creates resources for new priority levels detected at runtime using a default configuration template.

This commit implements dynamic provisioning for priority bands, enabling
the system to accept requests with priority levels that were not
explicitly configured at startup.

Key Changes:

1. 	Hybrid State Model:
  	- Adopts `sync.Map` in `FlowRegistry` and `registryShard` for
	  	operational state (queues, stats). This enables lock-free reads on
		  the hot path and safe concurrent writes when provisioning new
		  bands.
   	- Retains `map` + `RWMutex` for the `Config` definition to ensure a
			consistent source of truth for topology limits and names.

2. 	Dynamic Provisioning Logic:
   	- `prepareNewFlow` implements an optimistic lock-free check for band
			existence.
   	- `ensurePriorityBand` handles the atomic JIT creation of new bands,
			updating both the global config definitions and the active shard
			runtimes.

3. 	Configuration Extensions:
   	- Added `DefaultPriorityBand` to `Config` to serve as the template
	 		for dynamically created bands.
   	- Updated validation logic to support empty static band lists,
			defaulting instead to a purely dynamic configuration.
   	- Removed obsolescent test cases enforcing static band requirements.
This commit extends the test suite to validate the new dynamic priority
provisioning capabilities in the FlowRegistry and Shard components.

Key Updates:
1. 	Shard Tests:
   	- Added `TestShard_DynamicProvisioning` to verify that
	 		`addPriorityBand` correctly updates internal state (`sync.Map`,
			sorted levels) and maintains idempotency.
   	- Verified that accessing missing configuration triggers the
			expected invariant violation panic.

2. 	Registry Tests:
   	- Added `TestFlowRegistry_DynamicProvisioning` to verify the
			end-to-end flow: creating a new priority via `WithConnection`
			updates the global config, stats, and propagates to all active
			shards.
   	- Added concurrency tests to ensure multiple clients requesting the
	 		same new priority simultaneously do not race or duplicate work.
   	- Verified that dynamic bands are correctly persisted across scaling
			events.

3. 	Config Tests:
   	- Updated `TestNewConfig` to support empty static band
			configurations (now valid).
   	- Added verification for `DefaultPriorityBand` template
			initialization and cloning behavior.
Removes the explicit hardcoded configuration of the "Default" priority
band (Priority 0) in the main entry point.

The system now relies entirely on the newly implemented
`DefaultPriorityBand` mechanism to dynamically provision Priority 0
just-in-time upon the first request. This simplifies the wiring code and
validates the zero-config startup path.
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 16, 2025
@netlify
Copy link

netlify bot commented Dec 16, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 7ec6d8a
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6940afdedf10240008793f04
😎 Deploy Preview https://deploy-preview-2006--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 16, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 16, 2025
@LukeAVanDrie
Copy link
Contributor Author

Justification for sync.Map over other concurrency primitives is in #1792. There are several valid approaches though and I am not too opinionated here.

@ahg-g
Copy link
Contributor

ahg-g commented Dec 16, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 16, 2025
// protected by its own lock.
fr.mu.RLock()
_, exists := fr.config.PriorityBands[key.Priority]
fr.mu.RUnlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fine grained management of locking is error prone, do we really need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current JIT implementation, yes. In the controller implementation you suggested, probably not depending on how we handle the race condition between reconciliation and request serving.

// Apply defaults to the template.
builder.config.DefaultPriorityBand.applyDefaults()
if builder.config.DefaultPriorityBand.PriorityName == "" {
builder.config.DefaultPriorityBand.PriorityName = "Dynamic-Default"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be the infObj object name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be much better for observability. The JIT path in this PR makes this tricky. If we manage to move to the controller path, this becomes much easier to populate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question though... How do we handle N InferenceObjectives mapped to the same priority level? Right now, priority bands are unique per int. It would not be simple (and perhaps not desirable) to break that.

I guess we could make the name a composite of the objectives mapped to it `"InfObjA,InfObjB,..."? The name itself then becomes dynamic data and is no longer immutable (though I don't foresee any issues with this).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on multiple infObjs mapping to the same priority. I think the name has no value and it is rather confusing since naturally one will expect the infObj name to be logged. So I recommend removing it.


// If the band was missing, we must acquire the Write Lock to create it.
if !exists {
if err := fr.ensurePriorityBand(key.Priority); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting that priority bands are created/deleted as a trigger of InfObj reconcile events. When a new infObj is created we create a band, when deleted we delete the band. doing it in the request processing path is an anti-pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Driving this from the InferenceObjective controller is cleaner; however, it introduces specific synchronization and lifecycle challenges that the JIT approach handled implicitly. We need to decide how to handle these scenarios:

Consistency:

  1. Operator applies InferenceObjective { Priority: 10 }.
  2. Client immediately sends curl matching that objective.
  3. The request hits the EPP before the Reconciler loop has updated the FlowRegistry.
  4. The FlowRegistry sees "Priority 10", checks its map, sees nothing.

With the current JIT behavior: it works. With the InferenceObjective reconciliation path it fails (503 Service Unavailable or similar) or MUST fall back to a default priority band (e.g., "Priority 0"). I am not yet sure just how narrow we can make this race condition. Perhaps, we can make these reconciliation steps fully atomic, but I am not so certain that this will be simple or feasible.

No InferenceObjective:

An InferenceObjective is not required. Does "Priority 0" exist if no InferenceObjective defines it?
Traffic without an objective defaults to "Priority 0," so we need to always provision a "Default/0" band on startup, regardless of CRDs, to catch fall-through traffic. This places us right back at a hybrid of static + controller-managed config.

Lifecycle Management:

If we listen to InferenceObjective events, we must also handle Delete. When deleted, we must drain and remove the associated priority band. What if flows are currently active (have buffered requests) in that band? We need a graceful drain mechanism before destroying the sync.Map entry.

This isn't strictly an issue with the CRD reconciliation approach; my current PR only adds bands (which is likely okay for now). The state management issue just becomes more apparent under the CRD model though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency:

This scenario can also happen when we look up the InfObj, it may not have been created by the time the request got to the epp, so it is not a new problem.

Perhaps, we can make these reconciliation steps fully atomic, but I am not so certain that this will be simple or feasible.

We only need to be eventually consistent, but when creating a new infObj should trigger a state update in the epp that includes the availability of the infObj itself as well as all associated state like the Band, this state update in the epp can happen atomically.

An InferenceObjective is not required. Does "Priority 0" exist if no InferenceObjective defines it?
Traffic without an objective defaults to "Priority 0," so we need to always provision a "Default/0" band on startup, regardless of CRDs, to catch fall-through traffic. This places us right back at a hybrid of static + controller-managed config.

The default priority is a special case for which we create a band for it on startup.

All that being said, I am ok proceeding with the current approach you have, but please open an issue to track this. I am sure we will revisit it when we introduce the flow CRD

@LukeAVanDrie
Copy link
Contributor Author

@ahg-g

We have two distinct lifecycles intersecting here:

  1. Flows (ID): Defined by Request Headers. High cardinality, ephemeral. These must remain JIT/Data-Plane driven.
  2. Bands (Priority): Defined by InferenceObjective. Low cardinality, stable. These can be Control-Plane driven.

My initial implementation attempts to unify these into a single JIT path for simplicity. If we split them, WithConnection (the data path) changes from 'Create Band if missing' to 'Fail if Band missing'. This makes the system strictly dependent on the reconciler finishing its work before traffic can be served for a new InferenceObjective.

I am happy to move to this split model (once we resolve these open questions) , but I believe the sync.Map refactoring in this PR is the necessary underlying mechanism to support either model safely. Perhaps we can work towards getting this merged without the hook into the WithConnection JIT path and handle the trigger in a followup PR?

@ahg-g
Copy link
Contributor

ahg-g commented Dec 16, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 16, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LukeAVanDrie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 16, 2025
@k8s-ci-robot k8s-ci-robot merged commit 36ca5ab into kubernetes-sigs:main Dec 16, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Flow Control] Implement Dynamic Priority Band Configuration

3 participants