Skip to content

Commit ce44527

Browse files
committed
fix: address review feedback — Decimal money, atomic store, BudgetLimit model, scoping fix
- float → Decimal throughout: BudgetLimit.amount, BudgetWindow, store, tests - BudgetLimit + BudgetWindow model: config refactored from flat fields to limits: list[BudgetLimit]; each limit has amount, currency, scope_by, window - Atomic check_and_record(): eliminates TOCTOU race on get_spend()+record_spend(); InMemorySpendStore implements with threading.Lock (single-process); docs note production stores need DB-level atomics (Postgres FOR UPDATE, Redis Lua) - scope_by field: independent per-dimension budget isolation; scope_by=(channel,) means channel A spend does not count against channel B budget - selector.path fix: config examples and README updated to use 'input' not '*'; Step vs raw-dict distinction documented; evaluator auto-detects format - EvaluatorResult.error usage: malformed payload returns matched=False, error=None; error field reserved for crashes/timeouts/missing deps only - README: custom store example updated with scope param and check_and_record; stale malformed-input docs corrected; Known Limitations updated - Tests: or True removed; all assertions verify actual store state; lan17 channel isolation test (90 in A + 20 in B) passes with scope_by semantics
1 parent c36c7e2 commit ce44527

File tree

10 files changed

+1188
-510
lines changed

10 files changed

+1188
-510
lines changed

evaluators/contrib/financial-governance/README.md

Lines changed: 144 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,12 @@ As agents transact autonomously via protocols like [x402](https://github.com/coi
1010

1111
Tracks cumulative agent spend and enforces rolling budget limits. Stateful — records approved transactions and checks new ones against accumulated spend.
1212

13-
- **Per-transaction cap** — reject any single payment above a threshold
14-
- **Rolling period budget** — reject payments that would exceed a time-windowed budget
15-
- **Context-aware overrides** — different limits per channel, agent, or session via evaluate metadata
13+
- **Per-transaction cap** — reject any single payment above a threshold (`BudgetLimit` with no window)
14+
- **Rolling period budget** — reject payments that would exceed a time-windowed budget (`BudgetWindow(kind="rolling", ...)`)
15+
- **Calendar-aligned budget** — reject payments that exceed a day/week/month budget (`BudgetWindow(kind="fixed", ...)`)
16+
- **Scoped budgets** — independent counters per channel, agent, or session via `scope_by`
1617
- **Pluggable storage** — abstract `SpendStore` protocol with built-in `InMemorySpendStore`; bring your own PostgreSQL, Redis, etc.
18+
- **Atomic enforcement**`check_and_record()` prevents TOCTOU races in single-process deployments
1719

1820
### `financial_governance.transaction_policy`
1921

@@ -35,16 +37,25 @@ pip install -e ".[dev]"
3537

3638
### Spend Limit
3739

40+
The `spend_limit` evaluator is configured via a list of `BudgetLimit` objects. Each limit is evaluated independently — the first violation wins.
41+
3842
```yaml
3943
controls:
4044
- name: spend-limit
4145
evaluator:
4246
type: financial_governance.spend_limit
4347
config:
44-
max_per_transaction: 100.0 # Max USDC per single payment
45-
max_per_period: 1000.0 # Rolling 24h budget
46-
period_seconds: 86400 # Budget window (default: 24 hours)
47-
currency: USDC # Currency to govern
48+
limits:
49+
# Per-transaction cap: single payment ≤ 100 USDC
50+
- amount: "100.00"
51+
currency: USDC
52+
# Per-channel rolling 24h budget: each channel limited to 1000 USDC/day
53+
- amount: "1000.00"
54+
currency: USDC
55+
scope_by: [channel]
56+
window:
57+
kind: rolling
58+
seconds: 86400
4859
selector:
4960
path: input # Extract step.input (transaction dict)
5061
action: deny
@@ -61,8 +72,8 @@ controls:
6172
allowed_currencies: [USDC, USDT]
6273
blocked_recipients: ["0xDEAD..."]
6374
allowed_recipients: ["0xALICE...", "0xBOB..."]
64-
min_amount: 0.01
65-
max_amount: 5000.0
75+
min_amount: "0.01"
76+
max_amount: "5000.00"
6677
selector:
6778
path: input
6879
action: deny
@@ -82,58 +93,106 @@ The transaction dict (from `step.input`) should contain:
8293
```python
8394
# step.input — transaction payload
8495
{
85-
"amount": 50.0, # required — transaction amount
86-
"currency": "USDC", # required — payment currency
87-
"recipient": "0xABC...", # required — payment recipient
96+
"amount": "50.00", # required — Decimal or numeric string
97+
"currency": "USDC", # required — payment currency
98+
"recipient": "0xABC...", # required — payment recipient
99+
# optional context fields (used for scope_by)
100+
"channel": "slack",
101+
"agent_id": "agent-42",
102+
"session_id": "sess-1",
88103
}
89104
```
90105

106+
> **Note:** Use `Decimal` or string representations for `amount` — never raw `float`. Floating-point arithmetic is imprecise for money. The evaluator internally converts to `Decimal`.
107+
108+
## BudgetLimit Model
109+
110+
```python
111+
from decimal import Decimal
112+
from agent_control_evaluator_financial_governance.spend_limit import (
113+
BudgetLimit, BudgetWindow, SpendLimitConfig, SpendLimitEvaluator,
114+
)
115+
116+
# Per-transaction cap (no window)
117+
cap = BudgetLimit(amount=Decimal("100"), currency="USDC")
118+
119+
# Rolling 24-hour budget, scoped per channel
120+
rolling = BudgetLimit(
121+
amount=Decimal("1000"),
122+
currency="USDC",
123+
scope_by=("channel",),
124+
window=BudgetWindow(kind="rolling", seconds=86400),
125+
)
126+
127+
# Calendar-day budget (UTC)
128+
daily = BudgetLimit(
129+
amount=Decimal("500"),
130+
currency="USDC",
131+
window=BudgetWindow(kind="fixed", unit="day"),
132+
)
133+
134+
config = SpendLimitConfig(limits=[cap, rolling, daily])
135+
evaluator = SpendLimitEvaluator(config)
136+
```
137+
138+
### BudgetWindow
139+
140+
| kind | Required fields | Notes |
141+
|------|----------------|-------|
142+
| `"rolling"` | `seconds` | Sliding window from `now - seconds` |
143+
| `"fixed"` | `unit` (`"day"`, `"week"`, or `"month"`) | Calendar-aligned, UTC by default |
144+
145+
### scope_by semantics
146+
147+
`scope_by` lists the context dimension keys to isolate spend buckets. Each dimension is **independent**:
148+
149+
- `scope_by=()` (default) — global budget: all spend in that currency shares one counter
150+
- `scope_by=("channel",)` — one counter per unique `channel` value
151+
- `scope_by=("agent_id",)` — one counter per unique `agent_id`
152+
- `scope_by=("channel", "agent_id")` — one counter per unique `(channel, agent_id)` pair
153+
154+
Spend in `channel-A` does **not** count against `channel-B`'s budget.
155+
91156
## Context-Aware Limits
92157

93-
Context fields (`channel`, `agent_id`, `session_id`) and per-context limit overrides can be provided in two ways:
158+
Context fields (`channel`, `agent_id`, `session_id`) can be provided in two ways:
94159

95160
**Option A: Via `step.context`** (recommended for engine integration)
96161

97162
```python
98163
step = Step(
99164
type="tool",
100165
name="payment",
101-
input={"amount": 75.0, "currency": "USDC", "recipient": "0xABC"},
166+
input={"amount": "75.00", "currency": "USDC", "recipient": "0xABC"},
102167
context={
103168
"channel": "experimental",
104169
"agent_id": "agent-42",
105-
"channel_max_per_transaction": 50.0,
106-
"channel_max_per_period": 200.0,
107170
},
108171
)
109172
```
110173

111-
When using `selector.path: "*"`, the evaluator merges `step.context` fields into the transaction data automatically. When using `selector.path: "input"`, context fields must be included directly in `step.input`.
174+
When using `selector.path: "*"`, the evaluator merges `step.context` fields into the transaction data automatically. Fields already present in `step.input` are never overwritten by context.
112175

113176
**Option B: Inline in the transaction dict** (simpler, for direct SDK use)
114177

115178
```python
116179
result = await evaluator.evaluate({
117-
"amount": 75.0,
180+
"amount": "75.00",
118181
"currency": "USDC",
119182
"recipient": "0xABC",
120183
"channel": "experimental",
121-
"channel_max_per_transaction": 50.0,
122-
"channel_max_per_period": 200.0,
184+
"agent_id": "agent-42",
123185
})
124186
```
125187

126-
Spend budgets are **scoped by context** — spend in channel A does not count against channel B's budget. When no context fields are present, budgets are global.
127-
128188
## Custom SpendStore
129189

130-
The `SpendStore` protocol requires two methods. Implement them for your backend:
190+
The `SpendStore` protocol requires three methods. Implement them for your backend:
131191

132192
```python
193+
from decimal import Decimal
133194
from agent_control_evaluator_financial_governance.spend_limit import (
134-
SpendStore,
135-
SpendLimitConfig,
136-
SpendLimitEvaluator,
195+
SpendStore, SpendLimitConfig, SpendLimitEvaluator,
137196
)
138197
139198
class PostgresSpendStore:
@@ -142,24 +201,70 @@ class PostgresSpendStore:
142201
def __init__(self, connection_string: str):
143202
self._conn = connect(connection_string)
144203
145-
def record_spend(self, amount: float, currency: str, metadata: dict | None = None) -> None:
204+
def record_spend(
205+
self,
206+
amount: Decimal,
207+
currency: str,
208+
metadata: dict | None = None,
209+
) -> None:
146210
self._conn.execute(
147-
"INSERT INTO agent_spend (amount, currency, metadata, recorded_at) VALUES (%s, %s, %s, NOW())",
148-
(amount, currency, json.dumps(metadata)),
211+
"INSERT INTO agent_spend (amount, currency, metadata, recorded_at)"
212+
" VALUES (%s, %s, %s, NOW())",
213+
(str(amount), currency, json.dumps(metadata)),
149214
)
150215
151-
def get_spend(self, currency: str, since_timestamp: float) -> float:
216+
def get_spend(
217+
self,
218+
currency: str,
219+
start: float,
220+
end: float | None = None,
221+
scope: dict[str, str] | None = None,
222+
) -> Decimal:
223+
# Build WHERE clause for scope filtering
224+
clauses = [
225+
"currency = %s",
226+
"recorded_at >= to_timestamp(%s)",
227+
]
228+
params = [currency, start]
229+
if end is not None:
230+
clauses.append("recorded_at <= to_timestamp(%s)")
231+
params.append(end)
232+
if scope:
233+
for k, v in scope.items():
234+
clauses.append(f"metadata->>{k!r} = %s")
235+
params.append(v)
236+
where = " AND ".join(clauses)
152237
row = self._conn.execute(
153-
"SELECT COALESCE(SUM(amount), 0) FROM agent_spend WHERE currency = %s AND recorded_at >= to_timestamp(%s)",
154-
(currency, since_timestamp),
238+
f"SELECT COALESCE(SUM(amount), 0) FROM agent_spend WHERE {where}",
239+
params,
155240
).fetchone()
156-
return float(row[0])
241+
return Decimal(str(row[0]))
242+
243+
def check_and_record(
244+
self,
245+
amount: Decimal,
246+
currency: str,
247+
limit: Decimal,
248+
start: float,
249+
end: float | None = None,
250+
scope: dict[str, str] | None = None,
251+
metadata: dict | None = None,
252+
) -> tuple[bool, Decimal]:
253+
# Use a DB transaction for atomicity
254+
with self._conn.transaction():
255+
current = self.get_spend(currency, start, end, scope)
256+
if current + amount > limit:
257+
return False, current
258+
self.record_spend(amount, currency, metadata)
259+
return True, current
157260
158261
# Use it:
159262
store = PostgresSpendStore("postgresql://...")
160263
evaluator = SpendLimitEvaluator(config, store=store)
161264
```
162265

266+
> **Single-process atomicity note:** `InMemorySpendStore.check_and_record()` uses a `threading.Lock` to atomically check-and-record within a single process. For multi-process or distributed deployments, your custom store must implement true database-level atomics (e.g., PostgreSQL `SELECT ... FOR UPDATE`, Redis Lua scripts).
267+
163268
## Running Tests
164269

165270
```bash
@@ -170,10 +275,12 @@ pytest tests/ -v
170275

171276
## Design Decisions
172277

173-
1. **Decoupled from data source** — The `SpendStore` protocol means no new tables in core Agent Control. Bring your own persistence.
174-
2. **Context-aware limits** — Override keys in the evaluate data dict allow per-channel, per-agent, or per-session limits without multiple evaluator instances.
175-
3. **Python SDK compatible** — Uses the standard evaluator interface; works with both the server and the Python SDK evaluation engine.
176-
4. **Fail-open on errors** — Missing or malformed data returns `matched=False` with an `error` field, following Agent Control conventions.
278+
1. **Decimal for money** — All monetary amounts use `Decimal`, never `float`. Floating-point arithmetic is unsuitable for financial calculations.
279+
2. **BudgetLimit + BudgetWindow models** — Expressive, composable budget definitions that replace the previous flat config. Each limit is independent; first violation wins.
280+
3. **Independent scope dimensions** — `scope_by=("channel",)` creates a separate counter for each channel value. Spend in one channel is completely isolated from another.
281+
4. **Atomic check_and_record()** — Eliminates the TOCTOU race of separate `get_spend()` + `record_spend()` calls. Single-process safe with `threading.Lock`; production stores should use DB-level atomics.
282+
5. **Decoupled from data source** — The `SpendStore` protocol means no new tables in core Agent Control. Bring your own persistence.
283+
6. **Fail-open on malformed input** — Missing or malformed data returns `matched=False, error=None`, following Agent Control conventions. The `error` field is reserved for evaluator crashes, not policy decisions.
177284

178285
## Related Projects
179286

evaluators/contrib/financial-governance/src/agent_control_evaluator_financial_governance/__init__.py

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33
Provides two evaluators for enforcing financial policy on AI agent transactions:
44
55
- ``financial_governance.spend_limit``: Tracks cumulative spend against rolling
6-
period budgets and per-transaction caps.
6+
period budgets and per-transaction caps. Uses the :class:`BudgetLimit` /
7+
:class:`BudgetWindow` model for expressive, scoped budget definitions.
78
- ``financial_governance.transaction_policy``: Static policy checks — allowlists,
89
blocklists, amount bounds, and permitted currencies.
910
@@ -14,14 +15,22 @@
1415
1516
{
1617
"condition": {
17-
"selector": {"path": "*"},
18+
"selector": {"path": "input"},
1819
"evaluator": {
1920
"name": "financial_governance.spend_limit",
2021
"config": {
21-
"max_per_transaction": 100.0,
22-
"max_per_period": 1000.0,
23-
"period_seconds": 86400,
24-
"currency": "USDC"
22+
"limits": [
23+
{
24+
"amount": "100.00",
25+
"currency": "USDC"
26+
},
27+
{
28+
"amount": "1000.00",
29+
"currency": "USDC",
30+
"scope_by": ["channel"],
31+
"window": {"kind": "rolling", "seconds": 86400}
32+
}
33+
]
2534
}
2635
}
2736
},
@@ -30,6 +39,8 @@
3039
"""
3140

3241
from agent_control_evaluator_financial_governance.spend_limit import (
42+
BudgetLimit,
43+
BudgetWindow,
3344
SpendLimitConfig,
3445
SpendLimitEvaluator,
3546
)
@@ -41,6 +52,8 @@
4152
__all__ = [
4253
"SpendLimitEvaluator",
4354
"SpendLimitConfig",
55+
"BudgetLimit",
56+
"BudgetWindow",
4457
"TransactionPolicyEvaluator",
4558
"TransactionPolicyConfig",
4659
]
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
"""Spend-limit evaluator package."""
22

3-
from .config import SpendLimitConfig
3+
from .config import BudgetLimit, BudgetWindow, SpendLimitConfig
44
from .evaluator import SpendLimitEvaluator
55
from .store import InMemorySpendStore, SpendStore
66

77
__all__ = [
88
"SpendLimitEvaluator",
99
"SpendLimitConfig",
10+
"BudgetLimit",
11+
"BudgetWindow",
1012
"SpendStore",
1113
"InMemorySpendStore",
1214
]

0 commit comments

Comments
 (0)