Add guard observability hooks to Safety Agent SDKs#1106
Conversation
- Replace classification-focused tests with functional tests
- Tests now verify response structure instead of specific classification values
- Updated 13 test files (10 TypeScript, 3 Python)
- Tests now accept any valid classification ('pass' or 'block')
- Added vitest types to tsconfig.json for better IDE support
- Fixed TypeScript configuration to include test files
- Add URL validation to prevent SSRF attacks - Block private/internal IP addresses (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) - Block localhost access and file:// protocol - Validate URL format, protocol, and length (max 2048 chars) - Implement security checks in both Python and TypeScript SDKs - Maintain backward compatibility with existing valid URLs Fixes superagent-ai#1076
- Replace string-based IP checks with actual DNS resolution - Use ipaddr.js library for robust IPv4/IPv6 private range checking - Add DNS lookup to prevent hostname-based SSRF attacks - Support all IPv6 private ranges (fc00::/7, fe80::/10, ::ffff:127.0.0.0/104) - Implement fail-safe: treat DNS failures as private (secure by default) - Add comprehensive unit tests for SSRF protection Security improvements: - Blocks hostnames that resolve to private IPs (e.g., attacker.com -> 127.0.0.1) - Proper IPv4 and IPv6 private range detection - IPv4-mapped IPv6 address support (::ffff:127.0.0.1) - DNS resolution prevents bypass via hostname manipulation Fixes SSRF vulnerability where previous implementation only checked hostname strings, not actual resolved IP addresses.
- Change rootDir from './src' to '.' to allow test files compilation - Fixes TS6059 error where test files were included but not under rootDir - Build now compiles successfully with both src and tests
- Block private/internal IP addresses (IPv4 and IPv6) - Block localhost, loopback, and link-local addresses - DNS resolution check to prevent hostname-based SSRF bypasses - Protocol validation (blocks file://, only allows http/https) - URL length validation (max 2048 characters) - Fail-safe: DNS failures treated as private Fixes test expectations for IPv6 and DNS resolution error messages
|
@dushyantzz is attempting to deploy a commit to the Superagent Team on Vercel. A member of the Team first needs to authorize it. |
|
The CI failures are occuring due to missing API keys in the GitHub Actions environment. |
|
@cursor review |
|
@dushyantzz how does this hook into existing observability providers like Langfuse etc.? Care to give some details? |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| try: | ||
| result = hook(event) | ||
| if inspect.isawaitable(result): | ||
| asyncio.create_task(result) |
There was a problem hiding this comment.
Async hook exceptions not caught in Python
Medium Severity
When an async hook raises an exception, it's not caught or suppressed. The asyncio.create_task(result) call schedules the coroutine to run in the background, but provides no exception handler for the task. If the async hook raises an exception during execution, Python logs an "Task exception was never retrieved" warning when the task is garbage collected, violating the stated intent to "swallow hook errors to avoid impacting guard execution."
|
@cursor how does this hook into existing observability tools like Langfuse etc.? |
|
Perfect! Now I have a complete understanding of the observability hooks implementation. Let me create a comprehensive response explaining how these hooks integrate with observability tools like Langfuse. Based on the PR implementation, I can now provide a detailed answer about how the observability hooks work and how they can integrate with tools like Langfuse. The hooks provide lifecycle events (start, segment, result, error) that can be used to send telemetry data to any observability platform. How the Observability Hooks WorkThe observability hooks added to the Safety Agent SDKs provide a callback-based integration pattern that allows you to instrument the Hook ArchitectureThe implementation provides 4 lifecycle hooks for the guard flow:
Integration with Langfuse (Example)Here's how you would integrate with Langfuse or similar observability platforms: TypeScript: import { createClient } from 'safety-agent';
import { Langfuse } from 'langfuse';
const langfuse = new Langfuse({
publicKey: process.env.LANGFUSE_PUBLIC_KEY,
secretKey: process.env.LANGFUSE_SECRET_KEY,
});
const client = createClient({ apiKey: process.env.SUPERAGENT_API_KEY });
const trace = langfuse.trace({ name: 'safety-guard-check' });
const result = await client.guard({
input: "user message to analyze",
model: "openai/gpt-4o-mini",
hooks: {
onStart: (event) => {
trace.update({
metadata: {
model: event.model,
inputType: event.inputType,
inputSize: event.inputSize,
segmentCount: event.segmentCount,
}
});
},
onSegment: (event) => {
// Log each segment as a span
trace.span({
name: `guard-segment-${event.kind}`,
input: { size: event.segmentSize, units: event.segmentUnits },
output: {
classification: event.result.classification,
violations: event.result.violation_types
},
metadata: {
index: event.index,
totalCount: event.count,
durationMs: event.durationMs,
},
usage: {
promptTokens: event.result.usage.promptTokens,
completionTokens: event.result.usage.completionTokens,
totalTokens: event.result.usage.totalTokens,
}
});
},
onResult: (event) => {
trace.update({
output: {
classification: event.result.classification,
reasoning: event.result.reasoning,
violations: event.result.violation_types,
cweCodes: event.result.cwe_codes,
},
usage: {
promptTokens: event.result.usage.promptTokens,
completionTokens: event.result.usage.completionTokens,
totalTokens: event.result.usage.totalTokens,
},
metadata: {
durationMs: event.durationMs,
segmentCount: event.segmentCount,
}
});
},
onError: (event) => {
trace.update({
level: 'ERROR',
statusMessage: event.error.message,
});
},
},
});
await langfuse.flushAsync();Python: from safety_agent import create_client, GuardHooks
from langfuse import Langfuse
langfuse = Langfuse(
public_key=os.environ["LANGFUSE_PUBLIC_KEY"],
secret_key=os.environ["LANGFUSE_SECRET_KEY"],
)
client = create_client(api_key=os.environ["SUPERAGENT_API_KEY"])
trace = langfuse.trace(name="safety-guard-check")
def on_start(event):
trace.update(metadata={
"model": event.model,
"input_type": event.input_type,
"input_size": event.input_size,
"segment_count": event.segment_count,
})
def on_segment(event):
trace.span(
name=f"guard-segment-{event.kind}",
input={"size": event.segment_size, "units": event.segment_units},
output={
"classification": event.result.classification,
"violations": event.result.violation_types,
},
metadata={
"index": event.index,
"total_count": event.count,
"duration_ms": event.duration_ms,
},
usage={
"prompt_tokens": event.result.usage.prompt_tokens,
"completion_tokens": event.result.usage.completion_tokens,
"total_tokens": event.result.usage.total_tokens,
}
)
def on_result(event):
trace.update(
output={
"classification": event.result.classification,
"reasoning": event.result.reasoning,
"violations": event.result.violation_types,
"cwe_codes": event.result.cwe_codes,
},
usage={
"prompt_tokens": event.result.usage.prompt_tokens,
"completion_tokens": event.result.usage.completion_tokens,
"total_tokens": event.result.usage.total_tokens,
},
metadata={
"duration_ms": event.duration_ms,
"segment_count": event.segment_count,
}
)
hooks = GuardHooks(
on_start=on_start,
on_segment=on_segment,
on_result=on_result,
)
result = await client.guard(
input="user message to analyze",
model="openai/gpt-4o-mini",
hooks=hooks,
)
langfuse.flush()Key Benefits
Integration PatternsThe hooks support multiple integration patterns:
The design is intentionally minimal and flexible - you receive structured event objects and can transform/send them however you need for your observability stack. |
|
@homanp The hooks are provider agnostic and don’t hardcode any observability vendor. It emits structured lifecycle events (onStart, onSegment, onResult, onError) with timing + usage + classifications. |
I was thinking one could hook it into OpenTelemetry calls that already exists into upstream apps. But perhaps this is a better way to just keep it agnostic. A good start. Will test it out shortly! |
Description
Adds observability hooks to the Safety Agent guard flow in both TS and Python SDKs (start/segment/result/error), with docs and tests.
Type of Change
-[ ] Bug fix
-[x] New feature
-[ ] Breaking change
-[x] Documentation update
Testing
cd sdk/typescript && npm test (failed: missing provider API keys)
cd sdk/python && PYTHONPATH=./src python -m pytest (failed: missing provider API keys; hook tests pass)
Checklist
-[x] Code follows project style guidelines
-[ ] Tests pass locally
-[x] Documentation updated (if needed)