You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
NathanMaine-Labs
Senior TPM who builds. 13+ years enterprise delivery. Agentic AI, compliance automation, conversational AI. 48+ repos.
Senior Technical Program Manager | AI Systems Builder | 12+ Years Enterprise Delivery
I build the AI systems I've spent a career learning to manage — compliance LLMs, governed inference gateways, evaluation harnesses for autonomous agents, and patent-pending token governance infrastructure. 12 years enterprise delivery across identity platforms (700K users), data unification (89M records, 95.48% match rates), and $20M+ multi-cloud programs.
CMMC Compliance AI Platform (v1.5.1)
The only CMMC-specific fine-tuned LLM suite in the open-source ecosystem. Four models (7B–72B) trained for $77 total compute, deployed fully air-gapped via Ollama. 708 tests across three tiers — including 140 blind holdout scenarios that caught 3 real security bugs the 568 internal tests missed. 27 CMMC controls across 5 families (AC, AU, IA, SC, SI) plus 3 DFARS clauses.
Commercial AI gateways log to editable databases — useless for compliance audits
Every AI request passes through an 11-step pipeline that logs who asked what, blocks sensitive data, and chains each entry to the previous one using SHA-256 hashes — any tampering breaks the chain
CMMC compliance consultants cost $125K–$250K and no open-source AI alternative exists
Four AI models fine-tuned on government compliance documents, sized from fast lookups (7B) to deep multi-framework analysis (72B), running entirely on local hardware with zero cloud dependency
Standard AI vulnerability scanners don't test whether models leak regulated data like CUI, HIPAA, or DFARS content
Four custom probes and six detectors for NVIDIA's garak scanner that specifically try to trick compliance models into revealing controlled information. PR #1619 — 20 files, 1,599 lines. Dev fork
Compliance policies written in documents are hard to trace and enforce with software
Reads policy Markdown files and converts them into a connected graph where each requirement links to its evidence artifacts and enforcement points
AI Agent Evaluation & Dark Factory Testing
140 black-box behavioral scenarios in a physically separate holdout repository — the AI that builds the platform never sees the tests. An agent cannot game what it cannot see. The first sweep caught 3 real security bugs that passed all 568 internal tests: broken MFA setup (High), missing X-Content-Type-Options header (Medium), and missing X-Frame-Options header (Medium). This architecture independently converged with StrongDM's Software Factory pattern (published February 2026). The Agentic Evaluation Sandbox was created December 2025, predating their publication.
568 internal tests all passed but 3 real security bugs shipped — visible tests get gamed by AI agents
140 black-box HTTP scenarios in a separate repo the AI never sees. Docker digital twin with mock Ollama (no GPU). Covers auth, RBAC, PII/CUI blocking, prompt injection, audit integrity, security headers, and 8 CMMC policy rules. First sweep caught 3 bugs — all fixed in v1.5.1
AI agents can learn to game their own tests when the tests live inside the codebase
The original Dark Factory framework (December 2025). Defines four evaluation roles — Doer, Judge, Adversary, Observer — with holdout scenarios and probabilistic satisfaction scoring instead of simple pass/fail
Voice assistants and NLU classifiers break under noisy or unusual input — failures need to be found before users find them
Injects noise, varied phrasing, and edge cases into voice endpoints, classifies each response into three outcome states, and generates a robustness report with evidence
Writing load test plans for steady traffic, burst traffic, and endurance runs is repetitive and error-prone
Takes a service profile and SLO targets as input, then generates ready-to-run test configurations with pass/fail thresholds derived directly from the SLOs
Line coverage numbers don't tell you whether every function's actual behavior is tested
Walks the code's syntax tree to find every function, matches each one against existing test files, and generates skeleton tests for anything that's untested
Agent Infrastructure
Purpose-built components for agent memory, recovery, planning, and coordination. Deterministic and auditable — identical inputs always produce identical outputs.
AI agents forget what happened in previous tasks — they have no persistent, queryable memory
Stores facts and relationships in a knowledge graph, retrieves relevant memories using similarity search, and can explain why it recalled something by tracing the graph path
When an AI agent fails mid-task, most systems just crash instead of recovering
Wraps agent tasks in retry logic with exponential backoff, fallback chains (try Plan B if Plan A fails), and circuit breakers that stop calling a broken service
When multiple agents compete for tasks, some get overloaded while others sit idle
Coordinates agents using weighted round-robin allocation with capacity constraints and a skew-ratio metric that detects and corrects workload imbalance
Architecture reviews are inconsistent — different reviewers check different things
Accepts YAML architecture briefs, runs them through an LLM, and generates structured reviews with risk assessments, open questions, and checklists. Includes stub mode for testing without an LLM
AI operations metrics are scattered across tools with no unified view
ETL pipeline that ingests KPI snapshots from multiple sources, aggregates them using suffix-based heuristics, and produces a single evidence-backed report
Meetings produce hours of audio — decisions and action items get lost because nobody re-listens
Captures audio in real time, transcribes with faster-whisper, identifies who said what via speaker diarization, and generates structured summaries through an LLM. v2.0 adds NVIDIA GPU acceleration
Salesforce AI agents can access org data without respecting field-level security or sharing rules — a compliance risk
Auto-discovers the org schema (objects, fields, relationships), enforces FLS and sharing rules, then runs safe actions (SOQL, Flow, Apex) within those boundaries
Modern AI is all neural nets — classic rule-based reasoning that can explain its conclusions is underrepresented
Common Lisp expert system that works backward from a goal, checking rules and facts until it can prove or disprove the goal, with certainty scores on every conclusion
Demonstrating practical symbolic AI with a real-world application and a complete dev environment
Rule-based car troubleshooting system with a forward chaining inference engine — starts from symptoms and fires rules until it reaches a diagnosis. Includes VS Code + SBCL setup
Holdout scenario evaluation harness for AI agents. Doer/Judge/Adversary/Observer roles, probabilistic satisfaction scoring, append-only JSONL audit trails with integrity hashes. Created Dec 2025.
Suite of 4 fine-tuned LLMs (7B/14B/32B/72B) for CMMC 2.0, NIST 800-171, NIST 800-53, HIPAA, and DFARS compliance. Air-gappable, runs on Ollama with zero cloud dependency.
Compliance-first LLM gateway with tamper-evident audit trails, policy-as-code enforcement, and compliance evidence export. Built for regulated industries.