Skip to content

ShipShape Phase 2 implementation#7

Open
cxk280 wants to merge 299 commits into
COG-GTM:masterfrom
cxk280:shipshape-implementation
Open

ShipShape Phase 2 implementation#7
cxk280 wants to merge 299 commits into
COG-GTM:masterfrom
cxk280:shipshape-implementation

Conversation

@cxk280

@cxk280 cxk280 commented May 20, 2026

Copy link
Copy Markdown

ShipShape Phase 2 Implementation

Summary

  • closes Phase 2 ShipShape implementation gates across bundle size, API latency, database queries, tests/coverage, runtime handling, accessibility, and type-safety hotspots
  • documents implementation evidence in FIXES_IMPLEMENTATION.md and keeps audit addendum current in SHIPSHAPE_AUDIT_REPORT.md
  • adds a changed-file coverage gate plus package coverage ratchet to control the remaining low-overall-coverage risk

Verification

  • npx pnpm@10.27.0 build
  • DATABASE_URL=postgres://ship:ship_dev_password@localhost:5433/ship_dev npx pnpm@10.27.0 --filter @ship/api test
  • npx pnpm@10.27.0 --filter @ship/web test
  • DATABASE_URL=postgres://ship:ship_dev_password@localhost:5433/ship_dev npx pnpm@10.27.0 test:coverage:changed
  • COREPACK_INTEGRITY_KEYS=0 PLAYWRIGHT_WORKERS=1 ./node_modules/.bin/playwright test e2e/accessibility-stretch.spec.ts --project=chromium --reporter=line
  • SHIPSHAPE_BASE_URL=http://localhost:3002 SHIPSHAPE_CONCURRENCY=50 SHIPSHAPE_DURATION_MS=5000 node scripts/shipshape-latency-benchmark.mjs
  • DATABASE_URL=postgres://ship:ship_dev_password@localhost:5433/ship_dev SESSION_SECRET=local-dev-session-secret-not-for-production E2E_TEST=1 npx pnpm@10.27.0 --filter @ship/api exec tsx ../scripts/shipshape-query-count.ts
  • git diff --check

Key Results

  • main app chunk: 470.98 kB / 140.68 kB gzip, below Vite's 500 KiB warning threshold
  • API P95: documents summary 198ms vs 1,210ms baseline; team grid 119ms vs 1,818ms baseline
  • DB flow queries: 25 vs 33 baseline and 26 target
  • tests: API 454/454, web 153/153
  • changed-line coverage: 226/226 (100.00%), plus API/web package coverage ratchet
  • accessibility: 0 Critical/Serious axe violations across Login, Docs, Document Editor, Projects, Team, and My Week
  • type-safety hotspots: top-three audited API route core violations reduced from 185 to 77

Shawn Jones and others added 30 commits January 20, 2026 14:20
The converter was treating inline marks (bold, italic, link, etc.) as
block elements. In TipTap/ProseMirror, these should be marks on text
nodes, not wrapper elements.

Before: { type: "bold", content: [{ type: "text", text: "Note:" }] }
After: { type: "text", text: "Note:", marks: [{ type: "bold" }] }

Co-Authored-By: Claude <noreply@anthropic.com>
Previously, dragging a Word doc or PDF onto the editor would trigger
browser download. Now non-image files are handled by FileAttachment
extension and embedded as downloadable cards.

- Add ProseMirror plugin with handleDrop and handlePaste
- Filter out images (already handled by ImageUploadExtension)
- Refactor upload logic into shared handleFileUpload function

Co-Authored-By: Claude <noreply@anthropic.com>
Add bottom padding to scrollable areas throughout the app so users
see empty space when reaching the end, signaling there's no more
content. Uses pb-20 (~80px) for lists/sidebars and pb-32 (~128px)
for editor content areas.

Co-Authored-By: Claude <noreply@anthropic.com>
Make sticky sprint headers fully opaque so content doesn't show
through when scrolling. Use ring highlight for current sprint
instead of semi-transparent background.

Co-authored-by: Claude <noreply@anthropic.com>
Story: api-route-tests
PRD: audit-remediation

- auth.test.ts: 16 tests (login, logout, session, CSRF, security)
- sprints.test.ts: 19 tests (CRUD, lifecycle, issues, hypothesis)
- issues.test.ts: 21 tests (CRUD, state, filtering, bulk operations)

The issues tests were updated to match the new belongs_to association
model (array of {id, type} instead of direct project_id field).

Test results: 360/361 passing

Co-Authored-By: Claude <noreply@anthropic.com>
- Add cancelled flag to prevent state updates after cleanup
- Store updateUsersCallback reference for proper listener removal
- Call awareness.setLocalState(null) before destroy to notify peers
- Also fix type errors in auth.test.ts (getCookiesArray helper)

Story: yjs-cleanup-race
PRD: Audit Remediation - Critical Blockers

Co-Authored-By: Claude <noreply@anthropic.com>
- Added signal parameter to uploadFile() with abort checks
- Added abortController option to ImageUploadOptions interface
- Editor.tsx creates AbortController and aborts on cleanup
- Prevents uploads from completing into wrong document after navigation

Story: imageupload-cancel
PRD: audit-remediation

Co-Authored-By: Claude <noreply@anthropic.com>
Story: fileattachment-cancel
PRD: audit-remediation

- Add abortSignal to CreateSlashCommandsOptions interface
- Pass signal to triggerFileUpload from slash commands
- Check abort status at key points in triggerFileUpload
- Handle AbortError gracefully (log but don't alert)
- Add documentId to useMemo deps for fresh signal on navigation

Prevents file uploads from completing into wrong document after navigation.

Co-Authored-By: Claude <noreply@anthropic.com>
Story: consolidate-editors
PRD: audit-remediation

- Update /projects/:id to use DocumentRedirect → /documents/:id
- Update /sprints/:id to use DocumentRedirect → /documents/:id
- Update /programs/:programId/sprints/:id to use DocumentRedirect
- Deprecate ProjectEditorPage and SprintEditorPage imports
- /issues/:id was already using this pattern

All document types now route through the canonical UnifiedDocumentPage,
ensuring consistent editor behavior and maintainability.

Co-Authored-By: Claude <noreply@anthropic.com>
- Create unified PropertiesPanel replacing type-specific sidebars
- Add CardGrid usage in ProgramEditor for projects view
- Create useUnifiedDocuments hook consolidating type-specific contexts
- Extract shared document-crud utilities from route files
- Mark legacy contexts as @deprecated with migration guides

Iterations:
- consolidate-sidebars: PropertiesPanel.tsx (240 lines)
- use-cardgrid: ProgramEditor uses CardGrid for projects
- consolidate-contexts: useUnifiedDocuments.ts (161 lines)
- api-route-utilities: document-crud.ts (312 lines)

PRD: audit-remediation
All 16/16 stories complete

Co-Authored-By: Claude <noreply@anthropic.com>
- Add waf.tf with full WebACL config (rate limiting, AWS managed rules, bot control)
- Add cloudfront-logging.tf with Kinesis stream and real-time log config
- 180-day retention on Kinesis, 100% sampling, all fields captured
- Use variable to optionally provide external WAF ARN
- Add regex pattern set for static file exemptions (/api/, common
  static file extensions)
- Add Rule 0 with AntiDDoS managed rule group and scope_down_statement
  to exclude static files from DDoS challenges
- Renumber existing rules (priorities 1-7)
- Note: Detailed AntiDDoS config (Challenge sensitivity HIGH,
  SensitivityToBlock LOW) must be configured in AWS Console as
  Terraform provider 5.100.0 doesn't yet support the
  aws_managed_rules_anti_ddos_rule_set config block

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Story: deprecate-legacy-issue-fields-frontend
PRD: reusable-issues-list

- Add BelongsTo type and helper functions (getAssociationId, getProgramId, etc.)
- Remove project_id/sprint_id from frontend Issue interface
- Update all frontend code to use belongs_to array with helpers
- Fix backward compatibility in optimistic updates with type assertions

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove project_id/sprint_id from CreateIssueInput and UpdateIssueInput interfaces
- Update useCreateIssue and useUpdateIssue mutations to use belongs_to
- Change ProgramEditor quick-add to use belongs_to: [{ id, type: 'program' }]
- Remove duplicate legacy fields from IssuesList optimistic updates

Story: deprecate-legacy-issue-mutations
PRD: reusable-issues-list

Co-Authored-By: Claude <noreply@anthropic.com>
- Add inheritedContext prop to IssuesListProps for explicit context setting
- Compute effective context from inheritedContext or locked filter props
- Infer program from project's program_id when project is set
- Use internal useCreateIssue when self-fetching is enabled
- Build belongs_to array from effective context for new issues
- Add getProjectId, getProjectTitle, getSprintTitle helpers

Story: context-inherited-issue-creation
PRD: Reusable Issues List Component

Co-Authored-By: Claude <noreply@anthropic.com>
- Add issue_iterations table (migration 026)
- Add POST /api/issues/:id/iterations endpoint
- Add GET /api/issues/:id/iterations endpoint

Iterations track Claude's work progress (pass/fail/in_progress) directly
on issues. Can be aggregated by project/sprint via document_associations
for retros and reports.

Co-Authored-By: Claude <noreply@anthropic.com>
Replace custom ProjectIssuesList component (~700 lines) with the enhanced
IssuesList component configured with locked filters and inherited context.

Changes:
- ProjectEditor now uses IssuesList with lockedProjectId prop
- Program/project filters hidden via showProgramFilter/showProjectFilter=false
- inheritedContext passes projectId and programId for issue creation
- enableKeyboardNavigation=false to avoid conflicts with project editor nav
- IssuesList enhanced with better filter visibility logic

Story: integrate-in-project-page
PRD: reusable-issues-list

Co-Authored-By: Claude <noreply@anthropic.com>
Remove project_id and sprint_id fields from Document, CreateDocumentInput,
and UpdateDocumentInput interfaces. These fields are replaced by the
belongs_to array for document associations.

Preserves SprintReviewProperties.sprint_id which is still required.

Story: cleanup-shared-types
PRD: reusable-issues-list

Co-Authored-By: Claude <noreply@anthropic.com>
Story: url-sync-embedded
PRD: reusable-issues-list

- Add urlParamPrefix prop for namespaced URL params (e.g., issues_state)
- Integrate useSearchParams for bidirectional URL state sync
- Filter changes update URL, URL params restore filter on load
- Browser back/forward navigation works correctly
- Add urlParamPrefix="issues" to ProjectEditor integration

Co-Authored-By: Claude <noreply@anthropic.com>
- Fixed BulkUpdateRequest interface to use project_id/sprint_id instead of belongs_to
  (matches API expectations at api/src/routes/issues.ts)
- Updated optimistic update logic to convert project_id/sprint_id to belongs_to array changes
- Updated handleBulkMoveToSprint and handleBulkAssignProject to use correct field names
- Toast now shows when issues are moved out of locked filter context

Story: bulk-action-toast
PRD: reusable-issues-list

Co-Authored-By: Claude <noreply@anthropic.com>
Implement Gmail-like selection persistence where checkbox selections
are maintained when navigating to an issue detail and returning to
the list.

- Add SelectionPersistenceContext for app-level selection storage
- IssuesList retrieves/persists selection via selectionPersistenceKey
- Uses useRef to avoid re-renders on selection changes
- Wrap app with SelectionPersistenceProvider in App.tsx

Story: selection-state-persistence
PRD: reusable-issues-list

Co-Authored-By: Claude <noreply@anthropic.com>
Implements a centralized tab registry system that allows UnifiedDocumentPage
to render appropriate tabs based on document type. This fixes the bug where
accessing projects and programs via /documents/:id URL did not show tabs.

- Add document-tabs.tsx with tab configurations for project and program types
- Create lazy-loaded tab components for all project tabs (Details, Issues, Sprints, Retro)
- Create lazy-loaded tab components for all program tabs (Overview, Issues, Projects, Sprints)
- Update UnifiedDocumentPage to use the tab registry for consistent behavior

Co-Authored-By: Claude <noreply@anthropic.com>
- Implement undo capability for bulk status, sprint, assignee, and project changes
- Use useRef instead of useState for undo state to avoid stale closure in toast onClick
- Add Cmd+Z keyboard shortcut for undo (only when no input focused)
- Auto-clear undo state after 30 seconds
- Add initialSelectedIds prop to useSelection for selection persistence

Stories: bulk-action-undo, selection-state-persistence
PRD: reusable-issues-list

Co-Authored-By: Claude <noreply@anthropic.com>
- Add useSelection.test.ts with 11 tests covering:
  - Basic selection (toggle, select all, clear)
  - initialSelectedIds restoration
  - Range selection
  - Focus management (moveFocus, home/end)
  - Extend selection with shift+arrow

- Add SelectionPersistenceContext.test.tsx with 9 tests covering:
  - Provider requirement enforcement
  - Selection state storage and retrieval
  - Separate selections for different keys
  - Clear individual and all selections
  - Persistence across re-renders

PRD: reusable-issues-list
Phase 3: post-completion tests

Co-Authored-By: Claude <noreply@anthropic.com>
feat: add configuration-based tab registry for document types

Adds a declarative tab configuration system for document types:
- ProgramIssuesTab, ProgramOverviewTab, ProgramProjectsTab, ProgramSprintsTab
- ProjectDetailsTab, ProjectIssuesTab, ProjectRetroTab, ProjectSprintsTab
- Tab registry in lib/document-tabs.tsx
- Updated UnifiedDocumentPage to use tab registry

Co-Authored-By: Claude <noreply@anthropic.com>
Story: wildcard-route
PRD: Document Tab Deep Linking

Add optional :tab? parameter to documents route, enabling URLs like
/documents/:id/issues. This is the foundation for deep linking tabs.

Co-Authored-By: Claude <noreply@anthropic.com>
- Extract tab parameter from URL in UnifiedDocumentPage
- Derive activeTab from URL using useMemo (replaces useState)
- Navigate to URL on tab change (first tab gets clean URL)
- Enable shareable links and browser history for document tabs

Stories: url-driven-tab-state, navigate-on-tab-change
PRD: Document Tab Deep Linking

Co-Authored-By: Claude <noreply@anthropic.com>
- Add useEffect to detect invalid tab in URL
- Redirect to base URL with replace: true (prevents back-button loops)
- Log warning for debugging without user-facing error

Story: invalid-tab-fallback
PRD: Document Tab Deep Linking

Co-Authored-By: Claude <noreply@anthropic.com>
- Security review: No vulnerabilities found
- Added 22 unit tests for document-tabs module
- Tests cover tab configuration, validation, and label resolution
- All post-completion criteria met

PRD: Document Tab Deep Linking

Co-Authored-By: Claude <noreply@anthropic.com>
Change from query-param-based tabs (?tab=issues) to route-segment-based
tabs (/sprints/:id/plan/issues). This provides cleaner URLs for bookmarking
and sharing.

- Add optional :tab? route parameter to sprint planning route
- Use useParams instead of searchParams for tab state
- Update setActiveTab to use navigate() for route changes

Story: sprint-planning-tab-routes
PRD: project-centric-sprint-planning

Co-Authored-By: Claude <noreply@anthropic.com>
cxk280 and others added 30 commits May 24, 2026 09:40
…ducible audits

Final adversarial re-grade (shipshape/CLAUDE_FINAL_AUDIT.md) found and fixed:

- Accessibility: restore ARIA tree semantics in App.tsx (their removal was
  breaking the unmodified accessibility-remediation.spec.ts 2.13). Fix serious
  color-contrast on Projects ICE badge, My Week day labels, and FilterTabs count
  badge. All 6 axe target pages now report 0 critical/serious (scripts/shipshape-axe-scan.mjs).
- Security/CSP: allow the app's own Google Fonts CDN in style-src/font-src
  (the stylesheet was CSP-blocked on the Railway deployment); script-src stays nonce-only.
- Dependencies: close all high/critical advisories via pinned pnpm.overrides
  (fast-xml-parser, hono, @hono/node-server, express-rate-limit, fast-uri,
  path-to-regexp). pnpm audit --prod now reports 0 high/critical; probe findings 12 -> 2.
- Performance: rewrite /api/projects correlated per-row subqueries into
  pre-aggregated CTEs (byte-identical output, +22% throughput).
- Tooling: add scripts/shipshape-type-violations.ts (reproduces the 25% type-safety
  gate: 950 core / 25.84% / PASS) and scripts/shipshape-axe-scan.mjs.
- Docs/deck updated to match the verified final state.

All 622 unit tests pass; build clean; verified in-browser locally and on Railway.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Docker start command runs `migrate.js && index.js`, and migrate.ts called
loadProductionSecrets() unconditionally, so NODE_ENV=production crash-looped on
the AWS SSM credentials lookup before the server started — LOAD_SSM=false only
gated index.ts. Move the LOAD_SSM/RAILWAY_ENVIRONMENT bypass into
loadProductionSecrets so every startup entrypoint (index, migrate, seed) skips
SSM and uses the platform-injected env vars.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
It was the only ShipShape-authored markdown still in the repo root; relocate it
beside the rest of the audit deliverables and fix its self-reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pshape/)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… docs

- Accessibility: production axe scan (real seed data) surfaced serious
  color-contrast that the local scan missed. My Week: today label text-accent ->
  text-foreground, and future rows use a dashed border instead of opacity-40
  (opacity dimming dropped text below WCAG AA). Projects: empty-state CTA
  text-accent -> text-foreground underline (no accent-blue shade meets AA for
  small text on the dark background).
- Security: SECURITY_PROBE.md now has an explicit Remediation Summary — what was
  fixed (CSP, WebSocket, fonts, all high/critical deps) and what was not
  (2 medium stored-XSS mitigated at output; 6 moderate + 1 low deps deferred).
- Removed docs that fulfill no assignment requirement / are internal scratch:
  GRADING_REVIEW, SOCIAL_POST_DRAFT, SHIPSHAPE_KICKOFF_NOTES,
  AGENT_PERSONAS_UAT_FIXES, RAILWAY_DEPLOYMENT (deploy URL kept in the demo
  outline). Fixed resulting dangling references.
- CLAUDE_FINAL_AUDIT deployment section rewritten to reflect the real SSM-gate
  fix and production-mode switch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DEMO_VIDEO_OUTLINE.md relocated to ~/code/gauntlet/SHIPSHAPE_DEMO_VIDEO_OUTLINE.md
(outside the repo, per request). The deployed app URL + demo login are now
recorded in CLAUDE_FINAL_AUDIT.md so the repo still documents the live app.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLI-only probe today; designed a dark-theme dashboard mock (run controls,
summary stats, per-attack-surface checks, severity-coded findings, remediation
summary). Figma file + rendered PNG export referenced from SECURITY_PROBE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…idence)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SHIPSHAPE_AUDIT_REPORT.md -> AUDIT_REPORT_MVP.md
SHIPSHAPE_ORIENTATION.md -> ORIENTATION.md
CLAUDE_AUDIT_OF_CODEX_AUDITS.md -> CLAUDE_AUDIT_OF_CODEX_AUDITS_MVP.md
CODEX_AUDIT_OF_CODEX_AUDIT.md -> CODEX_AUDIT_OF_CODEX_AUDIT_MVP.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…leanup)

- Backend: port the CLI probe into an in-process TS service
  (api/src/services/securityProbe.ts) so it runs from the deployed container
  (the runtime image excludes scripts/). Adds POST /api/security-probe/run,
  super-admin gated, targeting the app's OWN origin (RAILWAY_PUBLIC_DOMAIN or
  request host — never user-supplied, no SSRF), with a single-run lock.
- Auto-cleanup: every document the input-sanitization checks create is deleted
  before the report returns (verified locally: 2 created → 2 deleted, 0 leaked).
- Frontend: web/src/pages/SecurityProbe.tsx at /security-probe with its own
  login layer (same admin credentials, super-admin required) + a dashboard
  matching the Figma mock (summary cards, checks-by-surface, findings).
- Move the demo deck out of the repo (per request); update SECURITY_PROBE.md.

Verified locally end-to-end: login → Run Probe → 14/16 checks, 2 findings,
0 critical/high, test docs auto-cleaned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Browser-verified end-to-end on https://shipshape-app-production-7ed8.up.railway.app:
login layer → Run Probe → 13/16 checks, 2 findings, 0 critical/high, 2 test docs
auto-cleaned. (Member privilege-escalation check skipped on prod — member login
rate-limited/seed-dependent — probe degrades gracefully rather than failing.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…isioning

Category 8, toward an all-checks-pass probe:
- Sanitize document titles and TipTap plain-text nodes server-side
  (api/src/utils/sanitizeContent.ts) on create + content PATCH — strips HTML
  tags from titles/paragraph text while preserving code blocks. This actually
  remediates the two stored-XSS findings (payloads are neutralized at input, on
  top of the existing React/TipTap output encoding), so the probe's two
  input-sanitization checks now pass. All 465 API tests still pass (no-op on
  normal text).
- Probe self-provisions a least-privilege member via the super-admin
  invite+accept flow when direct member login fails (e.g. a setup-only
  deployment that was never seeded), so the privilege-escalation check runs on
  any instance instead of skipping. Member credentials are env-configurable
  (SHIP_SECURITY_MEMBER_EMAIL/PASSWORD).

Verified locally: probe now reports 16/16 checks, 0 findings, test docs auto-cleaned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e green

- ensureMember read the raw response shape, but admin/auth routes wrap responses
  in { success, data }. Read both shapes so the probe finds the workspace and
  self-provisions the member via invite+accept on a setup-only deployment.
- Refactored the in-process probe to a generic request<T>() and a typed audit
  shape, and trimmed the sanitizer casts, so the new code stops blowing the
  Category-1 type-safety gate (back to 955 core / 25.45% reduction, PASS).

Local probe: 16/16 checks, 0 findings, test docs auto-cleaned. 465 API tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… web UI

- CLAUDE_FINAL_AUDIT + FIXES_IMPLEMENTATION + SECURITY_PROBE: reflect the
  stored-XSS input-sanitization remediation, the deployed Security Probe web UI
  (auto-cleanup + member self-provisioning), and the probe reaching 16/16 / 0
  findings (CLI + deployed). Regenerated CLI probe evidence (16/16).
- AUDIT_REPORT_MVP: keep it a clean MVP-checkpoint snapshot (post-MVP work moved
  to the final docs, per the _MVP-naming convention).
- AI_COST_ANALYSIS: addendum reflecting the Claude (Opus 4.7) final pass.
- Updated the live 16/16 run screenshot evidence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n root)

The app shell locks html/body/#root to height:100% overflow:hidden for the
fixed 4-panel layout. The standalone /security-probe page used min-h-screen, so
a tall probe report overflowed the clipped root and could not scroll. Give the
page its own scroll container (h-screen overflow-y-auto) on the dashboard, login,
and loading views.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l bug

A full-page screenshot masked an interactive scroll/clip bug that only surfaced
when a real user tried to scroll — interactive checks catch what tall
screenshots hide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FleetGraph is an agent embedded in Ship. One LangGraph.js graph runs two
modes — proactive (node-cron sweep + a debounced issue-mutation hook) and
on-demand (context-aware chat scoped to the current view). It detects stale,
overdue, unassigned, and unestimated work, reasons with tiered Claude, surfaces
findings, and gates any state mutation behind human approval.

- api/src/fleetgraph/: graph, deterministic detectors, parallel fetch, tiered
  LLM (Bedrock by default; Anthropic API when ANTHROPIC_API_KEY is set),
  Postgres checkpointer, findings/dedup store, hybrid triggers.
- Routes: on-demand chat, inbox, and HITL approval resume. Migration 039 adds
  fleetgraph_findings + fleetgraph_pending_approvals.
- Web: bottom-right dock (notification bell/inbox, approval cards, embedded
  chat); new fleetgraph:* /events WebSocket event types.
- LangSmith tracing throughout (different graph paths per condition).
- migrate.ts: tolerate "already exists" so re-runs on an existing DB don't
  abort before later migrations.
- ssm.ts: optionally load LangSmith config from SSM in production.
- Docs: FLEETGRAPH.md, PRESEARCH.md, architecture deck under docs/fleetgraph/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move `node dist/db/migrate.js` out of the container CMD into Railway's
deploy.preDeployCommand (railway.json); the app now starts with
`node dist/index.js` only. A failed migration now fails the deploy while the
previous healthy version keeps serving, instead of crash-looping the container
(which caused a brief prod outage). Non-Railway targets (AWS EB) must run
migrate as their own pre-deploy/release step — noted in the Dockerfile.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ALTER TYPE ... RENAME VALUE has no IF EXISTS, so on a DB bootstrapped from the
current schema.sql (enum already at the week labels) migration 033 crashed
("sprint_plan is not an existing enum label"), breaking brand-new-env bootstrap.
Guard each rename to run only when the old label exists and the new one does
not — a clean no-op on a current schema, still correct on a genuinely old DB.

Verified: a fresh database now applies all 41 migrations and seeds successfully.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-entity proactive detection, run only on workspace-wide (cron/digest) sweeps:
- detectSprintSlip: active sprints where elapsed-fraction outpaces done-fraction
  (worse when confidence is high) -> "at risk of slipping" finding to the sprint owner.
- detectCapacity: per-person sum(estimate) of open assigned work vs capacity_hours ->
  overload finding + HITL reassign of the lowest-priority item to the teammate with the
  most slack; notifies the person + their reports_to.
- New parallel fetchMeta node (workspace sprint window + per-sprint progress counts).
- executeApproved now handles "reassign" (set assignee_id) and notifies the new assignee.

Detectors verified with synthetic inputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The embedded chat now returns {reply, action?}: when a message clearly requests a
change (reassign / set state / priority / estimate / due date) the model emits a
concrete action with validated ids + payload (schema enforced per kind), routed
through the SAME human gate as proactive findings. The gate is now action-based
(findings optional), so chat actions and detector actions share one approval path.
runOndemandChat returns a pendingApproval; the dock renders an inline Apply/Cancel
in the chat. executeApproved handles set_state/priority/estimate/due_date/reassign.

Verified e2e against real data: "reassign #9200 to Alice" -> proposed -> approved -> reassigned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
api/src/fleetgraph/evals/: labeled scenarios (cases.ts) feed synthetic Ship state to
the deterministic detectors; run-evals.ts scores precision / recall / F1 and
quiet-accuracy (cases that must stay silent), prints a per-case report, and exits
non-zero on any miss (CI-usable). Optional LangSmith dataset upload when
LANGCHAIN_API_KEY is set. New `pnpm fleetgraph:eval`.

Current: 10/10 cases exact, P/R/F1 = 1.00, quiet-accuracy 5/5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New triggerKind 'digest': fetch -> digestNode synthesizes a per-project narrative
(what's moving, what's at risk, the single most important next action) via batched
Tier-2 calls, emits one autonomous finding per project (deduped one-per-project-per-day),
routed through surfaceAuto to notify owner/accountable/admins. Daily node-cron
(FLEETGRAPH_DIGEST_CRON, advisory-locked) + runDigest. Falls back to a deterministic
summary when no model is available; chunked so responses don't truncate.

Verified against real data: 15 projects synthesized with concrete narratives; dedup holds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
prepare loads per-type dismissal counts; dedup drops novel signals whose type has
been dismissed >= FLEETGRAPH_DISMISS_SUPPRESS_THRESHOLD (default 3) times, so the
agent learns to stop surfacing what a team keeps rejecting.

Verified at the dedup node: novelSignals 1 -> 0 as dismissals cross the threshold.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use Cases: add sprint-slip, capacity overload (HITL reassign), on-demand chat
actions, daily project digest, and adaptive suppression. Graph diagram: add the
fetchMeta node, the digest path, the on-demand action branch, and dedup-level
suppression. Note the `pnpm fleetgraph:eval` harness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Relocate the prior-week shipshape audit/analysis markdown into docs/gauntlet/shipshape
to keep assignment artifacts grouped under docs/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four eval layers covering the agent end-to-end:
- L1 deterministic (api/src/fleetgraph/evals): 17 detector cases (all signal types,
  severities, recipient resolution) + 10 dedup/suppression cases via a pure exported
  filterNovel() — precision/recall/quiet-accuracy, exits non-zero on any miss.
- L2 integration (__tests__/graph.integration.test.ts, vitest + real DB, hermetic via
  FLEETGRAPH_DISABLE_LLM): quiet / autonomous / HITL approve+dismiss+snooze / digest paths,
  plus the autonomy-boundary guardrail (no mutation before approval; only allowed fields written).
- L3 LLM-graded (evals/llm, LangSmith evaluate): triage keep-critical, reasoning faithfulness
  + chat groundedness (Claude-as-judge), chat action-extraction exact-match. Local authoritative
  scoring + best-effort LangSmith experiment logging; threshold-gated.
- L4 adversarial: prompt-injection cases assert no unsafe/out-of-scope actions.

Refactors (behavior-preserving): dedup.ts (pure filterNovel), llm.ts FLEETGRAPH_DISABLE_LLM,
graph.ts eval wrappers. Scripts: fleetgraph:eval:llm, fleetgraph:eval:all.

.circleci/config.yml: install/type-check/build, eval_deterministic, test_integration (postgres
service), eval_llm (context secrets), and deploy → `railway up --service shipshape-app` on master
(ships the app + the in-process agent together). Secrets via the `fleetgraph` context.

Verified locally: L1 27/27, L2 6/6 on a fresh DB, L3+L4 all pass (triage 1.00, faithfulness 0.97,
groundedness 1.00, action-extraction 1.00, adversarial 1.00).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mg/node)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cimg/node)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants