ShipShape Phase 2 implementation#7
Open
cxk280 wants to merge 299 commits into
Open
Conversation
The converter was treating inline marks (bold, italic, link, etc.) as
block elements. In TipTap/ProseMirror, these should be marks on text
nodes, not wrapper elements.
Before: { type: "bold", content: [{ type: "text", text: "Note:" }] }
After: { type: "text", text: "Note:", marks: [{ type: "bold" }] }
Co-Authored-By: Claude <noreply@anthropic.com>
Previously, dragging a Word doc or PDF onto the editor would trigger browser download. Now non-image files are handled by FileAttachment extension and embedded as downloadable cards. - Add ProseMirror plugin with handleDrop and handlePaste - Filter out images (already handled by ImageUploadExtension) - Refactor upload logic into shared handleFileUpload function Co-Authored-By: Claude <noreply@anthropic.com>
Add bottom padding to scrollable areas throughout the app so users see empty space when reaching the end, signaling there's no more content. Uses pb-20 (~80px) for lists/sidebars and pb-32 (~128px) for editor content areas. Co-Authored-By: Claude <noreply@anthropic.com>
Make sticky sprint headers fully opaque so content doesn't show through when scrolling. Use ring highlight for current sprint instead of semi-transparent background. Co-authored-by: Claude <noreply@anthropic.com>
Story: api-route-tests
PRD: audit-remediation
- auth.test.ts: 16 tests (login, logout, session, CSRF, security)
- sprints.test.ts: 19 tests (CRUD, lifecycle, issues, hypothesis)
- issues.test.ts: 21 tests (CRUD, state, filtering, bulk operations)
The issues tests were updated to match the new belongs_to association
model (array of {id, type} instead of direct project_id field).
Test results: 360/361 passing
Co-Authored-By: Claude <noreply@anthropic.com>
- Add cancelled flag to prevent state updates after cleanup - Store updateUsersCallback reference for proper listener removal - Call awareness.setLocalState(null) before destroy to notify peers - Also fix type errors in auth.test.ts (getCookiesArray helper) Story: yjs-cleanup-race PRD: Audit Remediation - Critical Blockers Co-Authored-By: Claude <noreply@anthropic.com>
- Added signal parameter to uploadFile() with abort checks - Added abortController option to ImageUploadOptions interface - Editor.tsx creates AbortController and aborts on cleanup - Prevents uploads from completing into wrong document after navigation Story: imageupload-cancel PRD: audit-remediation Co-Authored-By: Claude <noreply@anthropic.com>
Story: fileattachment-cancel PRD: audit-remediation - Add abortSignal to CreateSlashCommandsOptions interface - Pass signal to triggerFileUpload from slash commands - Check abort status at key points in triggerFileUpload - Handle AbortError gracefully (log but don't alert) - Add documentId to useMemo deps for fresh signal on navigation Prevents file uploads from completing into wrong document after navigation. Co-Authored-By: Claude <noreply@anthropic.com>
Story: consolidate-editors PRD: audit-remediation - Update /projects/:id to use DocumentRedirect → /documents/:id - Update /sprints/:id to use DocumentRedirect → /documents/:id - Update /programs/:programId/sprints/:id to use DocumentRedirect - Deprecate ProjectEditorPage and SprintEditorPage imports - /issues/:id was already using this pattern All document types now route through the canonical UnifiedDocumentPage, ensuring consistent editor behavior and maintainability. Co-Authored-By: Claude <noreply@anthropic.com>
- Create unified PropertiesPanel replacing type-specific sidebars - Add CardGrid usage in ProgramEditor for projects view - Create useUnifiedDocuments hook consolidating type-specific contexts - Extract shared document-crud utilities from route files - Mark legacy contexts as @deprecated with migration guides Iterations: - consolidate-sidebars: PropertiesPanel.tsx (240 lines) - use-cardgrid: ProgramEditor uses CardGrid for projects - consolidate-contexts: useUnifiedDocuments.ts (161 lines) - api-route-utilities: document-crud.ts (312 lines) PRD: audit-remediation All 16/16 stories complete Co-Authored-By: Claude <noreply@anthropic.com>
- Add waf.tf with full WebACL config (rate limiting, AWS managed rules, bot control) - Add cloudfront-logging.tf with Kinesis stream and real-time log config - 180-day retention on Kinesis, 100% sampling, all fields captured - Use variable to optionally provide external WAF ARN
- Add regex pattern set for static file exemptions (/api/, common static file extensions) - Add Rule 0 with AntiDDoS managed rule group and scope_down_statement to exclude static files from DDoS challenges - Renumber existing rules (priorities 1-7) - Note: Detailed AntiDDoS config (Challenge sensitivity HIGH, SensitivityToBlock LOW) must be configured in AWS Console as Terraform provider 5.100.0 doesn't yet support the aws_managed_rules_anti_ddos_rule_set config block 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Story: deprecate-legacy-issue-fields-frontend PRD: reusable-issues-list - Add BelongsTo type and helper functions (getAssociationId, getProgramId, etc.) - Remove project_id/sprint_id from frontend Issue interface - Update all frontend code to use belongs_to array with helpers - Fix backward compatibility in optimistic updates with type assertions Co-Authored-By: Claude <noreply@anthropic.com>
- Remove project_id/sprint_id from CreateIssueInput and UpdateIssueInput interfaces
- Update useCreateIssue and useUpdateIssue mutations to use belongs_to
- Change ProgramEditor quick-add to use belongs_to: [{ id, type: 'program' }]
- Remove duplicate legacy fields from IssuesList optimistic updates
Story: deprecate-legacy-issue-mutations
PRD: reusable-issues-list
Co-Authored-By: Claude <noreply@anthropic.com>
- Add inheritedContext prop to IssuesListProps for explicit context setting - Compute effective context from inheritedContext or locked filter props - Infer program from project's program_id when project is set - Use internal useCreateIssue when self-fetching is enabled - Build belongs_to array from effective context for new issues - Add getProjectId, getProjectTitle, getSprintTitle helpers Story: context-inherited-issue-creation PRD: Reusable Issues List Component Co-Authored-By: Claude <noreply@anthropic.com>
- Add issue_iterations table (migration 026) - Add POST /api/issues/:id/iterations endpoint - Add GET /api/issues/:id/iterations endpoint Iterations track Claude's work progress (pass/fail/in_progress) directly on issues. Can be aggregated by project/sprint via document_associations for retros and reports. Co-Authored-By: Claude <noreply@anthropic.com>
Replace custom ProjectIssuesList component (~700 lines) with the enhanced IssuesList component configured with locked filters and inherited context. Changes: - ProjectEditor now uses IssuesList with lockedProjectId prop - Program/project filters hidden via showProgramFilter/showProjectFilter=false - inheritedContext passes projectId and programId for issue creation - enableKeyboardNavigation=false to avoid conflicts with project editor nav - IssuesList enhanced with better filter visibility logic Story: integrate-in-project-page PRD: reusable-issues-list Co-Authored-By: Claude <noreply@anthropic.com>
Remove project_id and sprint_id fields from Document, CreateDocumentInput, and UpdateDocumentInput interfaces. These fields are replaced by the belongs_to array for document associations. Preserves SprintReviewProperties.sprint_id which is still required. Story: cleanup-shared-types PRD: reusable-issues-list Co-Authored-By: Claude <noreply@anthropic.com>
Story: url-sync-embedded PRD: reusable-issues-list - Add urlParamPrefix prop for namespaced URL params (e.g., issues_state) - Integrate useSearchParams for bidirectional URL state sync - Filter changes update URL, URL params restore filter on load - Browser back/forward navigation works correctly - Add urlParamPrefix="issues" to ProjectEditor integration Co-Authored-By: Claude <noreply@anthropic.com>
- Fixed BulkUpdateRequest interface to use project_id/sprint_id instead of belongs_to (matches API expectations at api/src/routes/issues.ts) - Updated optimistic update logic to convert project_id/sprint_id to belongs_to array changes - Updated handleBulkMoveToSprint and handleBulkAssignProject to use correct field names - Toast now shows when issues are moved out of locked filter context Story: bulk-action-toast PRD: reusable-issues-list Co-Authored-By: Claude <noreply@anthropic.com>
Implement Gmail-like selection persistence where checkbox selections are maintained when navigating to an issue detail and returning to the list. - Add SelectionPersistenceContext for app-level selection storage - IssuesList retrieves/persists selection via selectionPersistenceKey - Uses useRef to avoid re-renders on selection changes - Wrap app with SelectionPersistenceProvider in App.tsx Story: selection-state-persistence PRD: reusable-issues-list Co-Authored-By: Claude <noreply@anthropic.com>
Implements a centralized tab registry system that allows UnifiedDocumentPage to render appropriate tabs based on document type. This fixes the bug where accessing projects and programs via /documents/:id URL did not show tabs. - Add document-tabs.tsx with tab configurations for project and program types - Create lazy-loaded tab components for all project tabs (Details, Issues, Sprints, Retro) - Create lazy-loaded tab components for all program tabs (Overview, Issues, Projects, Sprints) - Update UnifiedDocumentPage to use the tab registry for consistent behavior Co-Authored-By: Claude <noreply@anthropic.com>
- Implement undo capability for bulk status, sprint, assignee, and project changes - Use useRef instead of useState for undo state to avoid stale closure in toast onClick - Add Cmd+Z keyboard shortcut for undo (only when no input focused) - Auto-clear undo state after 30 seconds - Add initialSelectedIds prop to useSelection for selection persistence Stories: bulk-action-undo, selection-state-persistence PRD: reusable-issues-list Co-Authored-By: Claude <noreply@anthropic.com>
- Add useSelection.test.ts with 11 tests covering: - Basic selection (toggle, select all, clear) - initialSelectedIds restoration - Range selection - Focus management (moveFocus, home/end) - Extend selection with shift+arrow - Add SelectionPersistenceContext.test.tsx with 9 tests covering: - Provider requirement enforcement - Selection state storage and retrieval - Separate selections for different keys - Clear individual and all selections - Persistence across re-renders PRD: reusable-issues-list Phase 3: post-completion tests Co-Authored-By: Claude <noreply@anthropic.com>
feat: add configuration-based tab registry for document types Adds a declarative tab configuration system for document types: - ProgramIssuesTab, ProgramOverviewTab, ProgramProjectsTab, ProgramSprintsTab - ProjectDetailsTab, ProjectIssuesTab, ProjectRetroTab, ProjectSprintsTab - Tab registry in lib/document-tabs.tsx - Updated UnifiedDocumentPage to use tab registry Co-Authored-By: Claude <noreply@anthropic.com>
Story: wildcard-route PRD: Document Tab Deep Linking Add optional :tab? parameter to documents route, enabling URLs like /documents/:id/issues. This is the foundation for deep linking tabs. Co-Authored-By: Claude <noreply@anthropic.com>
- Extract tab parameter from URL in UnifiedDocumentPage - Derive activeTab from URL using useMemo (replaces useState) - Navigate to URL on tab change (first tab gets clean URL) - Enable shareable links and browser history for document tabs Stories: url-driven-tab-state, navigate-on-tab-change PRD: Document Tab Deep Linking Co-Authored-By: Claude <noreply@anthropic.com>
- Add useEffect to detect invalid tab in URL - Redirect to base URL with replace: true (prevents back-button loops) - Log warning for debugging without user-facing error Story: invalid-tab-fallback PRD: Document Tab Deep Linking Co-Authored-By: Claude <noreply@anthropic.com>
- Security review: No vulnerabilities found - Added 22 unit tests for document-tabs module - Tests cover tab configuration, validation, and label resolution - All post-completion criteria met PRD: Document Tab Deep Linking Co-Authored-By: Claude <noreply@anthropic.com>
Change from query-param-based tabs (?tab=issues) to route-segment-based tabs (/sprints/:id/plan/issues). This provides cleaner URLs for bookmarking and sharing. - Add optional :tab? route parameter to sprint planning route - Use useParams instead of searchParams for tab state - Update setActiveTab to use navigate() for route changes Story: sprint-planning-tab-routes PRD: project-centric-sprint-planning Co-Authored-By: Claude <noreply@anthropic.com>
…ducible audits Final adversarial re-grade (shipshape/CLAUDE_FINAL_AUDIT.md) found and fixed: - Accessibility: restore ARIA tree semantics in App.tsx (their removal was breaking the unmodified accessibility-remediation.spec.ts 2.13). Fix serious color-contrast on Projects ICE badge, My Week day labels, and FilterTabs count badge. All 6 axe target pages now report 0 critical/serious (scripts/shipshape-axe-scan.mjs). - Security/CSP: allow the app's own Google Fonts CDN in style-src/font-src (the stylesheet was CSP-blocked on the Railway deployment); script-src stays nonce-only. - Dependencies: close all high/critical advisories via pinned pnpm.overrides (fast-xml-parser, hono, @hono/node-server, express-rate-limit, fast-uri, path-to-regexp). pnpm audit --prod now reports 0 high/critical; probe findings 12 -> 2. - Performance: rewrite /api/projects correlated per-row subqueries into pre-aggregated CTEs (byte-identical output, +22% throughput). - Tooling: add scripts/shipshape-type-violations.ts (reproduces the 25% type-safety gate: 950 core / 25.84% / PASS) and scripts/shipshape-axe-scan.mjs. - Docs/deck updated to match the verified final state. All 622 unit tests pass; build clean; verified in-browser locally and on Railway. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Docker start command runs `migrate.js && index.js`, and migrate.ts called loadProductionSecrets() unconditionally, so NODE_ENV=production crash-looped on the AWS SSM credentials lookup before the server started — LOAD_SSM=false only gated index.ts. Move the LOAD_SSM/RAILWAY_ENVIRONMENT bypass into loadProductionSecrets so every startup entrypoint (index, migrate, seed) skips SSM and uses the platform-injected env vars. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
It was the only ShipShape-authored markdown still in the repo root; relocate it beside the rest of the audit deliverables and fix its self-reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pshape/) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… docs - Accessibility: production axe scan (real seed data) surfaced serious color-contrast that the local scan missed. My Week: today label text-accent -> text-foreground, and future rows use a dashed border instead of opacity-40 (opacity dimming dropped text below WCAG AA). Projects: empty-state CTA text-accent -> text-foreground underline (no accent-blue shade meets AA for small text on the dark background). - Security: SECURITY_PROBE.md now has an explicit Remediation Summary — what was fixed (CSP, WebSocket, fonts, all high/critical deps) and what was not (2 medium stored-XSS mitigated at output; 6 moderate + 1 low deps deferred). - Removed docs that fulfill no assignment requirement / are internal scratch: GRADING_REVIEW, SOCIAL_POST_DRAFT, SHIPSHAPE_KICKOFF_NOTES, AGENT_PERSONAS_UAT_FIXES, RAILWAY_DEPLOYMENT (deploy URL kept in the demo outline). Fixed resulting dangling references. - CLAUDE_FINAL_AUDIT deployment section rewritten to reflect the real SSM-gate fix and production-mode switch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DEMO_VIDEO_OUTLINE.md relocated to ~/code/gauntlet/SHIPSHAPE_DEMO_VIDEO_OUTLINE.md (outside the repo, per request). The deployed app URL + demo login are now recorded in CLAUDE_FINAL_AUDIT.md so the repo still documents the live app. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLI-only probe today; designed a dark-theme dashboard mock (run controls, summary stats, per-attack-surface checks, severity-coded findings, remediation summary). Figma file + rendered PNG export referenced from SECURITY_PROBE.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…idence) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SHIPSHAPE_AUDIT_REPORT.md -> AUDIT_REPORT_MVP.md SHIPSHAPE_ORIENTATION.md -> ORIENTATION.md CLAUDE_AUDIT_OF_CODEX_AUDITS.md -> CLAUDE_AUDIT_OF_CODEX_AUDITS_MVP.md CODEX_AUDIT_OF_CODEX_AUDIT.md -> CODEX_AUDIT_OF_CODEX_AUDIT_MVP.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…leanup) - Backend: port the CLI probe into an in-process TS service (api/src/services/securityProbe.ts) so it runs from the deployed container (the runtime image excludes scripts/). Adds POST /api/security-probe/run, super-admin gated, targeting the app's OWN origin (RAILWAY_PUBLIC_DOMAIN or request host — never user-supplied, no SSRF), with a single-run lock. - Auto-cleanup: every document the input-sanitization checks create is deleted before the report returns (verified locally: 2 created → 2 deleted, 0 leaked). - Frontend: web/src/pages/SecurityProbe.tsx at /security-probe with its own login layer (same admin credentials, super-admin required) + a dashboard matching the Figma mock (summary cards, checks-by-surface, findings). - Move the demo deck out of the repo (per request); update SECURITY_PROBE.md. Verified locally end-to-end: login → Run Probe → 14/16 checks, 2 findings, 0 critical/high, test docs auto-cleaned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Browser-verified end-to-end on https://shipshape-app-production-7ed8.up.railway.app: login layer → Run Probe → 13/16 checks, 2 findings, 0 critical/high, 2 test docs auto-cleaned. (Member privilege-escalation check skipped on prod — member login rate-limited/seed-dependent — probe degrades gracefully rather than failing.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…isioning Category 8, toward an all-checks-pass probe: - Sanitize document titles and TipTap plain-text nodes server-side (api/src/utils/sanitizeContent.ts) on create + content PATCH — strips HTML tags from titles/paragraph text while preserving code blocks. This actually remediates the two stored-XSS findings (payloads are neutralized at input, on top of the existing React/TipTap output encoding), so the probe's two input-sanitization checks now pass. All 465 API tests still pass (no-op on normal text). - Probe self-provisions a least-privilege member via the super-admin invite+accept flow when direct member login fails (e.g. a setup-only deployment that was never seeded), so the privilege-escalation check runs on any instance instead of skipping. Member credentials are env-configurable (SHIP_SECURITY_MEMBER_EMAIL/PASSWORD). Verified locally: probe now reports 16/16 checks, 0 findings, test docs auto-cleaned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e green
- ensureMember read the raw response shape, but admin/auth routes wrap responses
in { success, data }. Read both shapes so the probe finds the workspace and
self-provisions the member via invite+accept on a setup-only deployment.
- Refactored the in-process probe to a generic request<T>() and a typed audit
shape, and trimmed the sanitizer casts, so the new code stops blowing the
Category-1 type-safety gate (back to 955 core / 25.45% reduction, PASS).
Local probe: 16/16 checks, 0 findings, test docs auto-cleaned. 465 API tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… web UI - CLAUDE_FINAL_AUDIT + FIXES_IMPLEMENTATION + SECURITY_PROBE: reflect the stored-XSS input-sanitization remediation, the deployed Security Probe web UI (auto-cleanup + member self-provisioning), and the probe reaching 16/16 / 0 findings (CLI + deployed). Regenerated CLI probe evidence (16/16). - AUDIT_REPORT_MVP: keep it a clean MVP-checkpoint snapshot (post-MVP work moved to the final docs, per the _MVP-naming convention). - AI_COST_ANALYSIS: addendum reflecting the Claude (Opus 4.7) final pass. - Updated the live 16/16 run screenshot evidence. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n root) The app shell locks html/body/#root to height:100% overflow:hidden for the fixed 4-panel layout. The standalone /security-probe page used min-h-screen, so a tall probe report overflowed the clipped root and could not scroll. Give the page its own scroll container (h-screen overflow-y-auto) on the dashboard, login, and loading views. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l bug A full-page screenshot masked an interactive scroll/clip bug that only surfaced when a real user tried to scroll — interactive checks catch what tall screenshots hide. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FleetGraph is an agent embedded in Ship. One LangGraph.js graph runs two modes — proactive (node-cron sweep + a debounced issue-mutation hook) and on-demand (context-aware chat scoped to the current view). It detects stale, overdue, unassigned, and unestimated work, reasons with tiered Claude, surfaces findings, and gates any state mutation behind human approval. - api/src/fleetgraph/: graph, deterministic detectors, parallel fetch, tiered LLM (Bedrock by default; Anthropic API when ANTHROPIC_API_KEY is set), Postgres checkpointer, findings/dedup store, hybrid triggers. - Routes: on-demand chat, inbox, and HITL approval resume. Migration 039 adds fleetgraph_findings + fleetgraph_pending_approvals. - Web: bottom-right dock (notification bell/inbox, approval cards, embedded chat); new fleetgraph:* /events WebSocket event types. - LangSmith tracing throughout (different graph paths per condition). - migrate.ts: tolerate "already exists" so re-runs on an existing DB don't abort before later migrations. - ssm.ts: optionally load LangSmith config from SSM in production. - Docs: FLEETGRAPH.md, PRESEARCH.md, architecture deck under docs/fleetgraph/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move `node dist/db/migrate.js` out of the container CMD into Railway's deploy.preDeployCommand (railway.json); the app now starts with `node dist/index.js` only. A failed migration now fails the deploy while the previous healthy version keeps serving, instead of crash-looping the container (which caused a brief prod outage). Non-Railway targets (AWS EB) must run migrate as their own pre-deploy/release step — noted in the Dockerfile. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ALTER TYPE ... RENAME VALUE has no IF EXISTS, so on a DB bootstrapped from the
current schema.sql (enum already at the week labels) migration 033 crashed
("sprint_plan is not an existing enum label"), breaking brand-new-env bootstrap.
Guard each rename to run only when the old label exists and the new one does
not — a clean no-op on a current schema, still correct on a genuinely old DB.
Verified: a fresh database now applies all 41 migrations and seeds successfully.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-entity proactive detection, run only on workspace-wide (cron/digest) sweeps: - detectSprintSlip: active sprints where elapsed-fraction outpaces done-fraction (worse when confidence is high) -> "at risk of slipping" finding to the sprint owner. - detectCapacity: per-person sum(estimate) of open assigned work vs capacity_hours -> overload finding + HITL reassign of the lowest-priority item to the teammate with the most slack; notifies the person + their reports_to. - New parallel fetchMeta node (workspace sprint window + per-sprint progress counts). - executeApproved now handles "reassign" (set assignee_id) and notifies the new assignee. Detectors verified with synthetic inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The embedded chat now returns {reply, action?}: when a message clearly requests a
change (reassign / set state / priority / estimate / due date) the model emits a
concrete action with validated ids + payload (schema enforced per kind), routed
through the SAME human gate as proactive findings. The gate is now action-based
(findings optional), so chat actions and detector actions share one approval path.
runOndemandChat returns a pendingApproval; the dock renders an inline Apply/Cancel
in the chat. executeApproved handles set_state/priority/estimate/due_date/reassign.
Verified e2e against real data: "reassign #9200 to Alice" -> proposed -> approved -> reassigned.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
api/src/fleetgraph/evals/: labeled scenarios (cases.ts) feed synthetic Ship state to the deterministic detectors; run-evals.ts scores precision / recall / F1 and quiet-accuracy (cases that must stay silent), prints a per-case report, and exits non-zero on any miss (CI-usable). Optional LangSmith dataset upload when LANGCHAIN_API_KEY is set. New `pnpm fleetgraph:eval`. Current: 10/10 cases exact, P/R/F1 = 1.00, quiet-accuracy 5/5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New triggerKind 'digest': fetch -> digestNode synthesizes a per-project narrative (what's moving, what's at risk, the single most important next action) via batched Tier-2 calls, emits one autonomous finding per project (deduped one-per-project-per-day), routed through surfaceAuto to notify owner/accountable/admins. Daily node-cron (FLEETGRAPH_DIGEST_CRON, advisory-locked) + runDigest. Falls back to a deterministic summary when no model is available; chunked so responses don't truncate. Verified against real data: 15 projects synthesized with concrete narratives; dedup holds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
prepare loads per-type dismissal counts; dedup drops novel signals whose type has been dismissed >= FLEETGRAPH_DISMISS_SUPPRESS_THRESHOLD (default 3) times, so the agent learns to stop surfacing what a team keeps rejecting. Verified at the dedup node: novelSignals 1 -> 0 as dismissals cross the threshold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use Cases: add sprint-slip, capacity overload (HITL reassign), on-demand chat actions, daily project digest, and adaptive suppression. Graph diagram: add the fetchMeta node, the digest path, the on-demand action branch, and dedup-level suppression. Note the `pnpm fleetgraph:eval` harness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Relocate the prior-week shipshape audit/analysis markdown into docs/gauntlet/shipshape to keep assignment artifacts grouped under docs/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four eval layers covering the agent end-to-end: - L1 deterministic (api/src/fleetgraph/evals): 17 detector cases (all signal types, severities, recipient resolution) + 10 dedup/suppression cases via a pure exported filterNovel() — precision/recall/quiet-accuracy, exits non-zero on any miss. - L2 integration (__tests__/graph.integration.test.ts, vitest + real DB, hermetic via FLEETGRAPH_DISABLE_LLM): quiet / autonomous / HITL approve+dismiss+snooze / digest paths, plus the autonomy-boundary guardrail (no mutation before approval; only allowed fields written). - L3 LLM-graded (evals/llm, LangSmith evaluate): triage keep-critical, reasoning faithfulness + chat groundedness (Claude-as-judge), chat action-extraction exact-match. Local authoritative scoring + best-effort LangSmith experiment logging; threshold-gated. - L4 adversarial: prompt-injection cases assert no unsafe/out-of-scope actions. Refactors (behavior-preserving): dedup.ts (pure filterNovel), llm.ts FLEETGRAPH_DISABLE_LLM, graph.ts eval wrappers. Scripts: fleetgraph:eval:llm, fleetgraph:eval:all. .circleci/config.yml: install/type-check/build, eval_deterministic, test_integration (postgres service), eval_llm (context secrets), and deploy → `railway up --service shipshape-app` on master (ships the app + the in-process agent together). Secrets via the `fleetgraph` context. Verified locally: L1 27/27, L2 6/6 on a fresh DB, L3+L4 all pass (triage 1.00, faithfulness 0.97, groundedness 1.00, action-extraction 1.00, adversarial 1.00). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mg/node) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cimg/node) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ShipShape Phase 2 Implementation
Summary
FIXES_IMPLEMENTATION.mdand keeps audit addendum current inSHIPSHAPE_AUDIT_REPORT.mdVerification
npx pnpm@10.27.0 buildDATABASE_URL=postgres://ship:ship_dev_password@localhost:5433/ship_dev npx pnpm@10.27.0 --filter @ship/api testnpx pnpm@10.27.0 --filter @ship/web testDATABASE_URL=postgres://ship:ship_dev_password@localhost:5433/ship_dev npx pnpm@10.27.0 test:coverage:changedCOREPACK_INTEGRITY_KEYS=0 PLAYWRIGHT_WORKERS=1 ./node_modules/.bin/playwright test e2e/accessibility-stretch.spec.ts --project=chromium --reporter=lineSHIPSHAPE_BASE_URL=http://localhost:3002 SHIPSHAPE_CONCURRENCY=50 SHIPSHAPE_DURATION_MS=5000 node scripts/shipshape-latency-benchmark.mjsDATABASE_URL=postgres://ship:ship_dev_password@localhost:5433/ship_dev SESSION_SECRET=local-dev-session-secret-not-for-production E2E_TEST=1 npx pnpm@10.27.0 --filter @ship/api exec tsx ../scripts/shipshape-query-count.tsgit diff --checkKey Results
470.98 kB / 140.68 kB gzip, below Vite's 500 KiB warning threshold198msvs1,210msbaseline; team grid119msvs1,818msbaseline25vs33baseline and26target226/226(100.00%), plus API/web package coverage ratchet185to77