Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
294 commits
Select commit Hold shift + click to select a range
24dbf8c
feat(Sprint 3): Implement cross-filing OCF derivation for 10-Q
Jan 21, 2026
88e538a
feat(banking): Implement arithmetic extractor and semantic fixes for …
Jan 21, 2026
8c3ad9e
feat(banking): Expand sector coverage and adding guardrails
Jan 21, 2026
3e603f9
feat(banking): Implement Street View Cash/Debt extraction
Jan 21, 2026
cde1250
fix(banking): Add Dealer catch-all for GS ShortTermDebt
Jan 21, 2026
3636267
fix(validator): Implement Flow/Stock Guardrail for Banking Metrics
Jan 21, 2026
759dc6c
Refine Banking Extraction: Dual-Track logic, E2E Skill, and Documenta…
Jan 21, 2026
d469a9c
fix(banking): Improve GAAP extraction for Cash and ShortTermDebt vali…
Jan 22, 2026
e73a239
feat(config): Add banking sector company configurations
Jan 22, 2026
5dd947d
docs(banking): Update extraction guide with dual-track architecture
Jan 22, 2026
35a2ae9
fix(banking): Remediate GAAP extraction for dealers, maturity schedul…
Jan 22, 2026
2287498
docs(banking): Update extraction guide with GAAP remediation details
Jan 22, 2026
aff1849
feat(banking): Implement Architect Directives for GAAP extraction
Jan 23, 2026
2029bac
docs(banking): Add response to Architect's 5 technical questions
Jan 23, 2026
9f8d366
feat(banking): Implement archetype-driven GAAP extraction with suffix…
Jan 23, 2026
97f1735
fix(banking): Resolve 10-Q extraction regressions and JPM repos subtr…
Jan 24, 2026
dadbb80
fix(banking): Phase 4 - Fix WFC 10-Q repos detection and trading excl…
Jan 24, 2026
db47526
docs(banking): Add Phase 4 extraction evolution report
Jan 24, 2026
b6625e3
docs(banking): Comprehensive update to developer guide for Phase 4
Jan 24, 2026
3d1adf9
feat(skill): Add write-evolution-report skill with ENE integration
Jan 24, 2026
3acdf57
feat(banking): Implement ADR-005 fingerprinting and ADR-012 safe fall…
Jan 24, 2026
1f902b2
docs(banking): Add Phase 5 extraction evolution report
Jan 25, 2026
7f398f1
feat(banking): Add known divergences support for 10-K extraction
Jan 25, 2026
7c42b6e
feat(testing): Add Standard Industrial E2E test framework
Jan 25, 2026
0915e8d
fix(industrial): Add known divergences for OperatingIncome validation
Jan 26, 2026
abd5a13
feat(testing): Add mode presets for E2E test coverage levels
Jan 26, 2026
eaa4cc6
feat(metrics): Expand Archetype A coverage with 7 new metrics
Jan 26, 2026
3d300fc
feat(extraction): Add facts-based fallback for XBRL concept mapping
Jan 26, 2026
06a7255
fix(extraction): Address E2E validation failures from Evolution Report
Jan 26, 2026
9c6c86f
fix(extraction): Correct date filter format in quarterly derivation
Jan 26, 2026
8ef60fa
feat(e2e): Add divergence statistics to E2E test skill outputs
Jan 26, 2026
a71cf47
fix(extraction): Resolve 8 XBRL extraction bugs across PBF, UNH, UPS,…
Jan 28, 2026
b58bbc1
docs: Add upstream feature analysis report for merge decision-making
Mar 2, 2026
05e5ed7
chore: commit WIP changes before upstream merge
Mar 2, 2026
da72eca
chore: stage all fork-local files before upstream merge
Mar 2, 2026
3772363
merge: Sync with upstream/main v5.19.1 (478 commits)
Mar 2, 2026
35dd3a8
chore: import upstream gaap_mappings and section_membership as supple…
Mar 2, 2026
6679ca9
docs: Add post-merge financial database assessment report
Mar 2, 2026
df85d81
feat: expand known_concepts at config load time using upstream GAAP m…
Mar 2, 2026
a5fc957
feat: wire ExperimentLedger into E2E script + measure GAAP expansion …
Mar 2, 2026
632676e
feat: add regression detection system with golden master promotion an…
Mar 2, 2026
743b705
feat: add yfinance reference snapshot system for deterministic E2E va…
Mar 2, 2026
956d7a3
data: generate initial yfinance reference snapshots for 43 companies
Mar 2, 2026
33eedfe
docs: add 3-phase roadmap for financial database completion
Mar 3, 2026
c9e0a72
data: Phase 1 infrastructure activation — 479 golden masters, 5834 ex…
Mar 3, 2026
816624a
fix: Phase 2 — resolve structural failures via concept fixes and skip…
Mar 3, 2026
b3c4228
feat: Phase 3 — golden master regression gate + SaaS company configs
Mar 3, 2026
fd49f3c
data: add SaaS company yfinance snapshots and initial E2E baseline
Mar 3, 2026
1f1505f
feat: Phase 3 expansion — 9 new companies across 3 sectors + SaaS fixes
Mar 4, 2026
c12b831
fix: Phase 3 E2E validation — all 11 untested companies now passing
Mar 4, 2026
7d0bd9d
feat: add user-facing StandardizedFinancials API with 24 cross-compan…
Mar 4, 2026
5d7eedc
feat: add FinancialDatabase — SQLite-backed store of standardized met…
Mar 4, 2026
64b4fcd
feat: add automated company onboarding pipeline for S&P 100 expansion
Mar 4, 2026
f4ae83c
feat: onboard 40 S&P 100 companies — 96 total tickers, 665 golden mas…
Mar 4, 2026
45d08d4
feat: add pipeline orchestrator for agent-driven database expansion
Mar 4, 2026
466f6ed
feat: add --no-ai flag, is_dimensioned fix, and pipeline tracking wit…
Mar 5, 2026
524a6ac
fix: thread filing_date and form_type through validation pipeline
Mar 5, 2026
e192e32
fix: add OSError retry in orchestrator and surface silent XBRL failures
Mar 6, 2026
af4f4e5
fix: resolve NFLX WeightedAverageSharesDiluted gap (95% → 100%)
Mar 17, 2026
54aab29
chore: add E2E reports, diagnosis skill, and update gitignore for han…
Mar 17, 2026
618e39b
chore: add branch status doc, pyyaml dependency, and sudo rule
sangicook Mar 17, 2026
4caf08f
docs: add auto-eval strategy report applying autoresearch ethos to Ed…
sangicook Mar 17, 2026
bdd1d52
feat: implement autonomous auto-eval system for XBRL config optimization
sangicook Mar 17, 2026
bae2619
fix: resolve 5 CQS gaps via auto-eval config changes (CQS 0.906 → 0.927)
sangicook Mar 17, 2026
146c6fa
fix: add multi-period verification and concept discovery to auto-eval…
sangicook Mar 17, 2026
832681b
docs: rewrite standardization README with auto-eval documentation
sangicook Mar 17, 2026
b913253
feat: implement Haiku swarm architecture for parallel auto-eval
sangicook Mar 17, 2026
6da1fe2
refactor: replace Haiku agents with Python ThreadPoolExecutor for mec…
sangicook Mar 17, 2026
5134b35
fix: handle None facts and boost scout proposal rate with concept var…
sangicook Mar 17, 2026
48db3de
fix: strip_prefix handles us-gaap_ underscore form + expand concept v…
sangicook Mar 17, 2026
9487b4b
fix: scout extracts XBRL values directly when yfinance unavailable
sangicook Mar 17, 2026
3e3748d
docs: add auto-eval results log tracking CQS evolution across sessions
sangicook Mar 18, 2026
47bff56
feat: add 50-company expansion cohort for auto-eval Session 3
sangicook Mar 18, 2026
29a3c97
feat: expand auto-eval to 50 companies — CQS 0.9206, 0 regressions
sangicook Mar 18, 2026
5e577e0
docs: update auto-eval docs to reflect Session 3 implementation
sangicook Mar 18, 2026
219f5b2
fix: correct gap classification, add per-metric tolerance, record eva…
sangicook Mar 18, 2026
477a239
fix: NULL variance bug + config depth optimization — CQS 0.9535 → 0.9796
sangicook Mar 18, 2026
cd850d1
feat: strict tolerances + two-score architecture (EF-CQS / SA-CQS)
sangicook Mar 18, 2026
08bd2d0
feat: wire Auto-Solver + two-score architecture into auto-eval loop
sangicook Mar 18, 2026
134df7c
feat: parallelize company evaluation with ProcessPoolExecutor
sangicook Mar 18, 2026
64deb19
fix: cap solver candidates at 50 + escalate after graveyard failure
sangicook Mar 18, 2026
f0962fd
feat: multi-period validation + standardization upgrade for solver
sangicook Mar 18, 2026
1a8be2b
refactor: clean up auto-eval/solver dead code and duplication
sangicook Mar 18, 2026
a19de94
docs: update Session 7 baselines, results, and standardization README
sangicook Mar 18, 2026
ffa9fcb
feat: period-aware search pipeline for auto-solver (v2)
sangicook Mar 18, 2026
6d02f4c
fix: resolve auto-eval loop stall (6 bugs)
sangicook Mar 18, 2026
1beba9e
test: add 23 verification tests for auto-eval stall fixes
sangicook Mar 18, 2026
4c4a0a7
feat: add clear_graveyard_entries() and update ledger after live veri…
sangicook Mar 18, 2026
809bbe1
feat: improve auto-resolver for hard gaps with evidence-driven proposals
sangicook Mar 19, 2026
24f3ba1
feat: add GPT-5.4 escalation for exhausted auto-eval gaps
sangicook Mar 19, 2026
2b1c6a1
feat: add eval_cohort param to run_overnight() + 50-company auto-eval…
sangicook Mar 19, 2026
dc27db8
feat: add multi-agent auto-eval with parallel workers + composite-met…
sangicook Mar 19, 2026
a20c029
fix: CQS evaluation gate + wire standardization into primary extraction
sangicook Mar 19, 2026
64befb2
feat: add observability sensors for CQS gate + SA promotion fixes
sangicook Mar 19, 2026
f09ecb4
fix: invalidate config cache after apply/revert + add module-level lo…
sangicook Mar 19, 2026
2e5964a
feat: add 4 standardization formulas from multi-agent auto-eval session
sangicook Mar 19, 2026
57da569
fix: use snapshot-based revert to preserve KEPT changes across experi…
sangicook Mar 19, 2026
0a6d6b9
feat: add in-memory config isolation for parallel auto-eval workers
sangicook Mar 19, 2026
2d1f75e
feat: add agent team architecture for 100+ company auto-eval
sangicook Mar 19, 2026
44f321c
feat: fix team-eval bugs, onboard 27 companies, define 500-company co…
sangicook Mar 19, 2026
5871b92
data: record 100-company team-eval run (Phase D results)
sangicook Mar 19, 2026
cc44bf6
docs: update standardization README with team architecture and 100/50…
sangicook Mar 19, 2026
c3db8e9
docs: add subscription-grade quality target to auto-eval README
sangicook Mar 19, 2026
d3e6ce4
feat: optimize team-eval validation from 2.28h to <5 min
sangicook Mar 19, 2026
c6b6eec
feat: replace in-memory XBRL cache with disk-backed cache to fix OOM
sangicook Mar 19, 2026
ffe7fcb
data: record team-eval run with disk-backed cache (5 workers, 100 com…
sangicook Mar 19, 2026
74022ec
feat: persist gap details in worker checkpoints
sangicook Mar 19, 2026
bb057ba
feat: Tier 3 Python-level XBRL extraction fixes for 31 gaps
sangicook Mar 20, 2026
911bb63
docs: add CQS improvement loop workflow and update README with Tier 3…
sangicook Mar 20, 2026
4b29c13
feat: add session state persistence and root cause classifier for CQS…
sangicook Mar 20, 2026
b1e7b93
fix: regression veto now compares against baseline instead of absolut…
sangicook Mar 20, 2026
fda112c
docs: add next-gen CQS loop implementation plan
sangicook Mar 20, 2026
100b77e
chore: checkpoint auto-eval overnight run results
sangicook Mar 20, 2026
3e85a60
feat: loop efficiency — derive gaps from CQSResult + proposal dedup c…
sangicook Mar 20, 2026
7d85db3
feat: regression diff pipeline — diagnose and auto-fix golden master …
sangicook Mar 20, 2026
d7d34fe
fix: correct field name mismatch in diagnose_regression and add integ…
sangicook Mar 20, 2026
053a86c
feat: reference adjudication — trust hierarchy and reference_disputed…
sangicook Mar 20, 2026
bc4518a
feat: industry archetype templates with forbidden/required metrics
sangicook Mar 20, 2026
b2a1686
feat: richer formula solver — subtraction search, scale normalization…
sangicook Mar 20, 2026
c33dced
feat: AI agent routing infrastructure — dispatch framework for Phase 2
sangicook Mar 21, 2026
ff16f80
fix: enable solver subtraction and scale search in proposal path
sangicook Mar 21, 2026
de24aec
feat: Tier 1 CQS loop improvements — unblock proposals, sign handling…
sangicook Mar 21, 2026
259a515
config: auto-eval accepted proposals from Phase C 100-company run
sangicook Mar 21, 2026
873a453
fix: store real XBRL concepts in golden masters, guard regression pip…
sangicook Mar 21, 2026
e120168
fix: Phase 1 governance — promotion threshold, metric tolerances, sto…
sangicook Mar 21, 2026
6709eaf
feat: Phase 2 SEC-native self-validation — wire internal consistency …
sangicook Mar 21, 2026
efe84b7
refactor: cleanup from code review — remove bad equation, derive stra…
sangicook Mar 21, 2026
0a9622e
feat: Phase 3 metric expansion — 19 to 37 base metrics + 6 derived + …
sangicook Mar 21, 2026
3d6df0c
feat: Phase 4 SEC Company Facts API as second reference source
sangicook Mar 21, 2026
10f04ca
feat: thread use_sec_facts through pipeline to unlock SEC facts fallback
sangicook Mar 22, 2026
5493021
fix: prepend us-gaap: prefix in SEC facts lookup — known_concepts are…
sangicook Mar 22, 2026
3e5c99e
fix: propagate reference_value through fast-path gap derivation
sangicook Mar 22, 2026
fc679ef
docs: update roadmap tracking with overnight run 004 results
sangicook Mar 22, 2026
b410326
feat: honest scoring, semantic constraints, and confidence states
sangicook Mar 22, 2026
a29f835
feat: Part C — RFA/SMA sub-scores, canonical fact provenance, Calcben…
sangicook Mar 22, 2026
79f3f3e
refactor: simplify decision gates, publish confidence, and canonical …
sangicook Mar 22, 2026
8c4da67
fix: honest coverage scoring, regression cleanup, and RFA/SMA gap rou…
sangicook Mar 23, 2026
9c295b2
feat: wire canonical fact store provenance through extraction pipeline
sangicook Mar 23, 2026
a45e0a7
feat: two-step auto-eval architecture with gap manifest and AI consul…
sangicook Mar 23, 2026
8f78771
feat: native Claude Code agents for gap consultation (gap-solver + ga…
sangicook Mar 23, 2026
3ff69e2
feat: typed action schema — AI emits semantic intents, compiler write…
sangicook Mar 24, 2026
f6fb017
test: E2E proof that typed actions fix raw pipeline's path invention …
sangicook Mar 24, 2026
1f8ff89
feat: live benchmark infrastructure for 50-company typed action evalu…
sangicook Mar 24, 2026
f2473a0
feat: capability-aware triage, sign-aware gaps, and applicability CQS…
sangicook Mar 24, 2026
aa76145
docs: consolidate autonomous system docs from 14 files to 2 + mainten…
sangicook Mar 25, 2026
72ee5ff
docs: optimize update-autonomous-docs skill for better triggering and…
sangicook Mar 25, 2026
a52a480
docs: restrict consult-consensus skill to manual-only triggering
sangicook Mar 25, 2026
e0c7c94
docs: optimize CLAUDE.md files and add stance patterns to consensus s…
sangicook Mar 25, 2026
2312bf0
docs: consensus session 005 — subscription-grade readiness requirements
sangicook Mar 25, 2026
cd4f310
feat: subscription-grade XBRL extraction system (8 phases)
sangicook Mar 25, 2026
15e19c6
feat: SEC-native primacy, progress printing, and Lead Agent Closed Loop
sangicook Mar 26, 2026
50e5c0c
feat: Phase 7 Lead Agent Closed Loop pipeline
sangicook Mar 26, 2026
1bf75aa
test: closed-loop E2E validation for AI gap resolution pipeline
sangicook Mar 26, 2026
765ed7d
refactor: make live benchmark self-running with OpenRouter API
sangicook Mar 26, 2026
33511a0
feat: closed-loop pipeline optimizations O1-O6 (Consensus 006)
sangicook Mar 26, 2026
bfe590b
docs: consensus 007 — value-grounded AI consultation architecture
sangicook Mar 26, 2026
f1fe3e9
feat: value-grounded AI consultation O7-O9 (Consensus 007)
sangicook Mar 26, 2026
5a84a21
fix: correct _do_retry argument signatures for parse/compile/validate
sangicook Mar 26, 2026
e1dc191
feat: manifest caching O10, deterministic downgrade O11, consensus 009
sangicook Mar 27, 2026
b0c717a
feat: gap-aware compiler with namespace normalization O12-O14 (Consen…
sangicook Mar 27, 2026
a9d0a38
docs: consensus 010 — AI prompt effectiveness diagnosis (O15-O20)
sangicook Mar 27, 2026
93acf55
feat: semantic AI prompt redesign O15-O20 (Consensus 010)
sangicook Mar 27, 2026
c976b67
feat: in-memory config bug fixes O21-O27, known_divergences wiring, c…
sangicook Mar 27, 2026
e02010b
feat: diagnostic logging O28-O32 for SA pipeline and eval loop
sangicook Mar 27, 2026
a89fd03
feat: MappingSource.OVERRIDE separates company overrides from exclusi…
Mar 27, 2026
5ee061c
fix: add missing logger import to tree_parser.py
Mar 27, 2026
36f1c76
feat: signed formula engine & companion fixes O49-O52 (Consensus 016)
Mar 28, 2026
74da0e6
refactor: remove dead MCP/GPT escalation path from auto-eval loop
Mar 31, 2026
6fda5fa
feat: graveyard replay breaks 0% KEEP rate — CQS 0.8224 → 0.8237
Mar 31, 2026
514fea2
fix: forbidden metrics CQS scoring, derivation planner wiring, diverg…
Apr 1, 2026
8453606
docs: update autonomous system docs for Consensus 017 + verification …
Apr 1, 2026
9a76d90
fix: resolve 3 XBRL gaps via config — CQS 0.8293 → 0.8300 (Gap Resolu…
Apr 1, 2026
cc33e20
feat: CQS scoring integrity reform — penalize extraction_failed exclu…
Apr 1, 2026
eddd1f1
fix: raw_cqs now subtracts forbidden-metric free passes, merge duplic…
Apr 1, 2026
dc14924
docs: update autonomous system docs for Consensus 018 scoring integri…
Apr 1, 2026
295de9a
feat: Consensus 019 Day 1 — formula purge + extraction_failed fixes
Apr 2, 2026
2b775dc
feat: remove extraction_failed exclusions for COGS(6) + OperatingInco…
Apr 2, 2026
8177da9
docs: update autonomous system docs for Consensus 019 Phase A
Apr 2, 2026
4e35194
feat: Phase B extraction fixes — classify remaining gaps correctly
Apr 2, 2026
1c8234f
docs: update autonomous docs for Phase B — CQS 0.8180, extraction_fai…
Apr 2, 2026
b365a35
fix: restore corrupted dollar amounts in ShortTermDebt notes
Apr 2, 2026
30163b0
docs: update autonomous docs for Phase C — formulas reverted, config …
Apr 2, 2026
6aec979
docs: add ADR-001 for separating standardization package from upstream
Apr 2, 2026
1c350ce
docs: add ADR-002 for EdgarTools data trust analysis and SEC API path…
Apr 2, 2026
83a1125
feat: Consensus 020 Scoring Integrity Sprint — CQS v2 baseline
Apr 2, 2026
76f49a0
feat: Phase 8+9 — metric importance tiers, company quality tiers, Sho…
Apr 3, 2026
02f044e
feat: Phase 10 — EF-CQS 0.8311 → 0.8684, known_divergences fix, impor…
Apr 3, 2026
f9e8640
docs: update autonomous docs for Phase 10 — EF-CQS 0.8684, M8.1+M9.1 …
Apr 3, 2026
4c88070
refactor: extract _build_known_divergences_by_ticker and _build_metri…
Apr 3, 2026
bab8015
feat: Phase 10 Steps 5-7 — closed-loop run, company quality tiers, ex…
Apr 3, 2026
b50e007
feat: Phase 11 — systematic gap investigation, EF-CQS 0.8684 → 0.8740
Apr 3, 2026
5799f86
feat: Consensus 021 — config collapse, industry expansion, TotalLiabi…
Apr 4, 2026
aa735b7
fix: TotalLiabilities composite formula — fix YAML nesting + add fall…
Apr 4, 2026
4615092
feat: expand industry map for 100-company cohort
Apr 4, 2026
59f26c9
fix: classify PFE OperatingIncome as reference mismatch, not extracti…
Apr 4, 2026
65e76e5
fix: MCD WeightedAverageSharesDiluted — add scale_factor for iXBRL
Apr 4, 2026
f58030e
fix: BAC/C ShareRepurchases — classify as reference mismatch
Apr 4, 2026
2ddbe88
feat: ShortTermDebt composite — remove divergences for 5 companies
Apr 4, 2026
a9ed123
feat: ShortTermDebt composite — add LongTermDebtCurrent fallback conc…
Apr 4, 2026
8b5601d
feat: onboard 50 new companies for 100-company cohort
Apr 4, 2026
9d40c99
feat: Phase D — 100-company cohort at EF-CQS 0.8544
Apr 4, 2026
cdeee55
docs: update autonomous docs for Phase 13 — 100-co EF-CQS 0.8544
Apr 4, 2026
72243b9
refactor: simplify Phase 13 code — deduplicate formula loops, consoli…
Apr 4, 2026
6ae9fae
docs: Consensus 022 — autonomous loop fate decided (Option B)
Apr 4, 2026
32344a3
docs: subscription-grade roadmap design spec
Apr 4, 2026
f24174c
feat: subscription-grade data contract — two-tier confidence signals,…
Apr 4, 2026
9292910
feat: Phase 14 — fix forbidden_by_ticker scoring bug, EF-CQS 0.8544→0…
Apr 5, 2026
74ebb6b
feat: NCI scope check + YAML migration — remove hardcoded Python cons…
Apr 5, 2026
9fcfe53
docs: expansion pipeline design spec — 3-skill agent-driven company c…
Apr 5, 2026
356c888
docs: add deep-consensus amendments to expansion pipeline spec
Apr 5, 2026
87fb271
feat: add industry JSON fallback in config_loader (Amendment 1)
Apr 5, 2026
53fa03a
feat: confidence_scorer with per-root-cause thresholds (Amendment 2)
Apr 5, 2026
179508b
feat: report_generator — cohort and escalation markdown reports
Apr 5, 2026
517cdf8
feat: redirect update_company_tiers to JSON overrides (Amendment 1)
Apr 5, 2026
dc2b591
feat: config_applier — single JSON write path for expansion pipeline
Apr 5, 2026
3bd523f
feat: expand_cohort — inner loop for company onboarding pipeline
Apr 5, 2026
8777933
feat: investigate_gaps — outer loop with priority queue and pattern d…
Apr 5, 2026
51c57d1
docs: add expansion pipeline skills to CLAUDE.md + output directories
Apr 5, 2026
3fa104b
fix: address code review findings across expansion pipeline
Apr 5, 2026
a6a4ee4
fix: normalize root-cause taxonomy so confidence scorer handles all 1…
Apr 5, 2026
8f59ba3
feat: enrich UnresolvedGapEntry with evidence fields for confidence s…
Apr 5, 2026
2ee4581
refactor: consolidate update_company_tiers() to use config_applier
Apr 5, 2026
151e9ea
test: add integration tests for full evidence-to-scoring path
Apr 5, 2026
0a27c71
feat: add override exclusion analyzer to identify promotable industry…
Apr 5, 2026
21f52d8
refactor: replace duplicate SIC range loading with cached config_load…
Apr 5, 2026
41bb2f2
feat: promote industry-level exclusions and remove redundant per-comp…
Apr 5, 2026
a6af132
cleanup: delete 20 empty override files and remove empty sub-dicts fr…
Apr 5, 2026
3c5df7d
refactor: simplify — cache SIC ranges, reuse config_loader, consolida…
Apr 5, 2026
36c4ba4
docs: add E2E calibration run artifacts (10-company cohort validation)
Apr 5, 2026
7857d03
fix: derive components_found/needed from extraction_evidence in expan…
Apr 5, 2026
1e717e7
feat: implement _try_deterministic_fix() with sign-error and concept-…
Apr 5, 2026
cd6c075
fix: use explicit None checks instead of truthiness in _try_determini…
Apr 5, 2026
9cd63ab
feat: add evidence sidecar JSON to preserve XBRL evidence across mark…
Apr 5, 2026
142b14e
fix: use timezone-aware UTC, add gap_type to sidecar key for uniqueness
Apr 5, 2026
8e2930e
feat: inject peer_count into evidence before confidence scoring
Apr 5, 2026
4c26afa
refactor: simplify Phase 3 code after review
Apr 5, 2026
20e1d4e
feat: replace binary graduated/needs_investigation with three-tier qu…
Apr 5, 2026
332e858
Merge feature/ai-concept-mapping: 14 phases of autonomous XBRL extrac…
Apr 5, 2026
6312bee
docs: add Run 022 post-merge CQS baseline to roadmap
Apr 5, 2026
898d898
docs: add 50-company expansion validation cohort report
Apr 5, 2026
f4babd1
refactor: remove redundant quality_tier field from CompanyResult
Apr 6, 2026
6038f33
chore: capture auto-fix artifacts from 50-company expansion validation
Apr 6, 2026
7de5637
fix: resolve PPE gaps for 15 cohort companies via known_divergences
Apr 6, 2026
08d9ba8
chore: ignore .worktrees directory for isolated workspaces
Apr 6, 2026
8486d70
feat(auto-eval): Sub-project A — strict EF-CQS + determinism CI gate
Apr 6, 2026
7f53663
refactor(auto-eval): freeze DETERMINISM_TEST_COHORT as tuple
Apr 6, 2026
2f206cc
test(auto-eval): pin EDGAR_DETERMINISM_DEGRADED parsing contract
Apr 6, 2026
2501f1e
test(auto-eval): cover legacy from_dict payload without ef_cqs_strict
Apr 6, 2026
e977da4
test(auto-eval): strengthen test_strict_zero_division_safe with mixed…
Apr 6, 2026
bcc3edd
test(auto-eval): determinism gate also asserts on ef_cqs_strict
Apr 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
42 changes: 42 additions & 0 deletions .agent/skills/bank-sector-test/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
name: bank-sector-test
description: "Run standardized E2E validation for banking sector companies (GSIBs) against yfinance. Use when refining industry logic or verifying banking-specific fixes."
---

# Bank Sector E2E Test

## Overview
This skill runs a standardized End-to-End (E2E) validation test for a predefined list of major banking institutions. It verifies XBRL concept mappings against yfinance data for:
- **Banks**: BK, C, GS, JPM, MS, PNC, STT, USB, WFC
- **Scope**: 2 years of 10-Ks, 2 quarters of 10-Qs
- **Metrics**: All metrics defined in `metrics.yaml` (unless filtered)

## When to Use This Skill
- After modifying `industry_logic/` for banking extraction.
- After updating `industry_metrics.yaml`.
- Before merging changes that affect financial sector companies.
- To verify "Street View" logic (ShortTermDebt, CashAndEquivalents) for banks.

## How to Run

From the project root:

```bash
# Run standard bank test
// turbo
python .agent/skills/bank-sector-test/scripts/run_bank_e2e.py

# Run for specific metrics only
// turbo
python .agent/skills/bank-sector-test/scripts/run_bank_e2e.py --metrics ShortTermDebt,CashAndEquivalents
```

## Reports
Reports are generated in: `sandbox/notes/008_bank_sector_expansion/reports/`

1. **`e2e_banks_YYYY-MM-DD_HHMM.json`**: Detailed failure log.
2. **`e2e_banks_YYYY-MM-DD_HHMM.md`**: Markdown summary with pass rates and top failure stats.

## Troubleshooting
- **Execution Path Error**: Check if `fallback_to_tree` is correctly set in `industry_metrics.yaml`.
- **High Variance in Debt**: Check strict deduction logic or "Economic View" consistency (e.g., NetRepos inclusion).
Loading