The parsing library behind parseland, plus its evaluation harness and dashboard. Code without evals isn't really done — so they live here.
from parseland_lib.parse import parse_page
from parseland_lib.s3 import get_landing_page_from_r2
url = 'https://doi.org/10.1002/andp.19033150414'
lp = get_landing_page_from_r2(url)
response = parse_page(lp)
print(response)parseland-lib/
├── parseland_lib/ Library source
├── tests/ Library tests
├── eval/ Offline eval harness (Python)
│ ├── parseland_eval/ Harness package
│ ├── runs/ Benchmark run JSON
│ └── Gold Standard For Parseland - Sheet1.{csv,json}
│ 100-row hand-annotated gold standard
└── dashboard/ Static dashboard (Vite + React + TS)
# 1. Eval harness
cd eval
/opt/homebrew/bin/python3.11 -m venv .venv && source .venv/bin/activate
pip install -e '.[dev]'
pip install -r ../requirements.txt
python -m parseland_eval fetch
python -m parseland_eval run --label baseline
# 2. Dashboard
cd ../dashboard
npm install
npm run dev # → http://localhost:5173Per-field, at three strictnesses (see eval/README.md for the matrix):
| Field | Strict | Soft | Fuzzy |
|---|---|---|---|
| Authors | last + first-initial | — | rapidfuzz ≥ 85 |
| Affiliations | exact | normalized | token_set_ratio ≥ 85 |
| Abstract | exact | Levenshtein / normalized | Levenshtein / raw |
| PDF URL | canonicalized exact | — | — |
Aggregated per row, per publisher domain, and per failure mode (paywall, login, bot_check, broken_url, no_abstract, non_article, image_only, clean).
| Metric | Score |
|---|---|
| Authors F1 (soft) | 33.1% |
| Affiliations F1 (fuzzy) | 81.1% |
| Abstract Levenshtein | 26.4% |
| Abstract present rate | 27.0% |
| PDF URL accuracy | 12.0% |
Most Authors/Abstract loss comes from Elsevier linkinghub.elsevier.com redirects, Oxford "Thanks for visiting…" gates, and login-wall landing pages. See dashboard/ → heatmap + failure-mode bar.
- Establish baseline —
python -m parseland_eval run --label baselineonce. - Make a parser change in
parseland_lib/publisher/parsers/…. - Re-run —
python -m parseland_eval run --label fix-elsevier-2026-04-16. - Compare — dashboard renders delta vs previous run; trend chart accumulates.