Skip to content

Paper Methods

David Fillmore edited this page Mar 24, 2026 · 4 revisions

Paper Methods

This page maps manuscript methods text to concrete implementation and analysis paths in the repository.

Paper Scope

Current default: a JOSS-oriented software paper backed by existing workflow examples instead of a paper centered on one scientific campaign result.

Software Description Map

Topic Primary paths
Lineage and differentiation from MELODIES-MONET ../DAVINCI-MONET/README.md, ../DAVINCI-MONET/CLAUDE.md, ../DAVINCI-MONET/davinci_monet/pipeline/, ../DAVINCI-MONET/davinci_monet/pairing/
CLI and user surface ../DAVINCI-MONET/davinci_monet/cli/
Config schema and migration ../DAVINCI-MONET/davinci_monet/config/
Pipeline stages ../DAVINCI-MONET/davinci_monet/pipeline/
Pairing strategies ../DAVINCI-MONET/davinci_monet/pairing/
Plotting system ../DAVINCI-MONET/davinci_monet/plots/
Statistics framework ../DAVINCI-MONET/davinci_monet/stats/
Tests and validation ../DAVINCI-MONET/tests/, ../DAVINCI-MONET/pyproject.toml, ../DAVINCI-MONET/CHANGELOG.md

Workflow Types To Describe

Workflow Why it matters Evidence paths
Paired model-vs-observation Core atmospheric model evaluation use case ../DAVINCI-MONET/analyses/asia-aq/
Observation-only Enables campaign analysis even without model fields ../DAVINCI-MONET/analyses/dc3/
Satellite swath-to-grid Extends the framework beyond point and track observations ../DAVINCI-MONET/analyses/modis-aod/, ../DAVINCI-MONET/davinci_monet/pairing/

Candidate Methods Subsections

For JOSS, these are working notes for the Software design and State of the field sections, not a signal that the paper needs a long standalone methods chapter.

Lineage And Differentiation

This is the most important subsection for JOSS reviewers. The differentiation must be obvious on first read.

Questions to answer:

  • What changed enough from MELODIES-MONET to justify a separate software paper?
  • Which architectural changes are easy to explain and matter to users?
  • Which workflow capabilities are genuinely new or substantially cleaner in DAVINCI?

Concrete differentiators to draw from (see full comparison table in Paper Outline):

  1. Procedural → stage-based pipeline: MELODIES-MONET uses a manual .open_models().open_obs().pair_data() sequence. DAVINCI uses composable Stage Protocol objects with shared PipelineContext, enabling obs-only auto-detection and pluggable stage ordering.
  2. No types → typed runtime with stronger static checks: MELODIES-MONET has no type hints. DAVINCI ships py.typed, uses broad type annotations, enables several stricter mypy checks (check_untyped_defs, disallow_incomplete_defs, no_implicit_optional), and validates config with Pydantic schemas at parse time.
  3. Reader-centric → geometry-driven pairing: MELODIES-MONET pairs by data source. DAVINCI auto-detects geometry (POINT, TRACK, PROFILE, SWATH, GRID) from dataset structure and dispatches to specialized strategies.
  4. No obs-only mode → auto-detected obs-only pipeline: Entirely new capability. ObsPlotter base class with 5 dedicated renderers.
  5. Ad hoc satellite handling → unified swath-to-grid binning: Numba-accelerated, configurable grid modes (match_model, resolution, explicit).
  6. 1000+ tests vs. limited coverage: Synthetic data fixtures cover plot types, pairing strategies, config parsing, and integration paths.
  7. 1,630x observation load speedup: Time filtering at file and data level, not available in MELODIES-MONET.

Configuration And Execution

Questions to answer:

  • How does a YAML config map to runtime stages?
  • What parts of the workflow are explicit in config versus inferred by the pipeline? (Key: obs-only mode is auto-detected from config structure)
  • How does Pydantic validation prevent misconfiguration before runtime?

Geometry-Aware Pairing

Questions to answer:

  • What geometry types are handled directly by the runtime? (5 DataGeometry values: POINT, TRACK, PROFILE, SWATH, GRID)
  • How should swath-to-grid binning be explained without overstating it as a separate enum geometry? (Answer: it is a strategy that operates on SWATH-geometry data but produces GRID-geometry output. The runtime dispatches it through the MODIS L2 reader path, not through DataGeometry enum dispatch)
  • What are the performance characteristics? (numba JIT, configurable grid modes)

Statistics And Plotting

Questions to answer:

  • Which metrics are first-class runtime outputs? (27 paired metrics; fixed descriptive set for obs-only)
  • Which plots are used as paper evidence versus user-facing examples?
  • How do obs-only plotters differ from paired plotters architecturally? (ObsPlotter vs. BasePlotter base class)

Performance

Treat this as optional for JOSS unless the numbers become central to the software contribution:

  • Time filtering at load: 1,630x speedup (163s -> 0.1s for 5-month file)
  • Numba-accelerated grid binning vs. pure-Python baseline
  • Configurable Dask concurrency for pairing

Case Study Design

Questions to answer:

  • Why are ASIA-AQ, DC3, and MODIS AOD enough to support the paper narrative? (Each exercises a distinct workflow type and geometry coverage without redundancy)
  • Which workflows do they cover?
  • Which one or two examples are enough for the compact JOSS evidence package?
Case Study Workflow Geometries Novel aspect
ASIA-AQ Paired POINT, TRACK Multi-observation breadth
DC3 Obs-only TRACK, GRID (LMA) No-model pipeline
MODIS-AOD Paired + swath-to-grid SWATH -> GRID Satellite binning

Validation And QA

Use this section to gather concise methods language about software quality:

  • Test coverage and test categories in ../DAVINCI-MONET/tests/
  • Validation evidence in ../DAVINCI-MONET/tests/, ../DAVINCI-MONET/pyproject.toml, and checked-in analysis workflows
  • Known limitations and gaps that should be disclosed honestly

Methods Risks

  • Avoid describing not-yet-implemented behavior as production capability.
  • Keep observation-only statistics wording accurate: computed and plotted, but not currently exported to CSV by save_results.
  • Be explicit about which figures come from checked-in artifacts versus regenerated outputs.
  • SwathGridStrategy is not dispatched via DataGeometry enum — do not describe it as a sixth geometry type. It is a strategy used by the MODIS reader pipeline.
  • Performance numbers (1,630x, etc.) need to be reproducible on a named machine with a frozen commit. Record the benchmark setup.

Related pages:

Clone this wiki locally