Skip to content

Configuration

David Fillmore edited this page Mar 24, 2026 · 12 revisions

Configuration Guide

DAVINCI control files are YAML documents that drive the full pipeline.

Current runtime accepts both older plots.*.data references and newer explicit pairs blocks. The checked-in analysis configs under analyses/ use pairs.

Two implementation details matter for accuracy:

  • pairs is a runtime-supported top-level section preserved as an extra field on the loaded config object. It is used heavily by the current analysis configs and plotting stage, even though it is not yet a dedicated typed field on MonetConfig.
  • In stats, the schema-native key is stat_list, while the pipeline runtime also accepts metrics for compatibility.

Top-level sections

Section Purpose
analysis Time window, output/log paths, style, debug
model Model datasets, variable mapping, variable settings
obs Observation datasets and preprocessing
pairs Runtime-supported named model/observation variable pair definitions
plots Plot jobs, each referencing one or more pairs
stats Statistics metric configuration

Minimal structure

analysis:
  start_time: "2024-07-01"
  end_time: "2024-07-31"
  output_dir: ./output

model:
  cmaq:
    mod_type: cmaq
    files: /data/model/*.nc
    mapping:
      airnow:
        o3: O3

obs:
  airnow:
    obs_type: pt_sfc
    filename: /data/obs/airnow.nc

pairs:
  cmaq_airnow_o3:
    model: cmaq
    obs: airnow
    variable:
      model_var: O3
      obs_var: o3

plots:
  o3_scatter:
    type: scatter
    pairs: [cmaq_airnow_o3]

stats:
  metrics: [N, MB, RMSE, R, NMB, NME, IOA]

analysis

Common keys:

  • start_time, end_time: multiple date formats are accepted
  • output_dir: directory for figures and CSV output
  • log_dir: directory for pipeline markdown logs (pipeline_YYYYMMDD_HHMMSS.md)
  • debug: verbose timing/progress behavior
  • style: plotting style block
analysis:
  start_time: "2024-02-01"
  end_time: "2024-02-29"
  output_dir: ./output
  log_dir: ./logs
  debug: false
  style:
    theme: ncar          # ncar | default
    context: default     # default | presentation | publication
  city_labels:
    Seoul: [37.5, 127.0]
    Tokyo: [35.7, 139.7]

model

Each model entry needs file paths and mapping to observation variables.

model:
  cesm_asiaq:
    mod_type: generic
    files: /path/to/model/*.nc
    radius_of_influence: 15000
    mapping:
      airnow:
        ozone: O3
        pm25: PM25
    variables:
      O3:
        unit_scale: 1.0e9
        units: ppb
        vmin_plot: 0
        vmax_plot: 100
        vdiff_plot: 30

Supported model readers include cmaq, wrfchem, ufs, rrfs, cesm_fv, cesm_se, and generic.

obs

Observation entries provide source files and optional preprocessing.

obs:
  airnow:
    obs_type: pt_sfc
    filename: /path/to/airnow.nc
    variables:
      ozone:
        obs_min: 0
        obs_max: 200

  pandora:
    obs_type: pt_sfc
    filename: /path/to/pandora_no2.nc
    resample: "h"
    min_obs_count: 3
    track_obs_count: true

Common obs_type values used in this codebase include pt_sfc, aircraft, sonde, lma, sat_swath_clm, and sat_grid_clm.

Satellite observations

For satellite L2 swath data, use obs_type: sat_swath_clm with a sat_type field to specify the product:

obs:
  terra_modis:
    obs_type: sat_swath_clm
    sat_type: modis_l2
    filename: /path/to/MOD04_L2.*.hdf
    grid_source: my_model        # model whose grid to bin onto
    time_resolution: "1D"        # temporal binning frequency
    load_binned: true            # load pre-binned file if available
    save_binned: true            # save binned result for reuse
    binned_file: /path/to/binned.nc
    variables:
      AOD_550_Dark_Target_Deep_Blue_Combined:
        source_name: AOD_550_Dark_Target_Deep_Blue_Combined
        rename: AOD_550
        obs_min: 0.0
        obs_max: 10.0
        unit_scale: 0.001

Supported sat_type values used by the current readers include modis_l2, tropomi, and tempo_l2_no2, plus the generic L2/L3 reader paths.

Lightning observations (LMA)

obs:
  oklma:
    obs_type: lma
    filename: /path/to/oklma/*.nc
    variables:
      flash_extent_density:
        obs_min: 0
        units: "flashes/grid cell"

Supported LMA networks: OKLMA, COLMA, NALMA (auto-detected from filename).

pairs

pairs define exactly which model/obs variable combinations downstream plots use.

pairs:
  cesm_airnow_o3:
    model: cesm_asiaq
    obs: airnow
    variable:
      model_var: O3
      obs_var: ozone

plots

Each plot references one or more pair names.

plots:
  o3_timeseries:
    type: timeseries
    pairs: [cesm_airnow_o3]
    title: "O3 Time Series"
    show_uncertainty: true
    aggregate_dim: site

  o3_spatial_bias:
    type: spatial_bias
    pairs: [cesm_airnow_o3]
    title: "O3 Spatial Bias"

Available plot types registered in the package include:

Paired (model vs. obs):

  • timeseries, diurnal, scatter, taylor, boxplot
  • spatial_bias, spatial_overlay, spatial_distribution
  • curtain, scorecard
  • site_timeseries, flight_timeseries, per_site_timeseries, track_map_3d

Observation-only (no model required):

  • obs_timeseries, obs_vertical_profile, obs_flight_track, obs_histogram, obs_lma_density

Observation-only plots use obs: instead of pairs::

plots:
  dc8_track:
    type: obs_flight_track
    obs: dc8
    variable: "O3"
    title: "DC-8 Flight Tracks: O3"

stats

Current runtime supports both metrics and stat_list keys for paired model-vs-observation runs.

stats:
  metrics: [N, MO, MP, MB, RMSE, R, R2, NMB, NME, IOA]
  round_output: 3
  output_table: true
  per_flight: true

Use stat_list if you want to follow schema naming; metrics is accepted by the pipeline runtime for compatibility.

Observation-only runs currently compute a fixed descriptive set (N, mean, median, std, min, max, p10, p25, p75, p90) in ObsStatisticsStage; the stats block is not yet used to customize those metrics.

Environment variables in YAML

String values support ${VAR} expansion.

analysis:
  output_dir: ${ASIA_AQ_ANALYSIS}/output

model:
  cesm:
    files: ${ASIA_AQ_DATA}/CAM/*.nc

Observation-only mode

When no model section is present in the config, the pipeline automatically switches to observation-only mode. Current stage order is load_observations -> obs_statistics -> obs_plotting -> save_results.

Plots are written to analysis.output_dir as usual. Observation-only descriptive statistics are computed, but the current save_results stage does not export them to CSV.

# No model section — triggers obs-only pipeline
analysis:
  start_time: "2012-05-18"
  end_time: "2012-06-22"
  output_dir: ./output

obs:
  dc8:
    obs_type: aircraft
    filename: /path/to/dc8_merge_*.ict
    variables:
      "O3": { source_name: O3_ESRL }

plots:
  dc8_profile:
    type: obs_vertical_profile
    obs: dc8
    variable: "O3"

Validate before running

davinci-monet validate config.yaml --show
davinci-monet run config.yaml

Clone this wiki locally