Problem
pivot metrics show fails to parse metrics YAML files that contain Python-specific YAML tags. This happens when stage functions annotate outputs as pivot.metric using types like pd.DataFrame or custom dataclasses.
Affected stages in eval-pipeline
| Stage |
Type |
Error |
ga_paper/generate_agent_summary:summary |
pd.DataFrame |
could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:pandas.core.frame.DataFrame' |
eval_pipeline_horizon/calculate_baseline_statistics |
SourceStats dataclass |
could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:eval_pipeline.horizon.calculate_baseline_statistics.SourceStats' |
eval_pipeline_horizon/generate_agent_summary:agent_summary |
pd.DataFrame |
same as above |
mr_time_horizon_1_0/generate_agent_summary:agent_summary |
pd.DataFrame |
same |
mr_time_horizon_1_1/generate_agent_summary:agent_summary |
pd.DataFrame |
same |
Current behavior
Pivot writes these metrics using yaml.dump() with default settings, which embeds Python-specific tags (!!python/object:...). When pivot metrics show later reads them with safe YAML loading, it can't parse the tags and emits a warning:
Failed to parse metrics from ga_paper/generate_agent_summary:summary: Failed to parse .../summary.yaml: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:pandas.core.frame.DataFrame'
The metric is silently omitted from output.
Expected behavior
Pivot should either:
- Serialize at write time: When a metric value is a DataFrame or dataclass, automatically convert to a YAML-safe dict before writing (e.g.,
df.to_dict(), dataclasses.asdict())
- Deserialize at read time: Use a restricted set of constructors that can handle common types like DataFrames and dataclasses
Option 1 is probably better — it keeps the YAML files human-readable and avoids security concerns with yaml.unsafe_load.
Reproduction
cd eval_pipeline/difficulty
pivot metrics show --all
# Look for "Failed to parse metrics" warnings
Context
Found during E2E smoke testing of the eval-pipeline (199 stages across 7 sub-pipelines). The metrics write correctly at stage execution time but can't be read back by the CLI.
Problem
pivot metrics showfails to parse metrics YAML files that contain Python-specific YAML tags. This happens when stage functions annotate outputs aspivot.metricusing types likepd.DataFrameor custom dataclasses.Affected stages in eval-pipeline
ga_paper/generate_agent_summary:summarypd.DataFramecould not determine a constructor for the tag 'tag:yaml.org,2002:python/object:pandas.core.frame.DataFrame'eval_pipeline_horizon/calculate_baseline_statisticsSourceStatsdataclasscould not determine a constructor for the tag 'tag:yaml.org,2002:python/object:eval_pipeline.horizon.calculate_baseline_statistics.SourceStats'eval_pipeline_horizon/generate_agent_summary:agent_summarypd.DataFramemr_time_horizon_1_0/generate_agent_summary:agent_summarypd.DataFramemr_time_horizon_1_1/generate_agent_summary:agent_summarypd.DataFrameCurrent behavior
Pivot writes these metrics using
yaml.dump()with default settings, which embeds Python-specific tags (!!python/object:...). Whenpivot metrics showlater reads them with safe YAML loading, it can't parse the tags and emits a warning:The metric is silently omitted from output.
Expected behavior
Pivot should either:
df.to_dict(),dataclasses.asdict())Option 1 is probably better — it keeps the YAML files human-readable and avoids security concerns with
yaml.unsafe_load.Reproduction
Context
Found during E2E smoke testing of the eval-pipeline (199 stages across 7 sub-pipelines). The metrics write correctly at stage execution time but can't be read back by the CLI.