Skip to content

generate_presentation: image embedding in slides via ppt-rs::add_image #3209

@oxoxDev

Description

@oxoxDev

Summary

generate_presentation (#2778, PR #3016) currently emits text-only slides — titles, bodies, bullets, speaker notes. Real decks usually include images: charts, screenshots, photo placeholders. ppt-rs already supports image embedding via Image::from_bytes(...).add_to(...) on the slide builder. Wire it through so the agent can dispatch a deck with images alongside text.

Problem

Users asking the agent to "make a deck with the Q3 revenue chart" or "include a screenshot of the dashboard" get a text-only deck back today. The tool spec on SlideSpec has no images field, the Rust engine doesn't accept image payloads, and there's no path from a chat-uploaded image or a Composio-fetched file to a slide-embedded asset.

Solution (optional)

  1. Extend SlideSpec (src/openhuman/tools/impl/presentation/types.rs) with an optional images: Vec<SlideImage> field. SlideImage carries either an in-workspace artifact id, a [FILE:path] marker (per Extend multimodal input to accept document and file attachments beyond images #2777), or a remote URL with a size cap.
  2. Resolve image references in the engine (engine.rs build_slides): walk each SlideImage, fetch bytes (with a deadline-bounded HTTP client for URLs, tokio::fs::read for FILE markers, artifacts::store::get_artifact_bytes for artifact ids), decode + validate (PNG/JPEG/WebP only, max 5 MB), and hand off to ppt-rs's Image::from_bytes.
  3. Layout heuristics: when a slide has both bullets and images, lay them out side-by-side; when only images, full-bleed. Start with a fixed two-column grid for ≤2 images, 2x2 grid for 3-4. Defer arbitrary positioning.
  4. Tests: round-trip a PNG, assert image entry present in the [Content_Types].xml and ppt/media/image1.png entry in the resulting zip. Size-cap rejection. MIME rejection.
  5. Orchestrator prompt update (agent_registry/agents/orchestrator/prompt.md): document that generate_presentation accepts an images array, and clarify the grounding rule covers image fetches (e.g. agent must research/memory_tree before claiming a chart shows X).

Acceptance criteria

  • SlideSpec.images accepted — passing SlideImage entries on a slide attaches the image at generation time.
  • Three resolution paths — artifact-id, [FILE:path] marker (depends on Extend multimodal input to accept document and file attachments beyond images #2777), remote URL (deadline-bounded, ≤5 MB, allowed MIME).
  • Layout heuristic — single image full-bleed; multiple images grid-arranged.
  • Tests — round-trip a PNG with the deck, assert media entry in the zip; reject oversized; reject wrong MIME.
  • Diff coverage ≥ 80% — per .github/workflows/pr-ci.yml.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    taskWork item that is not primarily a bug or a feature.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions