Z.AI Vision Pi Extension

8 vision AI tools for the pi coding agent, powered by Z.AI / Zhipu AI (GLM-4.6v, GLM-4.5v).

Features

Analyze images, screenshots, diagrams, charts, and videos
Extract text / OCR from screenshots with formatting preserved
Diagnose error screenshots with root-cause analysis and suggested fixes
Convert UI mockups to code, prompts, specs, or descriptions
Visual regression testing — compare expected vs actual UI screenshots
Local files are auto base64-encoded; remote URLs pass through directly
Auto-retry with exponential backoff on rate limits and transient errors
AbortSignal support — long-running analysis can be cancelled mid-flight

Installation

Global install

# Copy the extension into pi's global extensions folder
cp -r $(pwd) ~/.pi/agent/extensions/zai-vision/

Project-local install

# Copy into the current project's pi extensions directory
cp -r $(pwd) .pi/extensions/zai-vision/

Symlink for development

# Live reload as you edit
ln -s $(pwd) ~/.pi/agent/extensions/zai-vision

After installing, restart pi or run /reload to load the extension.

Configuration

Variable	Required	Default	Description
`Z_AI_API_KEY`	✅	—	API key from Z.AI or Zhipu
`Z_AI_MODE`	❌	`ZAI`	Platform: `ZAI` (https://api.z.ai) or `ZHIPU` (https://open.bigmodel.cn)

Set them in your shell profile or before launching pi:

export Z_AI_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
export Z_AI_MODE="ZAI"   # or "ZHIPU"

The extension validates the key on session start and shows a warning notification if it is missing or looks like a placeholder.

Platform defaults

Mode	Base URL	Default Model
`ZAI`	`https://api.z.ai/api/paas/v4/`	`glm-4.6v`
`ZHIPU`	`https://open.bigmodel.cn/api/paas/v4/`	`glm-4.6v`

Available Tools

`analyze_image`

General-purpose image understanding — identify objects, scenes, text, and answer open-ended questions.

Parameter	Type	Required	Description
`image_source`	`string`	✅	Path to a local image file or a remote URL (http/https)
`prompt`	`string`	❌	Optional additional instructions for the analysis

Example use case: "What objects are in this photo?" / ~/Downloads/photo.png

`extract_text_from_screenshot`

Extract and transcribe all visible text from a screenshot while preserving layout and formatting.

Parameter	Type	Required	Description
`image_source`	`string`	✅	Path or URL to the screenshot
`prompt`	`string`	❌	Optional instructions (e.g. "only the code block")
`programming_language`	`string`	❌	Language hint for better code extraction (e.g. `python`, `typescript`)

Example use case: "Extract the Python code from this terminal screenshot."

`diagnose_error_screenshot`

Analyze error screenshots (stack traces, dialog boxes, browser consoles) to identify root cause and suggest fixes.

Parameter	Type	Required	Description
`image_source`	`string`	✅	Path or URL to the error screenshot
`prompt`	`string`	❌	Specific question about the error
`context`	`string`	❌	Additional context about what was happening when the error occurred

Example use case: "Why is this TypeError happening?" with context: "I just upgraded React to 19."

`understand_technical_diagram`

Explain architecture diagrams, UML, flowcharts, ER diagrams, network topologies, and sequence diagrams.

Parameter	Type	Required	Description
`image_source`	`string`	✅	Path or URL to the diagram
`prompt`	`string`	❌	Optional focus instructions
`diagram_type`	`enum`	❌	One of: `architecture`, `uml`, `flowchart`, `er`, `network`, `sequence`, `other`

Example use case: "Explain this microservices architecture diagram."

`analyze_data_visualization`

Extract insights from charts, dashboards, plots, and data visualizations.

Parameter	Type	Required	Description
`image_source`	`string`	✅	Path or URL to the chart / dashboard
`prompt`	`string`	❌	Optional instructions
`analysis_focus`	`enum`	❌	One of: `trends`, `anomalies`, `comparisons`, `metrics`, `comprehensive`

Example use case: "What trends does this Grafana dashboard show?" with analysis_focus: trends

`ui_to_artifact`

Convert a UI screenshot or mockup into a concrete artifact.

Parameter	Type	Required	Description
`image_source`	`string`	✅	Path or URL to the UI screenshot
`output_type`	`enum`	✅	`code`, `prompt`, `spec`, or `description`
`prompt`	`string`	❌	Optional extra instructions

Example use cases:

output_type: code → "Turn this Figma mockup into React + Tailwind."
output_type: spec → "Write a design spec for this settings page."
output_type: prompt → "Generate an image-generation prompt that recreates this UI."

`ui_diff_check`

Compare two UI screenshots (expected vs actual) and report visual differences.

Parameter	Type	Required	Description
`expected_image`	`string`	✅	Path or URL to the reference / expected screenshot
`actual_image`	`string`	✅	Path or URL to the actual / current screenshot
`prompt`	`string`	❌	Optional focus (e.g. "only compare the header")

Example use case: "Compare expected.png vs actual.png and list all visual regressions."

`analyze_video`

Analyze video content and answer questions about events, characters, actions, and scenes.

Parameter	Type	Required	Description
`video_source`	`string`	✅	Path to a local video file or a remote URL
`prompt`	`string`	❌	Optional instructions for the analysis

Supported formats: .mp4, .mov, .mkv
Size limits: images ≤ 5 MB; videos ≤ 50 MB (local base64), 200 MB (remote URL)

Example use case: "Describe what happens in this 10-second screen recording."

Usage Examples

These are typical conversational invocations — pi decides which tool to call based on your message.

User: analyze this screenshot: ~/Downloads/error.png
→ pi calls diagnose_error_screenshot

User: turn this mockup into React: https://example.com/mockup.png
→ pi calls ui_to_artifact with output_type=code

User: compare expected.png vs actual.png
→ pi calls ui_diff_check

User: extract the code from this screenshot
→ pi calls extract_text_from_screenshot

User: what's in this video? ~/Downloads/demo.mp4
→ pi calls analyze_video

User: explain this architecture diagram
→ pi calls understand_technical_diagram

Troubleshooting

Symptom	Cause	Fix
`⚠️ MISSING_API_KEY` notification	`Z_AI_API_KEY` not set	Export the env var and restart pi
`⚠️ INVALID_API_KEY`	Key is wrong, expired, or a placeholder	Regenerate at Z.AI or Zhipu console
`Error: File too large`	Image > 5 MB or video > 50 MB (local)	Compress, resize, or use a remote URL
`Error: Unsupported format`	Wrong file extension	Convert to `.png`, `.jpg`, `.jpeg`, `.mp4`, `.mov`, or `.mkv`
`Error: Rate limited after retries`	Too many requests	Wait a moment and retry
`Error: TIMEOUT`	Request exceeded 5 minutes	Retry with a smaller image/video or shorter prompt
Tool returns generic `API_ERROR`	Bad request or server error	Check that URLs are publicly accessible and the file isn't corrupted

Checking status

Run /zai in pi to see the current configuration, model, and API key mask:

/zai
→ Z.AI Vision Extension — v0.1.0
  Platform:    ZAI
  Base URL:    https://api.z.ai/api/paas/v4/
  Model:       glm-4.6v
  API Key:     sk-abcd1...9xyz
  Timeout:     300s
  Max tokens:  32,768
  Status:      ✅ Ready

Development

# Install dev dependencies
npm install

# Type check
npx tsc --noEmit

# Reload extension in pi after edits
/reload

Project structure

src/
├── index.ts              # Entry point — registers tools + /zai command
├── config.ts             # Env validation (Z_AI_API_KEY, Z_AI_MODE)
├── api.ts                # HTTP client with retry, abort, timeout
├── file-service.ts       # Local file validation + base64 encoding
├── prompts.ts            # System prompts per tool category
├── shared/
│   ├── types.ts          # Shared interfaces and VisionError
│   └── tool-factory.ts   # createImageTool / createVideoTool factory
└── tools/
    ├── general-image.ts
    ├── text-extraction.ts
    ├── error-diagnosis.ts
    ├── diagram-analysis.ts
    ├── data-viz.ts
    ├── ui-to-artifact.ts
    ├── ui-diff.ts
    └── video-analysis.ts

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Z.AI Vision Pi Extension

Features

Installation

Global install

Project-local install

Symlink for development

Configuration

Platform defaults

Available Tools

`analyze_image`

`extract_text_from_screenshot`

`diagnose_error_screenshot`

`understand_technical_diagram`

`analyze_data_visualization`

`ui_to_artifact`

`ui_diff_check`

`analyze_video`

Usage Examples

Troubleshooting

Checking status

Development

Project structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Z.AI Vision Pi Extension

Features

Installation

Global install

Project-local install

Symlink for development

Configuration

Platform defaults

Available Tools

analyze_image

extract_text_from_screenshot

diagnose_error_screenshot

understand_technical_diagram

analyze_data_visualization

ui_to_artifact

ui_diff_check

analyze_video

Usage Examples

Troubleshooting

Checking status

Development

Project structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`analyze_image`

`extract_text_from_screenshot`

`diagnose_error_screenshot`

`understand_technical_diagram`

`analyze_data_visualization`

`ui_to_artifact`

`ui_diff_check`

`analyze_video`

Packages