GitHub - utsavgu/agentaction: Part of the project for DevFest'26

Step 8: Agent Action Layer

This module implements only Step 8:

Input: AOI metadata + screenshot crop path
Output: lightweight, reversible AssistPayload JSON for UI actions.

Install

python -m pip install -r requirements.txt

CLI

python agent_action.py \
  --image path.png \
  --doc_id X \
  --aoi_id Y \
  --aoi_type paragraph \
  --state confused

K2 Configuration Placeholders

The file /Users/utsavgupta/Documents/New project/agent_action.py includes editable placeholders:

DEFAULT_K2_BASE_URL = "https://YOUR_K2_BASE_URL"
DEFAULT_K2_MODEL = "YOUR_K2_MODEL"
DEFAULT_K2_API_KEY_ENV = "K2_API_KEY"

You can either edit those constants or pass values at runtime.

Exact Run Steps (with your API key)

Set your API key env var (replace the value):

export K2_API_KEY="PASTE_YOUR_REAL_KEY_HERE"

Run in K2 mode (replace URL/model):

python agent_action.py \
  --image /absolute/path/to/crop.png \
  --doc_id doc-123 \
  --aoi_id aoi-9 \
  --aoi_type paragraph \
  --state confused \
  --llm_mode k2 \
  --k2_base_url "https://YOUR_K2_BASE_URL" \
  --k2_model "YOUR_K2_MODEL" \
  --k2_api_key_env K2_API_KEY

The output JSON includes:

telemetry.llm_config.k2_api_key_present to confirm key visibility
telemetry.llm_preview placeholder text showing whether config is complete

Behavior:

Validates image path.
Acquires text in strict priority order:

AOIEvent.text_hint if non-empty and > 20 chars.
doc_text_provider.get_text(doc_id, aoi_id) (stub interface).
OCR fallback via pytesseract.
Image-only heuristics if OCR is poor/empty.

Routes by reader state (confused, interested, skimming, revising).
Returns 1–3 action cards with required buttons:
- Explain (explain_short)
- Explain deeper (explain_expanded)
- Dismiss (dismiss)
- I already know this (feedback_known)
- optional Make flashcards

Runnable examples (required)

Run all 3 examples:

python agent_action.py --run_examples

This prints JSON payloads for:

paragraph confusion
equation confusion
code confusion

Example output shape:

{
  "aoi_id": "aoi-p-1",
  "doc_id": "doc-paragraph",
  "state": "confused",
  "extracted_text": "Photosynthesis converts light energy into chemical energy...",
  "detected_language": "en",
  "actions": [
    {
      "title": "Direct explanation",
      "body": "Start here: ...",
      "buttons": [
        { "label": "Explain", "action_id": "explain_short" },
        { "label": "Explain deeper", "action_id": "explain_expanded" },
        { "label": "Dismiss", "action_id": "dismiss" },
        { "label": "I already know this", "action_id": "feedback_known" }
      ]
    }
  ],
  "suggested_prompts": [
    "[explain_short|short]\\n...",
    "[explain_short|expanded]\\n...",
    "[explain_expanded|short]\\n...",
    "[explain_expanded|expanded]\\n..."
  ],
  "telemetry": {
    "ocr_used": false,
    "confidence": 0.92,
    "heuristics": {
      "priority_order": [
        "text_hint_if_len_gt_20",
        "doc_text_provider_get_text",
        "ocr_with_pytesseract",
        "image_only_type_heuristics"
      ]
    }
  }
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
agent_action_template.py		agent_action_template.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Step 8: Agent Action Layer

Install

CLI

K2 Configuration Placeholders

Exact Run Steps (with your API key)

Runnable examples (required)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Step 8: Agent Action Layer

Install

CLI

K2 Configuration Placeholders

Exact Run Steps (with your API key)

Runnable examples (required)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages