Skip to content

utsavgu/agentaction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Step 8: Agent Action Layer

This module implements only Step 8:

Input: AOI metadata + screenshot crop path
Output: lightweight, reversible AssistPayload JSON for UI actions.

Install

python -m pip install -r requirements.txt

CLI

python agent_action.py \
  --image path.png \
  --doc_id X \
  --aoi_id Y \
  --aoi_type paragraph \
  --state confused

K2 Configuration Placeholders

The file /Users/utsavgupta/Documents/New project/agent_action.py includes editable placeholders:

  • DEFAULT_K2_BASE_URL = "https://YOUR_K2_BASE_URL"
  • DEFAULT_K2_MODEL = "YOUR_K2_MODEL"
  • DEFAULT_K2_API_KEY_ENV = "K2_API_KEY"

You can either edit those constants or pass values at runtime.

Exact Run Steps (with your API key)

  1. Set your API key env var (replace the value):
export K2_API_KEY="PASTE_YOUR_REAL_KEY_HERE"
  1. Run in K2 mode (replace URL/model):
python agent_action.py \
  --image /absolute/path/to/crop.png \
  --doc_id doc-123 \
  --aoi_id aoi-9 \
  --aoi_type paragraph \
  --state confused \
  --llm_mode k2 \
  --k2_base_url "https://YOUR_K2_BASE_URL" \
  --k2_model "YOUR_K2_MODEL" \
  --k2_api_key_env K2_API_KEY
  1. The output JSON includes:
  • telemetry.llm_config.k2_api_key_present to confirm key visibility
  • telemetry.llm_preview placeholder text showing whether config is complete

Behavior:

  • Validates image path.
  • Acquires text in strict priority order:
  1. AOIEvent.text_hint if non-empty and > 20 chars.
  2. doc_text_provider.get_text(doc_id, aoi_id) (stub interface).
  3. OCR fallback via pytesseract.
  4. Image-only heuristics if OCR is poor/empty.
  • Routes by reader state (confused, interested, skimming, revising).
  • Returns 1–3 action cards with required buttons:
    • Explain (explain_short)
    • Explain deeper (explain_expanded)
    • Dismiss (dismiss)
    • I already know this (feedback_known)
    • optional Make flashcards

Runnable examples (required)

Run all 3 examples:

python agent_action.py --run_examples

This prints JSON payloads for:

  1. paragraph confusion
  2. equation confusion
  3. code confusion

Example output shape:

{
  "aoi_id": "aoi-p-1",
  "doc_id": "doc-paragraph",
  "state": "confused",
  "extracted_text": "Photosynthesis converts light energy into chemical energy...",
  "detected_language": "en",
  "actions": [
    {
      "title": "Direct explanation",
      "body": "Start here: ...",
      "buttons": [
        { "label": "Explain", "action_id": "explain_short" },
        { "label": "Explain deeper", "action_id": "explain_expanded" },
        { "label": "Dismiss", "action_id": "dismiss" },
        { "label": "I already know this", "action_id": "feedback_known" }
      ]
    }
  ],
  "suggested_prompts": [
    "[explain_short|short]\\n...",
    "[explain_short|expanded]\\n...",
    "[explain_expanded|short]\\n...",
    "[explain_expanded|expanded]\\n..."
  ],
  "telemetry": {
    "ocr_used": false,
    "confidence": 0.92,
    "heuristics": {
      "priority_order": [
        "text_hint_if_len_gt_20",
        "doc_text_provider_get_text",
        "ocr_with_pytesseract",
        "image_only_type_heuristics"
      ]
    }
  }
}

About

Part of the project for DevFest'26

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages