🌐 Browser Agent

AI-powered browser automation with Vision and LangGraph.

An intelligent agent that can see, understand, and interact with web pages like a human. Built on LangGraph for robust state management and Ollama for flexible local or cloud-based vision models.

✨ Features

Feature	Description
🧠 LangGraph Orchestration	Robust state-machine logic for reliable task execution
🔍 Vision-First Interaction	Analyzes real-time screenshots to understand page state
🖱️ Coordinate Precision	Uses (x,y) coordinates for interactions, avoiding fragile selectors
💾 Dual Memory System	Short-term context (session) + Long-term persisted memory
⌨️ Full Keyboard/Mouse	Types, scrolls, and presses keys (Enter, Tab, etc.)
🔄 Stateful Loops	Automatically captures screen after every action for verification
💬 Professional CLI	Clean, interactive terminal interface with progress feedback

🚀 Quick Start

Prerequisites

Python 3.12+
uv package manager
Ollama installed and running

Installation

# Clone the repo
git clone https://github.com/RaheesAhmed/browseragent.git
cd browseragent

# Install dependencies
uv sync

# Install Playwright browsers
uv run playwright install

# Pull recommended vision model
ollama pull qwen2.5-vl  # Or llama3.2-vision

Run

uv run python main.py

💡 Usage Examples

❯ "Go to google.com and search for 'LangGraph documentation'"
❯ "Login to github.com and check my recent notifications"
❯ "Visit Wikipedia and tell me about the history of Artificial Intelligence"

🧠 Architecture

The agent uses a StateGraph to manage the execution loop, ensuring it always "sees" the browser state before making its next decision.

graph TD
    START -->|Initialize| Capture[Capture Screen]
    Capture -->|Vision Input| Agent[LLM Agent]
    Agent -->|Tool Calls| Tools[Execution Node]
    Tools -->|Result| Capture
    Agent -->|Finish| END

⚙️ Configuration

Model Selection

Modify src/config.py to change the model used by Ollama. You can switch between local models (e.g., qwen2.5-vl) or cloud-based models.

MODEL = "minimax-m2.5:cloud"  # Edit this to change the model!

Advanced Settings

Browser State: Managed in src/browser_manager.py (Default: 1280x800 viewport).
Tools: Extendable list in src/tools.py.

🎮 CLI Controls

Command	Action
`exit` / `quit`	Close the agent
`Ctrl+C`	Force terminate session

📁 Project Structure

browseragent/
├── main.py              # CLI entry point (Run agent)
├── src/
│   ├── agent.py         # LangGraph state machine & reasoning
│   ├── browser_manager.py # Playwright low-level control
│   ├── tools.py         # External tools (Navigation, Memory)
│   └── config.py        # Global settings (Model selection)
├── memory.json          # Persistent long-term memory store
└── .env                 # Environment variables

🛠️ Browser Actions

Action	Description
`navigate`	Go to a specific URL
`click_at_location`	Click at (x, y) coordinates
`type_text`	Type text and optionally press Enter
`press_key`	Single key press (Tab, Escape, etc.)
`scroll_page`	Scroll up or down
`save/get_memory`	Interact with persistent memory

📝 License

MIT

Built with ❤️ using LangGraph, Playwright, and Ollama

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
Screenshot.png		Screenshot.png
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌐 Browser Agent

✨ Features

🚀 Quick Start

Prerequisites

Installation

Run

💡 Usage Examples

🧠 Architecture

⚙️ Configuration

Model Selection

Advanced Settings

🎮 CLI Controls

📁 Project Structure

🛠️ Browser Actions

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🌐 Browser Agent

✨ Features

🚀 Quick Start

Prerequisites

Installation

Run

💡 Usage Examples

🧠 Architecture

⚙️ Configuration

Model Selection

Advanced Settings

🎮 CLI Controls

📁 Project Structure

🛠️ Browser Actions

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages