Skip to content

feat: Firecrawl cloud browser driver + dedicated search/scrape tools#1

Open
developersdigest wants to merge 23 commits intomainfrom
feat/firecrawl-browser-driver
Open

feat: Firecrawl cloud browser driver + dedicated search/scrape tools#1
developersdigest wants to merge 23 commits intomainfrom
feat/firecrawl-browser-driver

Conversation

@developersdigest
Copy link
Copy Markdown
Member

Summary

Adds Firecrawl as a first-class browser driver alongside openclaw and extension, plus dedicated firecrawl_search and firecrawl_scrape agent tools.

Browser driver (driver: "firecrawl"):

  • Session lifecycle via Firecrawl v2 API (POST /v2/browser, DELETE /v2/browser/:id) — no SDK, raw fetch()
  • Auto-creates a firecrawl profile when API key is present (matches search/scrape auto-enablement)
  • Dynamic cdpUrl management: ensureBrowserAvailable() creates a cloud session, injects the WSS URL into profile state, and Playwright connects via connectOverCDP()
  • Session reuse: checks existing session reachability before creating new ones
  • Surfaces liveViewUrl and interactiveLiveViewUrl in browser status for human-in-the-loop
  • Firecrawl-aware health checks (WSS handshake instead of /json/version)

Search & scrape tools:

  • firecrawl_search: v2 search with data.web response format handling
  • firecrawl_scrape: scrape with markdown output

Config & onboarding:

  • Firecrawl OAuth onboarding step (API key setup)
  • Zod schema for firecrawl config section
  • driver: "firecrawl" added to browser profile type

Bug fixes included:

  • Preserve dynamic cdpUrl across config hot-reload (was being clobbered by applyResolvedConfig())
  • Use runtime getCdpUrl() in all route handlers (snapshot, agent actions, status)
  • Handle v2 search nested data.web response format

Test plan

  • pnpm test -- --run src/browser/firecrawl-browser.test.ts — session create/delete/reachability
  • pnpm test -- --run src/browser/server-context.firecrawl-availability.test.ts — availability, reachability, stop, config-refresh clobber fix
  • pnpm test -- --run src/browser/config.test.ts — firecrawl profile resolution
  • pnpm test -- --run src/agents/tools/firecrawl-tools.test.ts — search/scrape tools + v2 format
  • pnpm test -- --run src/commands/onboard-firecrawl.test.ts — onboarding flow
  • End-to-end headless: browser startbrowser openbrowser snapshotbrowser stop against live Firecrawl API
  • Live search + scrape verified working

…e contexts

Three bugs fixed:

1. applyResolvedConfig() in resolved-config-refresh.ts unconditionally
   overwrote runtime.profile with a freshly resolved profile on every
   request, resetting cdpUrl back to "" and destroying the dynamic WSS
   URL that ensureBrowserAvailable() set from the firecrawl session.

2. ensureBrowserAvailable() early return path (existing session still
   alive) didn't re-apply cdpUrl, so if config refresh cleared it
   between requests, it stayed empty.

3. Route handlers (snapshot, tab context) read cdpUrl from the static
   profileCtx.profile instead of the runtime profile state. Added
   getCdpUrl() getter to ProfileContext that reads from runtime state.

Also fixes API request body field names (ttlTotal/ttlWithoutActivity →
ttl/activityTtl) to match the actual Firecrawl v2 browser API, and
removes debug console.error left from prior debugging session.
The Firecrawl v2 search API returns { data: { web: [...] } } instead
of the v1 flat { data: [...] }. Handle both formats gracefully.
Capture both URLs from the Firecrawl v2 browser API response.
Interactive allows human click/type; read-only is watch-only.
…sions

focusTab and closeTab in selection ops used the static profile.cdpUrl
(always empty for firecrawl) instead of the runtime cdpUrl set by
ensureBrowserAvailable. Also fix listProfiles to detect active firecrawl
sessions (no RunningChrome process) and report runtime cdpUrl.
Add FIRECRAWL_TOOL_NAMES constant and update applyFirecrawlKey to merge Firecrawl tool names into tools.alsoAllow (deduplicating against existing entries). Also append source=openclaw to the Firecrawl auth URL during browser auth flow to include the source parameter in the authorization request.
Trim verbose mocking, add assertions for alsoAllow tool merge,
deduplication, and source=openclaw auth URL param.
Apply the same fc- prefix check to browser auth results,
not just manual entry. Drop unnecessary async from generateCodeChallenge.
Prevents a stalled request from blocking the spinner indefinitely.
The outer catch already handles network errors so this just aborts cleanly.
- ensureBrowserAvailable now re-reads the API key from disk instead of
  using the stale value captured at startup, so runtime config updates
  take effect without a restart.
- Profile creation route and service now forward driver="firecrawl"
  instead of silently dropping it to undefined.
Importing resolveFirecrawlApiKey from web-fetch.ts pulled in
external-content.js which broke the firecrawl availability test mock.
Inline the trivial config+env lookup instead. Also fix formatting.
- Clean up stale firecrawl sessions before creating replacements to
  avoid leaking cloud resources
- Add abort timeouts to firecrawl-browser.ts fetch calls (30s create,
  10s delete) so stalled requests don't block indefinitely
- Preserve base path prefix in resolveSearchEndpoint so reverse-proxied
  deployments resolve correctly
- Skip CDP port allocation for firecrawl profiles since they use cloud
  sessions, not local ports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant