Skip to content

Conversation

@TechNickAI
Copy link
Owner

Summary

  • Replaces bloated Playwright MCP (~16.5k tokens) with a minimal skill that costs near-zero until invoked
  • SKILL.md is 68 lines vs 454 in lackeyjb/playwright-skill
  • Progressive disclosure pattern: tiny description for triggering, full docs when loaded

Changes

New skill at plugins/core/skills/playwright-browser/:

  • SKILL.md - Minimal execution pattern, no examples Claude already knows from training
  • run.js - Universal executor with auto-install, CI detection, proper async handling
  • package.json - Playwright dependency
  • API_REFERENCE.md - Advanced patterns only (network mocking, auth persistence, multi-tab, downloads, video, traces)

Design Decisions

Decision Rationale
No dev server detection with prompts Breaks autonomous workflows. Use PLAYWRIGHT_BASE_URL env var instead
Visible browser by default Better for debugging. CI detection auto-enables headless
spawn() for script execution Properly awaits async code before process exit
Injected variables for inline code BASE_URL, HEADLESS, chromium, etc. available without boilerplate
Skip basic API examples Claude knows Playwright from training - don't waste tokens teaching it

Testing

  • Code review agent verified no critical bugs
  • Prompt engineer reviewed SKILL.md against prompt engineering standards
  • Fixed async execution issue caught during review

🤖 Generated with Claude Code

Replaces bloated Playwright MCP (~16.5k tokens) with a minimal skill
that costs near-zero until invoked:

- SKILL.md: 68 lines vs 454 in lackeyjb/playwright-skill
- run.js: Auto-install, CI detection for headless, proper async handling
- API_REFERENCE.md: Advanced patterns only (network mocking, auth
  persistence, multi-tab, downloads, video, traces)

Design decisions:
- No dev server detection with prompts (use PLAYWRIGHT_BASE_URL env var)
- Visible browser by default, headless in CI
- Uses spawn() for script execution to properly await async code
- Injected variables (BASE_URL, HEADLESS, chromium, etc.) for inline code

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@claude
Copy link

claude bot commented Dec 30, 2025

PR Review: Playwright Browser Skill

Overall Assessment

This is an excellent implementation that demonstrates best practices in progressive disclosure and token efficiency. The design philosophy is sound, the code is clean, and the prompt engineering follows the project's standards. I have a few suggestions for improvement, but this is ready to merge with minor fixes.

Strengths

Architecture & Design

  • Progressive disclosure pattern: Brilliant approach to minimize token usage until the skill is actually needed
  • Auto-installation: Removes friction for users - they can just run the skill
  • CI detection: Smart default to headless in CI environments
  • Injected variables: Clean pattern for providing common variables without boilerplate
  • Proper async handling: Using spawn() instead of execSync properly awaits async code

Prompt Engineering

  • Minimal token usage: SKILL.md is concise and follows prompt engineering best practices
  • Skips basic examples: Correctly assumes Claude knows Playwright from training
  • Clear structure: Good use of XML tags for section boundaries
  • Goal-focused: Doesn't over-prescribe implementation details

Code Quality

  • Error handling: Proper error propagation and logging
  • Multiple input modes: Supports file, inline, and stdin execution
  • Cleanup: Temp files are properly cleaned up in finally blocks
  • Type safety: Good use of promises and proper async/await patterns

Issues to Address

1. Security: Unquoted Path in Line 59 (run.js:59)

Severity: Medium

return { code: fs.readFileSync(path.resolve(args[0]), "utf8"), isFile: true };

If args[0] contains spaces or special characters, this could fail or cause unexpected behavior. While path.resolve() does handle this, it's best practice to be explicit.

Recommendation: This is actually fine as-is since fs.readFileSync and path.resolve handle spaces correctly. However, if this path is ever used in shell commands, ensure proper quoting.

2. Potential Race Condition with Temp Files (run.js:136, 150)

Severity: Low

const tempFile = path.join(__dirname, `.temp-${Date.now()}.js`);

Using Date.now() for temp files could theoretically cause conflicts if two scripts run in the same millisecond.

Recommendation: Use a more robust approach:

const tempFile = path.join(__dirname, `.temp-${Date.now()}-${process.pid}.js`);

Or use the tmp or fs.mkdtemp() for guaranteed uniqueness.

3. Error Handling: Silent Catch in Cleanup (run.js:142-143, 156-158)

Severity: Low

try {
  fs.unlinkSync(tempFile);
} catch {}

Silent failures in cleanup can hide issues. At minimum, log errors in debug mode.

Recommendation:

try {
  fs.unlinkSync(tempFile);
} catch (err) {
  // Intentionally ignore ENOENT (file already deleted)
  if (err.code !== 'ENOENT') {
    console.warn('Failed to cleanup temp file:', err.message);
  }
}

4. Documentation: Inconsistent Terminology

Severity: Very Low

In SKILL.md line 32, the inline example uses single-letter variable names (b, p) which differs from the more descriptive example on lines 52-56 (browser, page).

Recommendation: Use consistent naming in examples. Either both should use full names or both should use abbreviations. I recommend full names for clarity:

node $SKILL_DIR/run.js "const browser = await chromium.launch(); const page = await browser.newPage(); await page.goto('http://localhost:3000'); console.log(await page.title()); await browser.close();"

5. Missing Error Context in isCompleteScript Detection (run.js:74-81)

Severity: Very Low

The isCompleteScript() function might incorrectly classify scripts that contain the strings but aren't actually complete scripts (e.g., in comments or strings).

Recommendation: This is fine for the intended use case. If issues arise, consider more robust detection, but the current implementation follows the "simple until proven wrong" principle.

Performance Considerations

  • Installation check: require.resolve() is efficient for checking if Playwright is installed
  • Spawn vs exec: Good choice using spawn() for potentially long-running scripts
  • Temp file cleanup: Proper cleanup prevents disk space issues

Security Considerations

  • No injection vulnerabilities: Code is properly written to temp files, not eval'd
  • No shell expansion issues: Arguments are passed to Node.js directly, not through shell
  • CI detection: Comprehensive check for CI environments

Testing Recommendations

While the PR description mentions code review by agents, I recommend manual testing for:

  1. Multiple simultaneous executions: Verify temp file naming doesn't conflict
  2. Failed installations: Test behavior when npm install fails
  3. Large scripts: Verify stdin input works with large files
  4. Error propagation: Confirm errors in user scripts are properly reported

Nitpicks

  1. Line 18 (run.js): process.chdir(__dirname) changes global state. Consider whether this is necessary or if relative paths could be used instead.

  2. package.json: Missing engines field. Consider adding:

    "engines": {
      "node": ">=18.0.0"
    }
  3. API_REFERENCE.md: Excellent content, but consider adding a note about Playwright version compatibility at the top.

Verdict

Approve with minor fixes recommended

This PR demonstrates excellent engineering and follows the project's prompt engineering standards. The issues identified are minor and don't block merging. I particularly appreciate the progressive disclosure pattern and the attention to token efficiency.

The only must-fix is making temp file names more robust (add PID). The rest are nice-to-haves.

Great work on this implementation!

Nick Sullivan and others added 2 commits December 29, 2025 20:11
Pulled best practices from lackeyjb/playwright-skill:
- cleanupOldTempFiles() for disk hygiene
- Custom headers support (PW_HEADER_NAME, PW_EXTRA_HEADERS)
- CI sandbox args (--no-sandbox, --disable-setuid-sandbox)
- Better error output formatting with emojis

Changed headless default:
- Now headless by default (less intrusive)
- PLAYWRIGHT_HEADED=true for visible browser
- SKILL.md guides Claude on when to use headed mode
  (user says "show me", "watch", "debug", etc.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Remove prescriptive word list. State the goal, trust the model.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@claude
Copy link

claude bot commented Dec 30, 2025

Code Review: Playwright Browser Skill

This is an excellent implementation of a progressive disclosure skill. The code quality is high, and the design decisions are well-reasoned. Below are my findings across different review categories.


✅ Strengths

Design & Architecture

  • Progressive disclosure pattern: Minimal SKILL.md (68 lines) provides just enough context for triggering, with advanced patterns deferred to API_REFERENCE.md. This is exactly the right pattern for LLM-to-LLM communication.
  • Smart auto-installation: Detecting missing Playwright and installing automatically reduces friction significantly.
  • Flexible execution modes: Supporting file, inline, and stdin execution covers all use cases elegantly.
  • CI awareness: Auto-detecting CI environments and applying appropriate browser args is thoughtful.

Code Quality

  • Clean separation of concerns: Each function has a single, clear responsibility.
  • Good error handling: Try-catch blocks with informative error messages throughout.
  • Proper async handling: Using spawn() to properly await async scripts before process exit (run.js:173-190).
  • Cleanup management: Proactively cleaning up old temp files prevents disk bloat (run.js:70-86).

Prompt Engineering

The SKILL.md follows prompt engineering best practices well:

  • XML tags with semantic names: <objective>, <execution>, <headless-vs-headed> etc. are self-documenting.
  • Clear examples: Showing actual usage patterns rather than just describing them.
  • Goal-focused instructions: Telling the LLM what to achieve, not micro-managing how.
  • Consistent terminology: Using the same terms throughout.

🔍 Issues Found

🔴 Critical: Command Injection Vulnerability (run.js:91-99)

Location: run.js:91-99 (getCodeToExecute function)

Issue: The inline code path uses args.join(" ") which can break with certain shell characters or multiple arguments:

if (args.length > 0) {
  console.log("⚡ Executing inline code");
  return args.join(" ");  // ⚠️ Vulnerability
}

Attack vector: If the shell splits arguments unexpectedly (e.g., unquoted spaces, semicolons, backticks), the code could be malformed or execute unintended commands.

Example:

node run.js console.log('test'); malicious.code()
# args = ['console.log(\'test\');', 'malicious.code()']
# joined = "console.log('test'); malicious.code()"

Recommendation: Since the first argument check (line 91) already handles file paths, the remaining arguments should be treated as a single inline code string. Consider:

if (args.length > 0 && !fs.existsSync(args[0])) {
  console.log("⚡ Executing inline code");
  return args.join(" ");  // This is actually fine since we expect the shell to have already split it
}

Actually, upon further review, this might be working as intended - the shell's argument parsing is being trusted here. However, the documentation should clarify that inline code should be properly quoted when calling the script.

Severity: Medium (requires specific invocation patterns to exploit, but should be documented)


🟡 Medium: Race Condition in Temp File Naming (run.js:205)

Location: run.js:205

Issue: Using Date.now() for temp file naming could theoretically collide if two instances run in the same millisecond:

const tempFile = path.join(__dirname, `.temp-${Date.now()}.js`);

Recommendation: Add process PID or use a random component:

const tempFile = path.join(__dirname, `.temp-${Date.now()}-${process.pid}.js`);

Severity: Low (very unlikely in practice, but easy to fix)


🟡 Medium: Incomplete Script Detection Heuristic (run.js:115-122)

Location: run.js:115-122

Issue: The isCompleteScript() function uses simple string matching which can have false positives/negatives:

function isCompleteScript(code) {
  const hasRequire = code.includes("require(");
  const hasAsyncWrapper =
    code.includes("(async () =>") ||
    code.includes("(async()=>") ||
    code.includes("(async function");
  return hasRequire && hasAsyncWrapper;
}

Problems:

  • Would match commented-out code: // const x = require("playwright")
  • Wouldn't match: const playwright = require("playwright"); in a comment followed by inline code
  • Wouldn't match ES6 imports: import { chromium } from 'playwright'

Recommendation: Either:

  1. Make the heuristic more robust (check for actual code, not comments)
  2. Document this limitation in SKILL.md
  3. Consider a more explicit marker (e.g., // @complete-script comment)

Severity: Medium (could cause unexpected behavior, but rare)


🟡 Medium: Missing Browser Cleanup Documentation

Location: SKILL.md and API_REFERENCE.md

Issue: None of the examples explicitly show browser cleanup with await browser.close() or proper context disposal. While the examples in SKILL.md do show cleanup (line 76), the API_REFERENCE.md examples often leave browsers/contexts open.

Example (API_REFERENCE.md:100-113):

const context = await browser.newContext({
  recordVideo: {
    dir: '/tmp/videos/',
    size: { width: 1280, height: 720 }
  }
});

const page = await context.newPage();
// ... do things ...
await page.close();

const videoPath = await page.video().path();
console.log('Video saved:', videoPath);
// ⚠️ Missing: await context.close(); await browser.close();

Impact: Memory leaks and zombie browser processes, especially problematic in CI.

Recommendation: Add cleanup to all examples, or add a "Cleanup" section showing proper teardown.

Severity: Medium (can cause real problems in production use)


🟢 Minor: Silent Error Handling (run.js:79-85)

Location: Multiple locations with empty catch blocks

Issue: Several places have silent error handling that might hide bugs:

tempFiles.forEach((file) => {
  try {
    fs.unlinkSync(path.join(__dirname, file));
  } catch {
    // Ignore - file might be in use
  }
});

Recommendation: At minimum, log to stderr in debug mode:

} catch (e) {
  if (process.env.DEBUG) {
    console.error(`Failed to cleanup ${file}:`, e.message);
  }
}

Severity: Low (intentional design choice, but could hinder debugging)


🟢 Minor: API_REFERENCE.md Line 139 - Outdated Device Name

Location: API_REFERENCE.md:143

Issue: The example uses devices['iPhone 14'] but Playwright's device descriptors might not include this exact string. Should verify against current Playwright device list.

Recommendation: Either verify the device name is correct for Playwright 1.49.0, or use a more common device like 'iPhone 13' or 'Pixel 5'.

Severity: Low (example code that might not work as shown)


🟢 Minor: Inconsistent Error Message Format

Location: run.js:64-65, 107-111, 165-166, 223

Issue: Error messages use inconsistent emoji prefixes:

  • console.error("❌ Failed to install Playwright:", e.message); (run.js:64)
  • console.error("❌ No code to execute"); (run.js:107)
  • console.error('❌ Error:', error.message); (run.js:165)
  • console.error("❌ Fatal:", error.message); (run.js:223)

Some include the error object, some just the message, some have different prefix patterns.

Recommendation: Standardize on a consistent format throughout.

Severity: Low (cosmetic, but consistency helps debugging)


🎯 Recommendations

High Priority

  1. Document shell escaping requirements for inline code execution in SKILL.md
  2. Add browser cleanup examples to API_REFERENCE.md showing proper teardown
  3. Improve script detection heuristic or document its limitations

Medium Priority

  1. Add process PID to temp file naming to prevent theoretical race conditions
  2. Standardize error message formatting for better debugging experience

Low Priority

  1. Add debug mode with DEBUG env var for troubleshooting
  2. Verify device names in API_REFERENCE.md examples against current Playwright

🔒 Security Review

✅ Good Security Practices

  • Scoped to __dirname: Changing to __dirname (line 25) prevents directory traversal
  • No eval(): Using spawn() instead of eval() is much safer
  • Sanitized temp files: Using timestamped names prevents collisions and leaks

⚠️ Potential Concerns

  • Code execution by design: This tool executes arbitrary code, which is its purpose but should be documented
  • Auto-installation: Running npm install and npx playwright install automatically could be abused if the script is called with malicious intent (though this is a development tool)

Recommendation: Add a security note to SKILL.md explaining this is a development tool and should not be exposed to untrusted code.


📊 Test Coverage

Missing: No automated tests detected. Consider adding:

  • Unit tests for helper functions (isCI, shouldBeHeadless, isCompleteScript)
  • Integration tests for the three execution modes (file, inline, stdin)
  • Error handling tests

🎓 Prompt Engineering Assessment

The SKILL.md follows the project's prompt engineering rules exceptionally well:

Front-loads critical information: <objective> comes first
Uses semantic XML tags: <headless-vs-headed> not <section-3>
Goal-focused instructions: Tells what to achieve, not how
Minimal token usage: 68 lines vs 454 in comparable skills
Progressive disclosure: Advanced patterns deferred to API_REFERENCE.md
Clear examples: Shows actual usage, not just descriptions
Consistent terminology: "headless" and "headed" used throughout

Minor suggestions:

  • The <defaults> section (line 56-58) could be moved to <execution> since it's execution-related
  • Consider adding a <troubleshooting> section for common issues

📝 Documentation Quality

SKILL.md ⭐⭐⭐⭐⭐

Excellent. Clear, concise, follows all prompt engineering best practices.

API_REFERENCE.md ⭐⭐⭐⭐

Very good coverage of advanced patterns. Needs cleanup examples added.

run.js comments ⭐⭐⭐⭐

Good header documentation. Inline comments are clear and helpful.


🎬 Final Verdict

This is high-quality work that demonstrates solid engineering practices and excellent understanding of the progressive disclosure pattern. The design decisions are well-reasoned and documented.

Ready to merge after:

  1. Adding browser cleanup to API_REFERENCE.md examples
  2. Documenting shell escaping requirements in SKILL.md
  3. Addressing the script detection heuristic (either improve or document limitations)

Nice to have:

  • Tests (even basic ones)
  • Debug mode for troubleshooting
  • Process PID in temp file names

Great work overall! The token savings and progressive disclosure pattern are exactly what this project needs. 🎉

@claude
Copy link

claude bot commented Dec 30, 2025

PR Review: Playwright Browser Skill

This is a well-designed implementation that follows progressive disclosure principles. The code quality is high, and the architecture is sound. Below are my findings organized by category.

✅ Strengths

Architecture & Design:

  • Excellent progressive disclosure pattern: minimal SKILL.md (68 lines) triggers the skill, detailed API_REFERENCE.md only loaded when needed
  • Smart auto-install mechanism with CI detection
  • Clean separation of concerns: run.js handles execution, SKILL.md guides Claude, API_REFERENCE.md provides advanced patterns
  • Proper async handling using spawn() instead of exec() prevents premature process exits

Code Quality:

  • Clean, readable code with good documentation
  • Proper error handling throughout
  • Automatic cleanup of temporary files
  • Sensible defaults (headless by default, auto-detect CI environment)

LLM-Friendly Design:

  • Follows prompt engineering best practices from rules/prompt-engineering.mdc
  • Clear goal-focused instructions rather than over-prescriptive steps
  • Good use of XML tags for structured sections
  • Trusts Claude to know Playwright basics (doesn't waste tokens on basic examples)

🔍 Code Quality Observations

run.js:115-122 - isCompleteScript() detection:
The heuristic for detecting complete scripts is simple but potentially fragile. Consider these edge cases:

// Would be detected as "complete" and not wrapped:
const code = `
const pw = require('playwright');
(async () => { /* user's async code */ })();
`;

// Would NOT be detected and get double-wrapped:
const code = `
import { chromium } from 'playwright';
(async () => { /* user's async code */ })();
`;

The require() check won't match ES module imports. This is likely fine for the current use case (Node.js scripts) but worth noting. The double-wrapping would cause syntax errors, which would be caught, but could confuse users.

run.js:88-113 - Argument parsing:
The logic correctly prioritizes: file path → inline code → stdin. However, there's a subtle issue:

if (args.length > 0 && fs.existsSync(args[0])) {
  // Execute as file
}
if (args.length > 0) {
  // Execute as inline - THIS COULD MATCH A FAILED FILE PATH
  return args.join(" ");
}

If someone passes a file path that doesn't exist, it falls through to inline code execution, which would try to execute the file path as JavaScript. This would fail gracefully with an error, but a clearer error message would be better:

if (args.length > 0) {
  if (fs.existsSync(args[0])) {
    const filePath = path.resolve(args[0]);
    console.log(`📄 Executing: ${filePath}`);
    return fs.readFileSync(filePath, "utf8");
  }
  // If it looks like a file path but doesn't exist, warn the user
  if (args[0].endsWith('.js') || args[0].includes('/')) {
    console.warn(`⚠️  Warning: '${args[0]}' looks like a file path but doesn't exist. Treating as inline code.`);
  }
  console.log("⚡ Executing inline code");
  return args.join(" ");
}

run.js:124-146 - getExtraHeadersCode():
Returns code as a string that gets injected. This is clever but the parsing logic silently fails on invalid JSON. Consider whether this should log a warning:

try {
  const parsed = JSON.parse(headersJson);
  if (typeof parsed === 'object' && parsed !== null && !Array.isArray(parsed)) {
    return parsed;
  }
  console.warn('⚠️  PW_EXTRA_HEADERS is not a valid object, ignoring');
} catch (e) {
  console.warn('⚠️  PW_EXTRA_HEADERS is not valid JSON, ignoring:', e.message);
}

🛡️ Security Considerations

Low Risk - Code Execution by Design:
This tool executes arbitrary JavaScript code, which is its purpose. The security model relies on:

  1. Users only running this in trusted environments
  2. Claude Code's existing security boundaries

This is appropriate for a developer tool. No changes needed.

Temporary File Handling (run.js:205, 217-218):
Temporary files use timestamp-based names (.temp-${Date.now()}.js). This is fine for single-user environments but could have race conditions if multiple processes run simultaneously. Consider using a random component:

const tempFile = path.join(__dirname, `.temp-${Date.now()}-${Math.random().toString(36).slice(2, 8)}.js`);

This is a minor issue and probably not worth changing unless concurrent execution is a goal.

Auto-install (run.js:53-68):
Automatically runs npm install which could execute arbitrary code from package.json scripts. This is standard npm behavior and users should understand this risk. The code correctly inherits the current directory context, so it only installs within the skill directory. No changes needed.

🎯 Best Practices & Conventions

Follows Repository Standards:

  • ✅ No commits without explicit user permission (this is a PR, appropriate)
  • ✅ Clear commit messages with emoji prefixes
  • ✅ Good PR description with summary, changes, and testing sections
  • ✅ Follows prompt engineering guidelines from rules/prompt-engineering.mdc

SKILL.md Design:
The <headless-vs-headed> section (lines 38-44) follows the latest commit's philosophy of trusting Claude:

Use headed when user wants to see the browser. You know when that is.

This is excellent. It's goal-focused rather than prescriptive (no word lists), trusting the model's judgment.

📊 Performance Considerations

Token Efficiency:
The progressive disclosure design is excellent for token efficiency:

  • SKILL.md: ~400 tokens (minimal, always loaded)
  • API_REFERENCE.md: ~1500 tokens (only loaded when needed)
  • vs. previous MCP: ~16,500 tokens

Runtime Performance:

  • Auto-install only runs once (cached by npm)
  • Temp file cleanup is efficient (simple filter + unlink)
  • spawn() is appropriate for this use case (better than exec for long-running processes)

🧪 Test Coverage

Missing Tests:
There are no automated tests for this skill. Consider adding:

  1. Unit tests for run.js:

    • isCompleteScript() detection
    • Argument parsing logic
    • Header parsing from environment variables
    • Temp file cleanup
  2. Integration tests:

    • Execute a simple script file
    • Execute inline code
    • Execute from stdin
    • Verify auto-install works
    • Test CI detection
  3. Example scripts:

    • A /tmp/test-playwright.js that validates the injected variables are available
    • Test that complete scripts aren't double-wrapped

While tests aren't blocking for a skill (this is developer tooling), they would increase confidence, especially around the script detection heuristics.

💡 Suggestions (Optional Improvements)

  1. Error context for inline code:
    When inline code fails, the error shows the line number in the temp file, which doesn't match the user's mental model. Consider adding source mapping or clearer error context.

  2. Verbose mode:
    Add a PLAYWRIGHT_VERBOSE=true option to show the final wrapped code before execution. Useful for debugging why scripts aren't working.

  3. Dry run mode:
    PLAYWRIGHT_DRY_RUN=true to show what would be executed without running it.

  4. Browser type selection:
    Currently hardcoded to chromium in the auto-install. Consider supporting firefox/webkit via environment variable.

🎯 Verdict

Recommendation: Approve with minor suggestions

This is high-quality work that achieves its design goals. The code is clean, follows project conventions, and implements progressive disclosure effectively. The few issues identified are minor edge cases that won't affect normal usage.

The lack of automated tests is the main gap, but for a developer tool like this, manual testing may be sufficient. If this skill becomes widely used, adding tests would be valuable.

Blocking issues: None
Non-blocking suggestions: 7 items noted above (error handling improvements, tests, verbose mode)

Great work on following the prompt engineering principles and keeping the LLM-facing interface minimal and goal-focused! 🎭

- Warn if .js file path doesn't exist (treating as inline code)
- Warn if PW_EXTRA_HEADERS is invalid JSON

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@TechNickAI
Copy link
Owner Author

Thanks for the thorough review! Addressed the two quick wins:

  1. ✅ Added warning when .js file path doesn't exist (now warns before treating as inline code)
  2. ✅ Added warning for invalid PW_EXTRA_HEADERS JSON

Deferring for later:

  • ES module detection - Node.js scripts are the primary use case
  • Tests - can add if this skill sees heavy use
  • Verbose/dry-run modes - nice to have, not blocking

const path = require("path");
const { execSync, spawn } = require("child_process");

process.chdir(__dirname);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relative file paths resolve incorrectly after chdir

The process.chdir(__dirname) at line 25 executes before getCodeToExecute() processes file arguments. When a user passes a relative path like ./my-test.js, the fs.existsSync() and path.resolve() calls resolve it relative to the skill directory instead of the caller's working directory. This causes valid script files to not be found, triggering the fallback that treats the file path as inline JavaScript code, which then fails to execute.

Additional Locations (1)

Fix in Cursor Fix in Web

@TechNickAI TechNickAI closed this Dec 30, 2025
@TechNickAI TechNickAI reopened this Dec 30, 2025
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@claude
Copy link

claude bot commented Dec 30, 2025

Code Review: Playwright Browser Skill

Summary

Excellent implementation of a minimal, progressive disclosure skill for Playwright automation. The design achieves its stated goal of replacing a bloated MCP with a near-zero token cost until invoked. The code is well-structured, handles edge cases thoughtfully, and follows the project's prompt engineering principles.

Strengths

1. Progressive Disclosure Pattern ✅

The skill perfectly demonstrates the progressive disclosure pattern:

  • Minimal SKILL.md (78 lines) for triggering
  • Separate API_REFERENCE.md for advanced patterns
  • "Trust the model" approach - assumes Claude knows Playwright basics

2. Excellent Prompt Engineering

The SKILL.md follows prompt-engineering best practices:

  • Goal-focused over prescriptive (line 41: "You know when that is")
  • Positive framing throughout
  • Clear examples with injected variables
  • Semantic XML tags (<objective>, <execution>, not numbered steps)

3. Robust Error Handling

  • Auto-installation with graceful fallback (run.js:53-67)
  • Temp file cleanup prevents disk bloat (run.js:70-86)
  • Helpful warnings for edge cases (run.js:98-99, 144-146)
  • Proper async handling with spawn() instead of execSync

4. CI/CD Awareness

  • Auto-detects CI environments (run.js:27-35)
  • Applies sandbox args only when needed (run.js:157)
  • Headless by default, configurable via env var

Issues Found

Code Quality

Minor: Race Condition in Temp File Cleanup

Location: run.js:212-226

The temp file cleanup in the finally block could fail if the script is still executing when cleanup runs. While unlikely due to the await, there's a theoretical race.

Suggestion: Add a small delay or verify the child process has exited:

finally {
  // Small delay to ensure spawn process has fully released file
  await new Promise(resolve => setTimeout(resolve, 100));
  try {
    fs.unlinkSync(tempFile);
  } catch {}
}

Minor: String Interpolation Security

Location: run.js:170

User-provided code is injected directly into the wrapped script. While this is the intended functionality, there's no sanitization of the code string.

Impact: Low risk since this is a local execution tool, but malicious inline code could break the wrapper or inject unexpected behavior.

Recommendation: Document this behavior clearly or add a warning if code contains template literal escape sequences.

Minor: Inconsistent Error Output

Location: run.js:172-174 vs run.js:230-231

Error handling uses different formats:

  • Wrapped code: console.error('❌ Error:', error.message)
  • Main: console.error("❌ Fatal:", error.message)

Suggestion: Use consistent terminology ("Error" vs "Fatal") or document when each applies.

Best Practices

Low Priority: Missing Input Validation

Location: run.js:88-117

The getCodeToExecute() function doesn't validate that stdin actually contains code when reading from pipe. Empty stdin would pass through.

Suggestion: Add validation:

if (!process.stdin.isTTY) {
  console.log("📥 Reading from stdin");
  const code = fs.readFileSync(0, "utf8");
  if (!code.trim()) {
    console.error("❌ No code provided via stdin");
    process.exit(1);
  }
  return code;
}

Performance

Optimization: Redundant CI Detection

Location: run.js:27-35 and run.js:157

isCI() is called every execution but the result is constant for a given environment. Consider caching:

const IS_CI = isCI();
const CI_ARGS = IS_CI ? "['--no-sandbox', '--disable-setuid-sandbox']" : "[]";

Minor impact, but improves code clarity.

Security

Medium: Command Injection via Environment Variables

Location: run.js:128-153

The getExtraHeadersCode() function generates code that reads environment variables but doesn't validate them. A malicious PW_HEADER_NAME could inject code.

Example Attack:

PW_HEADER_NAME='x", "evil": process.exit(1), "y":"z' node run.js "console.log('test')"

Impact: Local execution context only, but still a concern.

Recommendation: Add validation:

function getExtraHeaders() {
  const headerName = process.env.PW_HEADER_NAME;
  const headerValue = process.env.PW_HEADER_VALUE;
  if (headerName && headerValue) {
    // Validate header name follows HTTP spec
    if (!/^[a-zA-Z0-9-]+$/.test(headerName)) {
      console.warn('⚠️  PW_HEADER_NAME contains invalid characters, ignoring');
      return null;
    }
    return { [headerName]: headerValue };
  }
  // ... rest
}

Testing Coverage

The PR description mentions:

Code review agent verified no critical bugs
Prompt engineer reviewed SKILL.md against prompt engineering standards

Missing:

  • No automated tests for run.js
  • No example scripts demonstrating usage
  • No integration test for auto-install flow

Recommendation: Add a test/ directory with:

  • Unit tests for helper functions (isCI, shouldBeHeadless, etc.)
  • Integration test that runs a simple script
  • Test for auto-install (mocked)

Design Decisions Review

Decision Assessment
No dev server detection with prompts ✅ Correct. Prompts break autonomous workflows
Visible browser by default → Changed to headless ✅ Good revision based on review
spawn() for script execution ✅ Proper async handling
Injected variables for inline code ✅ Reduces boilerplate
Skip basic API examples ✅ Trusts Claude's training data

Documentation

SKILL.md

  • Clear, concise, follows progressive disclosure
  • Good use of semantic XML tags
  • Examples demonstrate patterns effectively

API_REFERENCE.md

  • Comprehensive coverage of advanced patterns
  • Examples are clear and copy-pasteable
  • Good balance of breadth vs depth

Minor suggestion: Add a "When to use this reference" section at the top to guide users.

Recommendations

Must Fix

  1. Address command injection vulnerability in header validation (Security: Medium)

Should Fix

  1. Add basic test coverage (at minimum, smoke test for run.js)
  2. Standardize error message format

Nice to Have

  1. Add race condition protection in temp file cleanup
  2. Validate stdin input
  3. Cache CI detection result
  4. Add usage examples in a examples/ directory

Overall Assessment

Quality: High
Security: Medium (one moderate issue)
Maintainability: High
Design: Excellent
Documentation: Excellent

This is a well-designed skill that achieves its goals elegantly. The progressive disclosure pattern and minimal token cost are exactly what this project needs. With the security fix and basic test coverage, this would be production-ready.

The design decisions are sound and well-justified. The shift from headed to headless by default based on review feedback shows good iteration.

Recommendation: Merge after addressing the header validation security issue. Testing can be added in a follow-up PR if needed.

} catch {
// Ignore directory read errors
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition deletes temp files from concurrent instances

The cleanupOldTempFiles function deletes ALL .temp-*.js files despite its name suggesting it only cleans old files. When multiple instances of run.js execute concurrently, one instance can delete another's temp file before its child process opens it. The sequence is: Instance A creates temp file and spawns child → Instance B starts, runs cleanup, deletes A's file → A's child fails to read the deleted file. The try-catch handles files currently open but not the race window between file creation and the spawned child opening it.

Additional Locations (1)

Fix in Cursor Fix in Web

}

const code = getCodeToExecute();
const tempFile = path.join(__dirname, `.temp-${Date.now()}.js`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temp file name collision when processes start same millisecond

The temp file name uses only Date.now() for uniqueness, which has millisecond precision. If two instances of run.js start within the same millisecond (possible with automated tooling or parallel CI), both create the same temp file path. Process A writes its code, then Process B overwrites it with different code, then Process A executes Process B's code instead of its own. Adding process.pid or a random component to the filename would prevent this collision.

Fix in Cursor Fix in Web

- Description now starts with "Use when..." per CLAUDE.md guidelines
- Bump plugin version 8.5.0 → 8.6.0 for new playwright-browser skill

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@claude
Copy link

claude bot commented Dec 30, 2025

Code Review: Playwright Browser Skill

This is a well-designed implementation that follows the progressive disclosure pattern effectively. The code quality is high overall, with good error handling and thoughtful design decisions. Here are my detailed findings:


✅ Strengths

Architecture & Design:

  • Progressive disclosure pattern is excellent - minimal token cost until invoked
  • Auto-installation removes friction for users
  • Multiple input modes (file, inline, stdin) provide flexibility
  • CI detection with automatic headless/headed switching is smart
  • Temp file cleanup prevents disk bloat over time

Code Quality:

  • Clean separation of concerns with single-purpose functions
  • Proper async/await handling using spawn() instead of execSync
  • Good error messages with emoji indicators for UX
  • Environment variable validation with helpful warnings

Documentation:

  • SKILL.md is concise and focused (68 lines vs 454 in reference)
  • API_REFERENCE.md separates advanced patterns from basics
  • Clear examples in documentation
  • Follows prompt-engineering.mdc principles (goal-focused, minimal prescriptiveness)

🔍 Issues & Concerns

1. Security: Arbitrary Code Execution (High Severity)

Location: run.js:170 (inline code wrapping)

The skill executes user-provided code directly without sanitization. While this is inherent to Playwright automation, there are safety improvements to consider:

// Current implementation injects code directly
${code}

Concerns:

  • No validation of code content before execution
  • Could execute malicious code if LLM is prompt-injected
  • File path traversal possible in getCodeToExecute() (line 93)

Recommendations:

  • Add a warning in SKILL.md about the trust model
  • Consider sandboxing options or at least document security implications
  • Validate file paths to prevent traversal: path.resolve() alone doesn't prevent ../../etc/passwd

2. Race Condition in Temp File Creation (Medium Severity)

Location: run.js:212

const tempFile = path.join(__dirname, `.temp-${Date.now()}.js`);

Issue: If two instances run in the same millisecond, they'll use the same filename, causing conflicts.

Fix:

const tempFile = path.join(__dirname, `.temp-${Date.now()}-${Math.random().toString(36).substring(7)}.js`);

Or use Node's built-in temp file creation:

const crypto = require('crypto');
const tempFile = path.join(__dirname, `.temp-${crypto.randomBytes(8).toString('hex')}.js`);

3. Incomplete isCompleteScript Detection (Medium Severity)

Location: run.js:119-126

function isCompleteScript(code) {
  const hasRequire = code.includes("require(");
  const hasAsyncWrapper =
    code.includes("(async () =>") ||
    code.includes("(async()=>") ||
    code.includes("(async function");
  return hasRequire && hasAsyncWrapper;
}

Issues:

  • Misses ES6 imports: import { chromium } from 'playwright'
  • Misses top-level await (Node 14.8+)
  • String-based detection is fragile (would match code in comments or strings)

Suggestions:

  1. Add import detection: code.includes("import ") || code.includes("require(")
  2. Document the heuristic limitations
  3. Consider a more robust AST-based approach or accept that wrapping harmless complete scripts is okay

4. Missing Error Context in runScript (Low-Medium Severity)

Location: run.js:180-198

child.on("close", (code) => {
  if (code === 0) {
    resolve();
  } else {
    reject(new Error(`Script exited with code ${code}`));
  }
});

Issue: When scripts fail, users only see exit code, not what failed.

Improvement:

child.on("close", (code) => {
  if (code === 0) {
    resolve();
  } else {
    reject(new Error(`Script exited with code ${code}. Check output above for details.`));
  }
});

Even better: Capture stderr separately for better error reporting.


5. Cleanup Race Condition (Low Severity)

Location: run.js:223-225

} finally {
  try {
    fs.unlinkSync(tempFile);
  } catch {}
}

Issue: Silent catch swallows all errors, including permission issues that users should know about.

Better:

} finally {
  try {
    fs.unlinkSync(tempFile);
  } catch (e) {
    if (e.code !== 'ENOENT') {
      console.warn(`⚠️  Failed to cleanup temp file: ${e.message}`);
    }
  }
}

6. Undocumented EXTRA_HEADERS Behavior

Location: run.js:128-153, SKILL.md:56

SKILL.md mentions EXTRA_HEADERS as an injected variable but doesn't explain:

  • How to use it in scripts
  • That it's null if not configured
  • Example usage

Add to SKILL.md:

- `EXTRA_HEADERS` - Custom headers object (null if not set)
  Example: PW_EXTRA_HEADERS='{"Authorization":"Bearer token"}'

🎯 Best Practices & Style

Minor Improvements:

  1. Consistent quoting - Mix of single/double quotes. Pick one (prefer single for JS).

  2. Magic strings - Consider constants:

const TEMP_FILE_PREFIX = '.temp-';
const TEMP_FILE_EXTENSION = '.js';
  1. Documentation for getExtraHeadersCode - This function returns code-as-string which is unusual. Add a comment explaining why.

  2. Package.json could specify Node version:

"engines": {
  "node": ">=14.8.0"
}

🔒 Security Considerations

  1. Trust Model: This skill executes arbitrary code. Document that it should only be used with trusted prompts.

  2. Environment Variable Injection: PW_EXTRA_HEADERS is parsed as JSON - already validated ✅

  3. File Path Validation: Consider rejecting absolute paths outside allowed directories.

  4. Dependency Security: Playwright is regularly updated. Consider adding a note about keeping dependencies current.


🧪 Test Coverage

Missing: No automated tests for this skill.

Recommendations:

  • Add basic smoke tests for the three input modes (file, inline, stdin)
  • Test auto-install logic (mock the npm commands)
  • Test error cases (invalid JSON in PW_EXTRA_HEADERS, missing files, etc.)
  • Test temp file cleanup

Example test structure:

// test/playwright-browser.test.js
describe('run.js', () => {
  it('should execute inline code');
  it('should execute file-based scripts');
  it('should handle stdin input');
  it('should cleanup temp files');
  it('should warn on invalid PW_EXTRA_HEADERS');
});

📊 Performance

Good:

  • Lazy installation (only when needed)
  • Temp file cleanup prevents disk bloat
  • Spawn instead of exec for better memory handling

Considerations:

  • Playwright installation is ~300MB. Document this for users on limited bandwidth/storage.

📋 Summary

Verdict:Approve with minor recommendations

This is a solid implementation that achieves its design goals effectively. The identified issues are mostly minor, with the exception of the security considerations around arbitrary code execution - which is inherent to the tool's purpose but should be documented.

Priority Fixes:

  1. ⚠️ Add security documentation about trust model
  2. 🔧 Fix temp file race condition (use crypto.randomBytes)
  3. 📝 Document EXTRA_HEADERS usage in SKILL.md
  4. 🛡️ Improve error context in runScript

Nice-to-Haves:

  • Add basic test coverage
  • Document Node.js version requirements
  • Improve isCompleteScript detection
  • Better error reporting from child process

The progressive disclosure pattern is excellent and this will be a valuable addition to the plugin ecosystem. Great work on keeping it minimal while still being feature-complete! 🎭

@TechNickAI TechNickAI merged commit bbea2e9 into main Dec 30, 2025
3 checks passed
@TechNickAI TechNickAI deleted the add-playwright-browser-skill branch December 30, 2025 02:46
code.includes("(async()=>") ||
code.includes("(async function");
return hasRequire && hasAsyncWrapper;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern matching in comments breaks variable injection

The isCompleteScript function uses String.includes() to detect whether code needs the injected variable wrapper. This matches patterns like require( and (async () => anywhere in the string, including inside comments or string literals. If user code contains both patterns in comments (e.g., // use require('playwright') and // wrap with (async () => ...)), the function incorrectly returns true, causing the code to run without the injected chromium, BASE_URL, HEADLESS, and other variables. This results in confusing "X is not defined" runtime errors.

Fix in Cursor Fix in Web

const headerValue = process.env.PW_HEADER_VALUE;
if (headerName && headerValue) {
return { [headerName]: headerValue };
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial header environment variables silently ignored without warning

When PW_HEADER_NAME is set without PW_HEADER_VALUE (or vice versa), the header configuration is silently ignored without any warning. This is inconsistent with the behavior for PW_EXTRA_HEADERS, which logs a warning when the JSON is invalid. A user who sets only PW_HEADER_NAME expecting a header to be sent may experience silent authentication or request failures, making the issue difficult to debug.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants