Skip to content

fix: skip unavailable tree-sitter parsers#459

Open
letsjo wants to merge 1 commit into
tirth8205:mainfrom
letsjo:fix/parser-load-timeout
Open

fix: skip unavailable tree-sitter parsers#459
letsjo wants to merge 1 commit into
tirth8205:mainfrom
letsjo:fix/parser-load-timeout

Conversation

@letsjo
Copy link
Copy Markdown

@letsjo letsjo commented May 10, 2026

Summary

  • lazily load tree_sitter_language_pack parsers instead of importing the package at module import time
  • probe parser loading in a short child process before loading it in the main graph build
  • mark only the failing language unavailable when a parser load hangs/fails, allowing the rest of the repository graph build to continue

Why

On macOS with tree-sitter-language-pack==0.13.0, loading the TSX binding can hang inside tree_sitter_language_pack.get_parser("tsx"). Because parser loading previously happened in the main process, one broken language binding could block the entire graph build until an outer timeout killed it.

This change makes that failure language-scoped: the affected language is skipped, while all other languages still contribute graph context.

Verification

  • uv run python -X faulthandler -m pytest tests/test_parser.py::TestCodeParser::test_parser_probe_timeout_marks_language_unavailable -q -p no:asyncio
  • uv run python -m py_compile code_review_graph/parser.py tests/test_parser.py
  • git diff --check
  • local smoke against a mixed repo where the installed CLI hangs on TSX:
    • baseline installed code-review-graph build --skip-flows: timed out after 10s
    • patched source via PYTHONPATH with CRG_PARSER_LOAD_TIMEOUT_SECONDS=1: completed in ~1.9s, produced 640 nodes / 6555 edges while skipping unavailable tsx

Note: ruff check currently hangs in my local dev environment even with --no-cache; no lint output was produced.

Copilot AI review requested due to automatic review settings May 10, 2026 09:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves resilience of the graph build when a specific tree_sitter_language_pack grammar hangs or fails to load by (1) avoiding module-import-time parser loading and (2) proactively probing parser loadability in a short-lived child process, so only the problematic language is skipped rather than blocking the entire build.

Changes:

  • Lazily import tree_sitter_language_pack inside CodeParser._get_parser() instead of at module import time.
  • Add a subprocess-based probe (_parser_load_probe_succeeds) to detect hanging/failing language bindings and mark only that language as unavailable.
  • Add a regression test ensuring a probe timeout marks a language unavailable and _get_parser() returns None.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
code_review_graph/parser.py Adds subprocess probing + unavailable-language tracking; switches to lazy import of tree_sitter_language_pack.
tests/test_parser.py Adds a test validating timeout probing behavior and resets the unavailable-language cache between tests.

Comment on lines 834 to +841
def _get_parser(self, language: str): # type: ignore[arg-type]
if language in _UNAVAILABLE_LANGUAGES:
return None
if language not in self._parsers:
if not _parser_load_probe_succeeds(language):
_UNAVAILABLE_LANGUAGES.add(language)
logger.warning("Skipping unavailable tree-sitter parser for %s", language)
return None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants