fix: skip unavailable tree-sitter parsers#459
Open
letsjo wants to merge 1 commit into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves resilience of the graph build when a specific tree_sitter_language_pack grammar hangs or fails to load by (1) avoiding module-import-time parser loading and (2) proactively probing parser loadability in a short-lived child process, so only the problematic language is skipped rather than blocking the entire build.
Changes:
- Lazily import
tree_sitter_language_packinsideCodeParser._get_parser()instead of at module import time. - Add a subprocess-based probe (
_parser_load_probe_succeeds) to detect hanging/failing language bindings and mark only that language as unavailable. - Add a regression test ensuring a probe timeout marks a language unavailable and
_get_parser()returnsNone.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
code_review_graph/parser.py |
Adds subprocess probing + unavailable-language tracking; switches to lazy import of tree_sitter_language_pack. |
tests/test_parser.py |
Adds a test validating timeout probing behavior and resets the unavailable-language cache between tests. |
Comment on lines
834
to
+841
| def _get_parser(self, language: str): # type: ignore[arg-type] | ||
| if language in _UNAVAILABLE_LANGUAGES: | ||
| return None | ||
| if language not in self._parsers: | ||
| if not _parser_load_probe_succeeds(language): | ||
| _UNAVAILABLE_LANGUAGES.add(language) | ||
| logger.warning("Skipping unavailable tree-sitter parser for %s", language) | ||
| return None |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tree_sitter_language_packparsers instead of importing the package at module import timeWhy
On macOS with
tree-sitter-language-pack==0.13.0, loading the TSX binding can hang insidetree_sitter_language_pack.get_parser("tsx"). Because parser loading previously happened in the main process, one broken language binding could block the entire graph build until an outer timeout killed it.This change makes that failure language-scoped: the affected language is skipped, while all other languages still contribute graph context.
Verification
uv run python -X faulthandler -m pytest tests/test_parser.py::TestCodeParser::test_parser_probe_timeout_marks_language_unavailable -q -p no:asynciouv run python -m py_compile code_review_graph/parser.py tests/test_parser.pygit diff --checkcode-review-graph build --skip-flows: timed out after 10sPYTHONPATHwithCRG_PARSER_LOAD_TIMEOUT_SECONDS=1: completed in ~1.9s, produced 640 nodes / 6555 edges while skipping unavailabletsxNote:
ruff checkcurrently hangs in my local dev environment even with--no-cache; no lint output was produced.