Skip to content

fix: escape file paths in GraphQL queries and add batch-size limits#275

Open
travellingsoldier85 wants to merge 1 commit intoentrius:testfrom
travellingsoldier85:fix/graphql-file-path-escaping-and-batch-limits
Open

fix: escape file paths in GraphQL queries and add batch-size limits#275
travellingsoldier85 wants to merge 1 commit intoentrius:testfrom
travellingsoldier85:fix/graphql-file-path-escaping-and-batch-limits

Conversation

@travellingsoldier85
Copy link

Summary

Fix two issues in the file content fetching functions used for token-based PR scoring:

1. GraphQL injection from unescaped file paths

File paths containing double quotes or backslashes are interpolated directly into GraphQL query strings, breaking query syntax and causing the entire file content fetch to fail silently. This means PRs touching files with special characters in their paths get scored as 0.

Fix:

  • Add _escape_graphql_expression() helper that escapes \ and "
  • Apply escaping in both fetch_file_contents_batch and fetch_file_contents_with_base

2. No batch-size limit for large PRs

PRs with many files generate a single GraphQL query with one object lookup per file. GitHub's GraphQL API has query complexity limits, so large PRs can trigger 502 errors and lose all file contents for scoring.

Fix:

  • Add _MAX_FILES_PER_GRAPHQL_BATCH = 50 constant
  • Split both fetch functions into batched requests

Tests

  • Escaping correctness tests
  • Batch splitting behavior tests
  • Special character handling and edge cases

Related Issues

No existing issue — discovered during code review of the scoring pipeline.

Testing

  • All new tests pass
  • Existing tests unaffected
  • Changes are backward compatible

Fix two issues in the file content fetching functions used for
token-based PR scoring:

1. **GraphQL injection from unescaped file paths**: File paths containing
   double quotes or backslashes are interpolated directly into GraphQL
   query strings, breaking query syntax and causing the entire file
   content fetch to fail silently. This means PRs touching files with
   special characters in their paths get scored as 0.
   - Add _escape_graphql_expression() helper that escapes \ and "
   - Apply escaping in both fetch_file_contents_batch and
     fetch_file_contents_with_base

2. **No batch-size limit for large PRs**: PRs with many files generate
   a single GraphQL query with one object lookup per file. GitHub's
   GraphQL API has query complexity limits, so large PRs can trigger
   502 errors and lose all file contents for scoring.
   - Add _MAX_FILES_PER_GRAPHQL_BATCH = 50 constant
   - Split both fetch functions into batched requests
   - Extract _fetch_file_contents_with_base_batch() as internal helper

Tests added for escaping correctness, batch splitting behavior,
special character handling, and edge cases (empty input, added/removed
files, failed batches).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant