Skip to content

CSV upload endpoints return 500 on malformed CSV files #723

@bencap

Description

@bencap

Summary

When a user uploads a scores, counts, or calibration classes CSV file that is malformed (e.g., inconsistent column counts across rows, extra unescaped commas), the API returns a 500 Internal Server Error instead of a user-friendly 400 Bad Request. The underlying error is a pandas.errors.ParserError which is not currently caught.

Problem

csv_data_to_df in src/mavedb/lib/score_sets.py wraps pd.read_csv. Three upload handlers call this function and catch UnicodeDecodeError to return a 400:

  • parse_score_set_variants_uploads in src/mavedb/routers/score_sets.py (scores and counts files)
  • The calibration creation endpoint in src/mavedb/routers/score_calibrations.py
  • The calibration update endpoint in src/mavedb/routers/score_calibrations.py

None of these handlers catch pandas.errors.ParserError, so a malformed CSV propagates as an unhandled exception and results in a 500.

Steps to Reproduce

  1. Upload a score set with a scores CSV where one row has more columns than the header — e.g., a row contains an unescaped comma in a value.
  2. Observe the API returns a 500 with a body similar to: Error tokenizing data. C error: Expected 1 fields in line 17, saw 4

Expected Behavior

The API returns a 400 Bad Request with a clear message such as:
"Error parsing scores file: <pandas error details>. Ensure the file is a valid CSV."

Observed Behavior

A 500 Internal Server Error is returned, exposing raw pandas internals to the client and giving the user no actionable guidance.

Acceptance Criteria

  • Uploading a malformed scores CSV returns a 400 with a descriptive error message.
  • Uploading a malformed counts CSV returns a 400 with a descriptive error message.
  • Uploading a malformed calibration classes CSV returns a 400 with a descriptive error message.
  • UnicodeDecodeError and pandas.errors.ParserError are both handled in all three upload handlers.
  • Tests cover the ParserError path for each handler.

Implementation Notes

  • Add except pd.errors.ParserError alongside the existing except UnicodeDecodeError in:
    • parse_score_set_variants_uploads in src/mavedb/routers/score_sets.py (for both scores and counts uploads)
    • The calibration creation handler in src/mavedb/routers/score_calibrations.py
    • The calibration update handler in src/mavedb/routers/score_calibrations.py
  • The error message should include the pandas error text to help users diagnose which line/column caused the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    app: backendTask implementation touches the backendtype: enhancementEnhancement to an existing feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions