Summary
When a user uploads a scores, counts, or calibration classes CSV file that is malformed (e.g., inconsistent column counts across rows, extra unescaped commas), the API returns a 500 Internal Server Error instead of a user-friendly 400 Bad Request. The underlying error is a pandas.errors.ParserError which is not currently caught.
Problem
csv_data_to_df in src/mavedb/lib/score_sets.py wraps pd.read_csv. Three upload handlers call this function and catch UnicodeDecodeError to return a 400:
parse_score_set_variants_uploads in src/mavedb/routers/score_sets.py (scores and counts files)
- The calibration creation endpoint in
src/mavedb/routers/score_calibrations.py
- The calibration update endpoint in
src/mavedb/routers/score_calibrations.py
None of these handlers catch pandas.errors.ParserError, so a malformed CSV propagates as an unhandled exception and results in a 500.
Steps to Reproduce
- Upload a score set with a scores CSV where one row has more columns than the header — e.g., a row contains an unescaped comma in a value.
- Observe the API returns a 500 with a body similar to:
Error tokenizing data. C error: Expected 1 fields in line 17, saw 4
Expected Behavior
The API returns a 400 Bad Request with a clear message such as:
"Error parsing scores file: <pandas error details>. Ensure the file is a valid CSV."
Observed Behavior
A 500 Internal Server Error is returned, exposing raw pandas internals to the client and giving the user no actionable guidance.
Acceptance Criteria
Implementation Notes
- Add
except pd.errors.ParserError alongside the existing except UnicodeDecodeError in:
parse_score_set_variants_uploads in src/mavedb/routers/score_sets.py (for both scores and counts uploads)
- The calibration creation handler in
src/mavedb/routers/score_calibrations.py
- The calibration update handler in
src/mavedb/routers/score_calibrations.py
- The error message should include the pandas error text to help users diagnose which line/column caused the issue.
Summary
When a user uploads a scores, counts, or calibration classes CSV file that is malformed (e.g., inconsistent column counts across rows, extra unescaped commas), the API returns a 500 Internal Server Error instead of a user-friendly 400 Bad Request. The underlying error is a
pandas.errors.ParserErrorwhich is not currently caught.Problem
csv_data_to_dfinsrc/mavedb/lib/score_sets.pywrapspd.read_csv. Three upload handlers call this function and catchUnicodeDecodeErrorto return a 400:parse_score_set_variants_uploadsinsrc/mavedb/routers/score_sets.py(scores and counts files)src/mavedb/routers/score_calibrations.pysrc/mavedb/routers/score_calibrations.pyNone of these handlers catch
pandas.errors.ParserError, so a malformed CSV propagates as an unhandled exception and results in a 500.Steps to Reproduce
Error tokenizing data. C error: Expected 1 fields in line 17, saw 4Expected Behavior
The API returns a
400 Bad Requestwith a clear message such as:"Error parsing scores file: <pandas error details>. Ensure the file is a valid CSV."Observed Behavior
A 500 Internal Server Error is returned, exposing raw pandas internals to the client and giving the user no actionable guidance.
Acceptance Criteria
UnicodeDecodeErrorandpandas.errors.ParserErrorare both handled in all three upload handlers.ParserErrorpath for each handler.Implementation Notes
except pd.errors.ParserErroralongside the existingexcept UnicodeDecodeErrorin:parse_score_set_variants_uploadsinsrc/mavedb/routers/score_sets.py(for both scores and counts uploads)src/mavedb/routers/score_calibrations.pysrc/mavedb/routers/score_calibrations.py