-
Notifications
You must be signed in to change notification settings - Fork 779
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra hands from community will be appreciatedExtra hands from community will be appreciatedverl
Description
Description
Currently, data conversion steps (e.g., converting data to PyArrow arrays) only run inside trainer.fit. This causes two issues:
- Data format errors are only caught during training, not during development.
- When debugging in VSCode, breakpoints set in python libs (e.g., arrow_writer.py) are not hit during trainer.dev, making debugging harder.
Example
In trainer.fit, one conversion step is:
# From: /root/miniconda3/envs/agl/lib/python3.12/site-packages/datasets/arrow_writer.py
out = pa.array(cast_to_python_objects(data, only_1d_for_numpy=True))If the data format is incorrect, it may raise an error, but the breakpoint here will not hit.
Proposed Solution
Add the same data validation/conversion steps of trainer.fit to trainer.dev. This will:
- Catch data format errors earlier.
- Allow breakpoints in data processing code to be triggered during development.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra hands from community will be appreciatedExtra hands from community will be appreciatedverl