๐ฅ Take raw robot data, sprinkle in metadata, and rebake it into fresh, standardized LeRobot datasets โ ready to serve your machine learning models.
Hot out of the oven and still rising! ๐งโ๐ณ Currently under active development. Expect changes in API and command line arguments.
Rebake turns the messy leftovers of robotic experiments into neatly baked datasets. By converting and unifying recordings from different robots into the LeRobot format, Rebake makes it easy to share, compare, and train on consistent data.
Think of it as your robot kitchen:
๐ช Raw ingredients โ rosbags and metadata
๐ณ Recipe โ filtering rules and packaging
๐ฐ Fresh loaf โ ML-ready LeRobot datasets
No more format fragmentation โ just tasty, standardized data your models will love.
๐ง Coming Soon
| Robot Platform | Status | Data Format | Features | Config |
|---|---|---|---|---|
| Toyota HSR | โ Production | rosbag | Full conversion, visualization | hsr |
Want your robot supported? Open an issue or contribute a plugin!
- ๐ Multi-robot support (๐ฎ planned) โ Extensible plugin system for adding new robot platforms.
- ๐ Flexible processing modes โ Convert individual episodes or combine many into larger datasets.
- โ๏ธ Cloud-ready โ AWS Batch integration for scaling conversion jobs to big datasets.
- ๐ฅ Built-in visualization โ Generate videos and HTML views to quickly inspect your data.
- ๐ Data management tools โ Filter, merge, and analyze converted datasets with ease.
- ๐ค Production-tested โ Currently supports Toyota HSR, with more robots on the way.
- Python 3.10 or higher
- Docker and docker-compose
- Toyota HSR recorded data (rosbag format with meta.json metadata)
- Currently supports HSR only - other robots coming soon
- pip or uv package manager
# Clone the repository
git clone https://github.com/airoa-org/rebake.git
cd rebake
# Set your HSR dataset directory
export HSR_DATASET_DIR=/path/to/your/rosbag/data
# Build and run the container
cd docker
docker compose build hsr_data_converter
docker compose run hsr_data_converter# Inside the container, install dependencies
GIT_LFS_SKIP_SMUDGE=1 uv sync
# Convert rosbag to LeRobot format (HSR example)
uv run -m hsr_data_converter.rosbag2lerobot.main \
--raw_dir /root/datasets \
--out_dir ./output \
--fps 10 \
--robot_type hsr \
--conversion_type individual๐ Your LeRobot dataset will be ready in ./output with videos, metadata, and structured data!
# Visualize the converted dataset
uv run src/hsr_data_converter/visualize/lerobot_dataset.py \
--repo-id your_dataset_name \
--root ./output/{Episode directory name} \
--episode-index 0The easiest way to get started is using the provided Docker environment:
git clone https://github.com/airoa-org/rebake.git
cd rebake
git submodule update --init --recursive
cd docker
docker compose build hsr_data_converter
docker compose run hsr_data_converterFor development or if you prefer local installation:
# Install dependencies with uv
GIT_LFS_SKIP_SMUDGE=1 uv sync
# Initialize submodules
git submodule update --init --recursiveConvert HSR recorded data to LeRobot format:
uv run -m hsr_data_converter.rosbag2lerobot.main \
--raw_dir /path/to/rosbags \
--out_dir /path/to/output \
--fps 10 \
--robot_type hsr \
--conversion_type aggregate \
--separate_per_primitive falseindividual: Convert each rosbag to separate datasetsaggregate: Combine multiple rosbags into a single dataset
Remove specific episodes based on criteria:
uv run src/hsr_data_converter/filter_episodes.py \
--input_dataset_path ./input_dataset \
--output_dataset_path ./filtered_dataset \
--chunk_size 1000Combine multiple datasets:
uv run src/hsr_data_converter/merge_dataset.py \
--sources ./dataset1 ./dataset2 \
--output ./merged_dataset \
--fps 10Generate dataset visualization:
uv run src/hsr_data_converter/visualize/lerobot_dataset.py \
--repo-id dataset_name \
--root ./dataset_path \
--episode-index 0HSR recorded data should follow this structure:
dataset_directory/
โโโ template-061707-25-04-30-09-01-51/
โ โโโ data.bag
โ โโโ meta.json
โโโ template-061707-25-04-30-09-02-45/
โ โโโ data.bag
โ โโโ meta.json
โโโ template-061707-25-04-30-09-03-36/
โ โโโ data.bag
โ โโโ meta.json
โโโ template-061707-25-04-30-09-04-28/
โ โโโ data.bag
โ โโโ meta.json
โโโ ...
Note: Each episode directory contains a
data.bagfile (rosbag recording) andmeta.jsonfile (episode metadata) with HSR-specific topic structure.
# Format code
make format
# Run linting (ruff + mypy)
make lint
# Run tests
make test
# Run tests with coverage
make test-coveragemake format- Format code with ruffmake lint- Run linting checks (ruff + mypy)make test- Run all unit testsmake test-coverage- Run tests with coverage reportmake ruff-check- Check code style onlymake ruff-fix- Fix code style issuesmake mypy- Run type checking
# Run specific test
uv run pytest tests/test_rosbag2lerobot.py -v
# Run with coverage
make test-coverage- Docker build fails: Ensure Docker and nvidia-docker are properly installed
- Memory errors: Increase Docker memory allocation for large datasets
- Permission errors: Check file permissions and Docker volume mounts
- Missing dependencies: Run
git submodule update --init --recursive
- ๐ Report issues on GitHub Issues
We welcome contributions! Whether you're fixing bugs, adding features, or supporting new robot platforms, your help is appreciated.
Quick start:
- Fork the repository
- Create a feature branch
- Make your changes and add tests
- Run quality checks:
make format && make lint && make test - Open a Pull Request
๐ For detailed instructions, development setup, and guidelines, please see our Contributing Guide.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Made with โค๏ธ by the AIRoA Team