Skip to content
/ tardis Public

LLM model used to exhibit the capabilities of our novel STRIDE-1 dataset

License

Notifications You must be signed in to change notification settings

tera-ai/tardis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TARDIS STRIDE: A Spatio-Temporal Road Image Dataset for Exploration and Autonomy

Héctor Carrión* · Yutong Bai* · Víctor A. Hernández Castro*
Kishan Panaganti · Matthew Trang · Ayush Zenith · Tony Zhang · Pietro Perona · Jitendra Malik

(* equal contribution)

Getting Started

Installation

Conda

Check your system's CUDA version with nvcc

nvcc --version

Create and activate virtual environment with required Python dependencies:

conda env create -f gpu_environment.yml tardis
conda activate tardis

Docker

Another approach is to build from our Dockerfile:

docker build -f Dockerfile --platform=linux/amd64 -t tardis .

Downloading Dataset

The full tokenized dataset is made available through two downloadable files in a public GCS bucket:

gsutil -m cp gs://tera-tardis/STRIDE-1/training.jsonl . # ~327GB
gsutil -m cp gs://tera-tardis/STRIDE-1/testing.jsonl . # ~9GB

Weights

The checkpoint/state used for evaluation of the model was saved in MessagePack format and is made available through this downloadable file:

gsutil -m cp gs://tera-tardis/STRIDE-1/checkpoint.msgpack . # ~10GB

Training

Single VM

To train on a single VM, you may use this script:

EasyLM/scripts/train.sh

Distributed (Kubernetes)

To train using Kubernetes, submit the Kubernetes Job as stated in .kubernetes/setup-cluster.sh.

Testing

Single VM

We only provide evaluation code for single VM configuration, as supposed to distributed solutions.

gsutil -m cp -r cp gs://tera-tardis/STRIDE-1/checkpoint.msgpack .
python -m EasyLM.models.llama.convert_easylm_to_hf \
    --load_checkpoint='trainstate_params::checkpoint.msgpack' \
    --model_size='vqlm_1b' \
    --output_dir='.'

For a more detailed breakdown of eval, please see this notebook

Safeguards

The dataset itself consists of Google StreetView data which has been thoroughly cleansed and blurred to protect the privacy of citizens, and is free of any ill-intent, nudity and sensitive information. For more information, refer to their policy.

Contacts

Citation

If you found this code/work to be useful in your own research, please consider citing as follows:

@article{carrion2025_tardis_stride,
  title={{TARDIS STRIDE}: A Spatio-Temporal Road Image Dataset for Exploration and Autonomy},
  author={Héctor Carrión, Yutong Bai, Víctor A. Hernández Castro, Kishan Panaganti, Ayush Zenith, Matthew Trang, Tony Zhang, Pietro Perona, Jitendra Malik},
  journal={arXiv preprint},
  year={2025},
}

About

LLM model used to exhibit the capabilities of our novel STRIDE-1 dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •