If you use the pipelines from your own script (not the CLI), __call__ isn’t under torch.inference_mode(). The text encoder keeps ~37 GB of graph/activations, so after you drop it and load the transformer you OOM.
Fix: Either put @torch.inference_mode() to respective places (i.e. on each pipeline’s __call__), or mention in the README that callers should wrap pipeline calls in torch.inference_mode().
If you use the pipelines from your own script (not the CLI),
__call__isn’t undertorch.inference_mode(). The text encoder keeps ~37 GB of graph/activations, so after you drop it and load the transformer you OOM.Fix: Either put
@torch.inference_mode()to respective places (i.e. on each pipeline’s__call__), or mention in the README that callers should wrap pipeline calls intorch.inference_mode().