vLLM

vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

vllm-omni Public

A framework for efficient model inference with omni-modality models

recipes Public

Common recipes to run vLLM

JavaScript 914 326

llm-compressor Public

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 3.5k 580

speculators Public

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python 608 154

semantic-router Public

Intelligent Mixture-of-Models Router for Efficient Heterogeneous LLMs Inference

Provide feedback