karun2328

karun2328

Achievements

inference_pipeline inference_pipeline Public

Benchmarking and optimizing transformer inference across PyTorch, ONNXRuntime, and TensorRT with latency/throughput analysis on GPU and CPU.

Python
llm_serving_benchmarks llm_serving_benchmarks Public

Benchmarking LLM inference serving with vLLM, analyzing latency, throughput, concurrency scaling, GPU memory usage, and KV-cache behavior.

Python
Personal_website Personal_website Public

Welcome to my world.

TypeScript
qwen2.5-7b-vllm-prefill-benchmarks qwen2.5-7b-vllm-prefill-benchmarks Public

Prefill performance study on Qwen2.5-7B using vLLM. Compares static vs mixed (bucketed) prefill under eager execution and CUDA Graphs, with controlled concurrency and real-world latency/throughput …

Python