Skip to content
View karun2328's full-sized avatar

Block or report karun2328

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. inference_pipeline inference_pipeline Public

    Benchmarking and optimizing transformer inference across PyTorch, ONNXRuntime, and TensorRT with latency/throughput analysis on GPU and CPU.

    Python

  2. llm_serving_benchmarks llm_serving_benchmarks Public

    Benchmarking LLM inference serving with vLLM, analyzing latency, throughput, concurrency scaling, GPU memory usage, and KV-cache behavior.

    Python

  3. Personal_website Personal_website Public

    Welcome to my world.

    TypeScript

  4. qwen2.5-7b-vllm-prefill-benchmarks qwen2.5-7b-vllm-prefill-benchmarks Public

    Prefill performance study on Qwen2.5-7B using vLLM. Compares static vs mixed (bucketed) prefill under eager execution and CUDA Graphs, with controlled concurrency and real-world latency/throughput …

    Python