Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
133 commits
Select commit Hold shift + click to select a range
217f996
[bugfix] fix memory leak in fs connector (#1656)
chunxiaozheng Sep 29, 2025
c46e7e8
[CD]: add flash infer (#1699)
sammshen Sep 29, 2025
92c9ed1
[Docs] Layerwise Docs (#1697)
sammshen Sep 29, 2025
a2c5ba5
[Benchmark] configure user stacking at high qps in multi-round-qa (#1…
yuezhu1 Sep 29, 2025
89ce712
[minor fix]fix pin_count type (#1685)
chunxiaozheng Sep 29, 2025
7afee73
[Docs] Add minimal runnable quick start guide (#1725)
kobe0938 Sep 30, 2025
4e5d036
[Benchmark] Emphasize query round results in long_doc_qa (#1730)
kobe0938 Sep 30, 2025
7eb6f80
[Core] Remove `lookup_id` and use `req_id` instead (#1729)
YaoJiayi Sep 30, 2025
08ccb7d
[minor fix] use parent __init__ to init meta (#1716)
chunxiaozheng Sep 30, 2025
81161f3
[Core][RemoteBackend] Implement remove api for remote backend and fs_…
maobaolong Oct 1, 2025
4a84f34
[Core] Add async Redis cluster connector + unit tests (#1638)
lindseywn Oct 1, 2025
9b7ecc7
Bump docker/login-action from 3.5.0 to 3.6.0 (#1723)
dependabot[bot] Oct 1, 2025
8cc81cd
[CI][BugFix]: Enable non-CUDA unit testing for all tests (#1677)
hickeyma Oct 1, 2025
d8d7d3e
[Feat] Generate random instance id when the instance id is not define…
ruizhang0101 Oct 1, 2025
3c6d429
[Docs] P2P KV Cache Sharing (#1735)
kobe0938 Oct 2, 2025
761b9fb
[Docs] Add RunLLM (#1736)
kobe0938 Oct 2, 2025
85d9d93
[Misc]: Improve the description provided for PRs (#1679)
hickeyma Oct 2, 2025
a1e9ffa
[core]Add cache/clear api to internal_api_server (#1711)
maobaolong Oct 2, 2025
6b8deed
Support remove around quotes in env config (#1671)
maobaolong Oct 2, 2025
8133383
Small fixes to getting_started.md (#1737)
SuperGoodGame Oct 2, 2025
53d93f0
[Doc] Fix raw-html syntax in index.rst (#1738)
Siddhant-Ray Oct 2, 2025
6219443
[Benchmark] Make long_doc_qa_recommender more user friendly (#1731)
kobe0938 Oct 2, 2025
70b1598
[CI/Build] Add async to comprehensive tests (#1744)
Shaoting-Feng Oct 5, 2025
49ea659
Fix bug in nixl_channel.py where incorrect variables were used when c…
xleoken Oct 7, 2025
033f7a9
Bump ossf/scorecard-action from 2.4.2 to 2.4.3 (#1753)
dependabot[bot] Oct 7, 2025
7cfb0d6
Bump step-security/harden-runner from 2.13.0 to 2.13.1 (#1752)
dependabot[bot] Oct 7, 2025
9da0ed1
[CI/Build] Add P2P full test (#1747)
Shaoting-Feng Oct 7, 2025
cc9623e
[Model] Add Qwen3 model support for CacheBlend (#1633)
junl666 Oct 9, 2025
af38585
docs: Update model reference from Llama-3.1-70B to Llama-3.1-8B (#1632)
jay-tau Oct 9, 2025
2dfa768
[Doc] Update example code (#1760)
JZhou3083 Oct 9, 2025
fb0d8c0
[Doc] fix cpu offloading example doc (#1740)
cr7258 Oct 9, 2025
e85b2e8
[Docs] Add kv cache calculator (#1763)
kobe0938 Oct 10, 2025
21d89ed
fix error in local cpu backend's clear() (#1766)
ziruiliu Oct 10, 2025
c0d7c09
[Docs] Update expired lmcache slack link (#1774)
kobe0938 Oct 12, 2025
523f8ae
[Docs] Add vllm production stack for Kubernetes Deployment (#1764)
kobe0938 Oct 12, 2025
32b1b03
Introduce a basic check tool for verify lmcache env and config work a…
maobaolong Oct 14, 2025
87cc3e3
implement clear_lookup_status in hit_limit_lookup_client (#1761)
chunxiaozheng Oct 14, 2025
32f4b3e
[FSConnector] support read ahead in FSConnector (#1771)
chunxiaozheng Oct 14, 2025
9946f1b
[Core] SGLang Kernel Update & TP Balance (#1510)
Oasis-Git Oct 14, 2025
b33add6
[observability]: P2P Stats Monitoring (#1754)
sammshen Oct 15, 2025
acf6f3a
[Docs] Add KV Cache Sizes for Popular Models in faq (#1769)
kobe0938 Oct 15, 2025
db078d9
Support adjust the effective memory by system available memory and re…
maobaolong Oct 15, 2025
0b0e11a
[feature][controller] support query worker info (#1462)
chunxiaozheng Oct 16, 2025
0dce442
support start lookup server on other rank (#1466)
chunxiaozheng Oct 16, 2025
e009015
[Bugfix] Add KV Cache format in gds backend (#1324)
muma378 Oct 16, 2025
9121dc8
Add VRAM Calculator link (#1808)
kobe0938 Oct 16, 2025
7e2d400
[metrics] add request_cache_hit_rate metric (#1800)
chunxiaozheng Oct 16, 2025
5fadf97
Fix duplicate cache_policy.update_on_hit() calls in LocalDiskBackend …
KevinCheung2259 Oct 17, 2025
feb5e3a
[bugfix]fix insert_key error in LocalDiskBackend (#1811)
chunxiaozheng Oct 17, 2025
580a8f0
[Core] Support NIXL storage obj backend (#1557)
tshmilnvidia Oct 17, 2025
89a8621
Guarded the async serializer usage so we only wrap the backend load c…
DongDongJu Oct 17, 2025
fdadfea
Bump github/codeql-action from 3 to 4 (#1784)
dependabot[bot] Oct 17, 2025
5afe968
[Core] Add batched_async_contains related method to fs connector (#1776)
maobaolong Oct 20, 2025
f98ff15
Update README with LMCache citation and features (#1829)
YaoJiayi Oct 20, 2025
5c196ba
Update README.md
junchenj Oct 20, 2025
0beff4d
[Core] NixlStorageBackend support eviction (#1775)
tshmilnvidia Oct 20, 2025
7824ec4
Bump actions/stale from 10.0.0 to 10.1.0 (#1751)
dependabot[bot] Oct 20, 2025
45b2708
Bump actions/setup-python from 5.6.0 to 6.0.0 (#1750)
dependabot[bot] Oct 20, 2025
30ce810
disable async_serializer in pd (#1818)
novahow Oct 20, 2025
399c0cd
[CI/Build] Add comprehensive test for layerwise KV transfer (#1822)
Shaoting-Feng Oct 20, 2025
3de1434
[FIX][Adapt_vllm] Fix ci failed by get_kv_cache_torch_dtype missed (#…
maobaolong Oct 21, 2025
91bceef
[Agents] Add prefix hit rate vs pool size analysis (#1838)
kobe0938 Oct 21, 2025
a5596c1
Valkey connector (#1743)
bluayer Oct 21, 2025
47d5ed5
[core] add batched_contains interface (#1778)
chunxiaozheng Oct 22, 2025
9be5b7c
[hotfix] Reduce async loading log verbosity (#1849)
DongDongJu Oct 23, 2025
45e64f4
[bugfix] fix batched_contains bug when lookup server on other rank (#…
chunxiaozheng Oct 24, 2025
acf29db
[CI]: add new url triton dependency from latest vllm to fix integrati…
sammshen Oct 24, 2025
b1f0980
[#1839][DOC]Introduce async_loading document (#1857)
maobaolong Oct 24, 2025
469143b
[optimize] reduce calls to contains for sync path (#1828)
chunxiaozheng Oct 24, 2025
bf897e1
[Release]: Pre 0.3.9 patch (#1852)
sammshen Oct 24, 2025
41779a1
[feat] Add prompt tokens metrics (#1860)
ruizhang0101 Oct 24, 2025
7749a25
[CI]: add harden tests to publish CI (#1873)
sammshen Oct 24, 2025
95ab8c2
Adapt vllm pr27188, cannot import cdiv (#1896)
maobaolong Oct 28, 2025
fff53a6
Bump lewagon/wait-on-check-action from 1.3.4 to 1.4.1 (#1892)
dependabot[bot] Oct 28, 2025
d2eef79
[Agents] Non-prefix caching hit rate vs pool size (#1851)
kobe0938 Oct 28, 2025
d6c4083
[Core][1/N] Message queue for LMCache Multi-process Mode (#1853)
ApostaC Oct 29, 2025
c832579
Update slack invite in meetings.rst (#1915)
nijaba Oct 29, 2025
11adb2f
[CI] publish harden more endpoints (#1874)
sammshen Oct 29, 2025
0b4c037
[CI]: remove the wait for action (#1916)
sammshen Oct 29, 2025
9ece5b6
[CI]: light dockerfile build bug (#1919)
sammshen Oct 30, 2025
f4ae4a5
[Feat]: Pipeline Parallelism (#1813)
sammshen Oct 30, 2025
825e2f5
[core]add error handling in kv loading (#1835)
ziruiliu Oct 30, 2025
ca11218
fix to use correct mem obj in async loading (#1867)
ziruiliu Oct 30, 2025
ad92d02
fix: minor changes to make SGL+LMCache work for TP==1 (#1904)
ziqifan617 Oct 30, 2025
9f5d7b2
Update bug report template with latest onboarding dashboard (#1906)
kobe0938 Oct 30, 2025
b58a908
[feature] support set list of lookup servers (#1872)
chunxiaozheng Oct 30, 2025
88e607f
[patch]: allow 0 buffer + layerwise detection for lmserver (#1798)
sammshen Oct 31, 2025
348c678
[Bugfix]Fix connection is none issue while get timeout (#1913)
maobaolong Oct 31, 2025
f0dc68e
[bugfix] fix plugin interpreter lookup error (#1911)
chunxiaozheng Nov 1, 2025
fc9ca4f
[MINOR] Use type alias for process_tokens return value (#1894)
maobaolong Nov 1, 2025
1b5e5be
Add a LMCacheBypassLookupClient to support lookup without communicate…
maobaolong Nov 1, 2025
416a286
[Core][Bugfix]Refactor Audit connector, avoid missing apis (#1887)
maobaolong Nov 2, 2025
09ac8f7
[hotfix] guard local CPU backend creation (#1905)
DongDongJu Nov 3, 2025
94645fa
[optimize] catch exception in batched_contains (#1920)
chunxiaozheng Nov 4, 2025
fafd038
[bugfix] Fix the lookup socket state error once timeout occurred (#1929)
maobaolong Nov 4, 2025
c0ab509
[core][bugfix] fix memory leak in batched_put (#1927)
chunxiaozheng Nov 4, 2025
3faa10b
[PATCH]: add dtype to CEK (#1859)
sammshen Nov 4, 2025
efbaffa
[Docs]: Add docs for InfiniStore storage backend (#1783)
profetia Nov 4, 2025
cf0f849
[CI/Build] Loosen threshold for local cpu and layerwise tests (#1947)
Shaoting-Feng Nov 5, 2025
3732d47
[CI]: audit connector unit test patch (#1951)
sammshen Nov 5, 2025
1cfe3bf
[InternalApi] Support config import module for the run_script api (#1…
maobaolong Nov 5, 2025
5260e66
[Improve|Core] Add default batched_get_non_blocking and batched_async…
maobaolong Nov 5, 2025
dde5145
Add async loading event related metrics (#1935)
maobaolong Nov 5, 2025
9a75ceb
[Bugfix] Fix layerwise codepath (#1950)
YaoJiayi Nov 5, 2025
97bc1c4
Bump actions/download-artifact from 5.0.0 to 6.0.0 (#1890)
dependabot[bot] Nov 5, 2025
a539c1d
Bump actions/upload-artifact from 4.6.2 to 5.0.0 (#1891)
dependabot[bot] Nov 5, 2025
4cdd0d9
[Core] Initial Addition SageMaker HyperPod remote connector (#1937)
ningziwen Nov 5, 2025
f4c0e2a
refactor: Use type alias instead of tuple for ProcessedChunk/ProcessT…
maobaolong Nov 6, 2025
fe3ca1b
[Bugfix] Fix key comparison (#1954)
YaoJiayi Nov 6, 2025
6b7e3e9
[metrics]filter out 0 hit rate (#1921)
chunxiaozheng Nov 6, 2025
df9dcd0
[CI/Build] Add /opt/venv/bin to PATH (#1925)
ningziwen Nov 6, 2025
ca1cb9d
[logging]: less verbose async logs (#1820)
sammshen Nov 6, 2025
e5ab00a
[refactor] unified reconstruct cache engine key (#1955)
chunxiaozheng Nov 6, 2025
95862fb
Add XPU support to LMCache for CPU/disk offloading
zhenwei-intel Sep 1, 2025
9934b86
update device and log info
zhenwei-intel Oct 2, 2025
2e72c1e
remove _lmcache_nvtx_annotate
zhenwei-intel Oct 16, 2025
866d32b
using vllm current platform
zhenwei-intel Nov 7, 2025
23bddb9
[feat] Support get the env of current process (#1944)
maobaolong Nov 7, 2025
a64ca32
Adapt hash func to vllm recent changes (#1952)
maobaolong Nov 10, 2025
59102a5
[CI/Build] Move /opt/venv/bin to PATH to BASE image (#1962)
ningziwen Nov 11, 2025
c4a7166
[bugfix]: only update skip_leading_tokens on last PP rank in wait_for…
tianlang-wq Nov 11, 2025
17962f4
Mooncake prioritizes obtaining the master address from master_ server…
tianlang-wq Nov 11, 2025
d5be14f
[controller][bugfix] do not start lmcache worker on scheduler (#1965)
chunxiaozheng Nov 11, 2025
aca44fd
[controller] simplify config when p2p is not enabled (#1963)
chunxiaozheng Nov 12, 2025
6f164e1
[Model] Added support for Qwen2 (#1934)
yaoyanglee Nov 12, 2025
ab85309
Update meetings.rst (#1994)
nijaba Nov 13, 2025
e01afb3
[Core][Mooncake]: add NUMA affinity and batched operations support (#…
xiaguan Nov 14, 2025
8c86926
[optimize] do not initialize storage manager in some rank (#1986)
chunxiaozheng Nov 14, 2025
cd04aa0
Introduce a dynamic expend memory allocator (#1899)
maobaolong Nov 14, 2025
dfa0503
[fix] Fix HyperPod connector release lease API (#1968)
ningziwen Nov 14, 2025
8ab94fd
[CI/Build] Support for required comprehensive test (#2001)
Shaoting-Feng Nov 15, 2025
6f7327a
[CI/Build] Limit the p2p latency threshold (#2003)
Shaoting-Feng Nov 15, 2025
2519583
Merge branch 'dev' into add_xpu
zhenwei-intel Nov 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .buildkite/cases/comprehensive-cases.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,6 @@ local_disk.yaml
local_cpu_mla.yaml
pd.yaml
multi_device.yaml
async.yaml
p2p.yaml
layerwise.yaml
28 changes: 28 additions & 0 deletions .buildkite/configs/async.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
workload:
type: long_doc_qa
max-inflight-requests: 20
sleep-time-after-warmup: 20
expected-latency-gain: 1.5
num-documents: 20
repeat-count: 1
hit-miss-ratio: 2:2

docker:
env:
- "LMCACHE_CHUNK_SIZE=256"
- "LMCACHE_LOCAL_CPU=False"
- "LMCACHE_MAX_LOCAL_CPU_SIZE=70"
- "LMCACHE_MAX_LOCAL_DISK_SIZE=70"
- "LMCACHE_LOCAL_DISK=\"file:///local/end-to-end-tests/local/\""
- "LMCACHE_ENABLE_ASYNC_LOADING=True"
- "LMCACHE_EXTRA_CONFIG={\"lookup_backoff_time\": 0.01, \"use_odirect\": True}"
- "LMCACHE_SAVE_UNFULL_CHUNK=False"

vllm:
model: "meta-llama/Llama-3.1-8B-Instruct"
args:
- "--load-format"
- "dummy"
- "--no-enable-prefix-caching"
- "--kv-transfer-config"
- "{\"kv_connector\":\"LMCacheConnectorV1\",\"kv_role\":\"kv_both\"}"
22 changes: 22 additions & 0 deletions .buildkite/configs/layerwise.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
workload:
type: long_doc_qa
max-inflight-requests: 20
expected-latency-gain: 3

docker:
env:
- "LMCACHE_CHUNK_SIZE=256"
- "LMCACHE_LOCAL_CPU=True"
- "LMCACHE_MAX_LOCAL_CPU_SIZE=5"
- "LMCACHE_USE_LAYERWISE=true"

vllm:
model: "meta-llama/Llama-3.2-1B-Instruct"
args:
- "--load-format"
- "dummy"
- "--no-enable-prefix-caching"
- "--kv-transfer-config"
- "{\"kv_connector\":\"LMCacheConnectorV1\",\"kv_role\":\"kv_both\"}"
- "--compilation-config"
- "{\"cudagraph_mode\":\"PIECEWISE\"}"
2 changes: 1 addition & 1 deletion .buildkite/configs/local_cpu.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
workload:
type: long_doc_qa
max-inflight-requests: 20
expected-latency-gain: 3.7
expected-latency-gain: 3.6

docker:
env:
Expand Down
63 changes: 63 additions & 0 deletions .buildkite/configs/p2p.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
workload:
type: long_doc_qa
num-documents: 20
max-inflight-requests: 2
repeat-count: 1
expected-latency: 4

feature:
type: p2p

docker1:
env:
- "LMCACHE_MAX_LOCAL_CPU_SIZE=60"
- "LMCACHE_ENABLE_ASYNC_LOADING=True"
- "LMCACHE_ENABLE_P2P=True"
- "LMCACHE_P2P_HOST=localhost"
- "LMCACHE_P2P_INIT_PORTS=8200"
- "LMCACHE_P2P_LOOKUP_PORTS=8201"
- "LMCACHE_TRANSFER_CHANNEL=nixl"
- "LMCACHE_ENABLE_CONTROLLER=True"
- "LMCACHE_LMCACHE_INSTANCE_ID=lmcache_instance_1"
- "LMCACHE_LMCACHE_WORKER_PORTS=8500"
- "LMCACHE_EXTRA_CONFIG={\"lookup_backoff_time\": 0.001}"
- "LMCACHE_SAVE_UNFULL_CHUNK=False"
- "PYTHONHASHSEED=123"
pull-port: 8300
reply-port: 8400

docker2:
env:
- "LMCACHE_MAX_LOCAL_CPU_SIZE=60"
- "LMCACHE_ENABLE_ASYNC_LOADING=True"
- "LMCACHE_ENABLE_P2P=True"
- "LMCACHE_P2P_HOST=localhost"
- "LMCACHE_P2P_INIT_PORTS=8202"
- "LMCACHE_P2P_LOOKUP_PORTS=8203"
- "LMCACHE_TRANSFER_CHANNEL=nixl"
- "LMCACHE_ENABLE_CONTROLLER=True"
- "LMCACHE_LMCACHE_INSTANCE_ID=lmcache_instance_2"
- "LMCACHE_LMCACHE_WORKER_PORTS=8501"
- "LMCACHE_EXTRA_CONFIG={\"lookup_backoff_time\": 0.001}"
- "LMCACHE_SAVE_UNFULL_CHUNK=False"
- "PYTHONHASHSEED=123"
pull-port: 8300
reply-port: 8400

vllm1:
model: "meta-llama/Llama-3.1-8B-Instruct"
args:
- "--load-format"
- "dummy"
- "--no-enable-prefix-caching"
- "--kv-transfer-config"
- "{\"kv_connector\":\"LMCacheConnectorV1\",\"kv_role\":\"kv_both\"}"

vllm2:
model: "meta-llama/Llama-3.1-8B-Instruct"
args:
- "--load-format"
- "dummy"
- "--no-enable-prefix-caching"
- "--kv-transfer-config"
- "{\"kv_connector\":\"LMCacheConnectorV1\",\"kv_role\":\"kv_both\"}"
137 changes: 135 additions & 2 deletions .buildkite/scripts/vllm-integration-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,123 @@ run_pd_lmcache() {
sleep 10
}

run_p2p_lmcache() {
local docker1="$1"
local vllm1="$2"
local docker2="$3"
local vllm2="$4"
local cfg_name="$5"
LOGFILE1="/tmp/build_${BUILD_ID}_${cfg_name}1.log"
LOGFILE2="/tmp/build_${BUILD_ID}_${cfg_name}2.log"

########## Instance 1 ##########
# docker args
docker1_args=(
--runtime nvidia
--network host
--gpus "device=0"
--volume ~/.cache/huggingface:/root/.cache/huggingface
--env VLLM_USE_FLASHINFER_SAMPLER=0
--env HF_TOKEN="$HF_TOKEN"
--env UCX_TLS=tcp
--ipc host
--shm-size 4G
)
while IFS= read -r e; do
[[ -n $e ]] && docker1_args+=(--env "$e")
done < <(yq -r '.env[]?' <<<"$docker1")
pull=$(yq -er '."pull-port"' <<<"$docker1" 2>/dev/null)
docker1_args+=(--env "LMCACHE_CONTROLLER_PULL_URL=localhost:$pull")
reply=$(yq -er '."reply-port"' <<<"$docker1" 2>/dev/null)
docker1_args+=(--env "LMCACHE_CONTROLLER_REPLY_URL=localhost:$reply")

# vllm args
vllm1_model="$(yq -r '.model' <<<"$vllm1")"
mapfile -t vllm1_cli_args < <(yq -r '.args // [] | .[]' <<<"$vllm1")
cmd_args1=(
lmcache/vllm-openai:build-latest
"$vllm1_model"
)
cmd_args1+=("${vllm1_cli_args[@]}")
cmd_args1+=("--port" "$PORT1")

##### Controller part start #####
if [ ! -d ".venv" ]; then
UV_PYTHON=python3 uv -q venv
fi
source .venv/bin/activate
uv pip install -r "$ORIG_DIR/requirements/build.txt" > /dev/null 2>&1
uv pip install torch==2.7.1 httpx fastapi uvicorn > /dev/null 2>&1
uv pip install -e "$ORIG_DIR" --no-build-isolation > /dev/null 2>&1
# Start controller
PYTHONHASHSEED=123 lmcache_controller \
--host localhost \
--port "$PORT" \
--monitor-ports "{\"pull\": ${pull}, \"reply\": ${reply}}" \
> "/tmp/build_${BUILD_ID}_${cfg_name}_controller.log" 2>&1 &
sleep 10
##### Controller part end #####

# Start docker
CID1=$(
docker run -d \
"${docker1_args[@]}" \
"${cmd_args1[@]}"
)

# Health check
wait_for_openai_api_server "$PORT1" "$vllm1_model" "$CID1"

# Logging
touch "$LOGFILE1"
docker logs -f "$CID1" >>"$LOGFILE1" 2>&1 &

########## Instance 2 ##########
# docker args
docker2_args=(
--runtime nvidia
--network host
--gpus "device=1"
--volume ~/.cache/huggingface:/root/.cache/huggingface
--env VLLM_USE_FLASHINFER_SAMPLER=0
--env HF_TOKEN="$HF_TOKEN"
--env UCX_TLS=tcp
--ipc host
--shm-size 4G
)
while IFS= read -r e; do
[[ -n $e ]] && docker2_args+=(--env "$e")
done < <(yq -r '.env[]?' <<<"$docker2")
pull=$(yq -er '."pull-port"' <<<"$docker2" 2>/dev/null)
docker2_args+=(--env "LMCACHE_CONTROLLER_PULL_URL=localhost:$pull")
reply=$(yq -er '."reply-port"' <<<"$docker2" 2>/dev/null)
docker2_args+=(--env "LMCACHE_CONTROLLER_REPLY_URL=localhost:$reply")

# vllm args
vllm2_model="$(yq -r '.model' <<<"$vllm2")"
mapfile -t vllm2_cli_args < <(yq -r '.args // [] | .[]' <<<"$vllm2")
cmd_args2=(
lmcache/vllm-openai:build-latest
"$vllm2_model"
)
cmd_args2+=("${vllm2_cli_args[@]}")
cmd_args2+=("--port" "$PORT2")

# Start docker
CID2=$(
docker run -d \
"${docker2_args[@]}" \
"${cmd_args2[@]}"
)

# Health check
wait_for_openai_api_server "$PORT2" "$vllm2_model" "$CID2"

# Logging
touch "$LOGFILE2"
docker logs -f "$CID2" >>"$LOGFILE2" 2>&1 &
}

usage() {
echo "Usage: $0 [OPTIONS]"
echo " "
Expand Down Expand Up @@ -315,6 +432,7 @@ test_vllmopenai_server_with_lmcache_integrated() {

run_long_doc_qa() {
local workload_config="$1"
local port="$2"

echo "→ Running long_doc_qa with customed workload config:"
printf '%s\n' "$workload_config"
Expand Down Expand Up @@ -349,7 +467,7 @@ run_long_doc_qa() {
uv -q pip install openai pandas matplotlib
python3 "$ORIG_DIR/benchmarks/long_doc_qa/long_doc_qa.py" \
"${workload_args[@]}" \
--port="$PORT" \
--port="$port" \
--output="response.txt"
}

Expand Down Expand Up @@ -433,6 +551,15 @@ for cfg_name in "${CONFIG_NAMES[@]}"; do
decoder_vllm_args="$(yq '.["vllm-decoder"]' "$cfg_file")"
run_pd_lmcache "$prefiller_docker_args" "$prefiller_vllm_args" "$decoder_docker_args" "$decoder_vllm_args" "$cfg_name"
model="$(yq -r '.["vllm-prefiller"].model' "$cfg_file")"
elif [[ "$feature_type" == "p2p" ]]; then
PORT1=$(find_available_port 8177)
docker1_args="$(yq '.["docker1"]' "$cfg_file")"
vllm1_args="$(yq '.["vllm1"]' "$cfg_file")"
PORT2=$(find_available_port 8277)
docker2_args="$(yq '.["docker2"]' "$cfg_file")"
vllm2_args="$(yq '.["vllm2"]' "$cfg_file")"
run_p2p_lmcache "$docker1_args" "$vllm1_args" "$docker2_args" "$vllm2_args" "$cfg_name"
model="$(yq -r '.["vllm1"].model' "$cfg_file")"
elif [[ -z "$feature_type" ]]; then
docker_args="$(yq '.docker' "$cfg_file")"
vllm_args="$(yq '.vllm' "$cfg_file")"
Expand All @@ -446,7 +573,13 @@ for cfg_name in "${CONFIG_NAMES[@]}"; do
test_vllmopenai_server_with_lmcache_integrated "$model"
elif [ "$test_mode" = "long_doc_qa" ]; then
workload_yaml="$(yq "(.workload * {\"model\": \"$model\"}) | del(.type)" "$cfg_file")"
run_long_doc_qa "$workload_yaml"
if [[ "$feature_type" == "p2p" ]]; then
tmp_workload_yaml=$(jq 'del(."expected-latency")' <<< "$workload_yaml")
run_long_doc_qa "$tmp_workload_yaml" "$PORT1"
run_long_doc_qa "$workload_yaml" "$PORT2"
else
run_long_doc_qa "$workload_yaml" "$PORT"
fi
fi

cleanup 0
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/blank_issue.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ labels: ''
assignees: ''
---
**Label**
Please label your issue so that it can easily be easily categorized under [LMCache Onboarding](https://github.com/LMCache/LMCache/issues/627)
Please label your issue so that it can easily be easily categorized under [LMCache Onboarding](https://github.com/LMCache/LMCache/issues/1882)

**Summary**
A concise overview of the issue you want to raise.
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ assignees: ''

---
**Label**
Please label your issue with "bug" and any other relevant labels so that it can easily be easily categorized under [LMCache Onboarding](https://github.com/LMCache/LMCache/issues/627)
Please label your issue with "bug" and any other relevant labels so that it can easily be easily categorized under [LMCache Onboarding](https://github.com/LMCache/LMCache/issues/1882)

**Describe the bug**
A clear and concise description of what the bug is.
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ assignees: ''

---
**Label**
Please label your issue with "new feature" and any other relevant labels so that it can easily be easily categorized under [LMCache Onboarding](https://github.com/LMCache/LMCache/issues/627)
Please label your issue with "new feature" and any other relevant labels so that it can easily be easily categorized under [LMCache Onboarding](https://github.com/LMCache/LMCache/issues/1882)

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Expand Down
44 changes: 8 additions & 36 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,12 @@
FILL IN THE PR DESCRIPTION HERE
<!-- Thanks for your contribution to LMCache! Here are some tips for you:
1. Make sure to read the Contributing Guide before submitting your PR: https://github.com/LMCache/LMCache/blob/dev/CONTRIBUTING.md
2. If this PR closes another issue, add 'Fixes #<issue number>' somewhere in the PR summary. GitHub will automatically close that issue when this PR gets merged. Alternatively, adding 'Refs #<issue number>' will not close the issue, but help provide the reviewer more context.-->

FIX #xxxx (*link existing issues this PR will resolve*)
**What this PR does / why we need it**:

**PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE**
**Special notes for your reviewers**:

---
**If applicable**:

<details>
<!-- inside this <details> section, markdown rendering does not work, so we use raw html here. -->
<summary><b> PR Checklist (Click to Expand) </b></summary>

<p>Thank you for your contribution to LMCache! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.</p>

<h3>PR Title and Classification</h3>
<p>Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:</p>
<ul>
<li><code>[Bugfix]</code> for bug fixes.</li>
<li><code>[CI/Build]</code> for build or continuous integration improvements.</li>
<li><code>[Doc]</code> for documentation fixes and improvements.</li>
<li><code>[Model]</code> for adding a new model or improving an existing model. Model name should appear in the title.</li>
<li><code>[Core]</code> for changes in the core LMCache logic (e.g., <code>LMCacheEngine</code>, <code>Backend</code> etc.)</li>
<li><code>[Misc]</code> for PRs that do not fit the above categories. Please use this sparingly.</li>
</ul>
<p><strong>Note:</strong> If the PR spans more than one category, please include all relevant prefixes.</p>

<h3>Code Quality</h3>

<p>The PR need to meet the following code quality standards:</p>

<ul>
<li>The code need to be well-documented to ensure future contributors can easily understand the code.</li>
<li> Please include sufficient unit tests to ensure the change is stay correct and robust. The unit and integration tests will always run and our comprehensive test will be triggered after the "full" label is tagged onto a PR.</li>
</ul>

<h3>What to Expect for the Reviews</h3>

We aim to address all PRs in a timely manner. If no one reviews your PR within 5 days, please @-mention one of KuntaiDu, ApostaC or YaoJiayi.

</details>
- [ ] this PR contains user facing changes - docs added
- [ ] this PR contains unit tests
17 changes: 17 additions & 0 deletions .github/workflows/automerge-labeler.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: Label auto-merge PRs

on:
pull_request_target:
types: [ auto_merge_enabled, auto_merge_disabled ]

permissions:
pull-requests: write

jobs:
add_remove_labels:
runs-on: ubuntu-latest
steps:
- uses: ubuntudroid/automerge-labeler@v1
with:
token: ${{ secrets.GITHUB_TOKEN }}
label: 'full'
Loading
Loading