Performance of llama.cpp on Intel GPU with SYCL backend #23313
Replies: 16 comments 46 replies
-
|
compiled with cmake -B build-sycl -DGGML_SYCL=ON -DGGML_SYCL_F16=ON -DGGML_SYCL_TARGET=INTEL -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_FLAGS="-march=znver4" -DCMAKE_CXX_FLAGS="-march=znver4" -DCMAKE_BUILD_TYPE=Release && cmake --build build-sycl --config Release -j 16 single b70 dual b70: |
Beta Was this translation helpful? Give feedback.
-
|
If instead compiling and using with f16=off: cmake -B build-sycl -DGGML_SYCL=ON -DGGML_SYCL_F16=OFF -DGGML_SYCL_TARGET=INTEL -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCMAKE_C_FLAGS="-march=znver4" -DCMAKE_CXX_FLAGS="-march=znver4" -DCMAKE_BUILD_TYPE=Release && cmake --build build-sycl --config Release -j 16 Single B70: Dual B70: |
Beta Was this translation helpful? Give feedback.
-
|
And with a much more interesting model, namely Qwen 3.6 27B: q4 and q8: |
Beta Was this translation helpful? Give feedback.
-
|
Ooft. A770 16GB, i5 14600k, current cachyOS. fp16: fp32: Very weird, compared with the one in the table - much better prefill, half the decode performance. |
Beta Was this translation helpful? Give feedback.
-
|
B580, AMD Ryzen 7 5700X3D, Ubuntu 25.10 built with fp16 in |
Beta Was this translation helpful? Give feedback.
-
|
Intel Arc Pro B50, Intel i7-8700 32GB RAM build: c0c7e14 (9298)
build: 2f6c815 (9397)
Command line arguments: Build options: cmake .. -B build -DGGML_VULKAN=1 -DGGML_RPC=ON nothing else changed between these runs, I tested my old version, ran "git pull", built it and retested |
Beta Was this translation helpful? Give feedback.
-
|
I hope it helps. 255H, ARC 140T, 32GB RAM
build: d4c8e2c (9442) |
Beta Was this translation helpful? Give feedback.
-
|
~/llama.cpp$ cmake -B build/ReleaseOV -G Ninja -DCMAKE_BUILD_TYPE=Release -DGGML_OPENVINO=ON :~/llama.cpp$ GGML_OPENVINO_STATEFUL_EXECUTION=1
specs https://www.asrockind.com/en-gb/NUC%20BOX-358H |
Beta Was this translation helpful? Give feedback.
-
|
📊 Intel Panther Lake Xe3 iGPU (12 EU) Benchmark Matrix: OpenVINO vs. Vulkan vs. SYCL Benchmarking sweep across all three major acceleration backends available in Environment
Models Tested
📈 Performance Summary Matrix
🛠️ Deep-Dive Analysis
📋 Raw Build & Execution Logs 1. Qwen3 80B MoE — OpenVINO Crash Log |
Beta Was this translation helpful? Give feedback.
-
|
HW:ryzen5 5600X, DDR4-3600 128GB, ARC B570 FP32 FP16 |
Beta Was this translation helpful? Give feedback.
-
|
HW:ryzen5 5700X, DDR4-3600 64GB, ARC B580 + ARC PRO B60 (24gb) llama build b60+b580
build: 65ef50a (9501) ./llama-bench -m ~/llama-2-7b.Q4_0.gguf -fa 1
build: 65ef50a (9501) on arc b60 ./llama-bench -m ~/llama-2-7b.Q4_0.gguf -fa 0 -dev sycl0
build: 65ef50a (9501) ./llama-bench -m ~/llama-2-7b.Q4_0.gguf -dev sycl0 -fa 1
build: 65ef50a (9501) on arc b580 ./llama-bench -m ~/llama-2-7b.Q4_0.gguf -dev sycl1 -fa 0
build: 65ef50a (9501) ./llama-bench -m ~/llama-2-7b.Q4_0.gguf -dev sycl1 -fa 1
build: 65ef50a (9501) VULKAN After promt processing, the GPU frequency is reset to minimum and TG is low ./llama-bench -m ~/llama-2-7b.Q4_0.gguf -fa 0
build: 65ef50a (9501) I set the minimum frequency on the GPU to b60 2300 and b580 2683 ./llama-bench -m ~/llama-2-7b.Q4_0.gguf -fa 0
build: 65ef50a (9501) ./llama-bench -m ~/llama-2-7b.Q4_0.gguf -fa 1
build: 65ef50a (9501) on B60 ./llama-bench -m ~/llama-2-7b.Q4_0.gguf -dev vulkan1 -fa 0
build: 65ef50a (9501) ./llama-bench -m ~/llama-2-7b.Q4_0.gguf -dev vulkan1 -fa 1
build: 65ef50a (9501) on b580 ./llama-bench -m ~/llama-2-7b.Q4_0.gguf -dev vulkan0 -fa 0
build: 65ef50a (9501) ./llama-bench -m ~/llama-2-7b.Q4_0.gguf -dev vulkan0 -fa 1
build: 65ef50a (9501) |
Beta Was this translation helpful? Give feedback.
-
Arc A380I know, I know... I got this it for its AV1 encoding/decoding, not LLMs, but here we are...
F16
F32
Got a couple of warnings during execution:
Other info: Build config (F16/F32 variations):
Found 1 SYCL devices:
SYCL Optimization Feature:
build: 7c158fb (b9518) |
Beta Was this translation helpful? Give feedback.
-
|
Hi,
./llama-bench -fa 0,1 -m ../../models/llama-2-7b.Q4_0.gguf
build: 6471e3c (9607) sycl-ls Thanks for your great work! |
Beta Was this translation helpful? Give feedback.
-
A380 - DockerApologies for the testing in docker, my local env is messed up in all sort of ways, im unable to test on bare metal however Ill share the docker compose and commands to reproduce if anyone is interested. Ill also attach other benchmarks to compare the current state as of this commit e95dae1 All results are the third results printed SYCL F16
SYCL F32
I ran a similar docker image a few days ago and I remember my results being far far better, not sure what has happened with the pp. Vulkanggml_vulkan: Found 1 Vulkan devices:
OpenvinoOpenVINO: using device GPU
Docker composeservices:
bench-openvino:
build:
context: .
dockerfile: .devops/openvino.Dockerfile
target: full
image: llama.cpp:full-openvino-local
devices:
- /dev/dri:/dev/dri
volumes:
- llama-cache:/models
environment:
- LD_LIBRARY_PATH=/app
- GGML_OPENVINO_DEVICE=${GGML_OPENVINO_DEVICE:-GPU}
- GGML_OPENVINO_STATEFUL_EXECUTION=1
- LLAMA_CACHE=/models
entrypoint: /app/llama-bench
command:
- -hf
- ${HF_REPO:-TheBloke/Llama-2-7B-GGUF:Q4_0}
- -fa
- "1"
- -ngl
- "99"
bench-sycl-f16:
build:
context: .
dockerfile: .devops/intel.Dockerfile
target: full
args:
GGML_SYCL_F16: "ON"
image: llama.cpp:full-sycl-f16-local
devices:
- /dev/dri:/dev/dri
volumes:
- llama-cache:/models
environment:
- LLAMA_CACHE=/models
- ONEAPI_DEVICE_SELECTOR=level_zero:0
- ZES_ENABLE_SYSMAN=1
entrypoint: /app/llama-bench
command:
- -hf
- ${HF_REPO:-TheBloke/Llama-2-7B-GGUF:Q4_0}
- -fa
- "1,0"
- -ngl
- "99"
bench-sycl-f32:
build:
context: .
dockerfile: .devops/intel.Dockerfile
target: full
args:
GGML_SYCL_F16: "OFF"
image: llama.cpp:full-sycl-f32-local
devices:
- /dev/dri:/dev/dri
volumes:
- llama-cache:/models
environment:
- LLAMA_CACHE=/models
- ONEAPI_DEVICE_SELECTOR=level_zero:0
- ZES_ENABLE_SYSMAN=1
entrypoint: /app/llama-bench
command:
- -hf
- ${HF_REPO:-TheBloke/Llama-2-7B-GGUF:Q4_0}
- -fa
- "1,0"
- -ngl
- "99"
bench-vulkan:
build:
context: .
dockerfile: .devops/vulkan.Dockerfile
target: full
image: llama.cpp:full-vulkan-local
devices:
- /dev/dri:/dev/dri
volumes:
- llama-cache:/models
environment:
- LLAMA_CACHE=/models
entrypoint: /app/llama-bench
command:
- -hf
- ${HF_REPO:-TheBloke/Llama-2-7B-GGUF:Q4_0}
- -fa
- "1,0"
- -ngl
- "99"
volumes:
llama-cache:
name: llama-cacheCommandsdocker compose run --build --rm bench-openvino # OpenVINO
docker compose run --build --rm bench-sycl-f16 # SYCL F16
docker compose run --build --rm bench-sycl-f32 # SYCL F32
docker compose run --build --rm bench-vulkan # VulkanFor repeated runs remove the |
Beta Was this translation helpful? Give feedback.
-
|
@toomanybyt3s Could you check the driver by following cmds? |
Beta Was this translation helpful? Give feedback.
-
|
Hardware: Intel Core Ultra 5 250K Plus, DDR5-6400 16GBx1, Intel Arc B580 LE with minimum core clock set to 2850 MHz since it seems to drop during inference despite having plenty thermal headroom
Gemma4 E4B has 4B parameters active, so I have calculated the effective memory bandwidth utilization as 45%. There seems to be some overhead with MoE models in general. |
Beta Was this translation helpful? Give feedback.

Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Purpose
It's used to share the performance data on Intel GPU with SYCL backend.
The performance data is only used as reference, since we don't double check the data.
It can not be used as any commercial purpose.
Rule
Encourage to test with default setting (environment variables).
If you want to update the data with special building or running setting, please create a new table.
Create/update the tables directly following the format.
Insert new record, instead of update it for same keys; Sort the records by col1, col2, col3.
Add your comments in the latest for more discussion.
Don't add table to compare with other hardware, framework or backend.
Please run 1+ times and update with the stable data.
Performance data on Intel GPU
Default setting
Build:
Run:
Data:
FP16
t/s
t/s
DDR5-6400 16GB
DDR5-6400 16GB
- Medium
32GB
- Medium
32GB
DDR5-6400 16GB
DDR5-6400 16GB
64GB
24.04.4
64GB
24.04.4
5600X
DDR4-3600
128GB
24.04.4
6.17.0-29-generic
5600X
DDR4-3600
128GB
24.04.4
6.17.0-29-generic
5600X
DDR4-3600
128GB
24.04.4
6.17.0-29-generic
5600X
DDR4-3600
128GB
24.04.4
6.17.0-29-generic
5700X3D
25.10
DDR5-6400 16GB
DDR5-6400 16GB
5700X3D
25.10
(12 EUs)
Panther Lake
26.04
(12 EUs)
Panther Lake
26.04
(12 EUs)
Panther Lake
26.04
(12 EUs)
Panther Lake
26.04
(12 EUs)
Panther Lake
26.04
(12 EUs)
Panther Lake
26.04
(12 EUs)
Panther Lake
26.04
(12 EUs)
Panther Lake
26.04
(12 EUs)
Panther Lake
26.04
(12 EUs)
Panther Lake
26.04
(12 EUs)
Panther Lake
26.04
More PP/TG Types:
FP16
t/s
t/s
t/s
t/s
(12 EUs)
Panther Lake
26.04
(12 EUs)
Panther Lake
26.04
Beta Was this translation helpful? Give feedback.
All reactions