Skip to content

ggml-cpu: add AVX-512-VNNI dot product for Q2_0 (x86)#44

Draft
khosravipasha wants to merge 1 commit into
pr/q2_0-cpufrom
pr/q2_0-x86
Draft

ggml-cpu: add AVX-512-VNNI dot product for Q2_0 (x86)#44
khosravipasha wants to merge 1 commit into
pr/q2_0-cpufrom
pr/q2_0-x86

Conversation

@khosravipasha

Copy link
Copy Markdown
Collaborator

PR DRAFT for testing and initial review

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an x86 implementation of ggml_vec_dot_q2_0_q8_0 that uses AVX-512-VNNI (when available) to accelerate dot products for Q2_0 weights against Q8_0 activations, and updates the generic-fallback renaming to avoid symbol collisions.

Changes:

  • Implement ggml_vec_dot_q2_0_q8_0 in arch/x86/quants.c, with an AVX-512-VNNI + AVX-512VL path and a scalar fallback.
  • Stop renaming ggml_vec_dot_q2_0_q8_0_generic to ggml_vec_dot_q2_0_q8_0 on x86 in arch-fallback.h (since a native x86 symbol now exists).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
ggml/src/ggml-cpu/arch/x86/quants.c Adds x86 AVX-512-VNNI implementation for Q2_0·Q8_0 dot product (plus fallback).
ggml/src/ggml-cpu/arch-fallback.h Removes x86 macro rename that would otherwise clash with the new native implementation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

const __m256i qy = _mm256_loadu_si256((const __m256i *) yb->qs);
const __m128i src = _mm_loadl_epi64((const __m128i *) &x[i].qs[k * 8]); // 8 bytes
// replicate each byte 4x, then extract field c via (b<<(6-2c))>>6 & 3
const __m256i rep = _mm256_set_m128i(_mm_shuffle_epi8(src, idxhi), _mm_shuffle_epi8(src, idxlo));
Comment on lines +601 to +605
#else
for (int i = 0; i < nb; i++) {
const float d0 = GGML_CPU_FP16_TO_FP32(x[i].d);

float sumi = 0.0f;
Co-authored-by: bri-prism <288398250+bri-prism@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants