ggml-cpu: add AVX-512-VNNI dot product for Q2_0 (x86)#44
Draft
khosravipasha wants to merge 1 commit into
Draft
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an x86 implementation of ggml_vec_dot_q2_0_q8_0 that uses AVX-512-VNNI (when available) to accelerate dot products for Q2_0 weights against Q8_0 activations, and updates the generic-fallback renaming to avoid symbol collisions.
Changes:
- Implement
ggml_vec_dot_q2_0_q8_0inarch/x86/quants.c, with an AVX-512-VNNI + AVX-512VL path and a scalar fallback. - Stop renaming
ggml_vec_dot_q2_0_q8_0_generictoggml_vec_dot_q2_0_q8_0on x86 inarch-fallback.h(since a native x86 symbol now exists).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
ggml/src/ggml-cpu/arch/x86/quants.c |
Adds x86 AVX-512-VNNI implementation for Q2_0·Q8_0 dot product (plus fallback). |
ggml/src/ggml-cpu/arch-fallback.h |
Removes x86 macro rename that would otherwise clash with the new native implementation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const __m256i qy = _mm256_loadu_si256((const __m256i *) yb->qs); | ||
| const __m128i src = _mm_loadl_epi64((const __m128i *) &x[i].qs[k * 8]); // 8 bytes | ||
| // replicate each byte 4x, then extract field c via (b<<(6-2c))>>6 & 3 | ||
| const __m256i rep = _mm256_set_m128i(_mm_shuffle_epi8(src, idxhi), _mm_shuffle_epi8(src, idxlo)); |
Comment on lines
+601
to
+605
| #else | ||
| for (int i = 0; i < nb; i++) { | ||
| const float d0 = GGML_CPU_FP16_TO_FP32(x[i].d); | ||
|
|
||
| float sumi = 0.0f; |
0f07ba4 to
a69cff5
Compare
3f213d3 to
19d565f
Compare
Co-authored-by: bri-prism <288398250+bri-prism@users.noreply.github.com>
a69cff5 to
dc7c932
Compare
19d565f to
1ab5667
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR DRAFT for testing and initial review