feat: add CompactVec collection with auto-spill support#4
feat: add CompactVec collection with auto-spill support#4
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new CompactVec<T, N> collection type that optimizes memory usage by storing up to N elements inline before automatically spilling to heap-allocated storage. This complements the existing string-focused types (CowStr and InlineStr) by providing a similar optimization pattern for generic vector collections.
Key changes:
- Implements
CompactVec<T, N>with automatic inline-to-heap transition - Provides comprehensive trait implementations (Clone, Debug, Index, Iterator, etc.)
- Includes Serde support when the feature is enabled
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| src/lib.rs | Adds the new compact_vec module and re-exports its types |
| src/compact_vec.rs | Complete implementation of CompactVec with 739 lines including documentation, core logic, trait implementations, and comprehensive tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| impl<T: fmt::Debug, const N: usize> fmt::Debug for CompactVec<T, N> { | ||
| fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
| write!(f, "CompactVec<{N}> {s:?}", s = self.as_slice()) |
There was a problem hiding this comment.
The Debug format string uses a named parameter s but also includes the const generic N directly in the format string using the inline constant syntax. While this works, it would be more consistent to format the capacity value explicitly, especially since it's available via self.capacity(). Consider using a more standard Debug format that shows whether the data is inline or heap-allocated, which would be more informative for debugging.
| write!(f, "CompactVec<{N}> {s:?}", s = self.as_slice()) | |
| let storage = if self.heap.is_some() { "heap" } else { "inline" }; | |
| f.debug_struct("CompactVec") | |
| .field("len", &self.len) | |
| .field("capacity", &self.capacity()) | |
| .field("storage", &storage) | |
| .field("elements", &self.as_slice()) | |
| .finish() |
| fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error> | ||
| where | ||
| A: SeqAccess<'de>, | ||
| { | ||
| let mut vec = CompactVec::new(); | ||
| while let Some(value) = seq.next_element::<T>()? { | ||
| vec.push(value); | ||
| } | ||
| Ok(vec) | ||
| } |
There was a problem hiding this comment.
The Deserialize implementation doesn't attempt to use the size hint from the deserializer to pre-allocate capacity. When deserializing sequences with known lengths (like JSON arrays), this could be used to avoid multiple reallocations. Consider using seq.size_hint() to create the CompactVec with appropriate initial capacity via with_capacity().
| Some(ref mut heap) => { | ||
| // Dropping the Vec will drop its contents automatically. | ||
| heap.clear(); |
There was a problem hiding this comment.
The explicit call to heap.clear() is redundant. When the Vec is dropped (which happens automatically at the end of the function), it will drop all its contents. This extra clear() call adds unnecessary overhead without any benefit.
| Some(ref mut heap) => { | |
| // Dropping the Vec will drop its contents automatically. | |
| heap.clear(); | |
| Some(_) => { | |
| // Dropping the Vec will drop its contents automatically; no need to clear. |
| Some(heap) => heap, | ||
| None => { | ||
| let mut vec = Vec::with_capacity(self.len); | ||
| for i in 0..self.len { | ||
| unsafe { | ||
| vec.push(self.inline[i].assume_init_read()); | ||
| } | ||
| } |
There was a problem hiding this comment.
After taking the heap Vec or reading inline elements, the Drop implementation will still run on self. Since heap is taken via take() (leaving it as None), the Drop impl will attempt to drop inline elements based on self.len. However, those elements have already been moved via assume_init_read(). This could lead to double-free or use-after-free. Consider using mem::forget(self) after moving the elements to prevent the Drop implementation from running, or set self.len = 0 before returning.
| Some(heap) => heap, | |
| None => { | |
| let mut vec = Vec::with_capacity(self.len); | |
| for i in 0..self.len { | |
| unsafe { | |
| vec.push(self.inline[i].assume_init_read()); | |
| } | |
| } | |
| Some(heap) => { | |
| // All elements live in `heap`. After taking it, ensure `Drop` | |
| // sees an empty inline vector. | |
| self.len = 0; | |
| heap | |
| } | |
| None => { | |
| // All elements currently live inline; move them into a Vec. | |
| let len = self.len; | |
| let mut vec = Vec::with_capacity(len); | |
| for i in 0..len { | |
| unsafe { | |
| vec.push(self.inline[i].assume_init_read()); | |
| } | |
| } | |
| // We've moved out all initialized inline elements; prevent | |
| // `Drop` from attempting to drop them again. | |
| self.len = 0; |
| impl<T, I: Into<usize>, const N: usize> Index<I> for CompactVec<T, N> { | ||
| type Output = T; | ||
| fn index(&self, index: I) -> &Self::Output { | ||
| let index = index.into(); | ||
| match self.heap { | ||
| Some(ref heap) => &heap[index], | ||
| None => { | ||
| assert!(index < self.len, "index out of bounds"); | ||
| // SAFETY: index < len ensures the element is initialized. | ||
| unsafe { &*self.inline[index].as_ptr() } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The Index and IndexMut trait implementations accept I: Into<usize> as a generic bound, which is broader than the standard library's approach. This allows types like u8, u16, u32, etc. to be used as indices. However, this is inconsistent with how standard Rust collections work (which implement Index for specific types like usize, Range<usize>, etc.). This could lead to confusion and unexpected behavior. Consider following the standard library pattern by implementing Index for specific index types.
| pub fn extend<I: IntoIterator<Item = T>>(&mut self, iter: I) { | ||
| for item in iter { | ||
| self.push(item); | ||
| } | ||
| } |
There was a problem hiding this comment.
The extend method pushes items one at a time without pre-allocating space when the iterator provides a size hint. This can lead to multiple reallocations when extending with a large iterator. Consider checking if the iterator implements ExactSizeIterator or provides a lower bound via size_hint(), and pre-allocating capacity accordingly to improve performance.
| pub const fn len(&self) -> usize { | ||
| match &self.heap { | ||
| Some(heap) => heap.len(), | ||
| None => self.len, | ||
| } | ||
| } |
There was a problem hiding this comment.
The len() method is marked as const, but it calls heap.len() which requires the Vec's len() method to be const. While this works in recent Rust versions (1.39+), the as_slice() and as_mut_slice() methods are also marked const and use more complex unsafe operations. Verify that all const operations here are actually valid in const contexts, as const support for these patterns may be limited in older Rust versions that this crate might target.
| pub fn with_capacity(capacity: usize) -> Self { | ||
| if capacity <= N { | ||
| Self::new() | ||
| } else { | ||
| Self { | ||
| // SAFETY: uninitialized array is valid for inline storage. | ||
| inline: unsafe { | ||
| MaybeUninit::<[MaybeUninit<T>; N]>::uninit().assume_init() | ||
| }, | ||
| len: 0, | ||
| heap: Some(Vec::with_capacity(capacity)), | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing test coverage for the with_capacity method when creating a heap-backed instance. While line 620-622 tests that with_capacity(12) creates a heap-backed instance, there's no test that verifies pushing elements into such a pre-allocated heap-backed vector works correctly, or that the capacity is actually used (no intermediate reallocations occur).
| // previous capacity for amortized growth. | ||
| let mut vec = Vec::with_capacity(N * 2 + 1); |
There was a problem hiding this comment.
The spill capacity calculation uses N * 2 + 1 which is reasonable for most cases, but for N = 0, this results in a capacity of 1, which means immediate reallocation on the next push. While this works, consider using a more standard growth strategy like max(N * 2, 4) or max(N * 2, 8) to avoid frequent reallocations in the edge case where N is very small.
| // previous capacity for amortized growth. | |
| let mut vec = Vec::with_capacity(N * 2 + 1); | |
| // previous capacity for amortized growth. Use a small | |
| // minimum to avoid tiny allocations when N is very small. | |
| let spill_capacity = if N == 0 { 4 } else { N * 2 }; | |
| let mut vec = Vec::with_capacity(spill_capacity); |
| impl<T, const N: usize> CompactVec<T, N> { | ||
| /// Creates a new empty `CompactVec` with the specified inline | ||
| /// capacity. No heap allocation occurs until more than `N` elements | ||
| /// are pushed. | ||
| pub fn new() -> Self { | ||
| // SAFETY: An uninitialized array of `MaybeUninit<T>` is valid. | ||
| let inline = | ||
| unsafe { MaybeUninit::<[MaybeUninit<T>; N]>::uninit().assume_init() }; | ||
| Self { | ||
| inline, | ||
| len: 0, | ||
| heap: None, | ||
| } | ||
| } | ||
|
|
||
| /// Creates a new `CompactVec` with enough capacity to hold at least | ||
| /// `capacity` elements without reallocation. If `capacity` is less | ||
| /// than or equal to `N`, no heap allocation will occur. Otherwise, | ||
| /// a `Vec` of the requested capacity will be allocated. | ||
| pub fn with_capacity(capacity: usize) -> Self { | ||
| if capacity <= N { | ||
| Self::new() | ||
| } else { | ||
| Self { | ||
| // SAFETY: uninitialized array is valid for inline storage. | ||
| inline: unsafe { | ||
| MaybeUninit::<[MaybeUninit<T>; N]>::uninit().assume_init() | ||
| }, | ||
| len: 0, | ||
| heap: Some(Vec::with_capacity(capacity)), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Returns the number of elements in the vector. | ||
| pub const fn len(&self) -> usize { | ||
| match &self.heap { | ||
| Some(heap) => heap.len(), | ||
| None => self.len, | ||
| } | ||
| } | ||
|
|
||
| /// Returns `true` if the vector contains no elements. | ||
| pub const fn is_empty(&self) -> bool { | ||
| self.len() == 0 | ||
| } | ||
|
|
||
| /// Returns the total capacity of the vector. When stored inline, | ||
| /// this equals `N`; when stored on the heap, this delegates to the | ||
| /// internal `Vec`'s capacity. | ||
| pub fn capacity(&self) -> usize { | ||
| match &self.heap { | ||
| Some(heap) => heap.capacity(), | ||
| None => N, | ||
| } | ||
| } | ||
|
|
||
| /// Returns `true` if the data is currently stored inline (i.e., | ||
| /// `len() <= N` and `heap` is `None`). | ||
| pub fn is_inline(&self) -> bool { | ||
| self.heap.is_none() | ||
| } | ||
|
|
||
| /// Provides an immutable slice of all elements in the vector. | ||
| pub const fn as_slice(&self) -> &[T] { | ||
| match &self.heap { | ||
| Some(heap) => heap.as_slice(), | ||
| None => { | ||
| // SAFETY: The first `len` elements of `inline` are | ||
| // initialized. We create a slice of that many elements. | ||
| unsafe { | ||
| core::slice::from_raw_parts( | ||
| self.inline.as_ptr() as *const T, | ||
| self.len, | ||
| ) | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Provides a mutable slice of all elements in the vector. | ||
| pub const fn as_mut_slice(&mut self) -> &mut [T] { | ||
| match &mut self.heap { | ||
| Some(heap) => heap.as_mut_slice(), | ||
| None => { | ||
| // SAFETY: The first `len` elements of `inline` are | ||
| // initialized. We create a mutable slice of that many | ||
| // elements. No aliasing occurs because either `heap` is | ||
| // `None` (so no other references exist) or we go into the | ||
| // `Some` branch above. | ||
| unsafe { | ||
| core::slice::from_raw_parts_mut( | ||
| self.inline.as_mut_ptr() as *mut T, | ||
| self.len, | ||
| ) | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Pushes a value onto the end of the vector. If the inline | ||
| /// storage is full, all existing elements are moved into a new | ||
| /// `Vec` and subsequent pushes are delegated to that vector. | ||
| pub fn push(&mut self, value: T) { | ||
| match self.heap { | ||
| Some(ref mut heap) => { | ||
| heap.push(value); | ||
| } | ||
| None => { | ||
| if self.len < N { | ||
| // SAFETY: We have capacity in `inline` at index `len`. | ||
| unsafe { | ||
| self.inline[self.len].as_mut_ptr().write(value); | ||
| } | ||
| self.len += 1; | ||
| } else { | ||
| // Spill to heap: allocate a new Vec with double the | ||
| // previous capacity for amortized growth. | ||
| let mut vec = Vec::with_capacity(N * 2 + 1); | ||
| // Move the existing inline elements into the Vec. | ||
| for i in 0..self.len { | ||
| // SAFETY: `i < len` so inline[i] is initialized. | ||
| unsafe { | ||
| vec.push(self.inline[i].assume_init_read()); | ||
| } | ||
| } | ||
| vec.push(value); | ||
| self.heap = Some(vec); | ||
| // We no longer use the inline storage, so reset len. | ||
| self.len = 0; | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Removes the last element from the vector and returns it, or | ||
| /// `None` if it is empty. If popping from a heap‑backed vector | ||
| /// results in a length that can be stored inline, the data is | ||
| /// automatically moved back into the inline storage to free the | ||
| /// heap allocation. | ||
| pub fn pop(&mut self) -> Option<T> { | ||
| match self.heap { | ||
| Some(ref mut heap) => { | ||
| let value = heap.pop(); | ||
| if let Some(v) = value { | ||
| // If the remaining length fits into inline storage, move | ||
| // back onto the stack. | ||
| if heap.len() <= N { | ||
| let mut new_len = 0; | ||
| for elem in heap.drain(..) { | ||
| // SAFETY: We have ensured that `heap.len()` | ||
| // is less than or equal to `N`, so there is | ||
| // enough space in `inline` to store all | ||
| // remaining elements. | ||
| unsafe { | ||
| self.inline[new_len].as_mut_ptr().write(elem); | ||
| } | ||
| new_len += 1; | ||
| } | ||
| self.heap = None; | ||
| self.len = new_len; | ||
| } | ||
| Some(v) | ||
| } else { | ||
| None | ||
| } | ||
| } | ||
| None => { | ||
| if self.len == 0 { | ||
| None | ||
| } else { | ||
| self.len -= 1; | ||
| // SAFETY: `len` has been decremented, so the element | ||
| // at index `len` is initialized and can be read. After | ||
| // reading, we leave the memory uninitialized. | ||
| Some(unsafe { self.inline[self.len].assume_init_read() }) | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Clears the vector, removing all values. This resets the vector | ||
| /// back to an empty inline state, deallocating any heap storage. | ||
| pub fn clear(&mut self) { | ||
| match self.heap { | ||
| Some(ref mut heap) => { | ||
| heap.clear(); | ||
| self.heap = None; | ||
| self.len = 0; | ||
| } | ||
| None => { | ||
| // Drop all inline elements | ||
| for i in 0..self.len { | ||
| unsafe { | ||
| self.inline[i].assume_init_drop(); | ||
| } | ||
| } | ||
| self.len = 0; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Returns an iterator over the slice. | ||
| pub fn iter(&self) -> core::slice::Iter<'_, T> { | ||
| self.as_slice().iter() | ||
| } | ||
|
|
||
| /// Returns a mutable iterator over the slice. | ||
| pub fn iter_mut(&mut self) -> core::slice::IterMut<'_, T> { | ||
| self.as_mut_slice().iter_mut() | ||
| } | ||
|
|
||
| /// Extends the vector with the contents of an iterator. Items are | ||
| /// pushed individually, potentially causing a spill from inline to | ||
| /// heap if the total number of elements exceeds the inline capacity. | ||
| pub fn extend<I: IntoIterator<Item = T>>(&mut self, iter: I) { | ||
| for item in iter { | ||
| self.push(item); | ||
| } | ||
| } | ||
|
|
||
| /// Consumes the `CompactVec` and returns a standard `Vec<T>` with | ||
| /// identical contents. This performs at most one allocation and | ||
| /// moves all elements out of the inline storage if necessary. | ||
| pub fn into_vec(mut self) -> Vec<T> { | ||
| match self.heap.take() { | ||
| Some(heap) => heap, | ||
| None => { | ||
| let mut vec = Vec::with_capacity(self.len); | ||
| for i in 0..self.len { | ||
| unsafe { | ||
| vec.push(self.inline[i].assume_init_read()); | ||
| } | ||
| } | ||
| vec | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The CompactVec API is missing common Vec methods like reserve(), reserve_exact(), shrink_to_fit(), truncate(), insert(), remove(), etc. While the current API provides basic functionality, users might expect these standard collection methods to be available. Consider documenting which Vec methods are intentionally omitted and why, or implementing them for API completeness.
No description provided.