Report correct L2 cache size on ARM (Neoverse V1/V2) by Rohanjames1997 · Pull Request #372 · pytorch/cpuinfo

Rohanjames1997 · 2026-02-05T00:21:25Z

Fixes #369

This is my first attempt at fixing the L2 size reporting issue. Comments and feedback are welcome.

I tested on Arm Neoverse V1 and V2 EC2 instances. I reused the reproducer in #369

#include <cstddef>
#include <cstdio>
#include <cpuinfo.h>
size_t l2_bytes() {
  if (!cpuinfo_initialize()) return 0;
  const cpuinfo_processor* p = cpuinfo_get_current_processor();
  if (!p || !p->cache.l2) return 0;
  return p->cache.l2->size; // bytes
}
int main() { std::printf("%zu\n", l2_bytes()); }

This now returns the expected results on Arm Neoverse V1 and V2

Additionally, here is the output of ./cache-info.

On Neoverse-V1:

Max cache size (upper bound): 4194304 bytes
L1 instruction cache: 64 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L1 data cache: 64 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L2 data cache: 64 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 1 processors

On Neoverse-V2:

Max cache size (upper bound): 4194304 bytes
L1 instruction cache: 64 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L1 data cache: 64 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L2 data cache: 64 x 2 MB (inclusive), 8-way set associative (4096 sets), 64 byte lines, shared by 1 processors

Radu2k

This looks great, thanks for addressing the issue! The only suggestion I would have is to add a check that level is matching used index or checking indexes and matching by level. This slipped through my fingers when I wrote the implementation in vllm and could lead to wrong cache values.

Here is tree -L 2 at /sys/devices/system/cpu/cpu0/cache and if you do a cat on level should see 2 at index2:

.
├── index0
│   ├── allocation_policy
│   ├── coherency_line_size
│   ├── level
│   ├── number_of_sets
│   ├── shared_cpu_list
│   ├── shared_cpu_map
│   ├── size
│   ├── type
│   ├── uevent
│   ├── ways_of_associativity
│   └── write_policy
├── index1
│   ├── allocation_policy
│   ├── coherency_line_size
│   ├── level
│   ├── number_of_sets
│   ├── shared_cpu_list
│   ├── shared_cpu_map
│   ├── size
│   ├── type
│   ├── uevent
│   ├── ways_of_associativity
│   └── write_policy
├── index2
│   ├── allocation_policy
│   ├── coherency_line_size
│   ├── level
│   ├── number_of_sets
│   ├── shared_cpu_list
│   ├── shared_cpu_map
│   ├── size
│   ├── type
│   ├── uevent
│   ├── ways_of_associativity
│   └── write_policy
├── index3
│   ├── allocation_policy
│   ├── coherency_line_size
│   ├── level
│   ├── number_of_sets
│   ├── shared_cpu_list
│   ├── shared_cpu_map
│   ├── size
│   ├── type
│   ├── uevent
│   ├── ways_of_associativity
│   └── write_policy
└── uevent

Rohanjames1997 · 2026-02-05T17:57:46Z

Thanks @Radu2k! That's a good catch.

I can error out with 0 and use the previous hardcoded values if level of index2 does not show 2. Will push that commit now.

checking indexes and matching by level

I tried this as well, although it can get a little involved, with a few more potential file opens. If you and the other reviewers deem it necessary, I can push that fix next.

cc: @malfet

Radu2k

Approved, LGTM 👍

Rohanjames1997 · 2026-02-09T18:18:37Z

Will fix the linter soon. Anything else of note? @malfet

malfet · 2026-02-09T18:37:26Z

@Rohanjames1997 I wonder why reading from sysfs is an arm specific thing rather than a generic Linux one?
And if those entries are not available, what would it return now (0 it seems), compared to previous behavior

I know that many sysfs entries are not available from AWS lambdas for security reasons

malfet

Please:

Fix lint
Use existing APIs/abstractions
Add some links to the docs explaining why sysfs should be used, which kernel versions added support for this entry, and whether it's arch specific or generic

malfet · 2026-02-09T18:55:49Z

src/arm/linux/init.c

+
+	/* Verify the index actually corresponds to the requested cache level */
+	snprintf(path, sizeof(path), "/sys/devices/system/cpu/cpu%u/cache/index%u/level", cpu_id, cache_level);
+	FILE* file = fopen(path, "r");


Can you explain why cpuinfo_linux_parse_small_file should not be used there?

Thanks! I had no idea about that function 🙂

Rohanjames1997 · 2026-02-09T23:10:02Z

@malfet Thanks much for the review! As you can tell this is my first contrib to the repo.

It looks like x86 CPUs have a CPUID instruction that directly returns cache properties - src/x86/cache/deterministic.c:14-95. ARM has no equivalent, so it used hardcoded values before.
If sysfs is unavailable, the function returns 0 and the code uses existing hardcoded values - same behavior as before.

Does that sound reasonable?

I will fix lint, use existing APIs & update docs next.

src/arm/linux/init.c

malfet · 2026-02-10T18:44:21Z

src/arm/linux/init.c

+	uint32_t value = 0;
+	const char* p = text_start;
+	while (p < text_end && *p >= '0' && *p <= '9') {
+		value = value * 10 + (*p - '0');
+		p++;
+	}
+	if (p == text_start || value == 0) {
+		return false;
+	}


A) this code looks identical to code on lines 25-29, why not move it to a shared function
B) Isn't this function already exists in standard library and called strtoull?

Good catch, I've extracted it into a helper function and used strtoul

Hmm, looking at it more closely, I see that a similar parsing logic is present in processors.c

One probable reason could be that cpuinfo_linux_parse_small_file passes a non-null-terminated buffer (text_start to text_end), while strtoul requires a null-terminated string and would read past the buffer.

What do you think @malfet ?

Do you know if there is a document somewhere explaining how to properly query cache sizes? I.e. couldn't one use cpuid to uniquely identify whether l2 cache is shared or not?

I.e. shouldn't want be able to detect cache configuration during

cpuinfo/src/arm/linux/aarch64-isa.c

Line 103 in 84818a4

switch (midr & (CPUINFO_ARM_MIDR_IMPLEMENTER_MASK | CPUINFO_ARM_MIDR_PART_MASK)) {

or
somewhere inside chipset.c ?

fbarchard · 2026-02-24T01:15:10Z

Could you include N1 and N2 as well?

Rohanjames1997 · 2026-02-24T21:40:13Z

@fbarchard I unfortunately don't have access to N1 and N2 machines to test my changes.

zhili03 · 2026-03-11T13:17:22Z

Hi @Rohanjames1997 , nice work, thanks for fixing the bug! We have done test over Neoverse N1( AWS Graviton 2), and here are the results

Max cache size (upper bound): 4194304 bytes
L1 instruction cache: 32 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L1 data cache: 32 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L2 data cache: 32 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 1 processors

are expected. Besides, the l2c_test.cpp output is 1048576, which is consistent with the 1MB L2 cache for Neoverse N1. Will look into Neoverse N2 soon if possible.

cc: @Radu2k @malfet @fbarchard

zhili03 · 2026-03-11T18:24:48Z

Hi @Rohanjames1997 , nice work, thanks for fixing the bug! We have done test over Neoverse N1( AWS Graviton 2), and here are the results
Max cache size (upper bound): 4194304 bytes
L1 instruction cache: 32 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L1 data cache: 32 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L2 data cache: 32 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 1 processors
are expected. Besides, the l2c_test.cpp output is 1048576, which is consistent with the 1MB L2 cache for Neoverse N1. Will look into Neoverse N2 soon if possible.

cc: @Radu2k @malfet @fbarchard

Managed to perform the test over Neoverse N2 (Azure D4ps v6), and the results are below,

Max cache size (upper bound): 4194304 bytes
L1 instruction cache: 4 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L1 data cache: 4 x 64 KB, 4-way set associative (256 sets), 64 byte lines, shared by 1 processors
L2 data cache: 4 x 1 MB (inclusive), 8-way set associative (2048 sets), 64 byte lines, shared by 1 processors

Again the l2c_test.cpp output is 1048576. Therefore, the PR demonstrates correct output for Neoverse N2 as well.

malfet · 2026-03-11T18:26:10Z

[EDIT] I finally understand why accessing /sysfs is a way to go, because accessing doing something like https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/cacheinfo.c

is not possible from userland. Moreover parsing ACPI should also be done on the kernel level

Rohanjames1997 · 2026-03-12T15:24:06Z

Thanks @malfet ! 🚀

First attempt

1ccdf22

meta-cla bot added the cla signed label Feb 5, 2026

Rohanjames1997 mentioned this pull request Feb 5, 2026

[Bug][CPU Backend]: Improve L2 cache size detection and usage on aarch64 vllm-project/vllm#30553

Closed

Radu2k reviewed Feb 5, 2026

View reviewed changes

Validate cache index matches requested level before reading from sysfs

fa4f246

Rohanjames1997 requested a review from Radu2k February 5, 2026 17:59

Radu2k approved these changes Feb 5, 2026

View reviewed changes

snadampal mentioned this pull request Feb 9, 2026

cpuinfo reports incorrect L2 cache size on ARM (Neoverse V1/V2) #369

Closed

malfet requested changes Feb 9, 2026

View reviewed changes

Rohanjames1997 added 2 commits February 10, 2026 17:47

Use cpuinfo_linux_parse_small_file API and add documentation

eff336a

Update comments

1ecc655

Rohanjames1997 requested a review from malfet February 10, 2026 18:04

malfet reviewed Feb 10, 2026

View reviewed changes

src/arm/linux/init.c Show resolved Hide resolved

malfet reviewed Feb 10, 2026

View reviewed changes

Rohanjames1997 requested a review from malfet February 24, 2026 21:40

malfet approved these changes Mar 12, 2026

View reviewed changes

malfet merged commit 7607ca5 into pytorch:main Mar 12, 2026
12 checks passed

Conversation

Rohanjames1997 commented Feb 5, 2026

Uh oh!

Radu2k left a comment

Choose a reason for hiding this comment

Uh oh!

Rohanjames1997 commented Feb 5, 2026

Uh oh!

Radu2k left a comment

Choose a reason for hiding this comment

Uh oh!

Rohanjames1997 commented Feb 9, 2026

Uh oh!

malfet commented Feb 9, 2026

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

malfet Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Rohanjames1997 Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Rohanjames1997 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

malfet Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Rohanjames1997 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Rohanjames1997 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

malfet Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

fbarchard commented Feb 24, 2026

Uh oh!

Rohanjames1997 commented Feb 24, 2026

Uh oh!

zhili03 commented Mar 11, 2026

Uh oh!

zhili03 commented Mar 11, 2026

Uh oh!

malfet commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Rohanjames1997 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Rohanjames1997 commented Feb 9, 2026 •

edited

Loading

malfet commented Mar 11, 2026 •

edited

Loading