Add Ability to Activate per Sample Timestamps#496
Conversation
| unwind_state->sample.pid = per_process_id; | ||
| unwind_state->sample.tid = per_thread_id; | ||
| unwind_state->sample.collected_at = bpf_ktime_get_boot_ns(); | ||
| unwind_state->sample.collected_at = bpf_ktime_get_ns(); |
There was a problem hiding this comment.
Why was this helper changed? We would like to account for the time the computer has been sleeping, no?
There was a problem hiding this comment.
Because that helper wasn't available for at RHEL 8.10's kernel, whereas the one I'm using is available universally.
I was surprised at this.
And yes, sleep time isn't accounted for, but the intervals are so small that it's not (currently) an issue.
Would you prefer I add a config flag (like use_boot_ns, for instance) similar to other flags you have in lightswitch_config already so that kernels that have it could use bpf_ktime_get_boot_ns(), and older kernels like RHEL 8.10 can fall back to bpf_ktime_get_ns()?
There was a problem hiding this comment.
It wasn't that hard to auto-detect whether the capability exists, just as you've done for other things, so I did that.
There was a problem hiding this comment.
BUT... after making that change, something interesting happened - 2 tests started to get verifier failures, but the binary worked just fine.
I explain what I had to do to get rid of that in the commit message for 6d0be10.
What's different now is that the flags default to false, but the check against system info will set them to the correct detected values - apparently the binary already did this, but the tests did not. Fine for the one flag that's apparently available on every kernel we've ever used it with, but the flag I just added isn't always available.
Setting any of these flags to true by default gives the verifier the chance to check for the non-present features, which the verifier apparently doesn't like (and probably only on the older kernels like RHEL 8.10).
Convoluted!
There was a problem hiding this comment.
Beyond the fact that lightswitch itself runs without issues on this distro version, additional proof that RedHat 8.10 has support for bpf_ktime_get_boot_ns():
# cat /etc/os-release | grep 8.10
VERSION="8.10 (Ootpa)"
VERSION_ID="8.10"
PRETTY_NAME="Red Hat Enterprise Linux 8.10 (Ootpa)"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.10
REDHAT_SUPPORT_PRODUCT_VERSION="8.10"
# sudo cat /proc/kallsyms |& grep bpf_ktime_get_boot_ns
ffffffff9b461c60 T bpf_ktime_get_boot_ns
ffffffff9c237b80 R bpf_ktime_get_boot_ns_proto
As you mentioned, the tests hardcode certain settings that might not be available, that's the ultimate root cause of the issue while running the tests on older kernels, but this all seems unrelated to me to the bpf_ktime_get_boot helper. I think we should not have this feature detection since all the kernels we support should have it (RHEL 8 backported it, it seems).
There was a problem hiding this comment.
Yes, bpf_ktime_get_boot_ns has been there all the time, it was task_pt_regs_helper that was missing all along on RHEL 8.10, and I mentally switched them.
So bpf_ktime_get_boot_ns doesn't need a detector at present (personally, I'd have it anyway, just so everything is consistent), so I'll remove it.
The genesis of this issue seems to have been that these tests have been failing for quite some time on RHEL 8.10, but since we don't explicitly run the test suite as part of generating the package, this was simply never noticed.
The binary auto-detects whether the helper is available, so the problem never shows up there - I just assumed that if the tests died with a verifier error, the binary surely would, and never tried it.
| long, | ||
| help = "Attach a per-sample collected_at timestamp (ms) label to pprof output" | ||
| )] | ||
| pub(crate) enable_ts_per_sample: bool, |
There was a problem hiding this comment.
We should maybe emit a warning if the profile format is not pprof and this is enabled since other formats support high resultion timestamps already (see the firefox profiler collector)
There was a problem hiding this comment.
Fine - do you not expect to support those timestamp types as well, in possible future PRs?
There was a problem hiding this comment.
So the warning would be (currently only for the firefox profiler collector format, and any others you may add in future): "--enable-ts-per-sample is not necessary for output format X - such timestamps are provided by default"?
Or what would you prefer?
There was a problem hiding this comment.
Let me think about this, need to make sure we don't under/over design this feature
| ); | ||
| } | ||
|
|
||
| fn hash_of(sample: &RawSample) -> u64 { |
There was a problem hiding this comment.
Should this impement Hash/er directly?
There was a problem hiding this comment.
Suppose so. I'll see how involved (or easy) it is.
There was a problem hiding this comment.
So, would you prefer this as a trait implementation (which would only be for the tests), or to alternatively use a simpler BuildHasher variant?
I'm doing the trait first, but let me know if you want me to switch.
|
Tangentially related but I was thinking of adding support for OTel which supports "aggregated" profiles à la pprof and also timestamped samples |
Certainly doable, but one format per PR, eh? |
- add collected_at to AggregatedSample
- propagate collected_at in convert.rs and emit pprof label
- RawSample::hash — now includes (self.collected_at / 1_000_000) (ms bucket),
so identical stacks at different milliseconds become separate aggregated
entries
- AggregatedSample — gains pub collected_at: u64 field (milliseconds
since Unix epoch)
- RawAggregatedSample::process() — sets collected_at:
self.sample.collected_at / 1_000_000 (ns → ms)
- new hash/bucketing tests
- tests for same stack in different ms buckets stays separate; same stack
within one ms bucket merges
- symbolize_profile — propagates collected_at through symbolization
- to_pprof — emits a collected_at numeric pprof label
(value = ms, unit = milliseconds) on every sample
- test: collected_at label round-trips through protobuf encode/decode
- Replace with what is available
…r to true - The actual binary doesn't fail the way the tests do, because it actually does detection from system info query and resets the flags as needed, and the branch is pruned by the verifier - The reason the tests fail though is that use_task_pt_regs_helper defaults to true, which means the verifier ALWAYS sees a live branch calling bpf_task_pt_regs and bpf_get_current_task_btf. If either of the helpers we need to use (including the one added in this PR) isn't available on RHEL 8.10, the verifier rejects the program with EINVAL.
| let unit_str = &profile.string_table[lbl.num_unit as usize]; | ||
| assert_eq!(unit_str, "milliseconds"); | ||
|
|
||
| // Verify after encode/decode round-trip |
There was a problem hiding this comment.
This test is making sure that the (de)serialization works, we don't need to check for this, we can assume it works and that upstream is well tested
There was a problem hiding this comment.
Did you mean:
- // Verify in memory
- // Verify after encode/decode round-trip
Or both? Or basically the entire test?
| unwind_state->sample.collected_at = bpf_ktime_get_boot_ns(); | ||
| unwind_state->sample.collected_at = lightswitch_config.use_ktime_get_boot_ns | ||
| ? bpf_ktime_get_boot_ns() | ||
| : bpf_ktime_get_ns(); |
There was a problem hiding this comment.
Let's back out this change. This helper is always present
- It actually is always available in kernels of interest to us
Currently, lightswitch applies one timestamp per session (every 5 seconds, by default) across all samples collected during the session window. In the case of creating a pprof per session and shipping the result to lightswitch-backend, the timestamp is applied upon actual ingestion time, not collection time.
This PR has the following rationale:
--enable-ts-per-samplecollected_at. This will allow the backend to continue operating as-is until support is added for the new label at that levelA separate POC has already been performed, adding support for these timestamps in lightswitch-backend and the UI, just to confirm this approach is functional, and achieves the stated goals.