feat(provider): add External Metrics provider#1863
Conversation
85c595d to
139a34a
Compare
3757b5a to
72ad54a
Compare
72ad54a to
9925699
Compare
9925699 to
2ee47e0
Compare
eeeccfc to
86cc361
Compare
aryan9600
left a comment
There was a problem hiding this comment.
thank you for opening this PR!
eb8b59f to
2e0a69c
Compare
|
Note : if that's okay, we'll squash commits after a few rounds of review so we can fix the DCO |
aryan9600
left a comment
There was a problem hiding this comment.
lgtm! 🎖️
please squash this into 1-2 commits and sign them off, thanks
9695948 to
6425301
Compare
@aryan9600 Done ! |
|
ci is failing because of unformatted code - could you run |
6425301 to
9606c94
Compare
@aryan9600 Done |
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1863 +/- ##
==========================================
- Coverage 39.44% 30.19% -9.25%
==========================================
Files 287 288 +1
Lines 22706 18532 -4174
==========================================
- Hits 8956 5596 -3360
+ Misses 12777 12201 -576
+ Partials 973 735 -238 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Datadog provider is often meeting API rate limits on bigger implementations. Datadog Cluster Agent can batch metric queries and expose them through an endpoint compatible with Kubernetes External Metrics API. This implementations allows to use this endpoint and any other server implementing Kubernetes External Metrics API. Including k8s API server itself. Co-authored-by: Johan Lore <johan.lore@decathlon.com> Co-authored-by: Maxime Véroone <maxime.veroone@decathlon.com> Signed-off-by: Johan Lore <johan.lore@decathlon.com> Signed-off-by: Maxime Véroone <maxime.veroone@decathlon.com> Signed-off-by: Johan Lore <johan.lore@decathlon.com>
9606c94 to
9b37d18
Compare
Proposed addition
The current Datadog metric provider relies on their Metric API.
However, this API has pretty low rate limits, and people with a moderately sized infrastructure tend to reach these limits quite easily when scaling their usage of Flagger or datadog-based autoscaling (like KEDA).
Datadog offers a more scalable alternative by making its Cluster Agent batch requests by groups of 35 see Cluster Agent Autoscaling Metrics. It then makes these metrics available within the cluster by exposing an endpoint following Kubernetes External Metrics API.
Note
This endpoint is not documented by Datadog, as they expect people to have the agent register against the control plane as the cluster's external metrics provider and then making these metrics available through k8s API Server, removing the need to query the endpoint directly.
However, by implementing a kubernetes API, its behavior is predictable and stable enough to be used directly.
We've relied on the way KEDA implemented a similar feature during design and implementation. However, Flagger is not an autoscaling solution so we're not going to mimic the metric proxy Keda operates. We simply propose to query the external metric server directly. By doing this, we also chose to make the provider generic and compatible with any external metrics server. The downside is that we cannot abstract the way datadog names its metrics which isn't trivial.
fix: #1235
Any alternatives you've considered?
We've pondered modifying the Datadog metric provider instead of making an external metrics provider. But we felt that this had the benefit of making other external metric providers compatible and kept the code datadog-agnostic.
We could theoretically make it even more generic and use any kubernetes metric API (standard, Custom or External), but I think Flagger already offers this
Disclaimer