[Draft] Design Proposal: Scheduler Management, Observability & Log Explorer UI#267
Conversation
Signed-off-by: aryunewaskar77-art <aaryaanewaskar@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request introduces a detailed proposal for adding a /scheduler section to the Volcano Dashboard, encompassing configuration management, metrics observability, and log exploration. The proposal outlines a 12-week implementation plan targeting the new-ui branch using tRPC and Server-Sent Events. The review feedback highlights several technical improvements for the proposed implementation, specifically regarding the robustness of the log streaming logic (handling UTF-8 and partial lines), aligning default values for log tailing, adding necessary input validation for pod names, and ensuring RBAC rules are flexible enough for non-standard installations.
|
|
||
| export const getRecentLogsSchema = z.object({ | ||
| podName: z.string(), | ||
| tailLines: z.number().min(1).max(10000).default(500), |
There was a problem hiding this comment.
A maximum of 10,000 lines for tailLines is quite high and could put significant pressure on the Kubernetes API server. Additionally, there is a discrepancy between the default value in the schema (500) and the Route Handler (100 on line 226). Consider lowering the maximum and aligning the defaults for consistency.
| Path: apps/web/src/app/api/scheduler/logs/stream/route.ts | ||
| export async function GET(req: Request) { | ||
| const { searchParams } = new URL(req.url); | ||
| const podName = searchParams.get("pod"); |
There was a problem hiding this comment.
| passThrough.on("data", (chunk) => { | ||
| for (const line of chunk.toString("utf8").split("\n")) { | ||
| if (line) controller.enqueue(`data: ${line}\n\n`); | ||
| } | ||
| }); |
There was a problem hiding this comment.
The current SSE implementation snippet enqueues partial lines if a log line is split across chunks. It also risks corrupting multi-byte UTF-8 characters. Data should be buffered until a newline is encountered, and StringDecoder should be used for safe decoding.
| passThrough.on("data", (chunk) => { | |
| for (const line of chunk.toString("utf8").split("\n")) { | |
| if (line) controller.enqueue(`data: ${line}\n\n`); | |
| } | |
| }); | |
| const { StringDecoder } = await import("node:string_decoder"); | |
| const decoder = new StringDecoder("utf8"); | |
| let buffer = ""; | |
| passThrough.on("data", (chunk) => { | |
| buffer += decoder.write(chunk); | |
| const lines = buffer.split("\n"); | |
| buffer = lines.pop() ?? ""; | |
| for (const line of lines) { | |
| controller.enqueue(`data: ${line}\n\n`); | |
| } | |
| }); |
| Add to ClusterRole rules: | ||
| - apiGroups: [""] | ||
| resources: ["configmaps"] | ||
| resourceNames: ["volcano-scheduler-configmap"] |
There was a problem hiding this comment.
Hardcoding volcano-scheduler-configmap in the RBAC rules assumes the default installation name. If the ConfigMap name is customized, the dashboard will lose access. It is recommended to document this limitation or use a more flexible approach (e.g., allowing access to all ConfigMaps in the namespace if scoped correctly).
This draft PR contains a design proposal and implementation discussion for issue #197.
Goals:
validate design assumptions
gather maintainer feedback
discuss incremental implementation strategy
Related:
Issue #197
Exploratory work: #262
Prerequisites Task: #254
No production implementation included.