-
Notifications
You must be signed in to change notification settings - Fork 4
Add pre/post lease hooks #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Update pyproject.toml files to adjust raw-options root paths from '../..' to '../../..' to account for monorepo subdirectory.
Update multiversion.sh to use correct paths with python/ prefix in worktree structure.
- Update Python container files for monorepo build paths - Copy Kind cluster config with dex nodeport pre-configured - Configure controller and e2e values with certificate placeholder - Patch deploy_with_helm.sh to support EXTRA_VALUES for Helm overlay pattern
- Add unified GitHub Actions workflows with path filters - Configure dependabot for all package ecosystems - Remove old .github directories from subdirectories - Install e2e test scripts (setup-e2e.sh, run-e2e.sh, tests.bats)
📝 WalkthroughWalkthroughThis PR introduces comprehensive exporter lifecycle management with status tracking, lifecycle hooks, and enhanced logging. It adds protobuf definitions for status streaming, implements hook execution for before/after-lease operations, and refactors session handling to support lease-scoped operations with real-time status updates and log source routing. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant AsyncDriverClient
participant ExporterService
participant ControllerService
participant Exporter
participant HookExecutor
participant LeaseContext
Client->>AsyncDriverClient: call_async(method, args)
AsyncDriverClient->>AsyncDriverClient: check_exporter_status()
alt Status check fails
AsyncDriverClient-->>Client: ExporterNotReady
else Status OK
AsyncDriverClient->>ExporterService: DriverCall RPC
ExporterService-->>AsyncDriverClient: result
AsyncDriverClient-->>Client: result
end
Client->>AsyncDriverClient: wait_for_lease_ready_streaming(timeout)
AsyncDriverClient->>ControllerService: StreamStatus RPC
loop Status updates
ControllerService-->>AsyncDriverClient: StreamStatusResponse
alt Status == LEASE_READY
AsyncDriverClient-->>Client: ready
else Status == BEFORE_LEASE_HOOK
AsyncDriverClient->>AsyncDriverClient: continue waiting
else Timeout
AsyncDriverClient-->>Client: timeout error
end
end
Exporter->>ControllerService: RegisterRequest
ControllerService-->>Exporter: UUID registered
Exporter->>Exporter: serve()
Exporter->>ControllerService: StreamStatus (listen for lease events)
loop Lease lifecycle
ControllerService-->>Exporter: StatusResponse (LEASE_READY)
Exporter->>LeaseContext: initialize lease scope
Exporter->>HookExecutor: run_before_lease_hook()
HookExecutor->>HookExecutor: execute hook subprocess
HookExecutor-->>Exporter: hook complete
Exporter->>ControllerService: ReportStatus(LEASE_READY)
Exporter->>Exporter: handle_lease()
Exporter->>ControllerService: StreamStatus (listen for clients)
Exporter->>Exporter: process client connections
ControllerService-->>Exporter: EndSessionRequest
Exporter->>HookExecutor: run_after_lease_hook()
HookExecutor-->>Exporter: hook complete
Exporter->>ControllerService: RequestLeaseRelease()
end
Exporter->>ControllerService: UnregisterRequest
sequenceDiagram
actor User
participant Shell
participant SessionManager
participant HookExecutor
participant Driver
participant LeaseContext
User->>Shell: shell command with lease
Shell->>Shell: _run_shell_with_lease_async()
Shell->>SessionManager: create session context
SessionManager->>LeaseContext: initialize lease scope
Shell->>HookExecutor: run_before_lease_hook()
HookExecutor->>Driver: instantiate device
HookExecutor->>HookExecutor: execute hook subprocess (pty)
HookExecutor-->>Shell: stream logs in real-time
HookExecutor-->>Shell: before-lease complete
Shell->>Shell: wait_for_lease_ready()
Shell->>Shell: run shell command
User->>Shell: interact with shell
Shell->>Shell: log_stream_async() with source routing
Shell->>HookExecutor: run_after_lease_hook()
HookExecutor->>HookExecutor: execute hook subprocess
HookExecutor-->>Shell: stream logs in real-time
HookExecutor-->>Shell: after-lease complete
Shell->>SessionManager: end session
SessionManager->>LeaseContext: mark lease ended
Shell-->>User: exit shell
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/packages/jumpstarter/jumpstarter/client/lease.py (1)
249-279: Off-by-one in retry logic results in 4 retries instead of 5.The check
if retries_left > 1triggers error whenretries_left == 1, meaning only 4 retry attempts occur instead of the documented 5. The decrement happens after the check, so: 5→4→3→2→1 (error).🔧 Proposed fix
- retries_left = 5 + max_retries = 5 + retries_left = max_retries logger.info("Waiting for ready connection at %s", path) while True: try: with ExitStack() as stack: async with client_from_path(path, self.portal, stack, allow=self.allow, unsafe=self.unsafe) as _: # Connection established break except AioRpcError as e: - if retries_left > 1: - retries_left -= 1 - else: + retries_left -= 1 + if retries_left <= 0: logger.error("Max retries reached while waiting for ready connection at %s", path) raise ConnectionError("Max retries reached while waiting for ready connection at %s" % path) from e if e.code().name == "UNAVAILABLE": logger.warning("Still waiting for connection to be ready at %s", path) else: logger.warning("Waiting for ready connection to %s: %s", path, e) await sleep(5)
🤖 Fix all issues with AI agents
In `@python/packages/jumpstarter-cli/jumpstarter_cli/get.py`:
- Around line 33-38: The reconstructed Exporter created when include_leases is
true is omitting the exporter.status field, so when creating exporter_with_lease
you must copy the original exporter's status into the new Exporter instance;
update the Exporter(...) call (the reconstruction code in client code that
builds exporter_with_lease) to pass status=exporter.status alongside namespace,
name, labels, online, and lease so the status column is preserved.
In `@python/packages/jumpstarter/jumpstarter/client/core.py`:
- Around line 125-174: The method wait_for_lease_ready_streaming currently logs
a warning when anyio.move_on_after times out and then returns silently; change
it to explicitly signal failure by raising a DriverError (e.g., raise
DriverError("Timeout waiting for exporter to become ready")) after the with
anyio.move_on_after(timeout): block instead of just logging, so callers of
wait_for_lease_ready_streaming (and any code expecting exceptions from
wait_for_lease_ready) will observe the timeout; keep the existing fallback for
AioRpcError.UNIMPLEMENTED to call wait_for_lease_ready(timeout).
In `@python/packages/jumpstarter/jumpstarter/client/grpc.py`:
- Line 90: The current parsing drops valid enum value 0 because of a truthy
check; remove the `and data.status` truthy guard and always map/assign
`data.status` to the `status: ExporterStatus | None` variable (e.g., `status =
ExporterStatus(data.status)` or equivalent mapping) instead of returning None
when it's 0; update the same pattern in the other occurrences referenced (around
the other checks at lines ~96-99) so the required proto enum value is preserved
even when it's 0 (EXPORTER_STATUS_UNSPECIFIED).
In `@python/packages/jumpstarter/jumpstarter/common/enums.py`:
- Around line 26-27: Fix the typo in the docstring for the enum constant
AFTER_LEASE_HOOK: change "releaseed" to "released" so the docstring reads "Lease
was released, but exporter is executing after lease hook" (update the string
literal associated with AFTER_LEASE_HOOK in the enums module).
In `@python/packages/jumpstarter/jumpstarter/exporter/hooks.py`:
- Around line 448-467: The before-lease hook currently never signals the
controller to release a lease on on_failure='endLease'; update
run_before_lease_hook to accept an optional request_lease_release callable
(mirror the signature used by run_after_lease_hook), wire that parameter through
the call site in exporter.py by passing self._request_lease_release, and inside
the HookExecutionError except branch for the endLease path invoke
request_lease_release() (if provided) before reporting status so the controller
actually releases the lease; ensure the new parameter is optional and
non-breaking if not supplied.
In `@python/packages/jumpstarter/jumpstarter/exporter/logging.py`:
- Around line 31-39: The get_source_for_record implementation may return a
shorter, generic mapped_logger match before a more specific one; update it to
pick the longest matching logger prefix: inside the existing with self._lock
block, iterate over self._child_handlers items and collect mapped_logger keys
where logger_name.startswith(mapped_logger), then select the mapped_logger with
the greatest length (max by len) and return its associated source; if no match,
return self.source. Keep references to get_source_for_record,
self._child_handlers, logger_name and self.source to locate and update the
logic.
- Around line 59-66: context_log_source currently unconditionally removes the
child handler on exit, losing any prior mapping; modify context_log_source to
capture the existing handler/mapping for logger_name before calling
add_child_handler (e.g., via whatever accessor or self._child_handlers lookup
you have), then in the finally block restore that previous mapping: if a
previous handler existed re-add it (using add_child_handler or direct restore),
otherwise remove the handler (remove_child_handler) as currently done; reference
the context_log_source, add_child_handler and remove_child_handler symbols when
making the change.
In `@python/packages/jumpstarter/jumpstarter/exporter/session.py`:
- Around line 46-49: The StreamStatus logic can miss updates because
_status_update_event is replaced after setting it; add a monotonic integer
version counter (e.g., _status_version) incremented whenever
_current_status/_status_message change and when creating/replacing
_status_update_event, then when yielding status in StreamStatus (or the
generator around lines ~197-257) capture the current version before awaiting the
event and only wait if the version hasn't advanced; after the wait re-check the
version in a loop to avoid missed updates. Update all places that set
_status_update_event/_current_status to increment _status_version so the
generator can reliably detect missed updates without blocking indefinitely.
🧹 Nitpick comments (11)
python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)
142-145: Redundant timeout handling.The
move_on_after(300)andtimeout=300.0parameter both specify a 5-minute timeout. Ifwait_for_hook_status_streaminghandles its own timeout internally, the outermove_on_afterscope may never trigger. Consider using only one timeout mechanism for clarity.♻️ Suggested simplification
- with anyio.move_on_after(300) as timeout_scope: # 5 minute timeout - completed = await client.wait_for_hook_status_streaming( - ExporterStatus.AVAILABLE, timeout=300.0 - ) - if completed: - logger.info("afterLease hook completed") - else: - logger.debug("Hook completion not confirmed") - if timeout_scope.cancelled_caught: - logger.warning("Timeout waiting for afterLease hook to complete") + completed = await client.wait_for_hook_status_streaming( + ExporterStatus.AVAILABLE, timeout=300.0 + ) + if completed: + logger.info("afterLease hook completed") + elif completed is None: + logger.warning("Timeout waiting for afterLease hook to complete") + else: + logger.debug("Hook completion not confirmed")python/packages/jumpstarter/jumpstarter/exporter/lease_context.py (2)
54-57: Consider using explicit validation instead ofassert.Using
assertfor runtime validation can be problematic since assertions are disabled when Python runs with optimization flags (-Oor-OO). For dataclass invariants that must always hold, consider raisingValueErrordirectly.♻️ Suggested fix
def __post_init__(self): """Validate that required resources are present.""" - assert self.before_lease_hook is not None, "LeaseScope requires a before_lease_hook event" - assert self.lease_name, "LeaseScope requires a non-empty lease_name" + if self.before_lease_hook is None: + raise ValueError("LeaseContext requires a before_lease_hook event") + if not self.lease_name: + raise ValueError("LeaseContext requires a non-empty lease_name")
67-69: Methodis_active()will always returnTrueafter construction.Since
__post_init__validates thatlease_nameis non-empty,is_active()will always returnTruefor any validLeaseContextinstance. If this is intentional (e.g., for semantic clarity or future-proofing where lease_name might become mutable), consider documenting this behavior. Otherwise, the method may be misleading.python/packages/jumpstarter/jumpstarter/config/exporter_test.py (1)
140-141: Consider adding timeout assertion for completeness.The test verifies script content but doesn't assert that the
timeout: 600value is correctly parsed and preserved during round-trip.💡 Suggested addition
assert config.hooks.before_lease.script == 'echo "Pre-lease hook for $LEASE_NAME"\nj power on\n' assert config.hooks.after_lease.script == 'echo "Post-lease hook for $LEASE_NAME"\nj power off\n' + assert config.hooks.before_lease.timeout == 600 + assert config.hooks.after_lease.timeout == 600python/packages/jumpstarter-kubernetes/jumpstarter_kubernetes/exporters.py (1)
82-106: Consider simplifying the status fallback logic.The current approach works but is slightly verbose. The status variable is set conditionally on line 82, then
or "Unknown"is applied again on lines 92 and 102.♻️ Optional simplification
def rich_add_rows(self, table, devices: bool = False): - status = self.status.exporter_status if self.status else "Unknown" + status = (self.status.exporter_status if self.status else None) or "Unknown" if devices: if self.status is not None: for d in self.status.devices: labels = [] if d.labels is not None: for label in d.labels: labels.append(f"{label}:{str(d.labels[label])}") table.add_row( self.metadata.name, - status or "Unknown", + status, self.status.endpoint, time_since(self.metadata.creation_timestamp), ",".join(labels), d.uuid, ) else: table.add_row( self.metadata.name, - status or "Unknown", + status, self.status.endpoint, str(len(self.status.devices) if self.status and self.status.devices else 0), time_since(self.metadata.creation_timestamp), )python/packages/jumpstarter/jumpstarter/common/types.py (1)
16-18: Remove or complete the emptyTYPE_CHECKINGblock.The block contains only
passbut the comment suggests async stub type aliases should be defined here. Either remove this dead code or add the intended type definitions.♻️ Proposed fix
If no additional type definitions are needed, remove the block:
-# Async stub type aliases are only available for type checking (defined in .pyi files) -if TYPE_CHECKING: - pass -python/packages/jumpstarter/jumpstarter/exporter/hooks.py (1)
309-323: Avoid blockingwait()in the async timeout path.After
process.kill(), the directprocess.wait()can block the event loop if the process stalls. Use the same thread offload pattern as earlier waits.🛠️ Suggested fix
- if process.poll() is None: - process.kill() - try: - process.wait() - except Exception: - pass + if process.poll() is None: + process.kill() + try: + await anyio.to_thread.run_sync(process.wait) + except Exception: + passpython/packages/jumpstarter/jumpstarter/client/core.py (2)
190-206: Elapsed time calculation doesn't account for RPC duration.The elapsed time increments by
poll_intervalafter each sleep, but the time spent inget_status_async()is not accounted for. With long network latencies, the actual wait could exceedtimeoutsignificantly. This is a minor concern given the 5-minute default timeout.♻️ More accurate timing using monotonic clock
+ import time + poll_interval = 0.5 # seconds - elapsed = 0.0 + start_time = time.monotonic() poll_count = 0 self.logger.debug("Waiting for exporter to be ready...") - while elapsed < timeout: + while (time.monotonic() - start_time) < timeout: poll_count += 1 - self.logger.debug("[POLL %d] Calling GetStatus (elapsed: %.1fs)...", poll_count, elapsed) + elapsed = time.monotonic() - start_time + self.logger.debug("[POLL %d] Calling GetStatus (elapsed: %.1fs)...", poll_count, elapsed) # ... rest of loop ... - self.logger.debug("[POLL %d] Sleeping for %.1fs before next poll...", poll_count, poll_interval) await anyio.sleep(poll_interval) - elapsed += poll_interval
368-395: Pre-call status check adds latency to every driver call.The
check_exporter_status()call before eachDriverCalladds aGetStatusRPC round-trip to every driver operation. For high-frequency driver calls, this could be significant overhead. Consider caching the status with a short TTL or only checking when the status might have changed.However, if driver calls are typically infrequent or correctness is paramount, this conservative approach is reasonable.
python/packages/jumpstarter/jumpstarter/exporter/exporter.py (2)
315-317: Consider using lazy string formatting in logger calls.Using f-strings in logger calls means the string formatting occurs even when the log level would filter out the message. Use
%-style formatting for deferred evaluation.♻️ Use lazy logging format
- logger.info(f"Updated status to {status}: {message}") + logger.info("Updated status to %s: %s", status, message) except Exception as e: - logger.error(f"Failed to update status: {e}") + logger.error("Failed to update status: %s", e)
647-670: Duplicate lease-end handling logic.The lease-end handling (lines 654-670) is nearly identical to the unleased-after-leased handling (lines 713-727). Both paths signal
lease_ended, wait forafter_lease_hook_done, and clear the lease context. Consider extracting a helper method.♻️ Extract lease-end handling to a helper method
async def _handle_lease_end(self) -> None: """Signal lease end and wait for afterLease hook to complete.""" if not self._lease_context: return logger.info("Lease ended, signaling handle_lease to run afterLease hook") self._lease_context.lease_ended.set() # Wait for the hook to complete with CancelScope(shield=True): await self._lease_context.after_lease_hook_done.wait() logger.info("afterLease hook completed") # Clear lease scope for next lease self._lease_context = NoneThen use it in both places:
if lease_changed: await self._handle_lease_end() self.stop() break
| include_leases = "leases" in with_options | ||
| include_online = "online" in with_options | ||
| exporters = config.list_exporters(filter=selector, include_leases=include_leases, include_online=include_online) | ||
| include_status = "status" in with_options | ||
| exporters = config.list_exporters( | ||
| filter=selector, include_leases=include_leases, include_online=include_online, include_status=include_status | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, locate the client.py file and examine the list_exporters function
find . -path "*jumpstarter/config/client.py" -type fRepository: jumpstarter-dev/monorepo
Length of output: 126
🏁 Script executed:
#!/bin/bash
# Examine the list_exporters function implementation
cat -n "$(find . -path '*jumpstarter/config/client.py' -type f | head -1)"Repository: jumpstarter-dev/monorepo
Length of output: 17683
🏁 Script executed:
#!/bin/bash
# Also check the get.py file to see the context
find . -path "*jumpstarter-cli/jumpstarter_cli/get.py" -type fRepository: jumpstarter-dev/monorepo
Length of output: 124
🏁 Script executed:
#!/bin/bash
# Find the Exporter class definition
find . -path "*/jumpstarter/client/grpc*" -type f | head -20Repository: jumpstarter-dev/monorepo
Length of output: 186
🏁 Script executed:
#!/bin/bash
# Also check the get.py file to see how include_status is used for display
cat -n "$(find . -path '*jumpstarter-cli/jumpstarter_cli/get.py' -type f | head -1)"Repository: jumpstarter-dev/monorepo
Length of output: 2242
🏁 Script executed:
#!/bin/bash
# Examine the Exporter class definition in grpc.py
cat -n "$(find . -path '*jumpstarter/client/grpc.py' -type f | head -1)" | head -200Repository: jumpstarter-dev/monorepo
Length of output: 8794
🏁 Script executed:
#!/bin/bash
# Search for how include_status is used in model_print or response objects
rg -n "include_status|include_leases|include_online" --type py -B2 -A2 python/packages/jumpstarter/Repository: jumpstarter-dev/monorepo
Length of output: 5483
Status field is lost when reconstructing exporters with leases.
When include_leases=True, the code in python/packages/jumpstarter/jumpstarter/config/client.py (lines 187–193) reconstructs Exporter objects but omits the status field. The original exporter objects contain status data (extracted from protobuf), but the new Exporter(...) call only passes namespace, name, labels, online, and lease, leaving status as None. When users request --with leases,status, the response's include_status flag is set correctly, but individual exporter objects no longer have status data, so the status column displays "UNKNOWN" instead of actual status values.
Copy the status field when reconstructing the exporter with lease information:
exporter_with_lease = Exporter(
namespace=exporter.namespace,
name=exporter.name,
labels=exporter.labels,
online=exporter.online,
status=exporter.status, # Add this line
lease=lease,
)🤖 Prompt for AI Agents
In `@python/packages/jumpstarter-cli/jumpstarter_cli/get.py` around lines 33 - 38,
The reconstructed Exporter created when include_leases is true is omitting the
exporter.status field, so when creating exporter_with_lease you must copy the
original exporter's status into the new Exporter instance; update the
Exporter(...) call (the reconstruction code in client code that builds
exporter_with_lease) to pass status=exporter.status alongside namespace, name,
labels, online, and lease so the status column is preserved.
| async def wait_for_lease_ready_streaming(self, timeout: float = 300.0) -> None: | ||
| """Wait for exporter to report LEASE_READY status using streaming. | ||
| Uses StreamStatus RPC for real-time status updates instead of polling. | ||
| This is more efficient and provides immediate notification of status changes. | ||
| Args: | ||
| timeout: Maximum time to wait in seconds (default: 5 minutes) | ||
| """ | ||
| import anyio | ||
|
|
||
| self.logger.debug("Waiting for exporter to be ready (streaming)...") | ||
| seen_before_lease_hook = False | ||
|
|
||
| try: | ||
| with anyio.move_on_after(timeout): | ||
| async for response in self.stub.StreamStatus(jumpstarter_pb2.StreamStatusRequest()): | ||
| status = ExporterStatus.from_proto(response.status) | ||
| self.logger.debug("StreamStatus received: %s", status) | ||
|
|
||
| if status == ExporterStatus.LEASE_READY: | ||
| self.logger.info("Exporter ready, starting shell...") | ||
| return | ||
| elif status == ExporterStatus.BEFORE_LEASE_HOOK_FAILED: | ||
| self.logger.warning("beforeLease hook failed") | ||
| return | ||
| elif status == ExporterStatus.AFTER_LEASE_HOOK: | ||
| # Lease ended before becoming ready | ||
| raise DriverError("Lease ended before becoming ready") | ||
| elif status == ExporterStatus.BEFORE_LEASE_HOOK: | ||
| seen_before_lease_hook = True | ||
| self.logger.debug("beforeLease hook is running...") | ||
| elif status == ExporterStatus.AVAILABLE: | ||
| if seen_before_lease_hook: | ||
| # Lease ended - AVAILABLE after BEFORE_LEASE_HOOK indicates lease released | ||
| raise DriverError("Lease ended before becoming ready") | ||
| else: | ||
| # Initial AVAILABLE state - waiting for lease assignment | ||
| self.logger.debug("Exporter status: AVAILABLE (waiting for lease assignment)") | ||
| else: | ||
| self.logger.debug("Exporter status: %s (waiting...)", status) | ||
|
|
||
| self.logger.warning("Timeout waiting for beforeLease hook to complete") | ||
| except AioRpcError as e: | ||
| if e.code() == StatusCode.UNIMPLEMENTED: | ||
| # StreamStatus not implemented, fall back to polling | ||
| self.logger.debug("StreamStatus not implemented, falling back to polling") | ||
| await self.wait_for_lease_ready(timeout) | ||
| else: | ||
| raise DriverError(f"Error streaming status: {e.details()}") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential silent failure after timeout.
When the timeout expires via move_on_after, the method logs a warning but then implicitly returns None. This may leave the caller unaware that the wait didn't succeed. Consider either raising an exception or returning a boolean to indicate success/failure.
🔧 Suggested improvement to handle timeout explicitly
- try:
- with anyio.move_on_after(timeout):
+ try:
+ with anyio.move_on_after(timeout) as cancel_scope:
async for response in self.stub.StreamStatus(jumpstarter_pb2.StreamStatusRequest()):
status = ExporterStatus.from_proto(response.status)
self.logger.debug("StreamStatus received: %s", status)
if status == ExporterStatus.LEASE_READY:
self.logger.info("Exporter ready, starting shell...")
return
elif status == ExporterStatus.BEFORE_LEASE_HOOK_FAILED:
self.logger.warning("beforeLease hook failed")
return
elif status == ExporterStatus.AFTER_LEASE_HOOK:
# Lease ended before becoming ready
raise DriverError("Lease ended before becoming ready")
elif status == ExporterStatus.BEFORE_LEASE_HOOK:
seen_before_lease_hook = True
self.logger.debug("beforeLease hook is running...")
elif status == ExporterStatus.AVAILABLE:
if seen_before_lease_hook:
# Lease ended - AVAILABLE after BEFORE_LEASE_HOOK indicates lease released
raise DriverError("Lease ended before becoming ready")
else:
# Initial AVAILABLE state - waiting for lease assignment
self.logger.debug("Exporter status: AVAILABLE (waiting for lease assignment)")
else:
self.logger.debug("Exporter status: %s (waiting...)", status)
- self.logger.warning("Timeout waiting for beforeLease hook to complete")
+ if cancel_scope.cancelled_caught:
+ self.logger.warning("Timeout waiting for beforeLease hook to complete")
except AioRpcError as e:🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/client/core.py` around lines 125 -
174, The method wait_for_lease_ready_streaming currently logs a warning when
anyio.move_on_after times out and then returns silently; change it to explicitly
signal failure by raising a DriverError (e.g., raise DriverError("Timeout
waiting for exporter to become ready")) after the with
anyio.move_on_after(timeout): block instead of just logging, so callers of
wait_for_lease_ready_streaming (and any code expecting exceptions from
wait_for_lease_ready) will observe the timeout; keep the existing fallback for
AioRpcError.UNIMPLEMENTED to call wait_for_lease_ready(timeout).
| name: str | ||
| labels: dict[str, str] | ||
| online: bool = False | ||
| status: ExporterStatus | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find proto files
find . -type f -name "*.proto" | head -20Repository: jumpstarter-dev/monorepo
Length of output: 308
🏁 Script executed:
#!/bin/bash
# Search for Exporter message and ExporterStatus enum in proto files
rg -l "message Exporter|enum ExporterStatus" --type-list
fd -e protoRepository: jumpstarter-dev/monorepo
Length of output: 705
🏁 Script executed:
#!/bin/bash
# Search more comprehensively for proto files
fd -e proto -oRepository: jumpstarter-dev/monorepo
Length of output: 184
🏁 Script executed:
#!/bin/bash
# Search for Exporter message and ExporterStatus enum
rg "message Exporter|enum ExporterStatus" protocol/proto/ -C2Repository: jumpstarter-dev/monorepo
Length of output: 833
🏁 Script executed:
#!/bin/bash
# Read the grpc.py file around lines 90-100
cat -n python/packages/jumpstarter/jumpstarter/client/grpc.py | sed -n '80,110p'Repository: jumpstarter-dev/monorepo
Length of output: 1379
🏁 Script executed:
#!/bin/bash
# Read the full Exporter message definition
rg "message Exporter" protocol/proto/jumpstarter/client/v1/client.proto -A20Repository: jumpstarter-dev/monorepo
Length of output: 795
🏁 Script executed:
#!/bin/bash
# Read the full ExporterStatus enum definition
rg "enum ExporterStatus" protocol/proto/jumpstarter/v1/common.proto -A10Repository: jumpstarter-dev/monorepo
Length of output: 783
Remove the truthy check on status enum—value 0 (UNSPECIFIED) is valid.
The status field in the proto is not optional and defaults to 0 (EXPORTER_STATUS_UNSPECIFIED), a valid enum value. The current truthy check if hasattr(data, "status") and data.status: drops this value, yielding None instead. Since the field is required, no presence check is needed.
Proposed fix
- status = None
- if hasattr(data, "status") and data.status:
- status = ExporterStatus.from_proto(data.status)
+ status = ExporterStatus.from_proto(data.status)Also applies to: 96-99
🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/client/grpc.py` at line 90, The
current parsing drops valid enum value 0 because of a truthy check; remove the
`and data.status` truthy guard and always map/assign `data.status` to the
`status: ExporterStatus | None` variable (e.g., `status =
ExporterStatus(data.status)` or equivalent mapping) instead of returning None
when it's 0; update the same pattern in the other occurrences referenced (around
the other checks at lines ~96-99) so the required proto enum value is preserved
even when it's 0 (EXPORTER_STATUS_UNSPECIFIED).
| AFTER_LEASE_HOOK = common_pb2.EXPORTER_STATUS_AFTER_LEASE_HOOK | ||
| """Lease was releaseed, but exporter is executing after lease hook""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in docstring.
"releaseed" should be "released".
📝 Fix typo
AFTER_LEASE_HOOK = common_pb2.EXPORTER_STATUS_AFTER_LEASE_HOOK
- """Lease was releaseed, but exporter is executing after lease hook"""
+ """Lease was released, but exporter is executing after lease hook"""📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| AFTER_LEASE_HOOK = common_pb2.EXPORTER_STATUS_AFTER_LEASE_HOOK | |
| """Lease was releaseed, but exporter is executing after lease hook""" | |
| AFTER_LEASE_HOOK = common_pb2.EXPORTER_STATUS_AFTER_LEASE_HOOK | |
| """Lease was released, but exporter is executing after lease hook""" |
🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/common/enums.py` around lines 26 -
27, Fix the typo in the docstring for the enum constant AFTER_LEASE_HOOK: change
"releaseed" to "released" so the docstring reads "Lease was released, but
exporter is executing after lease hook" (update the string literal associated
with AFTER_LEASE_HOOK in the enums module).
| except HookExecutionError as e: | ||
| if e.should_shutdown_exporter(): | ||
| # on_failure='exit' - shut down the entire exporter | ||
| logger.error("beforeLease hook failed with on_failure='exit': %s", e) | ||
| await report_status( | ||
| ExporterStatus.BEFORE_LEASE_HOOK_FAILED, | ||
| f"beforeLease hook failed (on_failure=exit, shutting down): {e}", | ||
| ) | ||
| logger.error("Shutting down exporter due to beforeLease hook failure with on_failure='exit'") | ||
| # Exit code 1 tells the CLI not to restart the exporter | ||
| shutdown(exit_code=1) | ||
| else: | ||
| # on_failure='endLease' - just block this lease, exporter stays available | ||
| logger.error("beforeLease hook failed with on_failure='endLease': %s", e) | ||
| await report_status( | ||
| ExporterStatus.BEFORE_LEASE_HOOK_FAILED, | ||
| f"beforeLease hook failed (on_failure=endLease): {e}", | ||
| ) | ||
| # TODO: We need to implement a controller-side mechanism to end the lease here | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for lease-release handling tied to before-lease hook failures
rg -n -C3 'BEFORE_LEASE_HOOK_FAILED|endLease|request_lease_release' -g '*.py'Repository: jumpstarter-dev/monorepo
Length of output: 23246
🏁 Script executed:
#!/bin/bash
# Find where execute_before_lease_hook is called
rg -n 'execute_before_lease_hook' -g '*.py' -B 2 -A 5Repository: jumpstarter-dev/monorepo
Length of output: 6953
🏁 Script executed:
#!/bin/bash
# Search for where execute_before_lease_hook is called in exporter.py
rg -n 'execute_before_lease_hook' python/packages/jumpstarter/jumpstarter/exporter/exporter.py -B 5 -A 10Repository: jumpstarter-dev/monorepo
Length of output: 50
🏁 Script executed:
#!/bin/bash
# Find all places where execute_before_lease_hook is called
rg -n 'execute_before_lease_hook' --type pyRepository: jumpstarter-dev/monorepo
Length of output: 1189
🏁 Script executed:
#!/bin/bash
# Find where HookExecutor is instantiated and used in exporter.py
rg -n 'HookExecutor|hook_executor' python/packages/jumpstarter/jumpstarter/exporter/exporter.py -B 3 -A 8Repository: jumpstarter-dev/monorepo
Length of output: 3674
🏁 Script executed:
#!/bin/bash
# Find run_before_lease_hook method definition
rg -n 'def run_before_lease_hook' python/packages/jumpstarter/jumpstarter/exporter/hooks.py -A 30Repository: jumpstarter-dev/monorepo
Length of output: 1744
🏁 Script executed:
#!/bin/bash
# Get the full run_before_lease_hook method
rg -n 'async def run_before_lease_hook' python/packages/jumpstarter/jumpstarter/exporter/hooks.py -A 100 | head -110Repository: jumpstarter-dev/monorepo
Length of output: 5286
Add lease-release mechanism for before-lease hook failures with on_failure='endLease'.
The TODO at line 466 points to a real gap: when a before-lease hook fails with on_failure='endLease', the lease is never explicitly released to the controller. Unlike the after-lease hook (which has a request_lease_release parameter passed to its orchestration method), the before-lease hook lacks any mechanism to signal lease termination. This leaves the lease stuck in an active state.
Add request_lease_release as an optional parameter to run_before_lease_hook() (matching the pattern in run_after_lease_hook()), invoke it in the on_failure='endLease' error path, and pass self._request_lease_release from the call site in exporter.py.
🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/exporter/hooks.py` around lines 448 -
467, The before-lease hook currently never signals the controller to release a
lease on on_failure='endLease'; update run_before_lease_hook to accept an
optional request_lease_release callable (mirror the signature used by
run_after_lease_hook), wire that parameter through the call site in exporter.py
by passing self._request_lease_release, and inside the HookExecutionError except
branch for the endLease path invoke request_lease_release() (if provided) before
reporting status so the controller actually releases the lease; ensure the new
parameter is optional and non-breaking if not supplied.
| def get_source_for_record(self, record): | ||
| """Determine the appropriate log source for a record.""" | ||
| with self._lock: | ||
| # Check if this record comes from a logger with a specific source mapping | ||
| logger_name = record.name | ||
| for mapped_logger, source in self._child_handlers.items(): | ||
| if logger_name.startswith(mapped_logger): | ||
| return source | ||
| return self.source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer the most specific logger prefix match.
If mappings overlap, a generic prefix can shadow a more specific one, so the intended source never applies. Selecting the longest matching prefix avoids misrouted log sources.
🛠️ Suggested fix
- for mapped_logger, source in self._child_handlers.items():
- if logger_name.startswith(mapped_logger):
- return source
- return self.source
+ matched_source = None
+ matched_len = -1
+ for mapped_logger, source in self._child_handlers.items():
+ if logger_name.startswith(mapped_logger) and len(mapped_logger) > matched_len:
+ matched_source = source
+ matched_len = len(mapped_logger)
+ return matched_source if matched_source is not None else self.source🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/exporter/logging.py` around lines 31
- 39, The get_source_for_record implementation may return a shorter, generic
mapped_logger match before a more specific one; update it to pick the longest
matching logger prefix: inside the existing with self._lock block, iterate over
self._child_handlers items and collect mapped_logger keys where
logger_name.startswith(mapped_logger), then select the mapped_logger with the
greatest length (max by len) and return its associated source; if no match,
return self.source. Keep references to get_source_for_record,
self._child_handlers, logger_name and self.source to locate and update the
logic.
| @contextmanager | ||
| def context_log_source(self, logger_name: str, source: LogSource): | ||
| """Context manager to temporarily set a log source for a specific logger.""" | ||
| self.add_child_handler(logger_name, source) | ||
| try: | ||
| yield | ||
| finally: | ||
| self.remove_child_handler(logger_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore any previous mapping on context exit.
context_log_source always removes the mapping, which drops any existing entry (or nested override) after the context ends. Preserve and restore the prior mapping instead.
🛠️ Suggested fix
`@contextmanager`
def context_log_source(self, logger_name: str, source: LogSource):
"""Context manager to temporarily set a log source for a specific logger."""
- self.add_child_handler(logger_name, source)
+ with self._lock:
+ prev = self._child_handlers.get(logger_name, None)
+ self._child_handlers[logger_name] = source
try:
yield
finally:
- self.remove_child_handler(logger_name)
+ with self._lock:
+ if prev is None:
+ self._child_handlers.pop(logger_name, None)
+ else:
+ self._child_handlers[logger_name] = prev🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/exporter/logging.py` around lines 59
- 66, context_log_source currently unconditionally removes the child handler on
exit, losing any prior mapping; modify context_log_source to capture the
existing handler/mapping for logger_name before calling add_child_handler (e.g.,
via whatever accessor or self._child_handlers lookup you have), then in the
finally block restore that previous mapping: if a previous handler existed
re-add it (using add_child_handler or direct restore), otherwise remove the
handler (remove_child_handler) as currently done; reference the
context_log_source, add_child_handler and remove_child_handler symbols when
making the change.
| _current_status: ExporterStatus = field(init=False, default=ExporterStatus.AVAILABLE) | ||
| _status_message: str = field(init=False, default="") | ||
| _status_update_event: Event = field(init=False) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StreamStatus can miss the latest status update.
Line 205-207 replaces _status_update_event after setting it. If a status change happens between the initial yield and the first wait, the stream can block forever and never emit the new status. A simple version counter prevents missed updates.
🛠️ Suggested fix
@@
- _status_update_event: Event = field(init=False)
+ _status_update_event: Event = field(init=False)
+ _status_version: int = field(init=False, default=0)
@@
- # Signal status change for StreamStatus subscribers
- self._status_update_event.set()
- # Create a new event for the next status change
- self._status_update_event = Event()
+ # Signal status change for StreamStatus subscribers
+ self._status_version += 1
+ self._status_update_event.set()
+ # Create a new event for the next status change
+ self._status_update_event = Event()
@@
- # Send current status immediately
- yield jumpstarter_pb2.StreamStatusResponse(
+ last_version = self._status_version
+ # Send current status immediately
+ yield jumpstarter_pb2.StreamStatusResponse(
status=self._current_status.to_proto(),
message=self._status_message,
)
@@
- # Wait for status change event
- current_event = self._status_update_event
- await current_event.wait()
-
- # Send the updated status
- logger.debug("StreamStatus() sending update: %s", self._current_status)
- yield jumpstarter_pb2.StreamStatusResponse(
- status=self._current_status.to_proto(),
- message=self._status_message,
- )
+ current_event = self._status_update_event
+ if self._status_version == last_version:
+ await current_event.wait()
+
+ if self._status_version != last_version:
+ last_version = self._status_version
+ logger.debug("StreamStatus() sending update: %s", self._current_status)
+ yield jumpstarter_pb2.StreamStatusResponse(
+ status=self._current_status.to_proto(),
+ message=self._status_message,
+ )Also applies to: 197-257
🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/exporter/session.py` around lines 46
- 49, The StreamStatus logic can miss updates because _status_update_event is
replaced after setting it; add a monotonic integer version counter (e.g.,
_status_version) incremented whenever _current_status/_status_message change and
when creating/replacing _status_update_event, then when yielding status in
StreamStatus (or the generator around lines ~197-257) capture the current
version before awaiting the event and only wait if the version hasn't advanced;
after the wait re-check the version in a loop to avoid missed updates. Update
all places that set _status_update_event/_current_status to increment
_status_version so the generator can reliably detect missed updates without
blocking indefinitely.
|
@kirkbrauer I am deleting this attempt, and leaving it for you |
Imported from jumpstarter-dev/jumpstarter#606
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.