Skip to content

Conversation

@mangelajo
Copy link
Member

@mangelajo mangelajo commented Jan 21, 2026

Imported from jumpstarter-dev/jumpstarter#606

Summary by CodeRabbit

  • New Features

    • Added exporter status tracking (Available, Offline, Lease Ready, etc.) with real-time streaming and display.
    • Introduced lifecycle hook support for custom scripts during lease setup and teardown.
    • Enhanced logging with source-based routing for before-lease, after-lease, driver, and system logs.
    • Added async lease handling with improved shell execution and log streaming.
  • Bug Fixes

    • Improved connection readiness detection with retry logic and timeout handling.
    • Fixed exit code propagation from exporter processes.

✏️ Tip: You can customize this high-level summary in your review settings.

mangelajo and others added 30 commits January 21, 2026 22:14
Update pyproject.toml files to adjust raw-options root paths
from '../..' to '../../..' to account for monorepo subdirectory.
Update multiversion.sh to use correct paths with python/ prefix
in worktree structure.
- Update Python container files for monorepo build paths
- Copy Kind cluster config with dex nodeport pre-configured
- Configure controller and e2e values with certificate placeholder
- Patch deploy_with_helm.sh to support EXTRA_VALUES for Helm overlay pattern
- Add unified GitHub Actions workflows with path filters
- Configure dependabot for all package ecosystems
- Remove old .github directories from subdirectories
- Install e2e test scripts (setup-e2e.sh, run-e2e.sh, tests.bats)
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 21, 2026

📝 Walkthrough

Walkthrough

This PR introduces comprehensive exporter lifecycle management with status tracking, lifecycle hooks, and enhanced logging. It adds protobuf definitions for status streaming, implements hook execution for before/after-lease operations, and refactors session handling to support lease-scoped operations with real-time status updates and log source routing.

Changes

Cohort / File(s) Summary
Protocol Buffer Definitions
python/buf.gen.yaml, python/packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/client/v1/client_pb2*, python/packages/jumpstarter-protocol/jumpstarter_protocol/jumpstarter/v1/{common,jumpstarter,kubernetes,router}_pb2*
Added mypy and mypy-grpc plugin entries to buf generator config. Generated comprehensive Python type stubs and gRPC interfaces for v1 protocol messages, including ExporterStatus enum, Lease management, status streaming (GetStatus, StreamStatus, EndSession), and Hook-related request/response types.
Status & Logging Infrastructure
python/packages/jumpstarter/jumpstarter/common/{enums.py,types.py,__init__.py}, python/packages/jumpstarter/jumpstarter/exporter/{logging.py,logging_protocol.py}
Introduced ExporterStatus and LogSource enums with protobuf mapping. Added LoggerRegistration protocol and enhanced LogHandler with per-logger source routing via child handler mappings. Added context manager for temporary source assignment and get_logger factory for registering logger sources.
Lease & Hook Lifecycle
python/packages/jumpstarter/jumpstarter/exporter/{lease_context.py,hooks.py,hooks_test.py}
Introduced LeaseContext dataclass to encapsulate lease-scoped resources, events, and metadata with status tracking. Implemented HookExecutor for before-lease and after-lease hook execution with environment setup, timeout handling, failure strategies (warn/endLease/exit), and real-time log streaming via PTY-based subprocess.
Session & Exporter Refactor
python/packages/jumpstarter/jumpstarter/exporter/{session.py,exporter.py}
Enhanced Session with status lifecycle, multi-port/hook-socket serving, and logger source registration. Completely refactored Exporter with: controller registration/communication, status reporting with streaming, lease-scoped handling, hook integration, connection bridging, and exit code support. Introduced stop() semantics accepting exit_code and wait flags.
Driver Client Status Handling
python/packages/jumpstarter/jumpstarter/client/{core.py,grpc.py,lease.py}
Added status awareness to AsyncDriverClient with get_status_async(), check_exporter_status() pre-call validation, and wait_for_lease_ready_streaming/polling. Introduced end_session_async() for session termination. Enhanced log_stream_async with LogSource-based routing and configurable system log visibility. Updated Exporter model to include status field and WithOptions.show_status flag. Added _wait_for_ready_connection() helper with retry logic for Unix socket readiness.
Configuration & Types
python/packages/jumpstarter/jumpstarter/{config/{client.py,exporter.py,exporter_test.py},common/utils.py,driver/base.py}
Added include_status parameter to list_exporters(). Introduced HookConfigV1Alpha1 and HookInstanceConfigV1Alpha1 models for hook configuration. Updated serve_unix_async to call _wait_for_ready_connection(). Enhanced Exporter construction with hook_executor, channel_factory, device_factory, tls, and grpc_options. Updated driver base logger to use centralized get_logger factory.
CLI Updates & Test Fixtures
python/packages/jumpstarter-cli/{jumpstarter_cli/{get.py,run.py,shell.py},jumpstarter_cli_common/opt.py}, python/packages/jumpstarter-cli-admin/{create_test.py,get_test.py}
Added "status" option to exporter view (get.py). Updated run.py to capture and prioritize exporter.exit_code in return logic. Refactored shell.py to add _run_shell_only helper, introduce _run_shell_with_lease_async for hook-aware execution with real-time hook output, and graceful ImportError handling for optional OIDC. Updated logging config to use RichHandler(show_path=False). Test fixtures updated to include exporterStatus/statusMessage fields and verify status in CLI outputs.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant AsyncDriverClient
    participant ExporterService
    participant ControllerService
    participant Exporter
    participant HookExecutor
    participant LeaseContext

    Client->>AsyncDriverClient: call_async(method, args)
    AsyncDriverClient->>AsyncDriverClient: check_exporter_status()
    alt Status check fails
        AsyncDriverClient-->>Client: ExporterNotReady
    else Status OK
        AsyncDriverClient->>ExporterService: DriverCall RPC
        ExporterService-->>AsyncDriverClient: result
        AsyncDriverClient-->>Client: result
    end

    Client->>AsyncDriverClient: wait_for_lease_ready_streaming(timeout)
    AsyncDriverClient->>ControllerService: StreamStatus RPC
    loop Status updates
        ControllerService-->>AsyncDriverClient: StreamStatusResponse
        alt Status == LEASE_READY
            AsyncDriverClient-->>Client: ready
        else Status == BEFORE_LEASE_HOOK
            AsyncDriverClient->>AsyncDriverClient: continue waiting
        else Timeout
            AsyncDriverClient-->>Client: timeout error
        end
    end

    Exporter->>ControllerService: RegisterRequest
    ControllerService-->>Exporter: UUID registered

    Exporter->>Exporter: serve()
    Exporter->>ControllerService: StreamStatus (listen for lease events)
    loop Lease lifecycle
        ControllerService-->>Exporter: StatusResponse (LEASE_READY)
        Exporter->>LeaseContext: initialize lease scope
        Exporter->>HookExecutor: run_before_lease_hook()
        HookExecutor->>HookExecutor: execute hook subprocess
        HookExecutor-->>Exporter: hook complete
        Exporter->>ControllerService: ReportStatus(LEASE_READY)
        Exporter->>Exporter: handle_lease()
        Exporter->>ControllerService: StreamStatus (listen for clients)
        Exporter->>Exporter: process client connections
        ControllerService-->>Exporter: EndSessionRequest
        Exporter->>HookExecutor: run_after_lease_hook()
        HookExecutor-->>Exporter: hook complete
        Exporter->>ControllerService: RequestLeaseRelease()
    end

    Exporter->>ControllerService: UnregisterRequest
Loading
sequenceDiagram
    actor User
    participant Shell
    participant SessionManager
    participant HookExecutor
    participant Driver
    participant LeaseContext

    User->>Shell: shell command with lease
    Shell->>Shell: _run_shell_with_lease_async()
    Shell->>SessionManager: create session context
    SessionManager->>LeaseContext: initialize lease scope
    
    Shell->>HookExecutor: run_before_lease_hook()
    HookExecutor->>Driver: instantiate device
    HookExecutor->>HookExecutor: execute hook subprocess (pty)
    HookExecutor-->>Shell: stream logs in real-time
    HookExecutor-->>Shell: before-lease complete
    
    Shell->>Shell: wait_for_lease_ready()
    Shell->>Shell: run shell command
    User->>Shell: interact with shell
    Shell->>Shell: log_stream_async() with source routing
    
    Shell->>HookExecutor: run_after_lease_hook()
    HookExecutor->>HookExecutor: execute hook subprocess
    HookExecutor-->>Shell: stream logs in real-time
    HookExecutor-->>Shell: after-lease complete
    
    Shell->>SessionManager: end session
    SessionManager->>LeaseContext: mark lease ended
    Shell-->>User: exit shell
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 Status flows like rivers through the lease,
Hooks dance before and after with ease,
Logs route by source in rainbow lines,
Context guards the lease design,
From protocol to hook's embrace, ✨
Jumpstarter finds its graceful place!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.96% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add pre/post lease hooks' clearly and accurately summarizes the main changes in the pull request, which center on implementing lifecycle hooks for leases.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/packages/jumpstarter/jumpstarter/client/lease.py (1)

249-279: Off-by-one in retry logic results in 4 retries instead of 5.

The check if retries_left > 1 triggers error when retries_left == 1, meaning only 4 retry attempts occur instead of the documented 5. The decrement happens after the check, so: 5→4→3→2→1 (error).

🔧 Proposed fix
-        retries_left = 5
+        max_retries = 5
+        retries_left = max_retries
         logger.info("Waiting for ready connection at %s", path)
         while True:
             try:
                 with ExitStack() as stack:
                     async with client_from_path(path, self.portal, stack, allow=self.allow, unsafe=self.unsafe) as _:
                         # Connection established
                         break
             except AioRpcError as e:
-                if retries_left > 1:
-                    retries_left -= 1
-                else:
+                retries_left -= 1
+                if retries_left <= 0:
                     logger.error("Max retries reached while waiting for ready connection at %s", path)
                     raise ConnectionError("Max retries reached while waiting for ready connection at %s" % path) from e
                 if e.code().name == "UNAVAILABLE":
                     logger.warning("Still waiting for connection to be ready at %s", path)
                 else:
                     logger.warning("Waiting for ready connection to %s: %s", path, e)
                 await sleep(5)
🤖 Fix all issues with AI agents
In `@python/packages/jumpstarter-cli/jumpstarter_cli/get.py`:
- Around line 33-38: The reconstructed Exporter created when include_leases is
true is omitting the exporter.status field, so when creating exporter_with_lease
you must copy the original exporter's status into the new Exporter instance;
update the Exporter(...) call (the reconstruction code in client code that
builds exporter_with_lease) to pass status=exporter.status alongside namespace,
name, labels, online, and lease so the status column is preserved.

In `@python/packages/jumpstarter/jumpstarter/client/core.py`:
- Around line 125-174: The method wait_for_lease_ready_streaming currently logs
a warning when anyio.move_on_after times out and then returns silently; change
it to explicitly signal failure by raising a DriverError (e.g., raise
DriverError("Timeout waiting for exporter to become ready")) after the with
anyio.move_on_after(timeout): block instead of just logging, so callers of
wait_for_lease_ready_streaming (and any code expecting exceptions from
wait_for_lease_ready) will observe the timeout; keep the existing fallback for
AioRpcError.UNIMPLEMENTED to call wait_for_lease_ready(timeout).

In `@python/packages/jumpstarter/jumpstarter/client/grpc.py`:
- Line 90: The current parsing drops valid enum value 0 because of a truthy
check; remove the `and data.status` truthy guard and always map/assign
`data.status` to the `status: ExporterStatus | None` variable (e.g., `status =
ExporterStatus(data.status)` or equivalent mapping) instead of returning None
when it's 0; update the same pattern in the other occurrences referenced (around
the other checks at lines ~96-99) so the required proto enum value is preserved
even when it's 0 (EXPORTER_STATUS_UNSPECIFIED).

In `@python/packages/jumpstarter/jumpstarter/common/enums.py`:
- Around line 26-27: Fix the typo in the docstring for the enum constant
AFTER_LEASE_HOOK: change "releaseed" to "released" so the docstring reads "Lease
was released, but exporter is executing after lease hook" (update the string
literal associated with AFTER_LEASE_HOOK in the enums module).

In `@python/packages/jumpstarter/jumpstarter/exporter/hooks.py`:
- Around line 448-467: The before-lease hook currently never signals the
controller to release a lease on on_failure='endLease'; update
run_before_lease_hook to accept an optional request_lease_release callable
(mirror the signature used by run_after_lease_hook), wire that parameter through
the call site in exporter.py by passing self._request_lease_release, and inside
the HookExecutionError except branch for the endLease path invoke
request_lease_release() (if provided) before reporting status so the controller
actually releases the lease; ensure the new parameter is optional and
non-breaking if not supplied.

In `@python/packages/jumpstarter/jumpstarter/exporter/logging.py`:
- Around line 31-39: The get_source_for_record implementation may return a
shorter, generic mapped_logger match before a more specific one; update it to
pick the longest matching logger prefix: inside the existing with self._lock
block, iterate over self._child_handlers items and collect mapped_logger keys
where logger_name.startswith(mapped_logger), then select the mapped_logger with
the greatest length (max by len) and return its associated source; if no match,
return self.source. Keep references to get_source_for_record,
self._child_handlers, logger_name and self.source to locate and update the
logic.
- Around line 59-66: context_log_source currently unconditionally removes the
child handler on exit, losing any prior mapping; modify context_log_source to
capture the existing handler/mapping for logger_name before calling
add_child_handler (e.g., via whatever accessor or self._child_handlers lookup
you have), then in the finally block restore that previous mapping: if a
previous handler existed re-add it (using add_child_handler or direct restore),
otherwise remove the handler (remove_child_handler) as currently done; reference
the context_log_source, add_child_handler and remove_child_handler symbols when
making the change.

In `@python/packages/jumpstarter/jumpstarter/exporter/session.py`:
- Around line 46-49: The StreamStatus logic can miss updates because
_status_update_event is replaced after setting it; add a monotonic integer
version counter (e.g., _status_version) incremented whenever
_current_status/_status_message change and when creating/replacing
_status_update_event, then when yielding status in StreamStatus (or the
generator around lines ~197-257) capture the current version before awaiting the
event and only wait if the version hasn't advanced; after the wait re-check the
version in a loop to avoid missed updates. Update all places that set
_status_update_event/_current_status to increment _status_version so the
generator can reliably detect missed updates without blocking indefinitely.
🧹 Nitpick comments (11)
python/packages/jumpstarter-cli/jumpstarter_cli/shell.py (1)

142-145: Redundant timeout handling.

The move_on_after(300) and timeout=300.0 parameter both specify a 5-minute timeout. If wait_for_hook_status_streaming handles its own timeout internally, the outer move_on_after scope may never trigger. Consider using only one timeout mechanism for clarity.

♻️ Suggested simplification
-                                    with anyio.move_on_after(300) as timeout_scope:  # 5 minute timeout
-                                        completed = await client.wait_for_hook_status_streaming(
-                                            ExporterStatus.AVAILABLE, timeout=300.0
-                                        )
-                                        if completed:
-                                            logger.info("afterLease hook completed")
-                                        else:
-                                            logger.debug("Hook completion not confirmed")
-                                    if timeout_scope.cancelled_caught:
-                                        logger.warning("Timeout waiting for afterLease hook to complete")
+                                    completed = await client.wait_for_hook_status_streaming(
+                                        ExporterStatus.AVAILABLE, timeout=300.0
+                                    )
+                                    if completed:
+                                        logger.info("afterLease hook completed")
+                                    elif completed is None:
+                                        logger.warning("Timeout waiting for afterLease hook to complete")
+                                    else:
+                                        logger.debug("Hook completion not confirmed")
python/packages/jumpstarter/jumpstarter/exporter/lease_context.py (2)

54-57: Consider using explicit validation instead of assert.

Using assert for runtime validation can be problematic since assertions are disabled when Python runs with optimization flags (-O or -OO). For dataclass invariants that must always hold, consider raising ValueError directly.

♻️ Suggested fix
     def __post_init__(self):
         """Validate that required resources are present."""
-        assert self.before_lease_hook is not None, "LeaseScope requires a before_lease_hook event"
-        assert self.lease_name, "LeaseScope requires a non-empty lease_name"
+        if self.before_lease_hook is None:
+            raise ValueError("LeaseContext requires a before_lease_hook event")
+        if not self.lease_name:
+            raise ValueError("LeaseContext requires a non-empty lease_name")

67-69: Method is_active() will always return True after construction.

Since __post_init__ validates that lease_name is non-empty, is_active() will always return True for any valid LeaseContext instance. If this is intentional (e.g., for semantic clarity or future-proofing where lease_name might become mutable), consider documenting this behavior. Otherwise, the method may be misleading.

python/packages/jumpstarter/jumpstarter/config/exporter_test.py (1)

140-141: Consider adding timeout assertion for completeness.

The test verifies script content but doesn't assert that the timeout: 600 value is correctly parsed and preserved during round-trip.

💡 Suggested addition
     assert config.hooks.before_lease.script == 'echo "Pre-lease hook for $LEASE_NAME"\nj power on\n'
     assert config.hooks.after_lease.script == 'echo "Post-lease hook for $LEASE_NAME"\nj power off\n'
+    assert config.hooks.before_lease.timeout == 600
+    assert config.hooks.after_lease.timeout == 600
python/packages/jumpstarter-kubernetes/jumpstarter_kubernetes/exporters.py (1)

82-106: Consider simplifying the status fallback logic.

The current approach works but is slightly verbose. The status variable is set conditionally on line 82, then or "Unknown" is applied again on lines 92 and 102.

♻️ Optional simplification
     def rich_add_rows(self, table, devices: bool = False):
-        status = self.status.exporter_status if self.status else "Unknown"
+        status = (self.status.exporter_status if self.status else None) or "Unknown"
         if devices:
             if self.status is not None:
                 for d in self.status.devices:
                     labels = []
                     if d.labels is not None:
                         for label in d.labels:
                             labels.append(f"{label}:{str(d.labels[label])}")
                     table.add_row(
                         self.metadata.name,
-                        status or "Unknown",
+                        status,
                         self.status.endpoint,
                         time_since(self.metadata.creation_timestamp),
                         ",".join(labels),
                         d.uuid,
                     )

         else:
             table.add_row(
                 self.metadata.name,
-                status or "Unknown",
+                status,
                 self.status.endpoint,
                 str(len(self.status.devices) if self.status and self.status.devices else 0),
                 time_since(self.metadata.creation_timestamp),
             )
python/packages/jumpstarter/jumpstarter/common/types.py (1)

16-18: Remove or complete the empty TYPE_CHECKING block.

The block contains only pass but the comment suggests async stub type aliases should be defined here. Either remove this dead code or add the intended type definitions.

♻️ Proposed fix

If no additional type definitions are needed, remove the block:

-# Async stub type aliases are only available for type checking (defined in .pyi files)
-if TYPE_CHECKING:
-    pass
-
python/packages/jumpstarter/jumpstarter/exporter/hooks.py (1)

309-323: Avoid blocking wait() in the async timeout path.

After process.kill(), the direct process.wait() can block the event loop if the process stalls. Use the same thread offload pattern as earlier waits.

🛠️ Suggested fix
-                        if process.poll() is None:
-                            process.kill()
-                            try:
-                                process.wait()
-                            except Exception:
-                                pass
+                        if process.poll() is None:
+                            process.kill()
+                            try:
+                                await anyio.to_thread.run_sync(process.wait)
+                            except Exception:
+                                pass
python/packages/jumpstarter/jumpstarter/client/core.py (2)

190-206: Elapsed time calculation doesn't account for RPC duration.

The elapsed time increments by poll_interval after each sleep, but the time spent in get_status_async() is not accounted for. With long network latencies, the actual wait could exceed timeout significantly. This is a minor concern given the 5-minute default timeout.

♻️ More accurate timing using monotonic clock
+        import time
+
         poll_interval = 0.5  # seconds
-        elapsed = 0.0
+        start_time = time.monotonic()
         poll_count = 0

         self.logger.debug("Waiting for exporter to be ready...")
-        while elapsed < timeout:
+        while (time.monotonic() - start_time) < timeout:
             poll_count += 1
-            self.logger.debug("[POLL %d] Calling GetStatus (elapsed: %.1fs)...", poll_count, elapsed)
+            elapsed = time.monotonic() - start_time
+            self.logger.debug("[POLL %d] Calling GetStatus (elapsed: %.1fs)...", poll_count, elapsed)
             # ... rest of loop ...
-            self.logger.debug("[POLL %d] Sleeping for %.1fs before next poll...", poll_count, poll_interval)
             await anyio.sleep(poll_interval)
-            elapsed += poll_interval

368-395: Pre-call status check adds latency to every driver call.

The check_exporter_status() call before each DriverCall adds a GetStatus RPC round-trip to every driver operation. For high-frequency driver calls, this could be significant overhead. Consider caching the status with a short TTL or only checking when the status might have changed.

However, if driver calls are typically infrequent or correctness is paramount, this conservative approach is reasonable.

python/packages/jumpstarter/jumpstarter/exporter/exporter.py (2)

315-317: Consider using lazy string formatting in logger calls.

Using f-strings in logger calls means the string formatting occurs even when the log level would filter out the message. Use %-style formatting for deferred evaluation.

♻️ Use lazy logging format
-            logger.info(f"Updated status to {status}: {message}")
+            logger.info("Updated status to %s: %s", status, message)
         except Exception as e:
-            logger.error(f"Failed to update status: {e}")
+            logger.error("Failed to update status: %s", e)

647-670: Duplicate lease-end handling logic.

The lease-end handling (lines 654-670) is nearly identical to the unleased-after-leased handling (lines 713-727). Both paths signal lease_ended, wait for after_lease_hook_done, and clear the lease context. Consider extracting a helper method.

♻️ Extract lease-end handling to a helper method
async def _handle_lease_end(self) -> None:
    """Signal lease end and wait for afterLease hook to complete."""
    if not self._lease_context:
        return
    
    logger.info("Lease ended, signaling handle_lease to run afterLease hook")
    self._lease_context.lease_ended.set()
    
    # Wait for the hook to complete
    with CancelScope(shield=True):
        await self._lease_context.after_lease_hook_done.wait()
    logger.info("afterLease hook completed")
    
    # Clear lease scope for next lease
    self._lease_context = None

Then use it in both places:

if lease_changed:
    await self._handle_lease_end()
    self.stop()
    break

Comment on lines 33 to +38
include_leases = "leases" in with_options
include_online = "online" in with_options
exporters = config.list_exporters(filter=selector, include_leases=include_leases, include_online=include_online)
include_status = "status" in with_options
exporters = config.list_exporters(
filter=selector, include_leases=include_leases, include_online=include_online, include_status=include_status
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, locate the client.py file and examine the list_exporters function
find . -path "*jumpstarter/config/client.py" -type f

Repository: jumpstarter-dev/monorepo

Length of output: 126


🏁 Script executed:

#!/bin/bash
# Examine the list_exporters function implementation
cat -n "$(find . -path '*jumpstarter/config/client.py' -type f | head -1)"

Repository: jumpstarter-dev/monorepo

Length of output: 17683


🏁 Script executed:

#!/bin/bash
# Also check the get.py file to see the context
find . -path "*jumpstarter-cli/jumpstarter_cli/get.py" -type f

Repository: jumpstarter-dev/monorepo

Length of output: 124


🏁 Script executed:

#!/bin/bash
# Find the Exporter class definition
find . -path "*/jumpstarter/client/grpc*" -type f | head -20

Repository: jumpstarter-dev/monorepo

Length of output: 186


🏁 Script executed:

#!/bin/bash
# Also check the get.py file to see how include_status is used for display
cat -n "$(find . -path '*jumpstarter-cli/jumpstarter_cli/get.py' -type f | head -1)"

Repository: jumpstarter-dev/monorepo

Length of output: 2242


🏁 Script executed:

#!/bin/bash
# Examine the Exporter class definition in grpc.py
cat -n "$(find . -path '*jumpstarter/client/grpc.py' -type f | head -1)" | head -200

Repository: jumpstarter-dev/monorepo

Length of output: 8794


🏁 Script executed:

#!/bin/bash
# Search for how include_status is used in model_print or response objects
rg -n "include_status|include_leases|include_online" --type py -B2 -A2 python/packages/jumpstarter/

Repository: jumpstarter-dev/monorepo

Length of output: 5483


Status field is lost when reconstructing exporters with leases.

When include_leases=True, the code in python/packages/jumpstarter/jumpstarter/config/client.py (lines 187–193) reconstructs Exporter objects but omits the status field. The original exporter objects contain status data (extracted from protobuf), but the new Exporter(...) call only passes namespace, name, labels, online, and lease, leaving status as None. When users request --with leases,status, the response's include_status flag is set correctly, but individual exporter objects no longer have status data, so the status column displays "UNKNOWN" instead of actual status values.

Copy the status field when reconstructing the exporter with lease information:

exporter_with_lease = Exporter(
    namespace=exporter.namespace,
    name=exporter.name,
    labels=exporter.labels,
    online=exporter.online,
    status=exporter.status,  # Add this line
    lease=lease,
)
🤖 Prompt for AI Agents
In `@python/packages/jumpstarter-cli/jumpstarter_cli/get.py` around lines 33 - 38,
The reconstructed Exporter created when include_leases is true is omitting the
exporter.status field, so when creating exporter_with_lease you must copy the
original exporter's status into the new Exporter instance; update the
Exporter(...) call (the reconstruction code in client code that builds
exporter_with_lease) to pass status=exporter.status alongside namespace, name,
labels, online, and lease so the status column is preserved.

Comment on lines +125 to +174
async def wait_for_lease_ready_streaming(self, timeout: float = 300.0) -> None:
"""Wait for exporter to report LEASE_READY status using streaming.
Uses StreamStatus RPC for real-time status updates instead of polling.
This is more efficient and provides immediate notification of status changes.
Args:
timeout: Maximum time to wait in seconds (default: 5 minutes)
"""
import anyio

self.logger.debug("Waiting for exporter to be ready (streaming)...")
seen_before_lease_hook = False

try:
with anyio.move_on_after(timeout):
async for response in self.stub.StreamStatus(jumpstarter_pb2.StreamStatusRequest()):
status = ExporterStatus.from_proto(response.status)
self.logger.debug("StreamStatus received: %s", status)

if status == ExporterStatus.LEASE_READY:
self.logger.info("Exporter ready, starting shell...")
return
elif status == ExporterStatus.BEFORE_LEASE_HOOK_FAILED:
self.logger.warning("beforeLease hook failed")
return
elif status == ExporterStatus.AFTER_LEASE_HOOK:
# Lease ended before becoming ready
raise DriverError("Lease ended before becoming ready")
elif status == ExporterStatus.BEFORE_LEASE_HOOK:
seen_before_lease_hook = True
self.logger.debug("beforeLease hook is running...")
elif status == ExporterStatus.AVAILABLE:
if seen_before_lease_hook:
# Lease ended - AVAILABLE after BEFORE_LEASE_HOOK indicates lease released
raise DriverError("Lease ended before becoming ready")
else:
# Initial AVAILABLE state - waiting for lease assignment
self.logger.debug("Exporter status: AVAILABLE (waiting for lease assignment)")
else:
self.logger.debug("Exporter status: %s (waiting...)", status)

self.logger.warning("Timeout waiting for beforeLease hook to complete")
except AioRpcError as e:
if e.code() == StatusCode.UNIMPLEMENTED:
# StreamStatus not implemented, fall back to polling
self.logger.debug("StreamStatus not implemented, falling back to polling")
await self.wait_for_lease_ready(timeout)
else:
raise DriverError(f"Error streaming status: {e.details()}") from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential silent failure after timeout.

When the timeout expires via move_on_after, the method logs a warning but then implicitly returns None. This may leave the caller unaware that the wait didn't succeed. Consider either raising an exception or returning a boolean to indicate success/failure.

🔧 Suggested improvement to handle timeout explicitly
-        try:
-            with anyio.move_on_after(timeout):
+        try:
+            with anyio.move_on_after(timeout) as cancel_scope:
                 async for response in self.stub.StreamStatus(jumpstarter_pb2.StreamStatusRequest()):
                     status = ExporterStatus.from_proto(response.status)
                     self.logger.debug("StreamStatus received: %s", status)

                     if status == ExporterStatus.LEASE_READY:
                         self.logger.info("Exporter ready, starting shell...")
                         return
                     elif status == ExporterStatus.BEFORE_LEASE_HOOK_FAILED:
                         self.logger.warning("beforeLease hook failed")
                         return
                     elif status == ExporterStatus.AFTER_LEASE_HOOK:
                         # Lease ended before becoming ready
                         raise DriverError("Lease ended before becoming ready")
                     elif status == ExporterStatus.BEFORE_LEASE_HOOK:
                         seen_before_lease_hook = True
                         self.logger.debug("beforeLease hook is running...")
                     elif status == ExporterStatus.AVAILABLE:
                         if seen_before_lease_hook:
                             # Lease ended - AVAILABLE after BEFORE_LEASE_HOOK indicates lease released
                             raise DriverError("Lease ended before becoming ready")
                         else:
                             # Initial AVAILABLE state - waiting for lease assignment
                             self.logger.debug("Exporter status: AVAILABLE (waiting for lease assignment)")
                     else:
                         self.logger.debug("Exporter status: %s (waiting...)", status)

-            self.logger.warning("Timeout waiting for beforeLease hook to complete")
+            if cancel_scope.cancelled_caught:
+                self.logger.warning("Timeout waiting for beforeLease hook to complete")
         except AioRpcError as e:
🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/client/core.py` around lines 125 -
174, The method wait_for_lease_ready_streaming currently logs a warning when
anyio.move_on_after times out and then returns silently; change it to explicitly
signal failure by raising a DriverError (e.g., raise DriverError("Timeout
waiting for exporter to become ready")) after the with
anyio.move_on_after(timeout): block instead of just logging, so callers of
wait_for_lease_ready_streaming (and any code expecting exceptions from
wait_for_lease_ready) will observe the timeout; keep the existing fallback for
AioRpcError.UNIMPLEMENTED to call wait_for_lease_ready(timeout).

name: str
labels: dict[str, str]
online: bool = False
status: ExporterStatus | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find proto files
find . -type f -name "*.proto" | head -20

Repository: jumpstarter-dev/monorepo

Length of output: 308


🏁 Script executed:

#!/bin/bash
# Search for Exporter message and ExporterStatus enum in proto files
rg -l "message Exporter|enum ExporterStatus" --type-list
fd -e proto

Repository: jumpstarter-dev/monorepo

Length of output: 705


🏁 Script executed:

#!/bin/bash
# Search more comprehensively for proto files
fd -e proto -o

Repository: jumpstarter-dev/monorepo

Length of output: 184


🏁 Script executed:

#!/bin/bash
# Search for Exporter message and ExporterStatus enum
rg "message Exporter|enum ExporterStatus" protocol/proto/ -C2

Repository: jumpstarter-dev/monorepo

Length of output: 833


🏁 Script executed:

#!/bin/bash
# Read the grpc.py file around lines 90-100
cat -n python/packages/jumpstarter/jumpstarter/client/grpc.py | sed -n '80,110p'

Repository: jumpstarter-dev/monorepo

Length of output: 1379


🏁 Script executed:

#!/bin/bash
# Read the full Exporter message definition
rg "message Exporter" protocol/proto/jumpstarter/client/v1/client.proto -A20

Repository: jumpstarter-dev/monorepo

Length of output: 795


🏁 Script executed:

#!/bin/bash
# Read the full ExporterStatus enum definition
rg "enum ExporterStatus" protocol/proto/jumpstarter/v1/common.proto -A10

Repository: jumpstarter-dev/monorepo

Length of output: 783


Remove the truthy check on status enum—value 0 (UNSPECIFIED) is valid.

The status field in the proto is not optional and defaults to 0 (EXPORTER_STATUS_UNSPECIFIED), a valid enum value. The current truthy check if hasattr(data, "status") and data.status: drops this value, yielding None instead. Since the field is required, no presence check is needed.

Proposed fix
-        status = None
-        if hasattr(data, "status") and data.status:
-            status = ExporterStatus.from_proto(data.status)
+        status = ExporterStatus.from_proto(data.status)

Also applies to: 96-99

🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/client/grpc.py` at line 90, The
current parsing drops valid enum value 0 because of a truthy check; remove the
`and data.status` truthy guard and always map/assign `data.status` to the
`status: ExporterStatus | None` variable (e.g., `status =
ExporterStatus(data.status)` or equivalent mapping) instead of returning None
when it's 0; update the same pattern in the other occurrences referenced (around
the other checks at lines ~96-99) so the required proto enum value is preserved
even when it's 0 (EXPORTER_STATUS_UNSPECIFIED).

Comment on lines +26 to +27
AFTER_LEASE_HOOK = common_pb2.EXPORTER_STATUS_AFTER_LEASE_HOOK
"""Lease was releaseed, but exporter is executing after lease hook"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Typo in docstring.

"releaseed" should be "released".

📝 Fix typo
     AFTER_LEASE_HOOK = common_pb2.EXPORTER_STATUS_AFTER_LEASE_HOOK
-    """Lease was releaseed, but exporter is executing after lease hook"""
+    """Lease was released, but exporter is executing after lease hook"""
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
AFTER_LEASE_HOOK = common_pb2.EXPORTER_STATUS_AFTER_LEASE_HOOK
"""Lease was releaseed, but exporter is executing after lease hook"""
AFTER_LEASE_HOOK = common_pb2.EXPORTER_STATUS_AFTER_LEASE_HOOK
"""Lease was released, but exporter is executing after lease hook"""
🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/common/enums.py` around lines 26 -
27, Fix the typo in the docstring for the enum constant AFTER_LEASE_HOOK: change
"releaseed" to "released" so the docstring reads "Lease was released, but
exporter is executing after lease hook" (update the string literal associated
with AFTER_LEASE_HOOK in the enums module).

Comment on lines +448 to +467
except HookExecutionError as e:
if e.should_shutdown_exporter():
# on_failure='exit' - shut down the entire exporter
logger.error("beforeLease hook failed with on_failure='exit': %s", e)
await report_status(
ExporterStatus.BEFORE_LEASE_HOOK_FAILED,
f"beforeLease hook failed (on_failure=exit, shutting down): {e}",
)
logger.error("Shutting down exporter due to beforeLease hook failure with on_failure='exit'")
# Exit code 1 tells the CLI not to restart the exporter
shutdown(exit_code=1)
else:
# on_failure='endLease' - just block this lease, exporter stays available
logger.error("beforeLease hook failed with on_failure='endLease': %s", e)
await report_status(
ExporterStatus.BEFORE_LEASE_HOOK_FAILED,
f"beforeLease hook failed (on_failure=endLease): {e}",
)
# TODO: We need to implement a controller-side mechanism to end the lease here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for lease-release handling tied to before-lease hook failures
rg -n -C3 'BEFORE_LEASE_HOOK_FAILED|endLease|request_lease_release' -g '*.py'

Repository: jumpstarter-dev/monorepo

Length of output: 23246


🏁 Script executed:

#!/bin/bash
# Find where execute_before_lease_hook is called
rg -n 'execute_before_lease_hook' -g '*.py' -B 2 -A 5

Repository: jumpstarter-dev/monorepo

Length of output: 6953


🏁 Script executed:

#!/bin/bash
# Search for where execute_before_lease_hook is called in exporter.py
rg -n 'execute_before_lease_hook' python/packages/jumpstarter/jumpstarter/exporter/exporter.py -B 5 -A 10

Repository: jumpstarter-dev/monorepo

Length of output: 50


🏁 Script executed:

#!/bin/bash
# Find all places where execute_before_lease_hook is called
rg -n 'execute_before_lease_hook' --type py

Repository: jumpstarter-dev/monorepo

Length of output: 1189


🏁 Script executed:

#!/bin/bash
# Find where HookExecutor is instantiated and used in exporter.py
rg -n 'HookExecutor|hook_executor' python/packages/jumpstarter/jumpstarter/exporter/exporter.py -B 3 -A 8

Repository: jumpstarter-dev/monorepo

Length of output: 3674


🏁 Script executed:

#!/bin/bash
# Find run_before_lease_hook method definition
rg -n 'def run_before_lease_hook' python/packages/jumpstarter/jumpstarter/exporter/hooks.py -A 30

Repository: jumpstarter-dev/monorepo

Length of output: 1744


🏁 Script executed:

#!/bin/bash
# Get the full run_before_lease_hook method
rg -n 'async def run_before_lease_hook' python/packages/jumpstarter/jumpstarter/exporter/hooks.py -A 100 | head -110

Repository: jumpstarter-dev/monorepo

Length of output: 5286


Add lease-release mechanism for before-lease hook failures with on_failure='endLease'.

The TODO at line 466 points to a real gap: when a before-lease hook fails with on_failure='endLease', the lease is never explicitly released to the controller. Unlike the after-lease hook (which has a request_lease_release parameter passed to its orchestration method), the before-lease hook lacks any mechanism to signal lease termination. This leaves the lease stuck in an active state.

Add request_lease_release as an optional parameter to run_before_lease_hook() (matching the pattern in run_after_lease_hook()), invoke it in the on_failure='endLease' error path, and pass self._request_lease_release from the call site in exporter.py.

🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/exporter/hooks.py` around lines 448 -
467, The before-lease hook currently never signals the controller to release a
lease on on_failure='endLease'; update run_before_lease_hook to accept an
optional request_lease_release callable (mirror the signature used by
run_after_lease_hook), wire that parameter through the call site in exporter.py
by passing self._request_lease_release, and inside the HookExecutionError except
branch for the endLease path invoke request_lease_release() (if provided) before
reporting status so the controller actually releases the lease; ensure the new
parameter is optional and non-breaking if not supplied.

Comment on lines +31 to +39
def get_source_for_record(self, record):
"""Determine the appropriate log source for a record."""
with self._lock:
# Check if this record comes from a logger with a specific source mapping
logger_name = record.name
for mapped_logger, source in self._child_handlers.items():
if logger_name.startswith(mapped_logger):
return source
return self.source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Prefer the most specific logger prefix match.

If mappings overlap, a generic prefix can shadow a more specific one, so the intended source never applies. Selecting the longest matching prefix avoids misrouted log sources.

🛠️ Suggested fix
-            for mapped_logger, source in self._child_handlers.items():
-                if logger_name.startswith(mapped_logger):
-                    return source
-            return self.source
+            matched_source = None
+            matched_len = -1
+            for mapped_logger, source in self._child_handlers.items():
+                if logger_name.startswith(mapped_logger) and len(mapped_logger) > matched_len:
+                    matched_source = source
+                    matched_len = len(mapped_logger)
+            return matched_source if matched_source is not None else self.source
🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/exporter/logging.py` around lines 31
- 39, The get_source_for_record implementation may return a shorter, generic
mapped_logger match before a more specific one; update it to pick the longest
matching logger prefix: inside the existing with self._lock block, iterate over
self._child_handlers items and collect mapped_logger keys where
logger_name.startswith(mapped_logger), then select the mapped_logger with the
greatest length (max by len) and return its associated source; if no match,
return self.source. Keep references to get_source_for_record,
self._child_handlers, logger_name and self.source to locate and update the
logic.

Comment on lines +59 to +66
@contextmanager
def context_log_source(self, logger_name: str, source: LogSource):
"""Context manager to temporarily set a log source for a specific logger."""
self.add_child_handler(logger_name, source)
try:
yield
finally:
self.remove_child_handler(logger_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Restore any previous mapping on context exit.

context_log_source always removes the mapping, which drops any existing entry (or nested override) after the context ends. Preserve and restore the prior mapping instead.

🛠️ Suggested fix
 `@contextmanager`
 def context_log_source(self, logger_name: str, source: LogSource):
     """Context manager to temporarily set a log source for a specific logger."""
-    self.add_child_handler(logger_name, source)
+    with self._lock:
+        prev = self._child_handlers.get(logger_name, None)
+        self._child_handlers[logger_name] = source
     try:
         yield
     finally:
-        self.remove_child_handler(logger_name)
+        with self._lock:
+            if prev is None:
+                self._child_handlers.pop(logger_name, None)
+            else:
+                self._child_handlers[logger_name] = prev
🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/exporter/logging.py` around lines 59
- 66, context_log_source currently unconditionally removes the child handler on
exit, losing any prior mapping; modify context_log_source to capture the
existing handler/mapping for logger_name before calling add_child_handler (e.g.,
via whatever accessor or self._child_handlers lookup you have), then in the
finally block restore that previous mapping: if a previous handler existed
re-add it (using add_child_handler or direct restore), otherwise remove the
handler (remove_child_handler) as currently done; reference the
context_log_source, add_child_handler and remove_child_handler symbols when
making the change.

Comment on lines +46 to 49
_current_status: ExporterStatus = field(init=False, default=ExporterStatus.AVAILABLE)
_status_message: str = field(init=False, default="")
_status_update_event: Event = field(init=False)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

StreamStatus can miss the latest status update.

Line 205-207 replaces _status_update_event after setting it. If a status change happens between the initial yield and the first wait, the stream can block forever and never emit the new status. A simple version counter prevents missed updates.

🛠️ Suggested fix
@@
-    _status_update_event: Event = field(init=False)
+    _status_update_event: Event = field(init=False)
+    _status_version: int = field(init=False, default=0)
@@
-        # Signal status change for StreamStatus subscribers
-        self._status_update_event.set()
-        # Create a new event for the next status change
-        self._status_update_event = Event()
+        # Signal status change for StreamStatus subscribers
+        self._status_version += 1
+        self._status_update_event.set()
+        # Create a new event for the next status change
+        self._status_update_event = Event()
@@
-        # Send current status immediately
-        yield jumpstarter_pb2.StreamStatusResponse(
+        last_version = self._status_version
+        # Send current status immediately
+        yield jumpstarter_pb2.StreamStatusResponse(
             status=self._current_status.to_proto(),
             message=self._status_message,
         )
@@
-            # Wait for status change event
-            current_event = self._status_update_event
-            await current_event.wait()
-
-            # Send the updated status
-            logger.debug("StreamStatus() sending update: %s", self._current_status)
-            yield jumpstarter_pb2.StreamStatusResponse(
-                status=self._current_status.to_proto(),
-                message=self._status_message,
-            )
+            current_event = self._status_update_event
+            if self._status_version == last_version:
+                await current_event.wait()
+
+            if self._status_version != last_version:
+                last_version = self._status_version
+                logger.debug("StreamStatus() sending update: %s", self._current_status)
+                yield jumpstarter_pb2.StreamStatusResponse(
+                    status=self._current_status.to_proto(),
+                    message=self._status_message,
+                )

Also applies to: 197-257

🤖 Prompt for AI Agents
In `@python/packages/jumpstarter/jumpstarter/exporter/session.py` around lines 46
- 49, The StreamStatus logic can miss updates because _status_update_event is
replaced after setting it; add a monotonic integer version counter (e.g.,
_status_version) incremented whenever _current_status/_status_message change and
when creating/replacing _status_update_event, then when yielding status in
StreamStatus (or the generator around lines ~197-257) capture the current
version before awaiting the event and only wait if the version hasn't advanced;
after the wait re-check the version in a loop to avoid missed updates. Update
all places that set _status_update_event/_current_status to increment
_status_version so the generator can reliably detect missed updates without
blocking indefinitely.

@mangelajo
Copy link
Member Author

@kirkbrauer I am deleting this attempt, and leaving it for you

@mangelajo mangelajo closed this Jan 22, 2026
@mangelajo mangelajo deleted the import/python-pr-606 branch January 22, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants