Skip to content

Conversation

@megatnt1122
Copy link
Collaborator

@megatnt1122 megatnt1122 commented Nov 25, 2025

Ticket

#1522

Description

Logging improvements to data_router

Tasks

  • - A description of the PR has been provided, and a diagram included if it is a new feature.
  • - Formatter has been run
  • - CHANGELOG comment has been added
  • - Labels have been assigned to the pr
  • - A reviwer has been added
  • - A user has been assigned to work on the pr
  • - If new feature a unit test has been added

Summary by Sourcery

Improve observability and permission handling in the data router API endpoints.

New Features:

  • Add structured request logging for all key data router endpoints, including correlation IDs and request status.

Enhancements:

  • Centralize permission checks by replacing direct usage of the permissions module with equivalent helpers on the shared support library.
  • Capture and log success and failure details (including results and errors where appropriate) for create, update, export, dependency graph, locking, path resolution, transfer, allocation/owner change, and delete operations to aid debugging and auditing.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 25, 2025

Reviewer's Guide

Refactors data_router to use centralized permission helpers and introduces structured request lifecycle logging (start/success/failure) across all data routes for better observability and correlation.

Sequence diagram for POST data/get with task initialization and logging

sequenceDiagram
    actor User
    participant ClientApp
    participant DataRouter
    participant GLib
    participant Logger
    participant ArangoDB
    participant GTasks

    User->>ClientApp: Request data download
    ClientApp->>DataRouter: POST /data/get?client=clientId

    DataRouter->>GLib: getUserFromClientID(clientId)
    GLib-->>DataRouter: client

    DataRouter->>Logger: logRequestStarted(client, correlationId, POST, data/get, Started)

    DataRouter->>ArangoDB: _executeTransaction(read metadata, validate ids)
    activate ArangoDB
    ArangoDB->>GLib: resolveDataID for each id
    GLib-->>ArangoDB: data ids
    ArangoDB-->>DataRouter: validated ids
    deactivate ArangoDB

    DataRouter->>GTasks: taskInitDataGet(client, path, encrypt, ids)
    GTasks-->>DataRouter: task descriptor result

    DataRouter-->>ClientApp: HTTP 200 result (task info)
    DataRouter->>Logger: logRequestSuccess(client, correlationId, POST, data/get, Success, result)

    alt Failure during transaction or task init
        DataRouter->>Logger: logRequestFailure(client, correlationId, POST, data/get, Failure, result, error)
        DataRouter->>GLib: handleException(error, res)
        GLib-->>ClientApp: Error response
    end
Loading

Class diagram for updated data_router modules and logging helpers

classDiagram
    class DataRouter {
        +post_create(req, res)
        +post_create_batch(req, res)
        +post_update(req, res)
        +post_update_batch(req, res)
        +post_update_md_err_msg(req, res)
        +post_update_size(req, res)
        +get_view(req, res)
        +post_export(req, res)
        +get_dep_graph_get(req, res)
        +get_lock(req, res)
        +get_path(req, res)
        +get_list_by_alloc(req, res)
        +post_get(req, res)
        +post_put(req, res)
        +post_alloc_chg(req, res)
        +post_owner_chg(req, res)
        +post_delete(req, res)
        -recordCreate(client, record, result)
        -recordUpdate(client, record, result)
    }

    class GLib {
        +getUserFromClientID(clientId)
        +getUserFromClientID_noexcept(clientId)
        +hasManagerPermProj(client, ownerId)
        +hasPermissions(client, resource, perms)
        +hasAdminPermObject(client, objectId)
        +getPermissions(client, resource, perms)
        +ensureAdminPermUser(client, userId)
        +ensureManagerPermProj(client, projId)
        +resolveDataID(id, client)
        +resolveDataCollID(id, client)
        +getObject(id, client)
        +hasPublicRead(dataId)
        +computeDataPath(loc, localOnly)
        +saveRecentGlobusPath(client, path, transferType)
        +handleException(error, res)
        <<constants>> PERM_CREATE
        <<constants>> PERM_WR_META
        <<constants>> PERM_WR_REC
        <<constants>> PERM_RD_REC
        <<constants>> PERM_RD_META
        <<constants>> PERM_RD_DATA
        <<constants>> TT_DATA_EXPORT
        <<constants>> TT_DATA_GET
        <<constants>> TT_DATA_PUT
    }

    class Logger {
        +logRequestStarted(client, correlationId, httpVerb, routePath, status, description)
        +logRequestSuccess(client, correlationId, httpVerb, routePath, status, description, extra)
        +logRequestFailure(client, correlationId, httpVerb, routePath, status, description, extra, error)
    }

    class ArangoDB {
        +_executeTransaction(config)
        +c_document(id)
        +d_document(id)
        +owner_firstExample(query)
        +loc_firstExample(query)
        +_update(id, patch)
        +_remove(id)
        +_query(query, bindVars)
    }

    class GTasks {
        +taskInitDataGet(client, path, encrypt, ids)
        +taskInitDataPut(client, path, encrypt, ids, check, srcRepoId)
        +taskInitRecAllocChg(client, projId, ids, opts)
        +taskInitRecOwnerChg(client, ids, collId, opts)
        +taskInitRecCollDelete(client, ids)
    }

    DataRouter --> GLib : uses
    DataRouter --> Logger : logs via
    DataRouter --> ArangoDB : executes transactions
    DataRouter --> GTasks : initializes tasks
    GLib --> ArangoDB : helper DB access
    GLib <.. DataRouter : centralized permissions and utilities
    Logger <.. DataRouter : request lifecycle logging
Loading

Flow diagram for generic data_router request lifecycle logging

flowchart TD
    A[Receive HTTP request
client, headers, body] --> B[Resolve client
getUserFromClientID or getUserFromClientID_noexcept]
    B --> C[logRequestStarted
client, correlationId,
httpVerb, routePath,
status Started,
description]
    C --> D{Handler uses
ArangoDB and helpers}

    D --> E[Build result data
or side effects only]
    E --> F[Send success response
res.send result or empty]
    F --> G[logRequestSuccess
client, correlationId,
httpVerb, routePath,
status Success,
description, extra]

    D --> H[Exception thrown
validation, permissions,
DB error, task error]
    H --> I[logRequestFailure
client, correlationId,
httpVerb, routePath,
status Failure,
description, extra,
error]
    I --> J[handleException
via g_lib.handleException]
    J --> K[Error response
status code and body]

    subgraph RetryLoop[Optional retry loop
for write conflict 1200]
        D --> L{Arango errorNum 1200
and retries left?}
        L -- yes --> D
        L -- no --> H
    end
Loading

File-Level Changes

Change Details Files
Add structured request lifecycle logging to all major data_router endpoints using a shared logger with correlation IDs and route metadata.
  • Import shared logger module and define a base path for data routes
  • For each endpoint, capture the client once at the top-level and log request start with HTTP verb, route path, status, and human-readable description
  • On successful completion, log request success including result or key output data where appropriate
  • On exceptions, log request failures with error details and any partial result data before delegating to existing exception handling
  • Ensure variables like client/result are declared in outer scope so they are available in both success and failure logging blocks
core/database/foxx/api/data_router.js
Unify permission checks to use g_lib helpers instead of the separate permissions module throughout data_router.
  • Replace permissions.hasManagerPermProj and permissions.hasPermissions usages with equivalent g_lib helper functions and constants
  • Update record update, view, lock, path, and listing logic to use g_lib.hasAdminPermObject, g_lib.getPermissions, and g_lib permission flags for consistency
  • Remove redundant permissions import now that all permission checks are routed through g_lib
  • Keep overall permission semantics and error handling the same while centralizing implementation
core/database/foxx/api/data_router.js

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In the /update/batch handler you now set client = req.queryParams.client instead of using g_lib.getUserFromClientID, which changes client from a user object to a raw id string and will impact downstream permission checks and task initialization that expect a full client object.
  • In the /export transaction you replaced results = [] with result = [] but still appear to use results later (for pushes, res.send(results), and logging), which will either reference an undefined variable or leave the logged/sent value empty; align the variable name consistently (result vs results).
  • Some of the new logging calls have inconsistencies or likely mistakes (e.g., /dep/graph/get logs httpVerb: "POST" for a GET route, and /update/size logs before client is set and uses extra: "undefined"), so it would be good to standardize these fields and ensure client and extra carry meaningful values across all handlers.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the /update/batch handler you now set `client = req.queryParams.client` instead of using `g_lib.getUserFromClientID`, which changes `client` from a user object to a raw id string and will impact downstream permission checks and task initialization that expect a full client object.
- In the /export transaction you replaced `results = []` with `result = []` but still appear to use `results` later (for pushes, `res.send(results)`, and logging), which will either reference an undefined variable or leave the logged/sent value empty; align the variable name consistently (`result` vs `results`).
- Some of the new logging calls have inconsistencies or likely mistakes (e.g., /dep/graph/get logs `httpVerb: "POST"` for a GET route, and /update/size logs before `client` is set and uses `extra: "undefined"`), so it would be good to standardize these fields and ensure `client` and `extra` carry meaningful values across all handlers.

## Individual Comments

### Comment 1
<location> `core/database/foxx/api/data_router.js:1279-1288` </location>
<code_context>

 router
     .get("/view", function (req, res) {
+        let results = null;
+        let client = null;
         try {
</code_context>

<issue_to_address>
**issue (bug_risk):** The /view handler logs `results`, but that variable is never populated and the actual response uses `data`.

In this route, `results` is never updated, so the success log will always show `extra: null` instead of the actual payload. To log the returned data, either assign `results = [data];` before `res.send`, or update the log call to use the real payload, e.g. `extra: { results: [data] }` or `extra: data` directly.
</issue_to_address>

### Comment 2
<location> `core/database/foxx/api/data_router.js:264-273` </location>
<code_context>
             try {
-                var result = {
+                client = g_lib.getUserFromClientID(req.queryParams.client);
+                logger.logRequestStarted({
+                    client: client,
+                    correlationId: req.headers["x-correlation-id"],
</code_context>

<issue_to_address>
**suggestion:** Logging for /dep/graph/get uses HTTP verb POST even though this is a GET route.

Please update the logger calls in this handler to use `httpVerb: "GET"` so the logs match the route definition (`router.get("/dep/graph/get", ...)`) and remain accurate for method-based filtering and monitoring.

Suggested implementation:

```javascript
            logger.logRequestStarted({
                client: client,
                correlationId: req.headers["x-correlation-id"],
                httpVerb: "GET",
                routePath: basePath + "/dep/graph/get",
                status: "Started",
                description: "Get dependency graph",
            });

```

```javascript
            logger.logRequestCompleted({
                client: client,
                correlationId: req.headers["x-correlation-id"],
                httpVerb: "GET",
                routePath: basePath + "/dep/graph/get",
                status: "Completed",
                description: "Get dependency graph",
            });

```

```javascript
            logger.logRequestFailed({
                client: client,
                correlationId: req.headers["x-correlation-id"],
                httpVerb: "GET",
                routePath: basePath + "/dep/graph/get",
                status: "Failed",
                description: "Get dependency graph",
                error: err && err.stack ? err.stack : err,
            });

```

- Ensure these logger calls are indeed within the `router.get("/dep/graph/get", ...)` handler.
- Do NOT change `httpVerb` for other routes, such as the `/create` POST route you showed; those should remain `"POST"`.
- If there are any other logger calls (e.g., `logRequestStarted`, `logRequestCompleted`, `logRequestFailed`) associated with `/dep/graph/get` that are not shown here, update their `httpVerb` field to `"GET"` in the same way.
</issue_to_address>

### Comment 3
<location> `core/database/foxx/api/data_router.js:30-36` </location>
<code_context>
        if (owner_id != client._id) {
            if (!g_lib.hasManagerPermProj(client, owner_id)) {
                var parent_coll = g_db.c.document(parent_id);
                if (!g_lib.hasPermissions(client, parent_coll, g_lib.PERM_CREATE)) {
                    throw error.ERR_PERM_DENIED;
                }
            }
        }

</code_context>

<issue_to_address>
**suggestion (code-quality):** Merge nested if conditions ([`merge-nested-ifs`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/JavaScript/Default-Rules/merge-nested-ifs))

```suggestion
        if (owner_id != client._id && !g_lib.hasManagerPermProj(client, owner_id)) {
              var parent_coll = g_db.c.document(parent_id);
              if (!g_lib.hasPermissions(client, parent_coll, g_lib.PERM_CREATE)) {
                  throw error.ERR_PERM_DENIED;
              }
        }

```

<br/><details><summary>Explanation</summary>Reading deeply nested conditional code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two `if` conditions can be combined using
`and` is an easy win.
</details>
</issue_to_address>

### Comment 4
<location> `core/database/foxx/api/data_router.js:1304-1308` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Use `const` or `let` instead of `var`. ([`avoid-using-var`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/JavaScript/Default-Rules/avoid-using-var))

<details><summary>Explanation</summary>`const` is preferred as it ensures you cannot reassign references (which can lead to buggy and confusing code).
`let` may be used if you need to reassign references - it's preferred to `var` because it is block- rather than
function-scoped.

From the [Airbnb JavaScript Style Guide](https://airbnb.io/javascript/#references--prefer-const)
</details>
</issue_to_address>

### Comment 5
<location> `core/database/foxx/api/data_router.js:1802` </location>
<code_context>

</code_context>

<issue_to_address>
**issue (code-quality):** Use `const` or `let` instead of `var`. ([`avoid-using-var`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/JavaScript/Default-Rules/avoid-using-var))

<details><summary>Explanation</summary>`const` is preferred as it ensures you cannot reassign references (which can lead to buggy and confusing code).
`let` may be used if you need to reassign references - it's preferred to `var` because it is block- rather than
function-scoped.

From the [Airbnb JavaScript Style Guide](https://airbnb.io/javascript/#references--prefer-const)
</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +1279 to +1288
let results = null;
let client = null;
try {
const client = g_lib.getUserFromClientID_noexcept(req.queryParams.client);
client = g_lib.getUserFromClientID_noexcept(req.queryParams.client);
logger.logRequestStarted({
client: client,
correlationId: req.headers["x-correlation-id"],
httpVerb: "GET",
routePath: basePath + "/view",
status: "Started",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The /view handler logs results, but that variable is never populated and the actual response uses data.

In this route, results is never updated, so the success log will always show extra: null instead of the actual payload. To log the returned data, either assign results = [data]; before res.send, or update the log call to use the real payload, e.g. extra: { results: [data] } or extra: data directly.

Comment on lines +264 to 273
logger.logRequestStarted({
client: client,
correlationId: req.headers["x-correlation-id"],
httpVerb: "POST",
routePath: basePath + "/create",
status: "Started",
description: "Create a new data record",
});
result = {
results: [],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Logging for /dep/graph/get uses HTTP verb POST even though this is a GET route.

Please update the logger calls in this handler to use httpVerb: "GET" so the logs match the route definition (router.get("/dep/graph/get", ...)) and remain accurate for method-based filtering and monitoring.

Suggested implementation:

            logger.logRequestStarted({
                client: client,
                correlationId: req.headers["x-correlation-id"],
                httpVerb: "GET",
                routePath: basePath + "/dep/graph/get",
                status: "Started",
                description: "Get dependency graph",
            });
            logger.logRequestCompleted({
                client: client,
                correlationId: req.headers["x-correlation-id"],
                httpVerb: "GET",
                routePath: basePath + "/dep/graph/get",
                status: "Completed",
                description: "Get dependency graph",
            });
            logger.logRequestFailed({
                client: client,
                correlationId: req.headers["x-correlation-id"],
                httpVerb: "GET",
                routePath: basePath + "/dep/graph/get",
                status: "Failed",
                description: "Get dependency graph",
                error: err && err.stack ? err.stack : err,
            });
  • Ensure these logger calls are indeed within the router.get("/dep/graph/get", ...) handler.
  • Do NOT change httpVerb for other routes, such as the /create POST route you showed; those should remain "POST".
  • If there are any other logger calls (e.g., logRequestStarted, logRequestCompleted, logRequestFailed) associated with /dep/graph/get that are not shown here, update their httpVerb field to "GET" in the same way.

@megatnt1122 megatnt1122 self-assigned this Nov 25, 2025
@megatnt1122 megatnt1122 added Component: Database Relates to database microservice / data model Type: Refactor Imlplementation change, same functionality Priority: Low Lower priority work. labels Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Database Relates to database microservice / data model Priority: Low Lower priority work. Type: Refactor Imlplementation change, same functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants