Skip to content

Reducing complexity of implementation in order to be able to add Atlas text search token based pagination#1046

Open
kbuma wants to merge 13 commits intomaterialsproject:mainfrom
kbuma:search-pagination
Open

Reducing complexity of implementation in order to be able to add Atlas text search token based pagination#1046
kbuma wants to merge 13 commits intomaterialsproject:mainfrom
kbuma:search-pagination

Conversation

@kbuma
Copy link
Collaborator

@kbuma kbuma commented Dec 19, 2025

Summary

Major changes:

  • remove parallelization of server requests
  • re-implement handling of list criteria for parameters in endpoints that do not accept lists (this was tied into the parallelization code previously)
  • re-implement handling of list criteria that is too large for a single request (this was tied into the parallelization code previously)

Copy link
Collaborator

@tsmathis tsmathis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really much to say on my end, I am be curious though about the performance/execution time of this implementation vs. the parallel approach.

r -= 1
except MPRestError as e:
# If we get 422 or 414 error, or 0 results for comma-separated params, split into batches
if "422" in str(e) or "414" in str(e) or "Got 0 results" in str(e):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any(trace in str(e) for trace in ("422","414","Got 0 results"))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed with 18b0b79

]
# Batch the split values to reduce number of requests
# Use batches of up to 100 values to balance URL length and request count
batch_size = min(100, max(1, len(split_values) // 10))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the batch size be chosen according to the limits we (may) impose on a Query? Or alternatively, should there be a check on the length of a batch after fixing the batch size? That way excessively long queries get rejected (e.g., I query for 1M task IDs, 100 batches would still give me an overly-long list of task IDs)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would change the existing user interface. Trying to avoid doing that as part of this refactor.

@kbuma
Copy link
Collaborator Author

kbuma commented Feb 6, 2026

@esoteric-ephemera @tsmathis I've addressed all outstanding comments. Ready for re-review.

tsmathis

This comment was marked as resolved.

@tsmathis
Copy link
Collaborator

tsmathis commented Feb 7, 2026

Nothing else to add from my end

@kbuma
Copy link
Collaborator Author

kbuma commented Feb 7, 2026

still a couple of .extend(...)s

this one should be fine, it's just the final step right? https://github.com/kbuma/api/blob/858accf6b00d90055eb21d642e0ac4eabf8ba921/mp_api/client/core/client.py#L732

this one is in a while loop: https://github.com/kbuma/api/blob/858accf6b00d90055eb21d642e0ac4eabf8ba921/mp_api/client/core/client.py#L806

@tsmathis addressed with fc25c1e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants