Reducing complexity of implementation in order to be able to add Atlas text search token based pagination#1046
Reducing complexity of implementation in order to be able to add Atlas text search token based pagination#1046kbuma wants to merge 13 commits intomaterialsproject:mainfrom
Conversation
tsmathis
left a comment
There was a problem hiding this comment.
Not really much to say on my end, I am be curious though about the performance/execution time of this implementation vs. the parallel approach.
mp_api/client/core/client.py
Outdated
| r -= 1 | ||
| except MPRestError as e: | ||
| # If we get 422 or 414 error, or 0 results for comma-separated params, split into batches | ||
| if "422" in str(e) or "414" in str(e) or "Got 0 results" in str(e): |
There was a problem hiding this comment.
any(trace in str(e) for trace in ("422","414","Got 0 results"))| ] | ||
| # Batch the split values to reduce number of requests | ||
| # Use batches of up to 100 values to balance URL length and request count | ||
| batch_size = min(100, max(1, len(split_values) // 10)) |
There was a problem hiding this comment.
Should the batch size be chosen according to the limits we (may) impose on a Query? Or alternatively, should there be a check on the length of a batch after fixing the batch size? That way excessively long queries get rejected (e.g., I query for 1M task IDs, 100 batches would still give me an overly-long list of task IDs)
There was a problem hiding this comment.
That would change the existing user interface. Trying to avoid doing that as part of this refactor.
77913b3 to
28e74e1
Compare
…d flatten once with chain.from_iterable
|
@esoteric-ephemera @tsmathis I've addressed all outstanding comments. Ready for re-review. |
|
Nothing else to add from my end |
…ion with append + chain.from_iterable
|
Summary
Major changes: