Skip to content

internal tagging#377

Open
rguo123 wants to merge 1 commit into
mainfrom
03-05-internal_tagging
Open

internal tagging#377
rguo123 wants to merge 1 commit into
mainfrom
03-05-internal_tagging

Conversation

@rguo123

@rguo123 rguo123 commented Mar 5, 2025

Copy link
Copy Markdown
Contributor

Important

Adds tag completion waiting mechanism to AtlasMapTags and updates version to 3.4.2.

  • Behavior:
    • Adds wait_time parameter to df and get_tags in AtlasMapTags to specify max wait time for tag completion.
    • Implements is_tag_complete in AtlasMapTags to check tag completion status.
    • Uses /v1/project/projection/tags/robotag endpoint to ensure tag processing.
    • Logs a warning if a tag is unavailable after waiting.
  • Misc:
    • Updates version in setup.py from 3.4.1 to 3.4.2.

This description was created by Ellipsis for 075d630. It will automatically update as commits are pushed.

rguo123 commented Mar 5, 2025

Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@rguo123 rguo123 marked this pull request as ready for review March 10, 2025 19:43
@rguo123 rguo123 force-pushed the 03-05-internal_tagging branch from c10ace8 to c122f10 Compare March 10, 2025 19:44
@rguo123 rguo123 requested a review from mcembalest March 10, 2025 19:46
Comment thread nomic/data_operations.py
"project_id": self.dataset.id,
"tag_id": tag_id,
},
).json()["is_complete"]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error handling for the GET request in is_tag_complete. Currently, it directly calls .json() without checking status codes.

Comment thread nomic/data_operations.py
wait_start = time.time()
# Wait up to 5 minutes for tag to be completed
while not is_complete:
# Sleep 5 seconds

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions sleeping 5 seconds, but the code uses time.sleep(15). Please adjust the comment or the sleep duration for consistency.

Suggested change
# Sleep 5 seconds
# Sleep 15 seconds

Comment thread nomic/data_operations.py

@property
def df(self, overwrite: bool = False) -> pd.DataFrame:
def df(self, overwrite: bool = False, wait_time: int = 120) -> pd.DataFrame:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using a @property with parameters. Consider converting df to a regular method so that parameters like overwrite and wait_time can be explicitly passed.

Comment thread nomic/data_operations.py Outdated
keep_tags.append(tag)
else:
# Use robotag route instead of v1/n so we guarantee only one request gets launched
requests.post(self.dataset.atlas_api_path + "/v1/project/projection/tags/robotag", headers=self.dataset.header,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The POST request to the /v1/project/projection/tags/robotag endpoint lacks error handling. Consider checking the response status and handling failures appropriately.

@rguo123 rguo123 force-pushed the 03-05-internal_tagging branch from c122f10 to 075d630 Compare March 10, 2025 19:49
Comment thread nomic/data_operations.py

def get_tags(self) -> List[Dict[str, str]]:
def is_tag_complete(self, tag_id) -> bool:
is_complete = requests.get(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error handling (e.g. response.raise_for_status) in is_tag_complete to gracefully handle HTTP errors.

Comment thread nomic/data_operations.py
keep_tags.append(tag)
else:
# Use robotag route instead of v1/n so we guarantee only one request gets launched
requests.post(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error handling for the robotag POST request to ensure it succeeds before proceeding.

Comment thread nomic/data_operations.py
)
wait_start = time.time()
# Wait up to 5 minutes for tag to be completed
while not is_complete:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an inconsistency in the get_tags method in the AtlasMapTags class. The inline comment mentions sleeping for 5 seconds and waiting up to 5 minutes for a tag to be completed, but the code actually calls time.sleep(15) and uses a default wait_time of 120 seconds. Please update either the code or the comments so that they both reflect the intended behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant