Skip to content

[Feature] Add asyncio support #28

@izellevy

Description

@izellevy

Is this your first time submitting a feature request?

  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing functionality

Describe the feature

Currently, load_dataset, list_datasets and to_pinecone_index functions are not async. These are potentially long running functions that might block the main thread for most asyncio applications. The goal is to add support for async equivalent of these functions.

list_datasets: gcsfs and s3fs are async compatible so it should be relatively easy to add async equivalents.

to_pinecone_index: Might require Pinecone Client 3.0 so we might need to wait until it is stable.

load_dataset: We need to improve the functionality here. Currently load_dataset does not actually load the dataset but just creates a Dataset object that might be confusing for the users. Long running tasks should be clear to the user and download should be explicit. (Currently download happens on property access to queries/documents or by calling head function.)
See https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris as an example.

In order to change this functionality I suggest changing load_dataset to get_dataset_loader (or another name) and creating two functions to fetch queries or documents such as dataset_loader.load_documents (async) and dataset_loader.load_queries (async). In that case we might need to deprecate load_dataset but keep several versions with a DeprecationWarning. We might also need some refactor.

Describe alternatives you've considered

We can keep the API as is but as asyncio is becoming more and more popular I think it is a good idea to catch up.

Who will this benefit?

to_pinecone_index_async will be especially useful for big bulk upserts. The other changes will improve the user experience.

Are you interested in contributing this feature?

Sure, I think we need to have a discussion first and plan the changes properly.

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions