Skip to content

Allow users to generate a queryID for large queries #292

Description

@mjwestgate

Currently, there is no way to do very large queries, such as those that require complex spatial polygons, or involve many species. There is a logical solution here - already implemented in galah-Python I believe - which is to upload a large query to the ALA using this API then use the returned queryID for later downloads/queries. Syntax could look something like this:

# cache a query on ALA
# noting that `copy_to()` is a dplyr generic, but isn't in galah yet
query_id <- galah_call() |>
  filter(taxonConceptID %in% vector_of_many_species) |>
  copy_to("data/query_id")   # I'm guessing syntax rn, but could use the same string as `url_lookup()` here?

# call that query ID to get counts for many species
result <- galah_call() |>
  filter(qid == query_id) |>
  distinct(speciesID, .keep_all = FALSE) |>
  select(count) |> # should really be `dplyr::add_count()`, but again not implemented yet
  collect()

As a side-note, this would also help us close issue #53

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions