Skip to content

Latest commit

 

History

History
29 lines (23 loc) · 1.35 KB

File metadata and controls

29 lines (23 loc) · 1.35 KB

Use the outerbounds-project skill.

The goal is to build a project that fetches company information from Snowflake continuously, fetches company website, interprets it using a local LLM, and enriches company information with tags generated by an LLM. The results can be explored with a custom UI, deployed on Outerbounds.

The project that has the following components:

  1. An ETL flow that fetches data from Snowflake
  • Refer to example_data.py for a sample
  • Only include rows with a valid website url
  • Data is processed in batches of at most 100 rows
  • Store the IDs of rows that were processed, next time the flow executes, fetch the next batch
  • Store the state of processing in an artifact, use Metaflow client to retrieve the state
  • Include an option for resetting state
  • Schedule the flow to be daily hourly
  1. A flow that fetches enriches rows
  • Triggered by the ETL flow using events
  • Process a batch of rows in parallel using foreach, 10 parallel tasks
  • Request the landing page of the company website
  • Use a small local LLM to extract 5 descriptive tags for each company
  • Include a card showing what was processed (successfully or not), and the tags produced
  • Fetch an artifact from previous run of the flow, merge it with the new results
  1. An intective UI for exploring companies and tags
  • Using the results produced by (2)