Use the outerbounds-project skill.
The goal is to build a project that fetches company information from Snowflake continuously, fetches company website, interprets it using a local LLM, and enriches company information with tags generated by an LLM. The results can be explored with a custom UI, deployed on Outerbounds.
The project that has the following components:
- An ETL flow that fetches data from Snowflake
- Refer to
example_data.pyfor a sample - Only include rows with a valid website url
- Data is processed in batches of at most 100 rows
- Store the IDs of rows that were processed, next time the flow executes, fetch the next batch
- Store the state of processing in an artifact, use Metaflow client to retrieve the state
- Include an option for resetting state
- Schedule the flow to be daily hourly
- A flow that fetches enriches rows
- Triggered by the ETL flow using events
- Process a batch of rows in parallel using foreach, 10 parallel tasks
- Request the landing page of the company website
- Use a small local LLM to extract 5 descriptive tags for each company
- Include a card showing what was processed (successfully or not), and the tags produced
- Fetch an artifact from previous run of the flow, merge it with the new results
- An intective UI for exploring companies and tags
- Using the results produced by (2)