Technical Lesson — Building a Semantic Retrieval Workflow

Introduction

A semantic retrieval workflow turns embeddings into useful search results. In this lesson, you will build the backend sequence that takes stored documents, embeds them, embeds a user query, compares similarity scores, ranks the results, and returns the top matches.

You will produce a Python script within a developer documentation scenario using Identify → Assemble → Execute → Verify with document metadata, query processing, cosine similarity, top-k ranking, and relevance checks.

Scenario

You are a junior backend developer on a platform team. The team maintains internal developer documentation, but users often search with everyday language instead of exact article titles.

A developer asks:

“Why does the mobile app say my token is expired?”

The documentation site contains several articles about authentication, API keys, billing, and dashboard performance. Your task is to build a small semantic retrieval workflow that returns the most relevant documentation articles first.

Tools and Resources

Python 3.10 or newer
Visual Studio Code or another code editor
Terminal or integrated terminal
pipenv
Ollama installed and running locally
An embedding model, such as embeddinggemma
Python package: ollama

Set up the embedding model:

ollama pull embeddinggemma
ollama run embeddinggemma "Hello world"

Install the project dependencies in the Pipfile and enter the virtual environment:

pipenv install
pipenv shell

To run the script later from inside the pipenv shell, use:

python semantic_retrieval_lesson.py

Instructions

Follow the technical process: Identify → Assemble → Execute → Verify.

Step 1: Identify the retrieval goal and output contract

Next, you will define what the retrieval workflow should solve and what information it should return.

Action

Create a short planning note that identifies the user need, searchable content, expected top result, and returned output.

Breakdown

Start by identifying the user role:

User or role:
Developers searching internal platform documentation.

Add the business problem:

Business problem:
Developers lose time when natural-language questions do not match documentation titles.

Add the user query that the backend will process:

User query:
"Why does the mobile app say my token is expired?"

Add the content the backend can search:

Searchable content:
Developer documentation summaries about authentication, API keys, billing, and dashboard performance.

Add the result you expect to rank first:

Expected top result:
An article about refreshing expired API access tokens.

Add the information each returned result should include:

Returned output:
Top 3 ranked results with document ID, title, category, similarity score, and source text.

Add a verification goal:

Verification goal:
The top result should be about expired or invalid API access tokens, not billing or dashboard performance.

Your completed planning note should look like this:

User or role:
Developers searching internal platform documentation.

Business problem:
Developers lose time when natural-language questions do not match documentation titles.

User query:
"Why does the mobile app say my token is expired?"

Searchable content:
Developer documentation summaries about authentication, API keys, billing, and dashboard performance.

Expected top result:
An article about refreshing expired API access tokens.

Returned output:
Top 3 ranked results with document ID, title, category, similarity score, and source text.

Verification goal:
The top result should be about expired or invalid API access tokens, not billing or dashboard performance.

You should have a clear output contract for your search workflow.

The output contract helps you design the code. A retrieval workflow is not only about finding text. It also needs to return enough source information for the user, frontend, or future RAG system to trust the result.

Step Hint

Do not return only the matching text. Include metadata such as document ID, title, category, and similarity score.

Step Feedback

This step is strong when it connects the retrieval workflow to a real user need and a clear returned output.

Step 2: Assemble the file, imports, model, and documents

Next, you will create the Python file, import the tools you need, choose one embedding model, set the number of results to return, and store searchable documents with traceable metadata.

Action

Create a Python file named semantic_retrieval_lesson.py and add the imports, model name, top-k value, and document list.

Breakdown

Create the file:

touch semantic_retrieval_lesson.py

Open semantic_retrieval_lesson.py.

At the top of the file, import the tools the script will use:

from math import sqrt
from typing import Any, Dict, List

import ollama

These imports are used for specific parts of the workflow:

sqrt calculates vector magnitudes for cosine similarity.
Any, Dict, and List make the data structure easier to read as you store strings, scores, and vectors together.
ollama sends text to the local embedding model and receives vector outputs.

Create a reusable constant for the embedding model:

MODEL = "embeddinggemma"

This keeps the model choice in one place. The same model must be used for both document embeddings and query embeddings.

Create a reusable constant for the number of results to return:

TOP_K = 3

TOP_K limits the output to the strongest matches. Returning only a few ranked results helps reduce noise.

Create the document list:

DOCUMENTS: List[Dict[str, str]] = []

Add the first searchable document:

DOCUMENTS.append(
    {
        "id": "DEV-101",
        "title": "Refreshing Expired API Access Tokens",
        "category": "authentication",
        "text": (
            "Explains how to refresh expired API access tokens, check token lifetime, "
            "and retry a request with a valid bearer token."
        ),
    }
)

This document should be the strongest expected match for the user query about an expired token.

Add another authentication-related document:

DOCUMENTS.append(
    {
        "id": "DEV-102",
        "title": "Fixing Invalid Authorization Headers",
        "category": "authentication",
        "text": (
            "Shows how to format authorization headers, include bearer tokens, "
            "and troubleshoot rejected API requests caused by malformed headers."
        ),
    }
)

This result may be related, but it is not exactly the same as an expired token.

Add an API key onboarding document:

DOCUMENTS.append(
    {
        "id": "DEV-103",
        "title": "Creating a New Developer API Key",
        "category": "onboarding",
        "text": (
            "Guides a new developer through creating an API key, copying the key value, "
            "and storing credentials securely."
        ),
    }
)

This gives the workflow a partial match that is still about API access.

Add a billing document:

DOCUMENTS.append(
    {
        "id": "DEV-104",
        "title": "Understanding Dashboard Billing Limits",
        "category": "billing",
        "text": (
            "Explains plan limits, monthly usage caps, billing warnings, "
            "and how to upgrade an account."
        ),
    }
)

This gives the workflow an unrelated document to rank lower for authentication queries.

Add a frontend performance document:

DOCUMENTS.append(
    {
        "id": "DEV-105",
        "title": "Troubleshooting Slow Dashboard Pages",
        "category": "frontend",
        "text": (
            "Covers browser caching, loading states, client-side rendering delays, "
            "and slow dashboard performance."
        ),
    }
)

This gives the workflow another unrelated option so you can check whether the ranking makes sense.

Your file should now include:

imports,
one model constant,
one TOP_K constant,
and five searchable documents with IDs, titles, categories, and source text.

Metadata makes retrieval easier to inspect and trust. If you return a result without source details, users and developers cannot easily verify where the information came from.

Step Hint

Short summaries are fine for this first workflow. Later, longer documents may need to be split into chunks before embedding.

Step Feedback

This step is strong when the documents are varied enough to test whether semantic retrieval separates related and unrelated content.

Step 3: Execute one embedding request

Next, you will create a helper function that sends one text input to the embedding model and returns one vector.

Action

Add a function named get_embedding() below the document records.

Breakdown

Start the helper function with a clear name, parameter, return type, and docstring:

def get_embedding(text: str) -> List[float]:
    """Return one embedding vector for one text input."""

Inside the function, send the text to the local Ollama embedding model:

    response = ollama.embed(model=MODEL, input=text)

This line tells Ollama which model to use and which text to convert into an embedding.

Return the first embedding vector from the response:

    return response["embeddings"][0]

The response stores embeddings in a list because the API can return embeddings for one input or multiple inputs. In this lesson, each call sends one text, so the vector you need is the first item.

Your completed helper function should look like this:

def get_embedding(text: str) -> List[float]:
    """Return one embedding vector for one text input."""
    response = ollama.embed(model=MODEL, input=text)
    return response["embeddings"][0]

You should now have a reusable function that accepts one text string and returns one list of numbers.

This helper function keeps embedding generation separate from the rest of the workflow. Instead of rewriting the Ollama call for every document and query, you can call get_embedding() whenever the backend needs a vector.

Step Hint

Make sure this function is not indented inside a document record. It should start at the left edge of the file.

Step Feedback

This step is strong when the function has one clear job: convert one text input into one embedding vector.

Step 4: Execute cosine similarity

Next, you will create a function that compares two embedding vectors and returns a similarity score.

Action

Add a function named cosine_similarity() below get_embedding().

Breakdown

Start the function with two vector parameters and a float return type:

def cosine_similarity(vector_a: List[float], vector_b: List[float]) -> float:
    """Compare two vectors by cosine similarity."""

Calculate the dot product:

    dot_product = sum(a * b for a, b in zip(vector_a, vector_b))

The dot product combines matching positions in both vectors. It is one part of measuring whether the vectors point in a similar direction.

Calculate the magnitude of the first vector:

    magnitude_a = sqrt(sum(a * a for a in vector_a))

Calculate the magnitude of the second vector:

    magnitude_b = sqrt(sum(b * b for b in vector_b))

Magnitude represents the vector length. Cosine similarity uses vector direction, so the score needs both magnitudes.

Add a guard for zero-length vectors:

    if magnitude_a == 0 or magnitude_b == 0:
        return 0.0

This prevents division by zero if either vector has no magnitude.

Return the cosine similarity score:

    return dot_product / (magnitude_a * magnitude_b)

Your completed function should look like this:

def cosine_similarity(vector_a: List[float], vector_b: List[float]) -> float:
    """Compare two vectors by cosine similarity."""
    dot_product = sum(a * b for a, b in zip(vector_a, vector_b))
    magnitude_a = sqrt(sum(a * a for a in vector_a))
    magnitude_b = sqrt(sum(b * b for b in vector_b))

    if magnitude_a == 0 or magnitude_b == 0:
        return 0.0

    return dot_product / (magnitude_a * magnitude_b)

You should now have a function that can compare a query embedding with a document embedding.

Cosine similarity gives the backend a ranking signal. Higher scores usually mean the texts are closer in meaning, but you still need to inspect the returned source text.

Step Hint

Do not compare embeddings from different models. Use the same model for the documents and the user query.

Step Feedback

This step is strong when the function returns one score and does not depend on a specific query or document.

Step 5: Execute document indexing

Next, you will embed each document and keep each embedding connected to its source metadata.

Action

Add a function named build_index() below cosine_similarity().

Breakdown

Start the function with one parameter for the document list:

def build_index(documents: List[Dict[str, str]]) -> List[Dict[str, Any]]:
    """Embed each document and keep the embedding attached to source metadata."""

Create an empty list to store indexed documents:

    index: List[Dict[str, Any]] = []

Loop through each document:

    for document in documents:

Inside the loop, combine the title and document text:

        searchable_text = f"{document['title']}. {document['text']}"

The title often contains useful meaning. Combining the title and text gives the embedding model a stronger description of the document.

Create an embedding for the searchable text:

        embedding = get_embedding(searchable_text)

Store the original metadata and the new embedding together:

        index.append({**document, "embedding": embedding})

This keeps the document ID, title, category, source text, and vector connected.

After the loop, return the completed index:

    return index

Your completed function should look like this:

def build_index(documents: List[Dict[str, str]]) -> List[Dict[str, Any]]:
    """Embed each document and keep the embedding attached to source metadata."""
    index: List[Dict[str, Any]] = []

    for document in documents:
        searchable_text = f"{document['title']}. {document['text']}"
        embedding = get_embedding(searchable_text)
        index.append({**document, "embedding": embedding})

    return index

You should now have a function that creates an in-memory index.

This step creates the searchable representation of your stored content. In a larger application, a vector database like Chroma could store these embeddings and metadata. Here, you keep them in memory so you can see the retrieval workflow clearly.

Step Hint

Keep the original metadata with the embedding. If the vector becomes separated from its document ID or title, the result will be hard to verify.

Step Feedback

This step is strong when every embedding remains traceable to its original document.

Step 6: Execute query search and top-k ranking

Next, you will embed the user query, compare it to every document embedding, sort the results by score, and return only the top matches.

Action

Add a function named search() below build_index().

Breakdown

Start the function with parameters for the query, index, and number of results:

def search(query: str, index: List[Dict[str, Any]], top_k: int = TOP_K) -> List[Dict[str, Any]]:
    """Embed a query, compare it to each document, and return top-ranked results."""

Create an embedding for the user query:

    query_embedding = get_embedding(query)

Create an empty list for scored results:

    scored_results: List[Dict[str, Any]] = []

Loop through each indexed document:

    for document in index:

Compare the query embedding to the document embedding:

        score = cosine_similarity(query_embedding, document["embedding"])

Store the returned fields for this result:

        scored_results.append(
            {
                "id": document["id"],
                "title": document["title"],
                "category": document["category"],
                "score": score,
                "text": document["text"],
            }
        )

The result includes the score and source metadata. It does not return the raw embedding because users do not need to read the vector.

Sort the results from highest score to lowest score:

    ranked_results = sorted(
        scored_results,
        key=lambda result: result["score"],
        reverse=True,
    )

Return only the top results:

    return ranked_results[:top_k]

Your completed function should look like this:

def search(query: str, index: List[Dict[str, Any]], top_k: int = TOP_K) -> List[Dict[str, Any]]:
    """Embed a query, compare it to each document, and return top-ranked results."""
    query_embedding = get_embedding(query)
    scored_results: List[Dict[str, Any]] = []

    for document in index:
        score = cosine_similarity(query_embedding, document["embedding"])
        scored_results.append(
            {
                "id": document["id"],
                "title": document["title"],
                "category": document["category"],
                "score": score,
                "text": document["text"],
            }
        )

    ranked_results = sorted(
        scored_results,
        key=lambda result: result["score"],
        reverse=True,
    )

    return ranked_results[:top_k]

You should now have the core retrieval workflow:

documents → document embeddings → user query → query embedding → similarity scores → ranked top-k results

This is the retrieval workflow in action. The backend is comparing the meaning of the user’s query to the meaning of stored documents, then ranking the closest matches.

Step Hint

Sort in descending order so the highest similarity score appears first.

Step Feedback

This step is strong when the returned results are ranked, limited to a useful top-k value, and traceable to source documents.

Step 7: Execute multiple query tests

Next, you will display the ranked results and test whether the workflow handles different user intents.

Action

Add a print helper, a main() function, and the script entry point. Then run the file.

Breakdown

Start with a helper function that prints one query and its ranked results:

def print_results(query: str, results: List[Dict[str, Any]]) -> None:
    """Display ranked results in a readable format."""

Print the query and a divider:

    print(f"\nQuery: {query}")
    print("-" * 72)

Loop through the ranked results with a rank number:

    for rank, result in enumerate(results, start=1):

Print the document ID, title, and score:

        print(f"{rank}. {result['id']} | {result['title']} | score={result['score']:.4f}")

Print the category and source text:

        print(f"   category: {result['category']}")
        print(f"   {result['text']}")

Your completed print helper should look like this:

def print_results(query: str, results: List[Dict[str, Any]]) -> None:
    """Display ranked results in a readable format."""
    print(f"\nQuery: {query}")
    print("-" * 72)

    for rank, result in enumerate(results, start=1):
        print(f"{rank}. {result['id']} | {result['title']} | score={result['score']:.4f}")
        print(f"   category: {result['category']}")
        print(f"   {result['text']}")

Create the main() function:

def main() -> None:

Build the in-memory index:

    index = build_index(DOCUMENTS)

Add three test queries:

    test_queries = [
        "Why does the mobile app say my token is expired?",
        "My API request fails even though I added a bearer token.",
        "How can I increase my monthly usage limit?",
    ]

The first two queries should usually rank authentication documents highly. The third query should usually rank the billing document highly.

Loop through the test queries:

    for query in test_queries:

Run the search for each query:

        results = search(query, index, top_k=TOP_K)

Print the ranked results:

        print_results(query, results)

Add the script entry point at the bottom of the file:

if __name__ == "__main__":
    main()

Your completed main() function and entry point should look like this:

def main() -> None:
    index = build_index(DOCUMENTS)

    test_queries = [
        "Why does the mobile app say my token is expired?",
        "My API request fails even though I added a bearer token.",
        "How can I increase my monthly usage limit?",
    ]

    for query in test_queries:
        results = search(query, index, top_k=TOP_K)
        print_results(query, results)


if __name__ == "__main__":
    main()

Run the file from inside your pipenv shell:

python semantic_retrieval_lesson.py

Your output should show ranked results for each query.

Example output pattern:

Query: Why does the mobile app say my token is expired?
------------------------------------------------------------------------
1. DEV-101 | Refreshing Expired API Access Tokens | score=0.5357
   category: authentication
   Explains how to refresh expired API access tokens, check token lifetime, and retry a request with a valid bearer token.
2. DEV-102 | Fixing Invalid Authorization Headers | score=0.3431
   category: authentication
   Shows how to format authorization headers, include bearer tokens, and troubleshoot rejected API requests caused by malformed headers.
3. DEV-105 | Troubleshooting Slow Dashboard Pages | score=0.2919
   category: frontend
   Covers browser caching, loading states, client-side rendering delays, and slow dashboard performance.

Exact rankings and scores may vary by model. Authentication-related queries should usually rank authentication documents above billing or frontend documents.

Testing multiple queries helps you verify consistency. A workflow that works for one query may still fail on another query.

Step Hint

Use test queries that represent different intents. Include at least one query that should match a non-authentication document.

Step Feedback

This step is strong when different queries produce different top results that match the user’s likely intent.

Step 8: Verify relevance, source traceability, and RAG readiness

Next, you will decide whether the retrieval output is useful enough to trust.

Action

Review the output and write a short verification note.

Breakdown

Use these checks:

Functional output:
Did the workflow return ranked results?

Vector consistency:
Were documents and queries embedded with the same model?

Relevance:
Does the top result answer the user’s actual need?

Intent alignment:
Does the result match the meaning of the query, not just a shared word?

Top-k quality:
Are the extra results useful or noisy?

Source traceability:
Can each result be traced back to a document ID and title?

RAG readiness:
Would the top result provide grounded context for an AI-generated response?

A strong verification note might look like this:

For the expired token query, DEV-101 ranked first. This makes sense because the result explains token lifetime and retrying requests with a valid bearer token.

The output includes ID, title, category, score, and source text, so the result is traceable. This result could support a future RAG answer, but I would still confirm that the article is current before using it as final context.

You should have a short verification note that connects the ranking back to the original user need.

Verification protects users from weak retrieval. A semantic retrieval workflow can run correctly and still return incomplete or misleading context. Before using retrieved content in RAG, always check whether the result is relevant and source-grounded.

Step Hint

A high score is not the same as certainty. Read the top result before deciding whether it is useful.

Step Feedback

This step is strong when your verification explains both what ranked first and why it is or is not useful.

Step 9: Reflect on how this workflow prepares for larger retrieval tools

Next, you will connect the in-memory workflow to vector databases, retrievers, APIs, and RAG.

Action

Write a short reflection that explains what this script proves and what would change in a larger system.

Breakdown

Answer these questions:

1. What content did the workflow embed?
2. What did the workflow compare?
3. How did the workflow rank results?
4. Why did metadata matter?
5. What would change if this moved into Chroma, LangChain, Flask, or RAG?

A completed reflection might look like this:

The script embedded developer documentation summaries and embedded each user query with the same model. It compared the query embedding with each document embedding using cosine similarity, then sorted the results by score and returned the top matches.

Metadata mattered because each result needed an ID, title, category, and source text so the output could be checked. In a larger system, Chroma could store embeddings and metadata, LangChain could wrap the search as a retriever, Flask could expose the workflow through an API route, and RAG could use the retrieved source text as context before generating an answer.

You should have a reflection that explains how the manual workflow prepares you for future tools.

This step helps you avoid treating Chroma, LangChain, or RAG as magic. Those tools still depend on the same retrieval sequence: prepare content, embed text, compare meaning, rank results, return source-grounded context, and verify quality.

Step Hint

Focus your reflection on the workflow, not only the tool names. The tools change, but the retrieval logic remains similar.

Step Feedback

This step is strong when it clearly separates “the code returned results” from “the retrieval output is useful and traceable.”

Complete Code Checkpoint

Use this completed file to check your work after you have built the script step by step.

from math import sqrt
from typing import Any, Dict, List

import ollama


MODEL = "embeddinggemma"
TOP_K = 3

DOCUMENTS: List[Dict[str, str]] = []

DOCUMENTS.append(
    {
        "id": "DEV-101",
        "title": "Refreshing Expired API Access Tokens",
        "category": "authentication",
        "text": (
            "Explains how to refresh expired API access tokens, check token lifetime, "
            "and retry a request with a valid bearer token."
        ),
    }
)

DOCUMENTS.append(
    {
        "id": "DEV-102",
        "title": "Fixing Invalid Authorization Headers",
        "category": "authentication",
        "text": (
            "Shows how to format authorization headers, include bearer tokens, "
            "and troubleshoot rejected API requests caused by malformed headers."
        ),
    }
)

DOCUMENTS.append(
    {
        "id": "DEV-103",
        "title": "Creating a New Developer API Key",
        "category": "onboarding",
        "text": (
            "Guides a new developer through creating an API key, copying the key value, "
            "and storing credentials securely."
        ),
    }
)

DOCUMENTS.append(
    {
        "id": "DEV-104",
        "title": "Understanding Dashboard Billing Limits",
        "category": "billing",
        "text": (
            "Explains plan limits, monthly usage caps, billing warnings, "
            "and how to upgrade an account."
        ),
    }
)

DOCUMENTS.append(
    {
        "id": "DEV-105",
        "title": "Troubleshooting Slow Dashboard Pages",
        "category": "frontend",
        "text": (
            "Covers browser caching, loading states, client-side rendering delays, "
            "and slow dashboard performance."
        ),
    }
)


def get_embedding(text: str) -> List[float]:
    """Return one embedding vector for one text input."""
    response = ollama.embed(model=MODEL, input=text)
    return response["embeddings"][0]


def cosine_similarity(vector_a: List[float], vector_b: List[float]) -> float:
    """Compare two vectors by cosine similarity."""
    dot_product = sum(a * b for a, b in zip(vector_a, vector_b))
    magnitude_a = sqrt(sum(a * a for a in vector_a))
    magnitude_b = sqrt(sum(b * b for b in vector_b))

    if magnitude_a == 0 or magnitude_b == 0:
        return 0.0

    return dot_product / (magnitude_a * magnitude_b)


def build_index(documents: List[Dict[str, str]]) -> List[Dict[str, Any]]:
    """Embed each document and keep the embedding attached to source metadata."""
    index: List[Dict[str, Any]] = []

    for document in documents:
        searchable_text = f"{document['title']}. {document['text']}"
        embedding = get_embedding(searchable_text)
        index.append({**document, "embedding": embedding})

    return index


def search(query: str, index: List[Dict[str, Any]], top_k: int = TOP_K) -> List[Dict[str, Any]]:
    """Embed a query, compare it to each document, and return top-ranked results."""
    query_embedding = get_embedding(query)
    scored_results: List[Dict[str, Any]] = []

    for document in index:
        score = cosine_similarity(query_embedding, document["embedding"])
        scored_results.append(
            {
                "id": document["id"],
                "title": document["title"],
                "category": document["category"],
                "score": score,
                "text": document["text"],
            }
        )

    ranked_results = sorted(
        scored_results,
        key=lambda result: result["score"],
        reverse=True,
    )

    return ranked_results[:top_k]


def print_results(query: str, results: List[Dict[str, Any]]) -> None:
    """Display ranked results in a readable format."""
    print(f"\nQuery: {query}")
    print("-" * 72)

    for rank, result in enumerate(results, start=1):
        print(f"{rank}. {result['id']} | {result['title']} | score={result['score']:.4f}")
        print(f"   category: {result['category']}")
        print(f"   {result['text']}")


def main() -> None:
    index = build_index(DOCUMENTS)

    test_queries = [
        "Why does the mobile app say my token is expired?",
        "My API request fails even though I added a bearer token.",
        "How can I increase my monthly usage limit?",
    ]

    for query in test_queries:
        results = search(query, index, top_k=TOP_K)
        print_results(query, results)


if __name__ == "__main__":
    main()

Considerations

Common issues

Issue	Why it matters	How to respond
Weak document summaries	The embedding may not represent enough meaning.	Add clearer source text or chunk longer documents.
Missing metadata	Users cannot trace where results came from.	Include ID, title, category, and source text.
Different embedding models	Query and document vectors may not compare reliably.	Use one approved model for all embeddings.
Top-k is too high	Too many results may add noise.	Start with top 3 and adjust based on task.
Top-k is too low	Useful context may be missed.	Test whether top 5 improves coverage.
Scores are trusted blindly	Similarity does not guarantee correctness.	Read the source text and compare it to user intent.
No RAG readiness check	Weak retrieval can lead to weak AI responses.	Confirm relevance and source grounding before generation.

Decision point: manual similarity vs Chroma

Option	Use when	Tradeoff
Manual in-memory similarity	You are learning or testing a small dataset.	Easy to inspect, but not scalable.
Chroma vector store	You need to store and search many embeddings.	More realistic, but adds tooling complexity.
LangChain retriever	You need reusable retrieval in a RAG pipeline.	Useful abstraction, but can hide some mechanics.

In this lesson, the in-memory workflow is intentional. It helps you see what happens before a vector database or retriever abstraction handles storage and search for you.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Pipfile		Pipfile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Technical Lesson — Building a Semantic Retrieval Workflow

Introduction

Scenario

Tools and Resources

Instructions

Step 1: Identify the retrieval goal and output contract

Action

Breakdown

Step Hint

Step Feedback

Step 2: Assemble the file, imports, model, and documents

Action

Breakdown

Step Hint

Step Feedback

Step 3: Execute one embedding request

Action

Breakdown

Step Hint

Step Feedback

Step 4: Execute cosine similarity

Action

Breakdown

Step Hint

Step Feedback

Step 5: Execute document indexing

Action

Breakdown

Step Hint

Step Feedback

Step 6: Execute query search and top-k ranking

Action

Breakdown

Step Hint

Step Feedback

Step 7: Execute multiple query tests

Action

Breakdown

Step Hint

Step Feedback

Step 8: Verify relevance, source traceability, and RAG readiness

Action

Breakdown

Step Hint

Step Feedback

Step 9: Reflect on how this workflow prepares for larger retrieval tools

Action

Breakdown

Step Hint

Step Feedback

Complete Code Checkpoint

Considerations

Common issues

Decision point: manual similarity vs Chroma

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages