Technical Lesson — Generating and Comparing Embeddings

Introduction

Embeddings help backend applications compare meaning instead of only matching exact words. In this lesson, you will generate embeddings for short text inputs, inspect the vector outputs, and compare related and unrelated texts using cosine similarity.

You will produce a Python script within a backend search scenario using Identify → Assemble → Execute → Verify with embedding generation, source labels, similarity scoring, ranked output, and relevance checks to confirm that related texts appear closer than unrelated texts.

Scenario

You are a junior backend developer helping a developer support team improve documentation search. Users keep asking questions like:

“My API request keeps getting rejected even though I send a token.”

The best documentation article may use more formal wording, such as:

“Fix invalid API authentication tokens.”

A keyword search may not connect those ideas reliably. Your task is to generate embeddings for the user query and a few documentation summaries, compare the vectors, and verify whether the strongest match makes sense.

Tools and Resources

Python 3.10 or newer
Visual Studio Code or another code editor
pipenv
Terminal or integrated terminal
Ollama installed and running locally
An embedding model, such as embeddinggemma
Python package: ollama

Setup commands:

ollama pull embeddinggemma
pipenv install ollama
pipenv shell

To run the script later, use:

python embedding_lesson.py

Instructions

Follow the technical process: Identify → Assemble → Execute → Verify.

Step 1: Identify the comparison goal

To start, we'll define what the embeddings need to compare before you write code.

Action

Create a short planning note that identifies the user query, the candidate documents, and the result you expect to be most related.

Breakdown

Start by identifying the user problem:

User problem:
Users search developer documentation using natural language.

Add the user query that the backend will compare:

Query:
"My API request keeps getting rejected even though I send a token."

Add the document you expect to be the strongest match:

Expected strongest match:
"Fix invalid API authentication tokens."

Add one document you expect to be weaker or unrelated:

Expected weaker or unrelated match:
"Understanding monthly billing limits."

Finally, write why you expect that ranking:

Why:
The query is about API authentication, not billing.

Your completed planning note should look like this:

User problem:
Users search developer documentation using natural language.

Query:
"My API request keeps getting rejected even though I send a token."

Expected strongest match:
"Fix invalid API authentication tokens."

Expected weaker or unrelated match:
"Understanding monthly billing limits."

Why:
The query is about API authentication, not billing.

You should have a short written plan that identifies:

the search problem,
the query,
the candidate texts,
and which result you expect to be most related.

This step keeps you from treating embeddings as random numbers. You are defining what the numbers need to help the backend compare. Later, you will use this plan to decide whether the similarity scores behave in a useful way.

Step Hint

Do not start with the model. Start with the comparison problem. A clear comparison goal makes verification easier later.

Step Feedback

This step is strong when your expected match is based on user intent, not only shared words.

Step 2: Assemble the text inputs and model choice

Next, you will create the Python file, import the tools you need, choose one embedding model, and store each text input with a traceable label.

Action

Create a Python file named embedding_lesson.py and add the model name and text inputs.

Breakdown

Create the file:

touch embedding_lesson.py

Open embedding_lesson.py.

At the top of the file, import the tools the script will use:

from math import sqrt
from typing import Any, Dict, List

import ollama

These imports are used for specific parts of the workflow:

sqrt calculates vector magnitudes for cosine similarity.
Any, Dict, and List make the data structure easier to read and trace as you store strings, scores, and vectors together.
ollama sends text to the local embedding model and receives vector outputs.

Create a reusable constant for the embedding model:

MODEL = "embeddinggemma"

This keeps the model choice in one place. If you later switch to a different embedding model, you can update this one value instead of searching through the full script.

Create a list of texts that you will embed:

TEXTS: List[Dict[str, str]] = [
    {
        "id": "Q1",
        "label": "user query",
        "text": "My API request keeps getting rejected even though I send a token.",
    },
    {
        "id": "D101",
        "label": "strong match",
        "text": "Fix invalid API authentication tokens by checking expiration, signature, and authorization headers.",
    },
    {
        "id": "D102",
        "label": "partial match",
        "text": "Set up API keys for a new developer account and store credentials securely.",
    },
    {
        "id": "D103",
        "label": "unrelated",
        "text": "Understand monthly billing limits and update the payment method for an account.",
    },
]

Your file should now include:

imports,
one model constant,
and a labeled list of text inputs.

Each item should include:

an ID,
a label,
and the text that will be embedded.

Labels help you trace each embedding back to its original text. Vectors are not readable summaries. You need the original source text available for display, relevance checking, and future RAG context.

Step Hint

Use the same embedding model for every text you plan to compare. Embeddings from different models may not be comparable.

Step Feedback

This step is strong when each text is meaningful enough to represent a real search idea and every text has a source label.

Step 3: Execute one embedding request

The next step is to create a helper function that sends one text input to the embedding model and returns one vector.

Action

Add a function named get_embedding() below the TEXTS list.

Breakdown

Start the helper function with a clear name, parameter, return type, and docstring:

def get_embedding(text: str) -> List[float]:
    """Return one embedding vector for one text input."""

Inside the function, send the text to the local Ollama embedding model:

    response = ollama.embed(model=MODEL, input=text)

This line tells Ollama which model to use and which text to convert into an embedding.

Return the first embedding vector from the response:

    return response["embeddings"][0]

The response stores embeddings in a list because the API can return embeddings for one input or multiple inputs. In this lesson, each call sends one text, so the vector you need is the first item.

Your completed helper function should look like this:

def get_embedding(text: str) -> List[float]:
    """Return one embedding vector for one text input."""
    response = ollama.embed(model=MODEL, input=text)
    return response["embeddings"][0]

You should now have a reusable function that accepts one text string and returns one list of numbers.

This helper function keeps embedding generation separate from the rest of the script. Instead of rewriting the Ollama call for every document, you can call get_embedding() for each text input.

Step Hint

Make sure this function is not indented inside the TEXTS list. It should start at the left edge of the file.

Step Feedback

This step is strong when the function has one clear job: convert one text input into one embedding vector.

Step 4: Execute embedding generation for every text

Now, you will loop through the text inputs, generate an embedding for each one, store the embedding with the original metadata, and print a confirmation message.

Action

Add a main() function that embeds every item in TEXTS.

Breakdown

Start the main() function below get_embedding():

def main() -> None:

Create an empty list to store the embedded versions of the texts:

    embedded_texts: List[Dict[str, Any]] = []

This list will hold each original item plus its new embedding vector.

Loop through the original text inputs:

    for item in TEXTS:

Inside the loop, generate an embedding from the item’s text:

        embedding = get_embedding(item["text"])

Store the original fields and the new embedding together:

        embedded_texts.append({**item, "embedding": embedding})

This keeps the ID, label, original text, and embedding connected.

Print the vector length for each item:

        print(f"{item['id']} created vector with {len(embedding)} dimensions")

This gives you quick evidence that each input produced a vector.

Add the script entry point at the bottom of the file:

if __name__ == "__main__":
    main()

Your completed main() function and script entry point should look like this:

def main() -> None:
    embedded_texts: List[Dict[str, Any]] = []

    for item in TEXTS:
        embedding = get_embedding(item["text"])
        embedded_texts.append({**item, "embedding": embedding})
        print(f"{item['id']} created vector with {len(embedding)} dimensions")


if __name__ == "__main__":
    main()

Run the file:

python embedding_lesson.py

Your terminal should print a vector length for each text.

Example output pattern:

Q1 created vector with 768 dimensions
D101 created vector with 768 dimensions
D102 created vector with 768 dimensions
D103 created vector with 768 dimensions

Your exact vector length may differ depending on the model.

At this point, each text has been transformed into a numerical representation. You do not need to read every number. You need to confirm that the backend can now compare the texts in the same vector space.

Step Hint

If you get an error, check that Ollama is running, the model name is correct, and the Python package is installed.

Step Feedback

This step is strong when every input produces an embedding and all embeddings have the same number of dimensions.

Step 5: Execute cosine similarity

Next, you will create a function that compares two embedding vectors and returns a similarity score.

Action

Add a cosine_similarity() function below get_embedding() and above main().

Breakdown

Start the function with two vector parameters:

def cosine_similarity(vector_a: List[float], vector_b: List[float]) -> float:
    """Compare two vectors by cosine similarity."""

Calculate the dot product:

    dot_product = sum(a * b for a, b in zip(vector_a, vector_b))

The dot product multiplies values in the same positions and adds the results together.

Calculate the magnitude of the first vector:

    magnitude_a = sqrt(sum(a * a for a in vector_a))

Calculate the magnitude of the second vector:

    magnitude_b = sqrt(sum(b * b for b in vector_b))

Add a guard for empty or zero-length vectors:

    if magnitude_a == 0 or magnitude_b == 0:
        return 0.0

Return the cosine similarity score:

    return dot_product / (magnitude_a * magnitude_b)

Your completed function should look like this:

def cosine_similarity(vector_a: List[float], vector_b: List[float]) -> float:
    """Compare two vectors by cosine similarity."""
    dot_product = sum(a * b for a, b in zip(vector_a, vector_b))
    magnitude_a = sqrt(sum(a * a for a in vector_a))
    magnitude_b = sqrt(sum(b * b for b in vector_b))

    if magnitude_a == 0 or magnitude_b == 0:
        return 0.0

    return dot_product / (magnitude_a * magnitude_b)

You should now have a reusable function that compares two embedding vectors and returns one number.

Cosine similarity compares the direction of two vectors. In semantic search, higher similarity usually means the texts are closer in meaning. You still need to check whether the result actually answers the user’s need.

Step Hint

Do not compare a query embedding from one model with document embeddings from another model.

Step Feedback

This step is strong when the score output can help you explain why one document is more likely to match the query than another.

Step 6: Compare the query to the documents

Next, we'll compare the user query embedding with each document embedding, store the scores, sort the results, and print a ranked list.

Action

Update main() so it compares the query to each document after all embeddings are created.

Breakdown

Inside main(), after the embedding generation loop, identify the query item:

    query_item = embedded_texts[0]

Store the query embedding in its own variable:

    query_embedding = query_item["embedding"]

Create an empty list for scored results:

    results: List[Dict[str, Any]] = []

Loop through the document items. Use embedded_texts[1:] so you skip the query and compare only the documents:

    for item in embedded_texts[1:]:

Inside the loop, compare the query embedding with the current document embedding:

        score = cosine_similarity(query_embedding, item["embedding"])

Store the original document fields, the embedding, and the similarity score together:

        results.append({**item, "score": score})

Sort the results from highest similarity to lowest similarity:

    results.sort(key=lambda item: item["score"], reverse=True)

Print a readable ranked list:

    print("\nRanked similarity results:")
    for rank, item in enumerate(results, start=1):
        print(f"{rank}. {item['id']} | {item['label']} | score={item['score']:.4f}")
        print(f"   {item['text']}")

Your updated main() function should look like this:

def main() -> None:
    embedded_texts: List[Dict[str, Any]] = []

    for item in TEXTS:
        embedding = get_embedding(item["text"])
        embedded_texts.append({**item, "embedding": embedding})
        print(f"{item['id']} created vector with {len(embedding)} dimensions")

    query_item = embedded_texts[0]
    query_embedding = query_item["embedding"]

    results: List[Dict[str, Any]] = []

    for item in embedded_texts[1:]:
        score = cosine_similarity(query_embedding, item["embedding"])
        results.append({**item, "score": score})

    results.sort(key=lambda item: item["score"], reverse=True)

    print("\nRanked similarity results:")
    for rank, item in enumerate(results, start=1):
        print(f"{rank}. {item['id']} | {item['label']} | score={item['score']:.4f}")
        print(f"   {item['text']}")

Run the file again:

python embedding_lesson.py

Your output should show a ranked list of documents compared to the query.

Example output pattern:

Q1 created vector with 768 dimensions
D101 created vector with 768 dimensions
D102 created vector with 768 dimensions
D103 created vector with 768 dimensions

Ranked similarity results:
1. D101 | strong match | score=0.8123
   Fix invalid API authentication tokens by checking expiration, signature, and authorization headers.
2. D102 | partial match | score=0.6542
   Set up API keys for a new developer account and store credentials securely.
3. D103 | unrelated | score=0.3127
   Understand monthly billing limits and update the payment method for an account.

Exact scores will vary by model.

The backend now has the core behavior of semantic comparison: embed the query, compare it to embedded documents, rank by similarity, and show the original source text. This is a small version of the retrieval behavior used later in vector search and RAG workflows.

Step Hint

Ranking is more useful than printing unsorted scores. Users usually need the best match near the top.

Step Feedback

This step is strong when the ranked output shows the expected strong match above the unrelated result.

Step 7: Verify the output and similarity behavior

Next, we will check whether the embedding workflow produced consistent vectors and whether the top result matches the original search intent.

Action

Add a verification checkpoint at the end of main().

Breakdown

Check whether all embeddings have the same number of dimensions:

    dimension_lengths = {len(item["embedding"]) for item in embedded_texts}

Print a pass message if the vector dimensions are consistent:

    print("\nVerification:")

    if len(dimension_lengths) == 1:
        print(f"PASS: All embeddings have {next(iter(dimension_lengths))} dimensions.")
    else:
        print(f"REVIEW: Embeddings have inconsistent dimensions: {dimension_lengths}")

Identify the top-ranked result:

    top_result = results[0]

Print the top result:

    print(f"Top result: {top_result['id']} ({top_result['label']})")

Compare the top result with your expected strongest match:

    if top_result["id"] == "D101":
        print("PASS: The API authentication article ranked highest.")
    else:
        print("REVIEW: The expected article did not rank highest. Review the inputs, model choice, and scores.")

Your verification code should look like this:

    dimension_lengths = {len(item["embedding"]) for item in embedded_texts}

    print("\nVerification:")

    if len(dimension_lengths) == 1:
        print(f"PASS: All embeddings have {next(iter(dimension_lengths))} dimensions.")
    else:
        print(f"REVIEW: Embeddings have inconsistent dimensions: {dimension_lengths}")

    top_result = results[0]
    print(f"Top result: {top_result['id']} ({top_result['label']})")

    if top_result["id"] == "D101":
        print("PASS: The API authentication article ranked highest.")
    else:
        print("REVIEW: The expected article did not rank highest. Review the inputs, model choice, and scores.")

Run the file again:

python embedding_lesson.py

Your terminal should now show vector creation, ranked similarity results, and verification messages.

Example output pattern:

Q1 created vector with 768 dimensions
D101 created vector with 768 dimensions
D102 created vector with 768 dimensions
D103 created vector with 768 dimensions

Ranked similarity results:
1. D101 | strong match | score=0.5784
   Fix invalid API authentication tokens by checking expiration, signature, and authorization headers.
2. D102 | partial match | score=0.4006
   Set up API keys for a new developer account and store credentials securely.
3. D103 | unrelated | score=0.2588
   Understand monthly billing limits and update the payment method for an account.
   
Verification:
PASS: All embeddings have 768 dimensions.
Top result: D101 (strong match)
PASS: The API authentication article ranked highest.

Verification is not just checking that the code ran. You are checking that the output supports the original search goal. If the unrelated billing article ranks highest, the backend workflow is not behaving in a useful way for this scenario.

Step Hint

A high similarity score does not automatically mean the result is correct. Always compare the top result back to the user’s intent.

Step Feedback

This step is strong when you can explain whether the scores support the expected search behavior.

Step 8: Reflect on how this supports semantic search

Finally, you will connect the completed script to the larger backend retrieval workflow.

Action

Write a short reflection that explains what your script can now do and what it is not doing yet.

Breakdown

Answer these questions:

1. What text did the script embed?
2. Which model created the embeddings?
3. Which document ranked highest?
4. Why does the top result match or not match the user’s intent?
5. What would need to change before this became a larger retrieval system?

A completed reflection might look like this:

The script embedded one user query and three documentation summaries using the same embedding model. The API authentication article ranked highest, which matches the user’s intent because the query is about a rejected API request and token. The billing article ranked lower, which makes sense because it does not address authentication.

This is not a full retrieval system yet because the documents are stored in a small in-memory list. In a larger backend application, the documents and embeddings would need to be stored with source metadata, and the search workflow would need to return ranked results for many possible documents.

You should have a short reflection that connects the code output back to semantic search and future RAG readiness.

This step helps you avoid seeing embeddings as a standalone tool. In backend AI applications, embeddings are useful because they support retrieval: finding the most relevant source content before returning results or generating an answer.

Step Hint

Focus your reflection on what the script proves. It proves that related texts can rank closer than unrelated texts. It does not prove that every future search result will be correct.

Step Feedback

This step is strong when it clearly separates “the code ran” from “the result is useful for the user’s search need.”

Complete Code Checkpoint

Use this completed file to check your work after you have built the script step by step.

from math import sqrt
from typing import Any, Dict, List

import ollama


MODEL = "embeddinggemma"

TEXTS: List[Dict[str, str]] = [
    {
        "id": "Q1",
        "label": "user query",
        "text": "My API request keeps getting rejected even though I send a token.",
    },
    {
        "id": "D101",
        "label": "strong match",
        "text": "Fix invalid API authentication tokens by checking expiration, signature, and authorization headers.",
    },
    {
        "id": "D102",
        "label": "partial match",
        "text": "Set up API keys for a new developer account and store credentials securely.",
    },
    {
        "id": "D103",
        "label": "unrelated",
        "text": "Understand monthly billing limits and update the payment method for an account.",
    },
]


def get_embedding(text: str) -> List[float]:
    """Return one embedding vector for one text input."""
    response = ollama.embed(model=MODEL, input=text)
    return response["embeddings"][0]


def cosine_similarity(vector_a: List[float], vector_b: List[float]) -> float:
    """Compare two vectors by cosine similarity."""
    dot_product = sum(a * b for a, b in zip(vector_a, vector_b))
    magnitude_a = sqrt(sum(a * a for a in vector_a))
    magnitude_b = sqrt(sum(b * b for b in vector_b))

    if magnitude_a == 0 or magnitude_b == 0:
        return 0.0

    return dot_product / (magnitude_a * magnitude_b)


def main() -> None:
    embedded_texts: List[Dict[str, Any]] = []

    for item in TEXTS:
        embedding = get_embedding(item["text"])
        embedded_texts.append({**item, "embedding": embedding})
        print(f"{item['id']} created vector with {len(embedding)} dimensions")

    query_item = embedded_texts[0]
    query_embedding = query_item["embedding"]

    results: List[Dict[str, Any]] = []

    for item in embedded_texts[1:]:
        score = cosine_similarity(query_embedding, item["embedding"])
        results.append({**item, "score": score})

    results.sort(key=lambda item: item["score"], reverse=True)

    print("\nRanked similarity results:")
    for rank, item in enumerate(results, start=1):
        print(f"{rank}. {item['id']} | {item['label']} | score={item['score']:.4f}")
        print(f"   {item['text']}")

    dimension_lengths = {len(item["embedding"]) for item in embedded_texts}

    print("\nVerification:")

    if len(dimension_lengths) == 1:
        print(f"PASS: All embeddings have {next(iter(dimension_lengths))} dimensions.")
    else:
        print(f"REVIEW: Embeddings have inconsistent dimensions: {dimension_lengths}")

    top_result = results[0]
    print(f"Top result: {top_result['id']} ({top_result['label']})")

    if top_result["id"] == "D101":
        print("PASS: The API authentication article ranked highest.")
    else:
        print("REVIEW: The expected article did not rank highest. Review the inputs, model choice, and scores.")


if __name__ == "__main__":
    main()

Considerations

Use the same embedding model for the query and every document.
Keep IDs, labels, and original text with each embedding.
Do not treat embeddings as readable summaries.
Do not trust similarity scores without a relevance check.
Include at least one unrelated document so you can tell whether the model separates meanings in a useful way.
Expect exact vector dimensions and similarity scores to vary by model.

Common Issues

Issue	Why it matters	How to respond
Ollama is not running	The model call cannot complete	Start Ollama before running the script
Model name is misspelled	The embedding request may fail	Use the approved model name consistently
Text inputs are too vague	Vague text may produce weak comparisons	Add enough context for the model to represent meaning
Source labels are missing	You cannot trace vectors back to text	Store ID, label, title, or source with each embedding
Similarity is trusted without review	A related result may still be incomplete	Read the original text and compare it to user intent

Decision Point: Why not use Chroma yet?

In this lesson, you are comparing a few embeddings directly so you can see the mechanics. In a larger application, a vector database like Chroma can store embeddings and metadata for more efficient retrieval. The core idea remains the same: embed text, compare meaning, rank results, and verify relevance.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Technical Lesson — Generating and Comparing Embeddings

Introduction

Scenario

Tools and Resources

Instructions

Step 1: Identify the comparison goal

Action

Breakdown

Step Hint

Step Feedback

Step 2: Assemble the text inputs and model choice

Action

Breakdown

Step Hint

Step Feedback

Step 3: Execute one embedding request

Action

Breakdown

Step Hint

Step Feedback

Step 4: Execute embedding generation for every text

Action

Breakdown

Step Hint

Step Feedback

Step 5: Execute cosine similarity

Action

Breakdown

Step Hint

Step Feedback

Step 6: Compare the query to the documents

Action

Breakdown

Step Hint

Step Feedback

Step 7: Verify the output and similarity behavior

Action

Breakdown

Step Hint

Step Feedback

Step 8: Reflect on how this supports semantic search

Action

Breakdown

Step Hint

Step Feedback

Complete Code Checkpoint

Considerations

Common Issues

Decision Point: Why not use Chroma yet?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages