Skip to content

Remove semantic search + embeddings#500

Closed
inFocus7 wants to merge 7 commits into
agentregistry-dev:cleanup/remove-import-enrichmentfrom
inFocus7:cleanup/remove-semantic-search-feat
Closed

Remove semantic search + embeddings#500
inFocus7 wants to merge 7 commits into
agentregistry-dev:cleanup/remove-import-enrichmentfrom
inFocus7:cleanup/remove-semantic-search-feat

Conversation

@inFocus7
Copy link
Copy Markdown
Collaborator

@inFocus7 inFocus7 commented May 15, 2026

Built on top of other cleanup work #491

note: this got autoclosed since the base branch merged, moved changes to #505

Description

Removes the unwired semantic-search / embeddings feature. The HTTP handlers, the indexer, and the public Store surface are deleted; the semantic_embedding* columns and HNSW indexes are dropped by a new v1alpha1 migration on upgrade.

What's removed:

  • internal/registry/api/handlers/v0/embeddings/: the indexer kickoff and job-status endpoints. Never wired into the running router.
  • internal/registry/embeddings/: the indexer, helpers, and Provider interface. Never constructed at boot.
  • ⚠️ pkg/semantic/ and the v1alpha1store.Store semantic methods (SetEmbedding, GetEmbeddingMetadata, SemanticList, VectorLiteral). Public-API breaking: downstream consumers importing these symbols will get compile errors.
  • EmbeddingsConfig and the AGENT_REGISTRY_EMBEDDINGS_ENABLED env var.
  • The migrator's generic MigratorConfig.Skip predicate, which only ever gated 003.

Migrations:

  • 003_embeddings.sql is retained as a no-op (comment-only) for historical consistency with installs that have version 203 recorded in schema_migrations. Plain postgres images can apply it.
  • 008_drop_semantic_embeddings.sql drops the columns and HNSW indexes with IF EXISTS, safe on installs that never ran 003 originally.

Change Type

/kind breaking_change
/kind cleanup

Changelog

Remove embeddings and semantic search.
⚠️: If you enabled embeddings and use the pgvector postgres image, keep the image during the upgrade. You can switch to a standard postgres image in a follow-up release.

Additional Notes

Upgrade notes

Operators currently running with AGENT_REGISTRY_EMBEDDINGS_ENABLED=true on a postgres image that includes pgvector (e.g. pgvector/pgvector:pg16) must keep the pgvector image during this upgrade.

Migration 208 drops the semantic_embedding* columns and HNSW indexes, which are typed against pgvector. PostgreSQL requires the extension's shared library to be loadable on disk to execute the drops. Attempting to upgrade the application and switch to a plain postgres image in the same Helm release fails with:

ERROR: could not access file "$libdir/vector": No such file or directory

To remove the pgvector dependency:

  1. Upgrade the application first, keeping the pgvector image.
  2. Verify migration 208 applied: SELECT version FROM schema_migrations WHERE version=208;
  3. In a follow-up Helm release, swap the bundled postgres image to plain postgres of the same major version.

Operators who ran with EMBEDDINGS_ENABLED=false (the default) have nothing to drop and can upgrade freely on plain postgres.

inFocus7 and others added 6 commits May 15, 2026 13:41
Removes internal/registry/api/handlers/v0/embeddings and
internal/registry/embeddings. Neither was wired into the running app
(registry_app.go never constructed the Indexer or jobs.Manager, and the
router never registered the handler); the existing config comment
already announced that the public surface had been removed pending a
rebuild. These packages have no other in-tree callers, so deletion is
safe and the build is unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…emantic

Removes the public Store surface for the semantic-search feature
(SetEmbedding, GetEmbeddingMetadata, SemanticList, VectorLiteral) along
with the pkg/semantic types they used. This is a public-API breaking
change for any downstream consumer that imported these symbols; the
parent commit removed the in-tree caller (the internal indexer) and
nothing else in this repo references them.

Inlines decodeRow back into scanRow in helpers.go now that SemanticList
was its only other caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmbeddingsConfig only gated whether the pgvector migration
(003_embeddings.sql) applied. With the semantic-search feature gone
the flag has no remaining effect, so it's dropped along with the
MigratorConfig.Skip predicate that was its sole consumer.

AGENT_REGISTRY_EMBEDDINGS_ENABLED is silently ignored by the env
parser (caarlos0/env tolerates unknown vars), so a deployment that
still sets it boots cleanly.

NewPostgreSQL and v1alpha1store.MigratorConfig both lose their
embeddings parameter and become single- / no-argument.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the HNSW indexes and semantic_embedding* columns added by the
original 003_embeddings.sql on the four affected tables (agents,
mcp_servers, skills, prompts). The pgvector extension is intentionally
left installed — extensions are database-global and may be in use by
downstream schemas.

Every statement in 008 is IF EXISTS so the migration is safe on
installs that had embeddings enabled historically (003 created the
columns, 008 removes them) and on installs that did not (the columns
never existed and the drops are no-ops).

Also replaces the contents of 003_embeddings.sql with comments only.
Previously the file was gated by the embeddings runtime flag's Skip
predicate; with that gate removed in the prior commit, 003 would now
run on every install and its `CREATE EXTENSION vector` statement would
fail on plain postgres images (e.g. the Helm chart's default
postgres:18). Replacing the body with comments keeps the file in the
migration sequence so the schema_migrations row for version 203 still
records cleanly, while no longer requiring pgvector binaries to be
available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AGENTS.md no longer claims pgvector is part of the stack and drops
the `/v0/embeddings/index` example from the authz section (that
endpoint was removed earlier in this PR).

scripts/kind/README.md drops the pgvector setup section — the kind
environment uses the same bundled postgres image as Helm and no
longer needs the extension.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bundled dev postgres no longer needs the vector extension —
003_embeddings.sql is a comment-only no-op and 008's drops are gated
on IF EXISTS. Matches the Helm chart's plain-postgres default and
removes the only remaining pgvector reference in the repo.

Keeping the major version at 16 (same as pgvector/pgvector:pg16) so
existing dev `postgres_data` volumes continue to work without a
manual wipe; the vector extension stays cataloged with a dangling
binary, which is harmless at boot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we either need to delete this file or no-op it, since we no longer implement embeddings or its config (e.g. embeddings enablement), meaning we won't do in-code Skip. because of that, without a skip we would be enabling the extension, which doesn't exist in base postgres.

on fresh installs this migration will apply, but not do anything, keeping a consistent 00N list of migrations. On upgrades, this file isn't re-applied, so no issues there.

Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
@inFocus7 inFocus7 changed the title Cleanup/remove semantic search Remove semantic search + embeddings May 15, 2026
@inFocus7 inFocus7 deleted the branch agentregistry-dev:cleanup/remove-import-enrichment May 19, 2026 20:02
@inFocus7 inFocus7 closed this May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant