Skip to content

Remove semantic search + embeddings#505

Merged
inFocus7 merged 10 commits into
agentregistry-dev:mainfrom
inFocus7:cleanup/remove-semantic-search-feat
May 20, 2026
Merged

Remove semantic search + embeddings#505
inFocus7 merged 10 commits into
agentregistry-dev:mainfrom
inFocus7:cleanup/remove-semantic-search-feat

Conversation

@inFocus7
Copy link
Copy Markdown
Collaborator

@inFocus7 inFocus7 commented May 19, 2026

note: original pr auto-closed since the underlying base pr merged

Description

Removes the unwired semantic-search / embeddings feature. The HTTP handlers, the indexer, and the public Store surface are deleted; the semantic_embedding* columns and HNSW indexes are dropped by a new v1alpha1 migration on upgrade.

What's removed:

  • internal/registry/api/handlers/v0/embeddings/: the indexer kickoff and job-status endpoints. Never wired into the running router.
  • internal/registry/embeddings/: the indexer, helpers, and Provider interface. Never constructed at boot.
  • ⚠️ pkg/semantic/ and the v1alpha1store.Store semantic methods (SetEmbedding, GetEmbeddingMetadata, SemanticList, VectorLiteral). Public-API breaking: downstream consumers importing these symbols will get compile errors.
  • EmbeddingsConfig and the AGENT_REGISTRY_EMBEDDINGS_ENABLED env var.
  • The migrator's generic MigratorConfig.Skip predicate, which only ever gated 003.

Migrations:

  • 003_embeddings.sql is retained as a no-op (comment-only) for historical consistency with installs that have version 203 recorded in schema_migrations. Plain postgres images can apply it.
  • 008_drop_semantic_embeddings.sql drops the columns and HNSW indexes with IF EXISTS, safe on installs that never ran 003 originally.

Change Type

/kind breaking_change
/kind cleanup

Changelog

Remove embeddings and semantic search.
⚠️ If you had previously enabled embeddings and use the pgvector postgres image, keep that image in place during the upgrade migration. Once the upgrade completes, you can `DROP EXTENSION vector;` and switch to a standard Postgres image post-release.

Additional Notes

Upgrade notes

Operators currently running with AGENT_REGISTRY_EMBEDDINGS_ENABLED=true on a postgres image that includes pgvector (e.g. pgvector/pgvector:pg16) must keep the pgvector image during this upgrade.

Migration 208 drops the semantic_embedding* columns and HNSW indexes, which are typed against pgvector. PostgreSQL requires the extension's shared library to be loadable on disk to execute the drops. Attempting to upgrade the application and switch to a plain postgres image in the same Helm release fails with:

ERROR: could not access file "$libdir/vector": No such file or directory

To remove the pgvector dependency:

  1. Upgrade the application first, keeping the pgvector image.
  2. Verify migration 208 applied: SELECT version FROM schema_migrations WHERE version=208;
  3. In a follow-up Helm release, swap the bundled postgres image to plain postgres of the same major version.

Operators who ran with EMBEDDINGS_ENABLED=false (the default) have nothing to drop and can upgrade freely on plain postgres.

inFocus7 and others added 7 commits May 19, 2026 16:13
Removes internal/registry/api/handlers/v0/embeddings and
internal/registry/embeddings. Neither was wired into the running app
(registry_app.go never constructed the Indexer or jobs.Manager, and the
router never registered the handler); the existing config comment
already announced that the public surface had been removed pending a
rebuild. These packages have no other in-tree callers, so deletion is
safe and the build is unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…emantic

Removes the public Store surface for the semantic-search feature
(SetEmbedding, GetEmbeddingMetadata, SemanticList, VectorLiteral) along
with the pkg/semantic types they used. This is a public-API breaking
change for any downstream consumer that imported these symbols; the
parent commit removed the in-tree caller (the internal indexer) and
nothing else in this repo references them.

Inlines decodeRow back into scanRow in helpers.go now that SemanticList
was its only other caller.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmbeddingsConfig only gated whether the pgvector migration
(003_embeddings.sql) applied. With the semantic-search feature gone
the flag has no remaining effect, so it's dropped along with the
MigratorConfig.Skip predicate that was its sole consumer.

AGENT_REGISTRY_EMBEDDINGS_ENABLED is silently ignored by the env
parser (caarlos0/env tolerates unknown vars), so a deployment that
still sets it boots cleanly.

NewPostgreSQL and v1alpha1store.MigratorConfig both lose their
embeddings parameter and become single- / no-argument.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the HNSW indexes and semantic_embedding* columns added by the
original 003_embeddings.sql on the four affected tables (agents,
mcp_servers, skills, prompts). The pgvector extension is intentionally
left installed — extensions are database-global and may be in use by
downstream schemas.

Every statement in 008 is IF EXISTS so the migration is safe on
installs that had embeddings enabled historically (003 created the
columns, 008 removes them) and on installs that did not (the columns
never existed and the drops are no-ops).

Also replaces the contents of 003_embeddings.sql with comments only.
Previously the file was gated by the embeddings runtime flag's Skip
predicate; with that gate removed in the prior commit, 003 would now
run on every install and its `CREATE EXTENSION vector` statement would
fail on plain postgres images (e.g. the Helm chart's default
postgres:18). Replacing the body with comments keeps the file in the
migration sequence so the schema_migrations row for version 203 still
records cleanly, while no longer requiring pgvector binaries to be
available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AGENTS.md no longer claims pgvector is part of the stack and drops
the `/v0/embeddings/index` example from the authz section (that
endpoint was removed earlier in this PR).

scripts/kind/README.md drops the pgvector setup section — the kind
environment uses the same bundled postgres image as Helm and no
longer needs the extension.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bundled dev postgres no longer needs the vector extension —
003_embeddings.sql is a comment-only no-op and 008's drops are gated
on IF EXISTS. Matches the Helm chart's plain-postgres default and
removes the only remaining pgvector reference in the repo.

Keeping the major version at 16 (same as pgvector/pgvector:pg16) so
existing dev `postgres_data` volumes continue to work without a
manual wipe; the vector extension stays cataloged with a dangling
binary, which is harmless at boot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
@inFocus7 inFocus7 enabled auto-merge May 20, 2026 16:42
Copy link
Copy Markdown
Collaborator

@nikolasmatt nikolasmatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean removal — no stale references in Go, charts, or docs; handler was never wired, confirmed. One important on 008, two nits.

Also: PR body's breaking-change list omits the MigratorConfig signature change (MigratorConfig(bool)MigratorConfig()) and the removal of the exported MigratorConfig.Skip field. Downstream consumers will hit compile errors — worth listing alongside pkg/semantic.

Comment thread pkg/registry/v1alpha1store/migrations/008_drop_semantic_embeddings.sql Outdated
Comment thread pkg/registry/v1alpha1store/migrations/003_embeddings.sql Outdated
Comment thread scripts/kind/README.md
Signed-off-by: Fabian Gonzalez <fabian.gonzalez@solo.io>
@inFocus7 inFocus7 added this pull request to the merge queue May 20, 2026
Merged via the queue into agentregistry-dev:main with commit e7f877b May 20, 2026
8 checks passed
@inFocus7 inFocus7 deleted the cleanup/remove-semantic-search-feat branch May 20, 2026 18:56
nikolasmatt added a commit to nikolasmatt/agentregistry that referenced this pull request May 20, 2026
- HappyPath: explicit assertions on "7 applied, 0 pending" and
  version == 8 (the highest OSS migration after agentregistry-dev#505's 007/008
  additions).
- DownErrNotReversible: every OSS migration now ships a
  RAISE-EXCEPTION .down.sql, so the assertion matches "not
  reversible" from the propagated PostgreSQL error.
- ForceWritesRowOnly: force uses the unoffset version 1 (the +200
  offset is gone with the engine swap). Adds an explicit assertion
  that the v1alpha1 schema does NOT exist after `force 1` — proves
  go-migrate's Force writes the bookkeeping row without running the
  migration SQL.
- GotoInferredSource (new): single-source binaries accept `goto 2`
  without --source; the version subcommand confirms the new state.

No-DB cases (MissingDSN, DBUrlPrecedence, Help, ArgValidation) are
unchanged from PR 503. All 8 tests run green against a local
Postgres on the dev port.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants