chore(infra): migrate from Elasticsearch to OpenSearch#216
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces Elasticsearch client/DSL imports and exceptions with OpenSearch equivalents, updates mapping/docstring terminology, adds os-v2 mapping initializers, and implements a Celery task with tests to index OpenSearch cluster/node statistics. ChangesElasticsearch to OpenSearch Migration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
90c7f04 to
7ae371f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rero_mef/monitoring/tasks.py`:
- Around line 43-44: Replace the silent handler "except Exception: return None"
in the snapshot lookup (the shown except block in rero_mef/monitoring/tasks.py)
with a guarded catch that captures the exception as e and logs it (e.g.,
logger.exception or logger.error with traceback) before returning/handling the
fallback; specifically change the bare except to "except Exception as e:" and
call the module's logger (or monitoring logger) to record the error and context
(search query, client info, function name) so OpenSearch search/client failures
are visible.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d1b7e221-99f9-4270-aa53-9e98d690c854
⛔ Files ignored due to path filters (26)
CLAUDE.mdis excluded by none and included by noneINSTALL.mdis excluded by none and included by nonedocker-compose.full.ymlis excluded by none and included by nonedocker-compose.ymlis excluded by none and included by nonedocker-services.ymlis excluded by none and included by nonedocker/dashboards/mef-dashboard.ndjsonis excluded by none and included by nonedocker/elasticsearch/Dockerfileis excluded by none and included by nonedocker/wait-for-services.shis excluded by none and included by nonepyproject.tomlis excluded by none and included by nonerero_mef/agents/gnd/mappings/os-v2/agents_gnd/gnd-agent-v0.0.1.jsonis excluded by none and included by nonerero_mef/agents/idref/mappings/os-v2/agents_idref/idref-agent-v0.0.1.jsonis excluded by none and included by nonerero_mef/agents/mef/mappings/os-v2/mef/mef-v0.0.1.jsonis excluded by none and included by nonerero_mef/agents/rero/mappings/os-v2/agents_rero/rero-agent-v0.0.1.jsonis excluded by none and included by nonerero_mef/agents/viaf/mappings/os-v2/viaf/viaf-v0.0.1.jsonis excluded by none and included by nonerero_mef/concepts/gnd/mappings/os-v2/concepts_gnd/gnd-concept-v0.0.1.jsonis excluded by none and included by nonerero_mef/concepts/idref/mappings/os-v2/concepts_idref/idref-concept-v0.0.1.jsonis excluded by none and included by nonerero_mef/concepts/mef/mappings/os-v2/concepts_mef/mef-concept-v0.0.1.jsonis excluded by none and included by nonerero_mef/concepts/rero/mappings/os-v2/concepts_rero/rero-concept-v0.0.1.jsonis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyrero_mef/places/gnd/mappings/os-v2/places_gnd/gnd-place-v0.0.1.jsonis excluded by none and included by nonerero_mef/places/idref/mappings/os-v2/places_idref/idref-place-v0.0.1.jsonis excluded by none and included by nonerero_mef/places/mef/mappings/os-v2/places_mef/mef-place-v0.0.1.jsonis excluded by none and included by nonescripts/setupis excluded by none and included by nonescripts/setup-dashboardsis excluded by none and included by nonescripts/testis excluded by none and included by noneuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (48)
rero_mef/agents/__init__.pyrero_mef/agents/api.pyrero_mef/agents/gnd/mappings/__init__.pyrero_mef/agents/gnd/mappings/os-v2/__init__.pyrero_mef/agents/idref/mappings/__init__.pyrero_mef/agents/idref/mappings/os-v2/__init__.pyrero_mef/agents/idref/mappings/v7/__init__.pyrero_mef/agents/mef/mappings/__init__.pyrero_mef/agents/mef/mappings/os-v2/__init__.pyrero_mef/agents/rero/mappings/__init__.pyrero_mef/agents/rero/mappings/os-v2/__init__.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/mappings/__init__.pyrero_mef/agents/viaf/mappings/os-v2/__init__.pyrero_mef/api.pyrero_mef/api_mef.pyrero_mef/cli.pyrero_mef/concepts/__init__.pyrero_mef/concepts/gnd/mappings/__init__.pyrero_mef/concepts/gnd/mappings/os-v2/__init__.pyrero_mef/concepts/idref/mappings/__init__.pyrero_mef/concepts/idref/mappings/os-v2/__init__.pyrero_mef/concepts/mef/mappings/__init__.pyrero_mef/concepts/mef/mappings/os-v2/__init__.pyrero_mef/concepts/rero/mappings/__init__.pyrero_mef/concepts/rero/mappings/os-v2/__init__.pyrero_mef/concepts/rero/mappings/v7/__init__.pyrero_mef/ext.pyrero_mef/filter.pyrero_mef/monitoring/__init__.pyrero_mef/monitoring/api.pyrero_mef/monitoring/cli.pyrero_mef/monitoring/tasks.pyrero_mef/monitoring/views.pyrero_mef/places/__init__.pyrero_mef/places/gnd/mappings/__init__.pyrero_mef/places/gnd/mappings/os-v2/__init__.pyrero_mef/places/idref/mappings/__init__.pyrero_mef/places/idref/mappings/os-v2/__init__.pyrero_mef/places/mef/mappings/__init__.pyrero_mef/places/mef/mappings/os-v2/__init__.pyrero_mef/query.pyrero_mef/tasks.pyrero_mef/views.pytests/api/test_all_mef_rest.pytests/api/test_entity_serializers.pytests/conftest.pytests/ui/test_ext.py
💤 Files with no reviewable changes (2)
- rero_mef/agents/idref/mappings/v7/init.py
- rero_mef/concepts/rero/mappings/v7/init.py
✅ Files skipped from review due to trivial changes (32)
- rero_mef/concepts/gnd/mappings/os-v2/init.py
- rero_mef/concepts/gnd/mappings/init.py
- rero_mef/agents/init.py
- rero_mef/places/idref/mappings/os-v2/init.py
- rero_mef/agents/idref/mappings/os-v2/init.py
- rero_mef/monitoring/init.py
- rero_mef/agents/gnd/mappings/os-v2/init.py
- rero_mef/concepts/rero/mappings/os-v2/init.py
- rero_mef/places/mef/mappings/os-v2/init.py
- rero_mef/agents/rero/mappings/init.py
- rero_mef/places/idref/mappings/init.py
- rero_mef/concepts/init.py
- rero_mef/concepts/mef/mappings/os-v2/init.py
- rero_mef/monitoring/cli.py
- rero_mef/places/gnd/mappings/os-v2/init.py
- rero_mef/concepts/idref/mappings/os-v2/init.py
- rero_mef/agents/mef/mappings/init.py
- rero_mef/concepts/mef/mappings/init.py
- rero_mef/agents/api.py
- tests/conftest.py
- rero_mef/agents/idref/mappings/init.py
- rero_mef/concepts/idref/mappings/init.py
- rero_mef/agents/viaf/mappings/init.py
- rero_mef/places/init.py
- rero_mef/places/gnd/mappings/init.py
- rero_mef/agents/mef/mappings/os-v2/init.py
- rero_mef/query.py
- rero_mef/places/mef/mappings/init.py
- rero_mef/concepts/rero/mappings/init.py
- rero_mef/agents/gnd/mappings/init.py
- rero_mef/agents/viaf/mappings/os-v2/init.py
- rero_mef/agents/rero/mappings/os-v2/init.py
🚧 Files skipped from review as they are similar to previous changes (12)
- tests/api/test_entity_serializers.py
- rero_mef/tasks.py
- rero_mef/api_mef.py
- tests/ui/test_ext.py
- tests/api/test_all_mef_rest.py
- rero_mef/filter.py
- rero_mef/ext.py
- rero_mef/monitoring/views.py
- rero_mef/monitoring/api.py
- rero_mef/views.py
- rero_mef/cli.py
- rero_mef/api.py
e679982 to
070deda
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
rero_mef/ext.py (1)
85-93:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winFix invalid Python 3 exception syntax
Line 87 uses
except RuntimeError, AttributeError:which is Python 2 syntax and invalid in Python 3. This will cause a SyntaxError and prevent the module from being imported.Proposed fix
- except RuntimeError, AttributeError: + except (RuntimeError, AttributeError):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rero_mef/ext.py` around lines 85 - 93, The except clause uses Python 2 syntax and must be updated: in the try/except around assigning current_search_client (the block that logs "Skipping %s alias setup" using alias_name and app.logger), replace the invalid "except RuntimeError, AttributeError:" with a Python 3-compatible form such as "except (RuntimeError, AttributeError):" (or separate except blocks) so the module can import without a SyntaxError.
🧹 Nitpick comments (1)
tests/ui/test_monitoring.py (1)
303-309: ⚡ Quick winMove function-local imports to module scope
Line 305 and Line 385 add deferred imports without a circular-import justification. Please move them to top-level imports to keep import behavior predictable and consistent.
Proposed diff
+import logging +from datetime import UTC, datetime, timedelta from unittest.mock import MagicMock, patch import pytest from click.testing import CliRunner @@ def test_last_snapshot_returns_none_and_logs_on_exception(caplog): """Returns None and logs the exception when the search call fails.""" - import logging client = _make_client(raise_on_search=True) with caplog.at_level(logging.ERROR, logger="rero_mef.monitoring.tasks"): result = _last_snapshot(client, "node1") @@ def test_index_os_stats_rates_computed_from_prev_snapshot(): """Rates are non-zero when a previous snapshot provides a baseline.""" - from datetime import UTC, datetime, timedelta - prev_ts = (datetime.now(UTC) - timedelta(seconds=60)).strftime( "%Y-%m-%dT%H:%M:%S.000Z" )As per coding guidelines: "
**/*.py: Organize imports in order: standard library → third-party → local, sorted within groups, with all imports at the top of the file" and "**/*.py: Only use deferred (inside-function) imports when they genuinely break circular dependencies, and document why with a comment."Also applies to: 385-389
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/ui/test_monitoring.py` around lines 303 - 309, This test file has function-local imports (e.g., the "import logging" inside test_last_snapshot and another deferred import around lines 385-389) that should be moved to module-level; move these imports to the top of tests/ui/test_monitoring.py, organize them into the standard library → third-party → local groups and sort them within each group, and remove the in-function import statements from functions like test_last_snapshot and the other test at ~385; if any import truly breaks a circular dependency, keep it deferred but add a short comment explaining why and reference the specific symbol(s) that require the deferred import.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@rero_mef/ext.py`:
- Around line 85-93: The except clause uses Python 2 syntax and must be updated:
in the try/except around assigning current_search_client (the block that logs
"Skipping %s alias setup" using alias_name and app.logger), replace the invalid
"except RuntimeError, AttributeError:" with a Python 3-compatible form such as
"except (RuntimeError, AttributeError):" (or separate except blocks) so the
module can import without a SyntaxError.
---
Nitpick comments:
In `@tests/ui/test_monitoring.py`:
- Around line 303-309: This test file has function-local imports (e.g., the
"import logging" inside test_last_snapshot and another deferred import around
lines 385-389) that should be moved to module-level; move these imports to the
top of tests/ui/test_monitoring.py, organize them into the standard library →
third-party → local groups and sort them within each group, and remove the
in-function import statements from functions like test_last_snapshot and the
other test at ~385; if any import truly breaks a circular dependency, keep it
deferred but add a short comment explaining why and reference the specific
symbol(s) that require the deferred import.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7480e9f3-d322-4ba0-95af-46fc6cbffc54
⛔ Files ignored due to path filters (26)
CLAUDE.mdis excluded by none and included by noneINSTALL.mdis excluded by none and included by nonedocker-compose.full.ymlis excluded by none and included by nonedocker-compose.ymlis excluded by none and included by nonedocker-services.ymlis excluded by none and included by nonedocker/dashboards/mef-dashboard.ndjsonis excluded by none and included by nonedocker/elasticsearch/Dockerfileis excluded by none and included by nonedocker/wait-for-services.shis excluded by none and included by nonepyproject.tomlis excluded by none and included by nonerero_mef/agents/gnd/mappings/os-v2/agents_gnd/gnd-agent-v0.0.1.jsonis excluded by none and included by nonerero_mef/agents/idref/mappings/os-v2/agents_idref/idref-agent-v0.0.1.jsonis excluded by none and included by nonerero_mef/agents/mef/mappings/os-v2/mef/mef-v0.0.1.jsonis excluded by none and included by nonerero_mef/agents/rero/mappings/os-v2/agents_rero/rero-agent-v0.0.1.jsonis excluded by none and included by nonerero_mef/agents/viaf/mappings/os-v2/viaf/viaf-v0.0.1.jsonis excluded by none and included by nonerero_mef/concepts/gnd/mappings/os-v2/concepts_gnd/gnd-concept-v0.0.1.jsonis excluded by none and included by nonerero_mef/concepts/idref/mappings/os-v2/concepts_idref/idref-concept-v0.0.1.jsonis excluded by none and included by nonerero_mef/concepts/mef/mappings/os-v2/concepts_mef/mef-concept-v0.0.1.jsonis excluded by none and included by nonerero_mef/concepts/rero/mappings/os-v2/concepts_rero/rero-concept-v0.0.1.jsonis excluded by none and included by nonerero_mef/config.pyis excluded by!rero_mef/config.pyand included byrero_mef/**/*.pyrero_mef/places/gnd/mappings/os-v2/places_gnd/gnd-place-v0.0.1.jsonis excluded by none and included by nonerero_mef/places/idref/mappings/os-v2/places_idref/idref-place-v0.0.1.jsonis excluded by none and included by nonerero_mef/places/mef/mappings/os-v2/places_mef/mef-place-v0.0.1.jsonis excluded by none and included by nonescripts/setupis excluded by none and included by nonescripts/setup-dashboardsis excluded by none and included by nonescripts/testis excluded by none and included by noneuv.lockis excluded by!**/*.lockand included by none
📒 Files selected for processing (49)
rero_mef/agents/__init__.pyrero_mef/agents/api.pyrero_mef/agents/gnd/mappings/__init__.pyrero_mef/agents/gnd/mappings/os-v2/__init__.pyrero_mef/agents/idref/mappings/__init__.pyrero_mef/agents/idref/mappings/os-v2/__init__.pyrero_mef/agents/idref/mappings/v7/__init__.pyrero_mef/agents/mef/mappings/__init__.pyrero_mef/agents/mef/mappings/os-v2/__init__.pyrero_mef/agents/rero/mappings/__init__.pyrero_mef/agents/rero/mappings/os-v2/__init__.pyrero_mef/agents/viaf/api.pyrero_mef/agents/viaf/mappings/__init__.pyrero_mef/agents/viaf/mappings/os-v2/__init__.pyrero_mef/api.pyrero_mef/api_mef.pyrero_mef/cli.pyrero_mef/concepts/__init__.pyrero_mef/concepts/gnd/mappings/__init__.pyrero_mef/concepts/gnd/mappings/os-v2/__init__.pyrero_mef/concepts/idref/mappings/__init__.pyrero_mef/concepts/idref/mappings/os-v2/__init__.pyrero_mef/concepts/mef/mappings/__init__.pyrero_mef/concepts/mef/mappings/os-v2/__init__.pyrero_mef/concepts/rero/mappings/__init__.pyrero_mef/concepts/rero/mappings/os-v2/__init__.pyrero_mef/concepts/rero/mappings/v7/__init__.pyrero_mef/ext.pyrero_mef/filter.pyrero_mef/monitoring/__init__.pyrero_mef/monitoring/api.pyrero_mef/monitoring/cli.pyrero_mef/monitoring/tasks.pyrero_mef/monitoring/views.pyrero_mef/places/__init__.pyrero_mef/places/gnd/mappings/__init__.pyrero_mef/places/gnd/mappings/os-v2/__init__.pyrero_mef/places/idref/mappings/__init__.pyrero_mef/places/idref/mappings/os-v2/__init__.pyrero_mef/places/mef/mappings/__init__.pyrero_mef/places/mef/mappings/os-v2/__init__.pyrero_mef/query.pyrero_mef/tasks.pyrero_mef/views.pytests/api/test_all_mef_rest.pytests/api/test_entity_serializers.pytests/conftest.pytests/ui/test_ext.pytests/ui/test_monitoring.py
💤 Files with no reviewable changes (2)
- rero_mef/agents/idref/mappings/v7/init.py
- rero_mef/concepts/rero/mappings/v7/init.py
✅ Files skipped from review due to trivial changes (29)
- rero_mef/monitoring/init.py
- rero_mef/concepts/mef/mappings/os-v2/init.py
- rero_mef/agents/gnd/mappings/init.py
- rero_mef/places/gnd/mappings/init.py
- rero_mef/agents/viaf/mappings/init.py
- rero_mef/places/mef/mappings/os-v2/init.py
- rero_mef/concepts/gnd/mappings/init.py
- rero_mef/places/mef/mappings/init.py
- rero_mef/agents/idref/mappings/os-v2/init.py
- rero_mef/concepts/idref/mappings/os-v2/init.py
- rero_mef/agents/rero/mappings/init.py
- rero_mef/agents/mef/mappings/os-v2/init.py
- rero_mef/places/init.py
- rero_mef/tasks.py
- rero_mef/concepts/gnd/mappings/os-v2/init.py
- tests/api/test_entity_serializers.py
- rero_mef/concepts/rero/mappings/os-v2/init.py
- rero_mef/agents/api.py
- rero_mef/concepts/mef/mappings/init.py
- rero_mef/agents/rero/mappings/os-v2/init.py
- rero_mef/agents/mef/mappings/init.py
- rero_mef/agents/idref/mappings/init.py
- rero_mef/places/idref/mappings/os-v2/init.py
- rero_mef/agents/init.py
- rero_mef/places/idref/mappings/init.py
- rero_mef/agents/gnd/mappings/os-v2/init.py
- rero_mef/agents/viaf/mappings/os-v2/init.py
- rero_mef/places/gnd/mappings/os-v2/init.py
- rero_mef/concepts/idref/mappings/init.py
d09ef6e to
05d0559
Compare
| rero-mef is the Python/Flask backend for the RERO Metadata Enrichment Framework (MEF). It provides OAI harvesting, record processing, and metadata enrichment for library data pipelines. The project is built on Invenio, with PostgreSQL, Elasticsearch, Celery, and Redis for backend services. All development and task running is managed via `uv` and `poethepoet`. | ||
|
|
||
| **Stack:** Python 3.12, Flask (Invenio), PostgreSQL, Elasticsearch 7, Celery, Redis | ||
| **Stack:** Python 3.12, Flask (Invenio), PostgreSQL, OpenSearch 2, Celery, Redis |
| // Suppress the Dart Sass "legacy JS API" module warning — it fires for | ||
| // every SCSS file and is not actionable until upstream (bootstrap-sass, | ||
| // invenio-theme) migrates to the modern API. | ||
| ignoreWarnings: [/legacy JS API/], |
There was a problem hiding this comment.
What is the link to OpenSearch?
| { | ||
| test: /\.(scss|css)$/, | ||
| use: [MiniCssExtractPlugin.loader, "css-loader", "sass-loader"], | ||
| use: [ |
There was a problem hiding this comment.
No obvious link to OpenSearch?
| "sqlitedict>=2.1.0", | ||
| "rero-invenio-base>=0.3.1", | ||
| # "rero-invenio-base>=0.3.1", | ||
| "rero-invenio-base @ git+https://github.com/rerowep/rero-invenio-base@wep-opensearch", |
There was a problem hiding this comment.
Publish a rero-invenio-base release and update this before integrating.
| - "6379:6379" | ||
| db: | ||
| image: postgres:17 | ||
| image: postgres:18-alpine |
There was a problem hiding this comment.
This should always be in sync with what is currently used in the production servers. Should we migrate to postgres18 in prod on the next release?
* replace docker/elasticsearch/Dockerfile with OPENSEARCH_PLUGINS=analysis-icu env var — no custom image needed * rename docker-compose service es → os; update all links and INVENIO_SEARCH_HOSTS in docker-services.yml and docker-compose.full.yml * add OPENSEARCH_INITIAL_ADMIN_PASSWORD to satisfy OpenSearch 2.12+ entrypoint check even when plugins.security.disabled=true * switch rero-invenio-base to git branch wep-opensearch to remove transitive elasticsearch-py 7.x dependency and unblock urllib3 upgrade to 2.7 * drop five now-resolved urllib3 CVE exceptions from scripts/test (CVE-2025-50181, CVE-2025-66418, CVE-2025-66471, CVE-2026-21441, CVE-2026-44431); update pytest CVE comment to name correct blocker (pytest-invenio 4.0.1 pins pytest<9.0.0) * replace "Elasticsearch"/"elasticsearch" with "Search"/"search" in all docstrings and comments throughout rero_mef/ and tests/ * update CLAUDE.md stack description: Elasticsearch → OpenSearch Co-Authored-by: Peter Weber <peter.weber@rero.ch>
Summary by CodeRabbit
New Features
Updates