refactor(briklab/scripts): notion-based lib, Makefile infra entrypoint, E2E hardening#18
Merged
Conversation
…ttp,wait}
Add lib/transverse/{log,env,http,wait}.sh as single-source-of-truth notion
modules: briklab.log.* (colors + leveled logging), briklab.env.* (.env /
versions.env helpers), briklab.http.* (one curl transport, auth passed by the
caller), and briklab.wait.until (one poll-until-ready loop, no eval).
common.sh becomes a backward-compat facade that sources the four modules and
re-exposes the legacy names (log_*, save_to_env, reload_env, load_env,
load_versions, check_http) plus the colors and root-path vars, so every
existing caller keeps working unchanged. Purely additive: no behavior change.
Replace the per-service wait_for_* poll loops (gitlab, gitea, jenkins, nexus), the ArgoCD port-forward wait, the cli _wait_for_http helper and recovery's _recover_wait with briklab.wait.until. Route container-state checks through briklab.check.container_running, fixing infra-refresh's divergent .State.Running probe. Replace the unsafe kill $(lsof ...) with pkill -f, and give infra-refresh and infra-verify proper source guards.
…port
Add briklab.http.request (response body + trailing status line, never-fail) and
migrate all five E2E API clients (gitlab, jenkins, gitea, nexus, argocd) off raw
curl onto the shared briklab.http.{get,post_json,delete,code,request} transport.
Auth (PRIVATE-TOKEN / token / Bearer / basic) and per-call options (-X, -L, -o,
-g, cookie jars, --max-time) are passed as arguments. Collapses gitlab-api's
three divergent curl shapes into one transport and removes ~30 duplicated curl
invocations.
Rename the credential helpers (ensure_*) to briklab.auth.* and the infra-verify / smoke-test helpers (verify_*, check/skip/is_running) to briklab.verify.*, collapsing the orchestration layer's divergent naming conventions into the <notion>.<submodule>.<verb> scheme. Definitions and all call sites move together; no behavior change.
Move token propagation (GitLab CI variables, Jenkins token-staleness restart) into recovery.sh as briklab.recover.gitlab_ci_vars / jenkins_token, routing their HTTP through briklab.http.*. infra-refresh.sh now sources recovery.sh directly instead of reaching down through e2e/lib/auth.sh, and composes the existing briklab.check.* / briklab.recover.* heals instead of duplicating per-token check wrappers. No behavior change to the infra-refresh command.
Break the 832-line assert.sh into assert/core.sh (domain-agnostic engine: counters, lifecycle, generic + JSON + status + job-log assertions) and assert/report.sh (Brik aggregate-report business outcomes, package/promote, and deploy-state delegations). assert.sh becomes a thin facade sourcing both. Drop the "is the lib loaded?" type guards on the delegating assertions: a test that calls them has already sourced the domain lib, so an undefined call now fails loudly instead of producing a soft "not loaded" assertion failure. Normalize the private helpers to the assert._verb convention. No change to the public assertion API.
The Nexus docker repo enforces basic auth, but the k3d registries.yaml only declared the mirror endpoint with no credentials, so containerd pulled anonymously, got 401, and deployed pods were stuck in ImagePullBackOff (the brik-deploy job then failed). Add a configs: auth block for both the registry host (nexus.briklab.test:8082) and the mirror endpoint host (brik-nexus:8082, the host containerd actually contacts), reusing the admin credentials the CI already uses to push.
…etup Move the two ArgoCD Application manifests (brik-e2e-gitops, brik-e2e-rollback) into a standalone setup/argocd-apps.sh that k3d.sh invokes at the end of a fresh cluster bring-up. The apps can now be re-applied against an existing cluster without recreating it. The two near-identical manifests are folded into a single parameterized helper.
Replace the ~17 inline _set_group_variable calls in setup_nexus_ci_variables with a key|value|masked table iterated in a single loop, and extract the SSH file-type variable into a _set_group_file_variable helper. The produced set of group CI/CD variables is unchanged; the count/total accounting is now exact.
The mktemp + printf + chmod block that builds a throwaway GIT_ASKPASS script was repeated verbatim in four git functions (push, push_tag, push_branch, trigger_via_push). Fold it into a single e2e.git._askpass_file helper; the public function signatures are unchanged.
briklab.verify.cmd took a command string and ran it through eval. Take the command and its arguments directly and run "$@", dropping the eval. The sole caller (cli/setup.sh) passes its command as separate words.
…ait helpers The v0.1.0 -> v0.2.0 git chain built verbatim by both rollback callbacks becomes e2e.git.build_release_chain. The 'wait for a Jenkins job to exist' poll duplicated across jenkins-test.sh and jenkins-rollback.sh becomes e2e.jenkins.wait_job_exists, built on briklab.wait.until. No behavior change beyond the now-silent poll.
…helper The download + assert.aggregate_v1 + conditional image_tag + conditional promote_succeeded tail was duplicated near-verbatim in gitlab-test.sh and jenkins-test.sh. Extract it into e2e.scenario.assert_aggregate (new lib/scenario.sh); the platform-specific run-id discovery and download command stay in the test scripts and are passed in. Behavior unchanged.
The deploy taxonomy was globbed in three places: cli/test.sh decided --with-deploy with *deploy*/*gitops*/*rollback*, and both suites matched *-deploy-gitops|*-deploy-rollback for the ArgoCD precheck and post-run sync assertion. Replace them with e2e.scenario.needs_deploy and e2e.scenario.is_gitops in lib/scenario.sh, so the CLI no longer hardcodes the E2E naming. Behavior is identical across the current scenario set.
Both suites asserted the ArgoCD sync after a green *-deploy-gitops run -- the GitLab suite through a _suite_assert_gitops_sync wrapper, Jenkins inline. Replace both with e2e.scenario.gitops_postcheck, removing the wrapper and the duplicated is_gitops guard.
Separate infra lifecycle from the E2E test CLI. A root Makefile and a new scripts/infra.sh thin dispatcher own create/start/stop/restart/clean/k3d and version-artifact generation; briklab.sh keeps test/setup/status/logs/reset/ preflight and redirects the moved commands. - Add Makefile (init/start/stop/restart/clean/clean-force/k3d-start/k3d-stop/ versions/versions-check) delegating to scripts/infra.sh. - Add scripts/infra.sh lifecycle dispatcher. - Move generate-versions.sh to scripts/lib/versions.sh as the briklab.versions notion (generate/check); resolve its root independently of the dispatcher. - Extract check_prereqs + the rich load_env into scripts/lib/cli/prereqs.sh, shared by both dispatchers. - Slim briklab.sh: drop lifecycle dispatch, point help at the Makefile. - Repoint all generate-versions references (versions.yml, docker-compose.yml, env.sh, runner-images.sh, README, generated artifact headers) to make versions.
…agation Reflect the lifecycle split (Makefile/infra.sh vs briklab.sh) and the generated versions workflow in the architecture guide and README. - Add an Entrypoints section: Make/infra.sh own the lifecycle, briklab.sh owns test/config/ops; both thin over lib/. - Document make versions / versions-check and the briklab.versions notion. - Explain when to run infra-refresh: a lab reset rotates the ArgoCD signing key, and only infra-refresh propagates a fresh ARGOCD_AUTH_TOKEN to the GitLab CI variables; a stale CI token makes brik-deploy fail on argocd app sync. - Refresh the directory structure (Makefile, infra.sh, lib/versions.sh, lib/cli/prereqs.sh, versions.yml/env, generated config artifacts). - Add Known Gotchas / Troubleshooting entries for the stale ArgoCD CI token.
…index Push-driven Jenkins scenarios (workflow-trunk-main, workflow-trunk-tag) timed out with "No build found for SHA" after a lab reset. The multibranch job is never re-indexed: there is no PeriodicFolderTrigger, and the gitea-plugin only manages webhooks at org level while brik is a Gitea user, so a freshly recreated repo has no webhook to notify Jenkins. Trigger an explicit Multibranch scan right after the push. A branch push then auto-builds via BranchDiscoveryTrait; a tag push makes the tag sub-job appear so the existing explicit /build step runs. - Add e2e.jenkins.scan_multibranch (POST /job/<job>/build). - Call it after the push in jenkins-test.sh (push mode). - Document the issue and the GitLab ArgoCD-token reset case in docs/e2e-known-issues.md, cross-linked from the architecture gotchas. Validated live: workflow-trunk-main 9/9, workflow-trunk-tag 10/10.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Structural refactor of
briklab/scripts/(chantier #30) aligning it with brik'snotion doctrine (
<notion>.<submodule>.<verb>, single source of truth, smallfocused files), plus a clean split of infra lifecycle from the E2E test CLI and
two E2E reliability fixes found while validating end to end.
No behaviour change for the lab; the same
cmd_*flows now sit behind a sharedscripts/lib/backbone with two thin dispatchers.What changed
Transverse + lib restructure
briklab.{log,env,http,wait}.*notions; routeE2E API clients and polling/container probes through the
briklab.http.*andbriklab.wait.*SoT.briklab.*; foldinfra-refreshinto therecovery layer; split the E2E assertion library into core + report.
Setup / infra decomposition
k3dsetup (setup/argocd-apps.sh).helper; replace
evalwith"$@"inverify.cmd.E2E dedup (scenario layer)
scenario.sh: hoist the aggregate-report validation tail, the rollbackrelease-chain / Jenkins job-wait helpers, centralize deploy/gitops scenario
classification, and unify the gitops post-run sync check across suites.
Makefile infra entrypoint (lifecycle split)
Makefile->scripts/infra.showns the lab lifecycle(
init/start/stop/restart/clean/k3d-*/versions);briklab.shkeepstest/config/ops and redirects the moved commands.
briklab.versions.*notion(
scripts/lib/versions.sh);make versions/make versions-check.check_prereqs+load_envbootstrap extracted tolib/cli/prereqs.sh.E2E fixes
pushes get indexed (no webhook for user-owned Gitea repos, no periodic scan).
Docs
docs/architecture.md: Entrypoints section, versions workflow,infra-refreshguidance, refreshed directory tree, Known Gotchas.
docs/e2e-known-issues.md(Jenkins multibranch scan + GitLab ArgoCD-tokenreset cases).
Test plan
shellcheck --severity=error --external-sourcesoverscripts/(CI parity): clean.make versions-check: artifacts in sync withversions.yml.make help/make versions;briklab.shredirects moved lifecycle commands.node-full-cve, workflow-trunk-main/tag, node-deploy-rollback) after
infra-refresh.workflow-trunk-main9/9 andworkflow-trunk-tag10/10 (the scan-after-push fix).