Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
ec40bf9
refactor(scripts): extract transverse helpers into briklab.{log,env,h…
jeanjerome Jun 5, 2026
4813f24
refactor(scripts): route polling and container probes through the SoT
jeanjerome Jun 5, 2026
d7badcf
refactor(scripts): route E2E API clients through briklab.http.* trans…
jeanjerome Jun 5, 2026
6796525
refactor(scripts): unify auth and verify helpers under briklab.* notions
jeanjerome Jun 5, 2026
93acd36
refactor(scripts): fold infra-refresh into the recovery layer
jeanjerome Jun 5, 2026
7824edd
refactor(scripts): split the E2E assertion library into core and report
jeanjerome Jun 5, 2026
bcd31cc
fix(scripts): authenticate k3d image pulls against the Nexus registry
jeanjerome Jun 5, 2026
3628d95
refactor(scripts): split ArgoCD application provisioning out of k3d s…
jeanjerome Jun 5, 2026
9cec051
refactor(scripts): drive the GitLab Nexus CI variables from a table
jeanjerome Jun 5, 2026
24174c6
refactor(scripts): extract the GIT_ASKPASS script creation into a helper
jeanjerome Jun 5, 2026
02272c9
refactor(scripts): run verify.cmd via "$@" instead of eval
jeanjerome Jun 5, 2026
87ee54b
refactor(scripts): hoist the rollback release-chain and Jenkins job-w…
jeanjerome Jun 5, 2026
f4a7f7f
refactor(scripts): hoist the aggregate-report validation tail into a …
jeanjerome Jun 5, 2026
f440733
refactor(scripts): centralize the deploy/gitops scenario classification
jeanjerome Jun 5, 2026
869ce00
refactor(scripts): unify the gitops post-run sync check across suites
jeanjerome Jun 5, 2026
fc2ce42
refactor(scripts): split infra lifecycle into Makefile + infra.sh
jeanjerome Jun 5, 2026
60d3eb2
docs: document Makefile infra entrypoint and infra-refresh token prop…
jeanjerome Jun 5, 2026
3be03ac
fix(e2e): scan Jenkins Multibranch after push so branch/tag triggers …
jeanjerome Jun 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Briklab - infra lifecycle entrypoint.
# Thin wrapper over scripts/infra.sh. Testing/config live in scripts/briklab.sh.
INFRA := ./scripts/infra.sh

.PHONY: help init start stop restart clean clean-force \
k3d-start k3d-stop versions versions-check

help: ## Show this help
@$(INFRA) help

init: ## First launch (start + setup + k3d + smoke-test)
@$(INFRA) init

start: ## Start all containers
@$(INFRA) start

stop: ## Stop all containers
@$(INFRA) stop

restart: ## Stop + start
@$(INFRA) restart

clean: ## Delete all data and volumes (prompts for confirmation)
@$(INFRA) clean

clean-force: ## Delete all data and volumes (no prompt)
@$(INFRA) clean --yes

k3d-start: ## Create k3d cluster + install ArgoCD
@$(INFRA) k3d-start

k3d-stop: ## Destroy the k3d cluster
@$(INFRA) k3d-stop

versions: ## Regenerate versions.env + Jenkins plugins + image lock from versions.yml
@$(INFRA) versions

versions-check: ## Fail if any generated artifact drifts from versions.yml
@$(INFRA) versions --check
46 changes: 26 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ The alternatives are bad:
- Renting GitLab/Jenkins cloud accounts per contributor: expensive, slow to iterate, shared state across PRs.
- Hand-rolling GitLab CE + Jenkins (with Configuration-as-Code) + Nexus + k3d + ArgoCD in Docker: days of wiring per contributor for PAT registration, runner registration, Job DSL, Nexus repository creation, ArgoCD port-forwards.

Briklab wires it once. Every contributor runs `./scripts/briklab.sh init` and gets the full stack ready in 5 minutes.
Briklab wires it once. Every contributor runs `make init` and gets the full stack ready in 5 minutes.

For internal architecture details, see [docs/architecture.md](docs/architecture.md).

Expand Down Expand Up @@ -85,7 +85,7 @@ Add to Docker Desktop (Settings > Docker Engine):
### Initialize

```bash
./scripts/briklab.sh init
make init
```

> GitLab takes 3-5 minutes on first start. Jenkins builds a custom Docker image on first start. Nexus takes 2-3 minutes. The script waits automatically.
Expand Down Expand Up @@ -122,15 +122,21 @@ Setup creates 6 hosted repositories for artifact publishing:

## CLI Commands

### Lifecycle
### Lifecycle (Makefile)

Infra lifecycle is driven by the root `Makefile` (or `./scripts/infra.sh <command>`
directly). Testing, configuration and reset stay on `./scripts/briklab.sh`.

| Command | Description |
|---------|-------------|
| `briklab.sh init` | First launch (start + setup + smoke-test) |
| `briklab.sh start` | Start all containers (+ set root password) |
| `briklab.sh stop` | Stop all containers |
| `briklab.sh restart` | Stop + start |
| `briklab.sh clean` | Delete all data and volumes (irreversible) |
| `make init` | First launch (start + setup + k3d + smoke-test) |
| `make start` | Start all containers |
| `make stop` | Stop all containers |
| `make restart` | Stop + start |
| `make clean` | Delete all data and volumes (prompts; `make clean-force` skips it) |
| `make k3d-start` / `make k3d-stop` | Create / destroy the k3d cluster + ArgoCD |
| `make versions` | Regenerate versions.env + Jenkins plugins + image lock from `versions.yml` |
| `make versions-check` | Fail if any generated artifact drifts from `versions.yml` (CI guard) |

### Configuration

Expand Down Expand Up @@ -198,25 +204,23 @@ deploy/gitops scenarios (or `--all`) the ArgoCD + cluster checks are blocking.

### Kubernetes

| Command | Description |
|---------|-------------|
| `briklab.sh k3d-start` | Create k3d cluster + install ArgoCD |
| `briklab.sh k3d-stop` | Destroy the k3d cluster |
k3d lifecycle lives in the Makefile: `make k3d-start` / `make k3d-stop` (see the
Lifecycle table above).

## Typical Workflow

```bash
# Day 1 - Full setup
./scripts/briklab.sh init # First time setup (~5 min)
make init # First time setup (~5 min)
./scripts/briklab.sh test --gitlab --all # Run GitLab E2E suite
./scripts/briklab.sh test --jenkins --all # Run Jenkins E2E suite
./scripts/briklab.sh stop # Done for the day
make stop # Done for the day

# Day N
./scripts/briklab.sh start # Restart (fast, data preserved)
make start # Restart (fast, data preserved)
./scripts/briklab.sh test --gitlab # Quick GitLab smoke test
./scripts/briklab.sh test --jenkins # Quick Jenkins smoke test
./scripts/briklab.sh stop # Done
make stop # Done
```

## E2E Testing
Expand Down Expand Up @@ -305,10 +309,12 @@ Full suite run on 2026-04-18

**Nexus repository creation fails** -- If `setup` is run before Nexus is fully ready, repository creation may fail. Wait for the healthcheck to pass, then re-run: `./scripts/briklab.sh setup`

**k3d cluster already exists** -- `k3d cluster delete brik && ./scripts/briklab.sh k3d-start`
**k3d cluster already exists** -- `k3d cluster delete brik && make k3d-start`

**ArgoCD won't sync** -- ArgoCD default polling is ~3 minutes. Use `argocd app get <app> --refresh hard` to force, or run `./scripts/briklab.sh infra-refresh` to renew port-forwards and tokens.

**`brik-deploy` fails with `token signature is invalid`** -- After a lab reset (`make clean` + `make init`) or any k3d/ArgoCD recreation, the ArgoCD signing key rotates and the `ARGOCD_AUTH_TOKEN` stored in GitLab CI variables goes stale. The `test` self-heal only refreshes the local token in `.env`; run `./scripts/briklab.sh infra-refresh` to propagate a fresh token to the GitLab CI variables (and Jenkins), then re-run the deploy/gitops scenarios.

**E2E timeout** -- Use `--batch-size 4` to limit concurrent pipelines. Check runner saturation with `./scripts/briklab.sh logs runner`. Run `./scripts/briklab.sh infra-refresh` if tokens expired.

**Reset between E2E runs** -- `./scripts/briklab.sh reset --gitlab` cleans repos, k8s namespaces, ArgoCD apps, and Nexus artifacts.
Expand All @@ -319,13 +325,13 @@ For the complete list of known issues and solutions, see [docs/architecture.md -

```bash
# Stop containers (data preserved)
./scripts/briklab.sh stop
make stop

# Delete all data and volumes (irreversible, requires confirmation)
./scripts/briklab.sh clean
make clean

# Delete k3d cluster
k3d cluster delete brik
make k3d-stop

# Full removal: after clean, remove Docker images manually
docker rmi gitlab/gitlab-ce:18.10.1-ce.0 gitlab/gitlab-runner:alpine3.21-bleeding
Expand Down
2 changes: 1 addition & 1 deletion config/brik-images.lock.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# GENERATED by scripts/generate-versions.sh from versions.yml -- DO NOT EDIT
# GENERATED from versions.yml by 'make versions' -- DO NOT EDIT
# Source of truth: versions.yml (runner_images). Consumed by scripts/lib/runner-images.sh (briklab.runner_images.pull).
# Digest-pinned so a clean rebuild pulls the exact images the E2E suite was validated against.
# Format: <ref>:<tag>@<digest>. The pull helper fetches the digest and tags it <ref>:<tag> locally.
Expand Down
2 changes: 1 addition & 1 deletion config/jenkins/plugins.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# GENERATED by scripts/generate-versions.sh from versions.yml -- DO NOT EDIT
# GENERATED from versions.yml by 'make versions' -- DO NOT EDIT
# Source of truth: versions.yml (jenkins_plugins). Consumed by images/jenkins/Dockerfile.
configuration-as-code:2077.v41f1011a_5110
git:5.10.1
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# K8s cluster is managed separately via k3d (see scripts/lib/setup/k3d.sh)
#
# Image versions and build args come from versions.env (generated from
# versions.yml by scripts/generate-versions.sh). briklab.sh loads it into the
# versions.yml by 'make versions'). The dispatchers load it into the
# environment automatically; for a bare invocation, source it first:
#
# Usage:
Expand Down
67 changes: 62 additions & 5 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,54 @@ All services run together via a single `docker-compose.yml`:

---

## Entrypoints

Two concerns, two entrypoints, one shared `scripts/lib/` backbone:

| Entrypoint | Owns | Commands |
|------------|------|----------|
| `Makefile` -> `scripts/infra.sh` | Infra lifecycle | `init`, `start`, `stop`, `restart`, `clean` / `clean-force`, `k3d-start`, `k3d-stop`, `versions`, `versions-check` |
| `scripts/briklab.sh` | Test + config + ops | `test`, `setup`, `reset`, `preflight`, `status`, `logs`, `smoke-test`, `infra-refresh` |

```bash
make init # create the whole lab (~5 min)
./scripts/briklab.sh test --gitlab --all # run the GitLab E2E suite
./scripts/briklab.sh test --jenkins --all
make stop # done for the day; make start to resume
```

Both dispatchers are thin: they set shared paths, source `lib/common.sh` +
`lib/cli/prereqs.sh` (the `check_prereqs` / `load_env` bootstrap), then dispatch
to `cmd_*` functions in `lib/cli/*` and notion modules in `lib/`. The lifecycle
commands moved out of `briklab.sh` into `infra.sh`; `briklab.sh` redirects them
with a hint (`'start' is an infra command -- use: make start`).

### Versions are generated

`versions.yml` is the single source of truth for every component/tool version.
`make versions` regenerates the derived artifacts (`versions.env`,
`config/jenkins/plugins.txt`, `config/brik-images.lock.yaml`) via the
`briklab.versions.*` notion (`scripts/lib/versions.sh`); `make versions-check`
fails if any artifact drifts. Never edit the generated files by hand.

### When to run `infra-refresh`

After a lab restart, a `make clean` + `make init`, or any k3d/ArgoCD recreation,
the ArgoCD server signing key rotates and previously-issued tokens are
invalidated. The `test` self-heal (`preflight --fix`) refreshes the **local**
ArgoCD token in `.env`, but only `infra-refresh` **propagates** the fresh token
to the GitLab group CI variables (`briklab.recover.gitlab_ci_vars`) and to
Jenkins. A stale CI token makes `brik-deploy` fail on `argocd app sync` with
`token signature is invalid` even though the GitOps manifests pushed cleanly.

```bash
./scripts/briklab.sh infra-refresh # regenerate + propagate tokens, then re-test
```

Run it before deploy/gitops scenarios whenever the lab was reset between runs.

---

## Design Principles

### 1. Single compose, all services
Expand Down Expand Up @@ -89,7 +137,7 @@ The Runner uses `extra_hosts` to map `gitlab.briklab.test` and `nexus.briklab.te
## Setup Flow

```
0. init
0. make init (-> scripts/infra.sh init)
|-- 1. check_prereqs (docker, jq)
|-- 2. prepare .env (copy .env.example if missing)
|-- 3. docker compose up -d (wait for healthchecks)
Expand Down Expand Up @@ -233,20 +281,26 @@ parser stays happy.

```
briklab/
|-- Makefile # Infra lifecycle entrypoint (-> scripts/infra.sh)
|-- docker-compose.yml # All services (GitLab + Runner + Gitea + Jenkins + Nexus + SSH Target)
|-- .env.example # Variables template
|-- versions.yml # SINGLE SOURCE OF TRUTH for component/tool versions
|-- versions.env # GENERATED by 'make versions' (image tags + build args)
|-- scripts/
| |-- briklab.sh # Thin CLI dispatcher (commands -> lib/cli/*)
| |-- briklab.sh # Test/config CLI (test/setup/reset/status/logs/preflight/infra-refresh)
| |-- infra.sh # Infra lifecycle CLI (init/start/stop/clean/k3d/versions)
| +-- lib/
| |-- common.sh # Shared utilities (logging, retry, env loading)
| |-- versions.sh # briklab.versions.* (generate/check derived artifacts)
| |-- checks.sh # Pure state predicates (single probe truth)
| |-- preflight.sh # E2E readiness gate (--fix self-heals)
| |-- recovery.sh # briklab.recover.* (mutating: node/controller/token)
| |-- runner-images.sh # Pre-pull set derived from brik's registry
| |-- infra-verify.sh # verify_* presentation (for setup)
| |-- infra-refresh.sh # Token/port-forward refresh + propagate
| |-- cli/ # Command modules (sourced by briklab.sh)
| | |-- lifecycle.sh # start/stop/restart/status/logs/clean/k3d/init
| |-- cli/ # Command modules (sourced by both dispatchers)
| | |-- prereqs.sh # check_prereqs + load_env bootstrap (shared)
| | |-- lifecycle.sh # init/start/stop/restart/status/logs/clean/k3d (-> infra.sh)
| | |-- setup.sh # setup + smoke-test
| | |-- test.sh # test (preflight --fix -> run)
| | +-- reset.sh # reset
Expand Down Expand Up @@ -291,8 +345,9 @@ briklab/
| |-- jenkins/ # Custom Jenkins image (Dockerfile + entrypoint)
| +-- ssh-target/ # SSH target container (Dockerfile + entrypoint)
|-- config/
| |-- brik-images.lock.yaml # GENERATED by 'make versions' (digest-pinned runner images)
| +-- jenkins/
| |-- plugins.txt # Required plugins
| |-- plugins.txt # GENERATED by 'make versions' (pinned plugins)
| +-- casc.yaml # Jenkins Configuration-as-Code
|-- docs/
| |-- architecture.md # This file
Expand Down Expand Up @@ -378,6 +433,8 @@ briklab/
| Gitea API 404 on org repos | `brik` is a user, not an organization | Use `/api/v1/user/repos` not `/api/v1/orgs/brik/repos` |
| ArgoCD doesn't pick up new commits | Default polling interval ~3 minutes | Call `?refresh=hard` before sync |
| GitLab can't mask variables with spaces | GitLab restriction on masked variable format | Avoid spaces in masked CI variable values |
| `brik-deploy` fails: `token signature is invalid` | ArgoCD key rotated on lab reset; CI variable holds a stale token (`test` self-heal refreshes only `.env`, not CI) | Run `./scripts/briklab.sh infra-refresh` to propagate a fresh `ARGOCD_AUTH_TOKEN` to GitLab CI |
| Jenkins push scenario: `No build found for SHA` | Multibranch job not indexed after a lab reset (no webhook for user-owned repos, no periodic scan) | The Jenkins suite scans after each push; see [e2e-known-issues.md](e2e-known-issues.md) |

---

Expand Down
108 changes: 108 additions & 0 deletions docs/e2e-known-issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Briklab E2E - Known Issues

Living record of E2E behaviours that are not bugs in the brik runtime but stem
from the lab's own state or third-party tooling. Each entry says what you see,
why it happens, and how it is handled.

---

## Jenkins: push-triggered Multibranch scenarios need an explicit scan after a lab reset

**Status:** resolved in the suite (2026-06-06).

**Symptom**

A push-driven Jenkins scenario (`workflow-trunk-main`, `workflow-trunk-tag`)
times out:

```
Triggering via git push (ref: main)...
Push SHA: 8e61511b...
Waiting for build triggered by SHA 8e61511b...
[ERROR] No build found for SHA 8e61511b... after 300s
```

All API-triggered scenarios (node-full, node-complete, node-deploy-gitops,
node-plan-tag, node-full-cve, the rollback) pass. Only the scenarios that rely
on `git push -> Jenkins build` fail.

**Root cause**

`node-workflow-trunk` is a `WorkflowMultiBranchProject`. A push only produces a
build if Jenkins is notified to index the repository. In this lab there is no
such notification after a reset:

- The Jenkins job has **no `PeriodicFolderTrigger`** (no scheduled scan).
- The freshly recreated Gitea repo has **no webhook**. The Jenkins gitea-plugin
is configured with `manageHooks: true`, but it manages hooks at the **org**
level, and `brik` is a Gitea **user**, not an organisation, so no per-repo
webhook is ever created (verified: a manual scan discovers branches/tags but
still registers no webhook).
- Neither `setup` nor the push step triggers a scan.

So after a `make clean` + `make init` (or any repo recreation) the multibranch
job is never re-indexed, and the push to `main` is never built. Tag scenarios
fail too, because the tag sub-job is only discovered by a scan.

This is independent of the brik runtime and of the `Makefile`/`infra.sh`
lifecycle split: the pipeline itself is fine (a manual "Scan Now" discovers
`main` + `v0.1.0` in ~5s and the branch auto-builds to SUCCESS).

**Resolution**

`jenkins-test.sh` now triggers an explicit Multibranch scan immediately after
the push, via `e2e.jenkins.scan_multibranch` (POST `/job/<job>/build`):

- branch push (`main`): the scan indexes the new commit and the
`BranchDiscoveryTrait` auto-builds the branch;
- tag push (`v0.2.0`): the scan makes the tag sub-job appear, then the existing
tag step issues the explicit `/build`.

Validated live: `workflow-trunk-main` 9/9, `workflow-trunk-tag` 10/10.

**If it still times out**

Confirm the job indexed and re-run the scenario:

```bash
curl -s -u "admin:${JENKINS_ADMIN_PASSWORD}" \
"http://jenkins.briklab.test:9090/job/node-workflow-trunk/api/json" \
| jq -r '[.jobs[]?.name] | join(",")' # expect: main,v0.1.0,...
```

A full `make init` on a clean lab also re-establishes the job state.

---

## GitLab: `brik-deploy` fails with `token signature is invalid` after a lab reset

**Status:** operational (run `infra-refresh`).

**Symptom**

`brik-deploy` fails on `argocd app sync` even though the GitOps manifests were
pushed cleanly:

```
INFO deploy manifests pushed successfully to .../config-deploy-gitops.git
ERROR deploy argocd app sync failed for: brik-e2e-gitops
{"level":"fatal","msg":"... invalid session: token signature is invalid ..."}
```

**Root cause**

A lab reset rotates the ArgoCD server signing key, invalidating previously
issued tokens. The `test` self-heal (`preflight --fix`) refreshes the **local**
`ARGOCD_AUTH_TOKEN` in `.env`, but only `infra-refresh` **propagates** a fresh
token to the GitLab group CI variables (`briklab.recover.gitlab_ci_vars`). The
job runs with the stale CI-variable token.

**Resolution**

```bash
./scripts/briklab.sh infra-refresh # regenerate + propagate the token
./scripts/briklab.sh test --gitlab --project node-deploy-gitops
```

Run `infra-refresh` before deploy/gitops scenarios whenever the lab was reset
between runs.
Loading