Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
detgpt-env/
env/
venv/


__pycache__/
*.pyc


.vscode/
.idea/


*.pt
*.pth
*.jsonl
209 changes: 209 additions & 0 deletions notebooks/colab_gpu_reproducibility.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Verifiable LLM Baseline — GPU Reproducibility (Phase 3) on Google Colab\n",
"\n",
"This notebook extends the **CPU determinism baseline** to a **CUDA GPU**, executing\n",
"the Phase 3 goal from the README: *strict GPU determinism* via deterministic cuDNN\n",
"and a pinned cuBLAS workspace.\n",
"\n",
"It runs two things on the GPU:\n",
"1. **Segmented audit** (`reproducibility.py`) — the 5 falsifiability scenarios. Scenario 1\n",
" (clean replay) must pass bitwise; scenarios 2–5 must be caught.\n",
"2. **Fresh-vs-fresh** (`gpu_reproducibility_test.py`) — two from-scratch runs on the\n",
" same GPU must produce bitwise-identical weights.\n",
"\n",
"> **Before you run:** set the runtime to GPU — *Runtime → Change runtime type → Hardware\n",
"> accelerator → GPU (T4 is fine)*, then *Runtime → Run all*."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Confirm a GPU is attached"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!nvidia-smi"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Get the code and pin determinism\n",
"\n",
"Set `REPO_URL` to your fork/repo. (Alternatively, mount Drive or upload the `src/`\n",
"folder — see the commented fallbacks.)\n",
"\n",
"`CUBLAS_WORKSPACE_CONFIG` is set here *before* any CUDA op so cuBLAS uses a fixed\n",
"reduction order. `src/device.py` also sets it on import, so the scripts are safe even\n",
"if you skip this — but setting it now keeps the in-kernel `torch` import deterministic too."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"\n",
"# Must be set before the first CUDA matmul (read once at CUDA context creation).\n",
"os.environ[\"CUBLAS_WORKSPACE_CONFIG\"] = \":4096:8\"\n",
"\n",
"# ---- Option A: clone from GitHub (set this to your repo) ----\n",
"REPO_URL = \"https://github.com/<your-username>/Verifiable-LLM-Baseline.git\"\n",
"REPO_DIR = \"Verifiable-LLM-Baseline\"\n",
"\n",
"if not os.path.isdir(REPO_DIR):\n",
" !git clone --depth 1 $REPO_URL $REPO_DIR\n",
"\n",
"# ---- Option B (fallback): mount Google Drive and point REPO_DIR at your copy ----\n",
"# from google.colab import drive\n",
"# drive.mount('/content/drive')\n",
"# REPO_DIR = '/content/drive/MyDrive/Verifiable-LLM-Baseline'\n",
"\n",
"# ---- Option C (fallback): upload a zip of the repo ----\n",
"# from google.colab import files; files.upload() # then: !unzip -q Verifiable-LLM-Baseline.zip\n",
"\n",
"SRC_DIR = os.path.join(REPO_DIR, \"src\")\n",
"assert os.path.isdir(SRC_DIR), f\"src/ not found at {SRC_DIR} — fix REPO_URL or use a fallback.\"\n",
"\n",
"# Colab ships a CUDA-enabled torch already; only ensure the light deps.\n",
"!pip install -q \"numpy==2.4.3\" \"tqdm==4.67.3\"\n",
"print(\"src ready at:\", SRC_DIR)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Environment fingerprint\n",
"\n",
"Determinism guarantees hold **within one GPU model**. The device name below is part of\n",
"the trust anchor — a different GPU (or a different cuBLAS/cuDNN version) may produce a\n",
"different, but still internally reproducible, set of bits."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"print(\"torch :\", torch.__version__)\n",
"print(\"CUDA available :\", torch.cuda.is_available())\n",
"print(\"CUDA runtime :\", torch.version.cuda)\n",
"if torch.cuda.is_available():\n",
" print(\"GPU :\", torch.cuda.get_device_name(0))\n",
" print(\"cuDNN :\", torch.backends.cudnn.version())\n",
"print(\"CUBLAS workspace:\", os.environ.get(\"CUBLAS_WORKSPACE_CONFIG\"))\n",
"assert torch.cuda.is_available(), \"No GPU — set Runtime → Change runtime type → GPU.\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Phase 3 — segmented audit on GPU (5 scenarios)\n",
"\n",
"Expected: **Scenario 1 PASSES** (clean replay is bitwise deterministic on this GPU);\n",
"**Scenarios 2–5 FAIL** (wrong seed, injected gradient noise, post-training sabotage,\n",
"and a tampered checkpoint file are all detected)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!cd \"$SRC_DIR\" && python reproducibility.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Fresh-vs-fresh — bitwise GPU reproducibility\n",
"\n",
"Trains from scratch twice on this GPU and checks the two runs are bitwise identical.\n",
"Appends a proof block to `proofs/device_determinism_log.txt`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!cd \"$SRC_DIR\" && python gpu_reproducibility_test.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Eval + sealed global manifest\n",
"\n",
"Runs the deterministic held-out eval and seals the end-to-end pipeline hash\n",
"(environment + config + dataset + checkpoint + eval)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!cd \"$SRC_DIR\" && python eval.py && python global_manifest.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Interpreting the results\n",
"\n",
"- **Same GPU → same bits.** A passing Scenario 1 and a passing fresh-vs-fresh test show\n",
" the software entropy is fully controlled on this device.\n",
"- **Different GPU → different (but reproducible) bits.** Checkpoint hashes produced on a\n",
" T4 will not match those from an A100 or from CPU. That cross-hardware drift is the\n",
" Phase 2 quantity this baseline is built to measure — not a failure.\n",
"- **If you hit a `nondeterministic ... CUDA` error**, an op without a deterministic\n",
" kernel was reached under `torch.use_deterministic_algorithms(True)`. That is a genuine\n",
" finding worth recording — it pinpoints exactly where hardware entropy enters.\n",
"- **RNG coverage:** GPU dropout draws from the CUDA generator, so the checkpoint now\n",
" serializes CUDA RNG state alongside CPU/NumPy/Python state — without it, a resumed\n",
" replay would diverge at the first dropout mask."
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"name": "colab_gpu_reproducibility.ipynb",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Loading