AOSSIE-Org · Archit381 · Jun 11, 2026 · Jun 9, 2026 · Jun 9, 2026 · Jun 9, 2026
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,16 @@
+detgpt-env/
+env/
+venv/
+
+
+__pycache__/
+*.pyc
+
+
+.vscode/
+.idea/
+
+
+*.pt
+*.pth
+*.jsonl
diff --git a/notebooks/colab_gpu_reproducibility.ipynb b/notebooks/colab_gpu_reproducibility.ipynb
@@ -0,0 +1,209 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Verifiable LLM Baseline — GPU Reproducibility (Phase 3) on Google Colab\n",
+    "\n",
+    "This notebook extends the **CPU determinism baseline** to a **CUDA GPU**, executing\n",
+    "the Phase 3 goal from the README: *strict GPU determinism* via deterministic cuDNN\n",
+    "and a pinned cuBLAS workspace.\n",
+    "\n",
+    "It runs two things on the GPU:\n",
+    "1. **Segmented audit** (`reproducibility.py`) — the 5 falsifiability scenarios. Scenario 1\n",
+    "   (clean replay) must pass bitwise; scenarios 2–5 must be caught.\n",
+    "2. **Fresh-vs-fresh** (`gpu_reproducibility_test.py`) — two from-scratch runs on the\n",
+    "   same GPU must produce bitwise-identical weights.\n",
+    "\n",
+    "> **Before you run:** set the runtime to GPU — *Runtime → Change runtime type → Hardware\n",
+    "> accelerator → GPU (T4 is fine)*, then *Runtime → Run all*."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Confirm a GPU is attached"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!nvidia-smi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Get the code and pin determinism\n",
+    "\n",
+    "Set `REPO_URL` to your fork/repo. (Alternatively, mount Drive or upload the `src/`\n",
+    "folder — see the commented fallbacks.)\n",
+    "\n",
+    "`CUBLAS_WORKSPACE_CONFIG` is set here *before* any CUDA op so cuBLAS uses a fixed\n",
+    "reduction order. `src/device.py` also sets it on import, so the scripts are safe even\n",
+    "if you skip this — but setting it now keeps the in-kernel `torch` import deterministic too."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "# Must be set before the first CUDA matmul (read once at CUDA context creation).\n",
+    "os.environ[\"CUBLAS_WORKSPACE_CONFIG\"] = \":4096:8\"\n",
+    "\n",
+    "# ---- Option A: clone from GitHub (set this to your repo) ----\n",
+    "REPO_URL = \"https://github.com/<your-username>/Verifiable-LLM-Baseline.git\"\n",
+    "REPO_DIR = \"Verifiable-LLM-Baseline\"\n",
+    "\n",
+    "if not os.path.isdir(REPO_DIR):\n",
+    "    !git clone --depth 1 $REPO_URL $REPO_DIR\n",
+    "\n",
+    "# ---- Option B (fallback): mount Google Drive and point REPO_DIR at your copy ----\n",
+    "# from google.colab import drive\n",
+    "# drive.mount('/content/drive')\n",
+    "# REPO_DIR = '/content/drive/MyDrive/Verifiable-LLM-Baseline'\n",
+    "\n",
+    "# ---- Option C (fallback): upload a zip of the repo ----\n",
+    "# from google.colab import files; files.upload()   # then: !unzip -q Verifiable-LLM-Baseline.zip\n",
+    "\n",
+    "SRC_DIR = os.path.join(REPO_DIR, \"src\")\n",
+    "assert os.path.isdir(SRC_DIR), f\"src/ not found at {SRC_DIR} — fix REPO_URL or use a fallback.\"\n",
+    "\n",
+    "# Colab ships a CUDA-enabled torch already; only ensure the light deps.\n",
+    "!pip install -q \"numpy==2.4.3\" \"tqdm==4.67.3\"\n",
+    "print(\"src ready at:\", SRC_DIR)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Environment fingerprint\n",
+    "\n",
+    "Determinism guarantees hold **within one GPU model**. The device name below is part of\n",
+    "the trust anchor — a different GPU (or a different cuBLAS/cuDNN version) may produce a\n",
+    "different, but still internally reproducible, set of bits."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "print(\"torch           :\", torch.__version__)\n",
+    "print(\"CUDA available  :\", torch.cuda.is_available())\n",
+    "print(\"CUDA runtime    :\", torch.version.cuda)\n",
+    "if torch.cuda.is_available():\n",
+    "    print(\"GPU             :\", torch.cuda.get_device_name(0))\n",
+    "    print(\"cuDNN          :\", torch.backends.cudnn.version())\n",
+    "print(\"CUBLAS workspace:\", os.environ.get(\"CUBLAS_WORKSPACE_CONFIG\"))\n",
+    "assert torch.cuda.is_available(), \"No GPU — set Runtime → Change runtime type → GPU.\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Phase 3 — segmented audit on GPU (5 scenarios)\n",
+    "\n",
+    "Expected: **Scenario 1 PASSES** (clean replay is bitwise deterministic on this GPU);\n",
+    "**Scenarios 2–5 FAIL** (wrong seed, injected gradient noise, post-training sabotage,\n",
+    "and a tampered checkpoint file are all detected)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!cd \"$SRC_DIR\" && python reproducibility.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Fresh-vs-fresh — bitwise GPU reproducibility\n",
+    "\n",
+    "Trains from scratch twice on this GPU and checks the two runs are bitwise identical.\n",
+    "Appends a proof block to `proofs/device_determinism_log.txt`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!cd \"$SRC_DIR\" && python gpu_reproducibility_test.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6. Eval + sealed global manifest\n",
+    "\n",
+    "Runs the deterministic held-out eval and seals the end-to-end pipeline hash\n",
+    "(environment + config + dataset + checkpoint + eval)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!cd \"$SRC_DIR\" && python eval.py && python global_manifest.py"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Interpreting the results\n",
+    "\n",
+    "- **Same GPU → same bits.** A passing Scenario 1 and a passing fresh-vs-fresh test show\n",
+    "  the software entropy is fully controlled on this device.\n",
+    "- **Different GPU → different (but reproducible) bits.** Checkpoint hashes produced on a\n",
+    "  T4 will not match those from an A100 or from CPU. That cross-hardware drift is the\n",
+    "  Phase 2 quantity this baseline is built to measure — not a failure.\n",
+    "- **If you hit a `nondeterministic ... CUDA` error**, an op without a deterministic\n",
+    "  kernel was reached under `torch.use_deterministic_algorithms(True)`. That is a genuine\n",
+    "  finding worth recording — it pinpoints exactly where hardware entropy enters.\n",
+    "- **RNG coverage:** GPU dropout draws from the CUDA generator, so the checkpoint now\n",
+    "  serializes CUDA RNG state alongside CPU/NumPy/Python state — without it, a resumed\n",
+    "  replay would diverge at the first dropout mask."
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "name": "colab_gpu_reproducibility.ipynb",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}