Skip to content

refactor: changing the repo to be based on prior work established#89

Open
ryoari wants to merge 6 commits into
mainfrom
feature/new-baseline
Open

refactor: changing the repo to be based on prior work established#89
ryoari wants to merge 6 commits into
mainfrom
feature/new-baseline

Conversation

@ryoari

@ryoari ryoari commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Refactors into a new baseline for successive work.

Due to collaborative effort of previous work, a lot of technical debt had built up where I wasn't sure of the code and it's effect on determinism.

As such, as I parse through the preexisting code to gain a better understanding. It is better to move everything to a legacy folder and re-introduce slowly after validating the code functions and it's effect.

Based on my prior supporting work in the https://github.com/ryoari/Verifiable-LLM-Baseline/

I rebase this repo to my previous effort, with clearly understood foundations.

this PR builds on #73 #37 #86

Adds in the following files:

src: contains all the code logic to run cpu and gpu determinism on device

proof: contains hash manifest to verify said cpu and gpu determinism, as well as proof of testing runs

requirements.txt : hardens to a fixed environment standard.

src/

File Summary
config.py Immutable training config + SHA-256 config hash
dataset.py Tiny deterministic char-level dataset + batches
device.py Device select (CPU/CUDA/XPU) + determinism setup
eval.py Held-out eval, perplexity, sealed eval manifest
global_manifest.py Combines all hashes into sealed pipeline manifest
gpu_reproducibility_test.py Fresh-vs-fresh bitwise determinism check per device
main.py set_seed entropy control + sanity smoke test
model.py Deterministic NanoGPT (attention, block, TinyGPT)
reproducibility.py Segmented prover/auditor + 5 tamper scenarios
telemetry.py Per-step loss/grad/param logger + model hashing

proofs/

File Summary
cpu_determinism_log.txt Narrated CPU run history / verification log
device_determinism_log.txt Machine-readable fresh-vs-fresh proof records
env_fingerprint.json Torch/Python/OS/CPU environment snapshot
eval_manifest.json Eval loss, perplexity, dataset + checkpoint hashes
pipeline_manifest.json Final sealed global pipeline hash bundle

AI Usage Disclosure:

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools:
CLAUDE for comments and reasoning in device.py and gpu_reproducibility.py

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
  • I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

  • New Features

    • End-to-end determinism & reproducibility tooling: deterministic model + tiny dataset demo, automated fresh‑vs‑fresh bitwise checks, sealed evaluation and pipeline manifests, and a CLI entrypoint for deterministic setup and quick demos.
  • Documentation

    • Added a Colab workflow demonstrating GPU reproducibility steps and interpretation.
    • Included proof logs and environment/evaluation manifests capturing reproducibility outcomes.
  • Chores

    • Added dependency manifest and project ignores for local/dev artifacts.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds an end-to-end determinism and reproducibility pipeline: device/determinism utilities, a tiny dataset and TinyGPT model, deterministic training and auditor scenarios with telemetry, fresh-vs-fresh tests, evaluation, sealed manifests, proof logs, and a Colab notebook + dependency pins to run the workflow.

Changes

Determinism Testing & Verification Pipeline

Layer / File(s) Summary
Project scaffolding and orchestration
.gitignore, requirements.txt, notebooks/colab_gpu_reproducibility.ipynb
Repository ignores virtualenvs and artifacts, pins runtime deps, and adds a Colab notebook orchestrating the reproducibility workflow.
Device selection & determinism configuration
src/device.py
Import-time cuBLAS workspace pinning, accelerator detection (CUDA → XPU → CPU), RNG capture/restore, accelerator seeding, and global deterministic algorithm configuration.
Training config and dataset
src/config.py, src/dataset.py
Immutable TRAIN_CONFIG with deterministic hash and a tiny fixed-character dataset encoding corpus into precomputed tensors and a get_batch API.
TinyGPT model architecture
src/model.py
Causal self-attention (masked), transformer Block (pre-LN), positional/token embeddings, and a bias-free logits head.
Seeding, telemetry, and entrypoint
src/telemetry.py, src/main.py
Per-step JSONL telemetry logger and set_seed entrypoint that seeds Python/NumPy/PyTorch and configures accelerator determinism.
Training orchestration & auditor scenarios
src/reproducibility.py
run_training_segment with checkpoint/RNG save-restore, four auditor scenarios (bad seed, gradient noise, sabotage, broken seal), and verify to compare telemetry and final hashes.
Fresh-vs-fresh reproducibility test
src/gpu_reproducibility_test.py
Train two independent runs from scratch, compare loss curves and bitwise parameter equality, compute model hashes, append device-proof JSON.
Evaluation and global manifest sealing
src/eval.py, src/global_manifest.py
Evaluate held-out batch, write eval_manifest.json, compute deterministic hashes for environment/config/dataset/checkpoint/eval, and seal into pipeline manifest.
Proof logs and artifacts
proofs/cpu_determinism_log.txt, proofs/device_determinism_log.txt, proofs/env_fingerprint.json, proofs/eval_manifest.json, proofs/pipeline_manifest.json
Recorded console and JSON proof artifacts capturing audit outputs, environment fingerprint, evaluation metrics, and sealed global pipeline hash.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

backend, python, configuration, documentation, dependencies, pending-coderabbit-review, external-contributor, size/XL

Suggested reviewers

  • Archit381

Poem

🐰 I hopped through seeds and hashes tight,
cuBLAS pinned before the light,
Logs and manifests sealed so neat,
TinyGPT marches, step by beat—
Reproducible dreams in byte-sized flight.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title references 'prior work established' but lacks specificity about what those changes entail, making it partially vague. Consider a more specific title such as 'feat: add deterministic reproducibility tooling and baselines' or 'refactor: introduce reproducibility framework and proofs' to clearly convey the main changes.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/new-baseline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🧹 Nitpick comments (4)
src/global_manifest.py (2)

24-24: ⚡ Quick win

Fragile version string parsing.

The expression sys.version.split(' ')[0] assumes the Python version string always starts with the version number followed by a space. While this holds for standard CPython, alternative implementations or unusual build configurations could break this assumption.

♻️ More robust version parsing
-        "python": sys.version.split(' ')[0],
+        "python": platform.python_version(),

The platform.python_version() function reliably returns just the version number (e.g., "3.14.0").

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/global_manifest.py` at line 24, Replace the fragile sys.version.split('
')[0] usage when building the "python" entry in the manifest: locate where
sys.version is used (the expression sys.version.split(' ')[0]) and swap it to
use platform.python_version() so the manifest stores a reliable version string;
import platform at the top if not present and update the "python" key generation
in the same function or module where the manifest dict is constructed.

32-34: Make the dataset-hash endian-stable (or document the supported platforms).

TinyDataset().encoded is built deterministically from a constant string and uses fixed dtype=torch.long (int64), so the values feeding the hash are consistent. However, dataset.encoded.numpy().tobytes() uses native NumPy memory layout (including endianness), so the SHA-256 will differ on big-endian environments. For strict cross-endian reproducibility, hash a canonical representation (e.g., hash the integer list / JSON, or cast to a fixed endianness before tobytes()).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/global_manifest.py` around lines 32 - 34, The SHA-256 currently uses
dataset.encoded.numpy().tobytes() which depends on native endianness; make the
hash endian-stable by converting the array to a fixed byte-order or canonical
representation before hashing: for example, take TinyDataset().encoded.numpy(),
cast it to a defined endianness (e.g., little-endian int64) or serialize the
integer list (e.g., .tolist() then JSON canonicalize) and then compute
hashlib.sha256(...) on that canonical bytes; update the dataset_hash computation
(the variable dataset_hash and uses of TinyDataset and dataset.encoded)
accordingly and document the chosen approach.
src/device.py (1)

102-105: ⚡ Quick win

Add stacklevel to backend-mismatch warning for actionable call sites.

This warning currently points at restore_accel_rng_state() instead of the caller, which makes audit-path debugging harder.

Proposed diff
     if saved.get("backend") != current:
         warnings.warn(
             f"Skipping accelerator RNG restore: checkpoint backend "
             f"{saved.get('backend')!r} != current backend {current!r}."
+            ,
+            stacklevel=2
         )
         return
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/device.py` around lines 102 - 105, The warning in restore_accel_rng_state
currently points to inside that function; update the warnings.warn call in
restore_accel_rng_state to pass an appropriate stacklevel (e.g., stacklevel=2)
so the reported site is the caller, not the warning emitter; locate the
warnings.warn invocation that constructs the message using saved.get('backend')
and current and add the stacklevel parameter to the call.

Source: Linters/SAST tools

src/config.py (1)

4-20: ⚡ Quick win

“Immutable” config is currently mutable at runtime.

TRAIN_CONFIG can be mutated by any importer, which undermines the determinism contract this module describes.

Proposed diff
 import json
 import hashlib
+from types import MappingProxyType
 
 # The Immutable Training Configuration
-TRAIN_CONFIG = {
+TRAIN_CONFIG = MappingProxyType({
     "embed_dim": 16,
     "num_heads": 2,
     "max_seq_len": 32,
     "dropout": 0.1,
     "lr": 0.01,
     "optimizer": "Adam",
     "seed": 99,
     "total_steps": 10,
     "checkpoint_step": 5
-}
+})
 
 def get_config_hash():
     """Returns a deterministic SHA-256 hash of the configuration dict."""
-    encoded = json.dumps(TRAIN_CONFIG, sort_keys=True).encode()
+    encoded = json.dumps(dict(TRAIN_CONFIG), sort_keys=True).encode()
     return hashlib.sha256(encoded).hexdigest()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/config.py` around lines 4 - 20, TRAIN_CONFIG is currently a plain dict
and can be mutated at runtime; make it truly read-only by keeping the mutable
dict as a private variable (e.g. _TRAIN_CONFIG) and expose a read-only view
using types.MappingProxyType assigned to TRAIN_CONFIG, import types at top, and
ensure get_config_hash serializes the underlying mutable dict (e.g.
json.dumps(_TRAIN_CONFIG, sort_keys=True).encode()) rather than the
MappingProxyType so the hash remains deterministic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@proofs/device_determinism_log.txt`:
- Around line 1-52: The JSON verification log contains inconsistent keys: the
first record uses "cuda": null while later records use "accelerator_version":
null; update the generator/output so all records use the same field name
(preferably "accelerator_version") by replacing the "cuda" key with
"accelerator_version" in the first record and ensuring any code that emits the
log uses the single canonical key ("accelerator_version") instead of "cuda".

In `@requirements.txt`:
- Around line 1-3: requirements.txt currently uses ranged constraints (e.g.,
"torch>=2.10.0,<3.0", "numpy>=2.4,<3.0", "tqdm>=4.67,<5.0"); update these to
exact pinned versions that match Legacy/uv.lock (replace the ranges with
explicit version strings) and update notebooks/colab_gpu_reproducibility.ipynb
to install from the lockfile or the exact pins (instead of pip install with
ranges) — either change the notebook cell to run a locked install (e.g., pip
install -r requirements.txt using the pinned file or pip install from the lock)
or replace the "!pip install numpy>=2.4,<3.0 tqdm>=4.67,<5.0" cell with the
exact pinned versions from Legacy/uv.lock to ensure deterministic installs.

In `@src/dataset.py`:
- Around line 18-23: Validate block_size inside get_batch to ensure it produces
equal-length x and y slices: check that block_size is a positive integer and <=
len(self.encoded) - 1 (since y uses index 1..block_size+1) and raise a clear
ValueError if not. Update the get_batch function (referencing get_batch,
self.encoded, x, y) to perform this guard before slicing so x =
self.encoded[:block_size] and y = self.encoded[1:block_size+1] always produce
matching shapes.

In `@src/eval.py`:
- Line 70: The print call print(f"\n ~> Manifest saved to eval_manifest.json")
uses an unnecessary f-string; replace it with a plain string literal (print("\n
~> Manifest saved to eval_manifest.json")) to remove the redundant f prefix and
satisfy the Ruff warning, locating and updating the print statement in eval.py.
- Around line 67-70: The code currently writes eval_manifest.json to the CWD
which can break downstream stages; update the write to save under the
deterministic proofs directory (e.g., "proofs/eval_manifest.json") by creating
the proofs directory if missing and writing manifest there instead of
open("eval_manifest.json", "w"); locate the write using the manifest variable
and the open(...) call in src/eval.py and replace it to build the path
(Path("proofs") / "eval_manifest.json") and ensure the directory exists before
json.dump.
- Line 36: The torch.load call that assigns checkpoint (checkpoint =
torch.load("mid_checkpoint.pt", weights_only=False, map_location=DEVICE)) is
vulnerable to arbitrary code execution and should be hardened: either switch to
loading a weights-only checkpoint by using weights_only=True and refactor
saving/loading so mid_checkpoint.pt contains only tensors, or keep the existing
checkpoint format but validate file integrity before loading by computing and
comparing a trusted SHA256/HMAC of "mid_checkpoint.pt" and abort if it doesn't
match; implement the chosen change around the checkpoint load (the checkpoint
variable assignment) and ensure any downstream code that expects RNG/metadata is
updated if you drop non-tensor state.

In `@src/global_manifest.py`:
- Around line 36-38: The manifest generation is computing inconsistent
model_checkpoint_hash values because different files/paths or non-deterministic
checkpoint contents are being read; update the logic around the model hash
calculation (the open("mid_checkpoint.pt", "rb") call and the model_hash
variable in src/global_manifest.py) to always read the same canonical checkpoint
file (use a single configured checkpoint path/name), verify the file exists and
is fully written before hashing, and compute the SHA256 over the exact bytes
deterministically; ensure the same model_hash value is then injected into all
manifests (proofs/*) from this single source so every generated manifest
references the identical checkpoint hash.
- Around line 36-38: The code currently opens "mid_checkpoint.pt" to compute
model_hash without checking its existence; modify the block that computes
model_hash to first verify the file exists (e.g., using os.path.exists or
pathlib.Path("mid_checkpoint.pt").exists()), and if missing raise a clear error
or log a descriptive message and exit (consistent with the existing
eval_manifest.json check). Update the model_hash computation area (the with
open("mid_checkpoint.pt", "rb") as f: ... block) to perform this existence check
before opening the file.

In `@src/main.py`:
- Around line 35-41: The top-level creation and printing of TinyDataset
(TinyDataset() and dataset.get_batch()) should not run at import time because
importing set_seed triggers this side effect; move that demo logic into a
guarded entrypoint (e.g., wrap the dataset = TinyDataset(); x, y =
dataset.get_batch(); print(...) block into an if __name__ == "__main__": or into
a function like main_demo() and call it only when executed as a script) so
importing src/reproducibility.py or other modules no longer emits output.

In `@src/model.py`:
- Around line 62-75: The forward method lacks a max-sequence guard and __init__
doesn't store max_seq_len; store the constructor arg as self.max_seq_len in
__init__ and add a clear guard at the start of forward (e.g., check T =
idx.size(1) or B,T = idx.size() and if T > self.max_seq_len raise a ValueError)
so inputs longer than the configured max_seq_len fail with an explicit message
referencing the model's max sequence length; update references in forward to use
self.max_seq_len.
- Around line 44-48: The Block.__init__ currently ignores the dropout parameter
when constructing the attention module; update the CausalSelfAttention
instantiation in Block.__init__ to pass the configured dropout through (e.g.,
call CausalSelfAttention(embed_dim, num_heads, max_seq_len, dropout) or named
arg dropout=dropout) and ensure the CausalSelfAttention constructor accepts and
uses that dropout parameter so attention dropout follows the Block's configured
value.

In `@src/reproducibility.py`:
- Line 56: Several print statements use f-strings without interpolation (e.g.,
the print call containing "FATAL ALERT! : Cryptographic seal broken!" and the
other literal messages at the noted locations); remove the unnecessary leading
'f' from those string literals so they become plain strings (e.g., change
print(f"...") to print("...")) for each occurrence referenced in
reproducibility.py (the literal messages at the indicated lines).
- Around line 29-30: The code computes active_end_step = end_step if end_step is
not None else TRAIN_CONFIG["total_steps"] but the training loop still references
end_step; update the loop to use active_end_step instead of end_step so the
fallback is honored and no None is used. Locate usages of end_step in the
training loop (the block that iterates or checks current step against end_step)
and replace them with active_end_step, keeping TRAIN_CONFIG and active_end_step
logic intact.
- Around line 53-59: The checkpoint verification currently only prints a “FATAL
ALERT” and continues; modify the checkpoint seal check (the block that calls
logger.hash_model(model) and compares loaded_hash to
checkpoint['checkpoint_hash']) to fail fast: after detecting mismatch, raise a
clear exception (e.g., RuntimeError) or call sys.exit(1) so execution stops
immediately, and include the expected and actual hash snippets in the exception
message for diagnostics instead of just printing them.
- Around line 127-137: The replay currently restores model and optimizer from
checkpoint but not the RNG states, so replay can drift; after loading
checkpoint['model'] and checkpoint['optimizer'] restore RNG state(s) from the
checkpoint before any noise injection by applying the saved RNG blobs (e.g.,
checkpoint['torch_rng_state'] via torch.set_rng_state,
checkpoint['cuda_rng_state'] via torch.cuda.set_rng_state_all,
checkpoint['np_rng_state'] via numpy.random.set_state, and
checkpoint['py_rng_state'] via random.setstate') so that subsequent operations
(including secret_noise_auditor gradient-noise injection) start from the exact
checkpoint RNG state.
- Line 247: The function currently returns only the telemetry match boolean (the
variable match) which can report success even on a hash mismatch; change the
return to the full verification verdict object instead of match so callers
receive the complete result (e.g., replace the final return of match with the
verification result variable used earlier in this function — commonly named
verdict or verification_result — ensuring you return that object containing
status, hashes, and telemetry rather than the match boolean).
- Line 49: The checkpoint deserialization uses
torch.load(checkpoint_path_to_load, weights_only=False) which can execute
arbitrary code; change loading to torch.load(checkpoint_path_to_load,
weights_only=True) to restrict deserialization to tensors (update every
occurrence), and move non-tensor state (e.g., numpy_rng, python_rng) into a
separate safe sidecar (e.g., JSON/pickle-safe or explicit state file) that you
load with a safe parser after verifying the checkpoint_hash; update the code
paths that expect these RNG objects (locations referencing checkpoint,
numpy_rng, python_rng or functions that restore RNG state) to read the sidecar
and restore RNG state explicitly instead of relying on non-tensor objects inside
torch.load.

---

Nitpick comments:
In `@src/config.py`:
- Around line 4-20: TRAIN_CONFIG is currently a plain dict and can be mutated at
runtime; make it truly read-only by keeping the mutable dict as a private
variable (e.g. _TRAIN_CONFIG) and expose a read-only view using
types.MappingProxyType assigned to TRAIN_CONFIG, import types at top, and ensure
get_config_hash serializes the underlying mutable dict (e.g.
json.dumps(_TRAIN_CONFIG, sort_keys=True).encode()) rather than the
MappingProxyType so the hash remains deterministic.

In `@src/device.py`:
- Around line 102-105: The warning in restore_accel_rng_state currently points
to inside that function; update the warnings.warn call in
restore_accel_rng_state to pass an appropriate stacklevel (e.g., stacklevel=2)
so the reported site is the caller, not the warning emitter; locate the
warnings.warn invocation that constructs the message using saved.get('backend')
and current and add the stacklevel parameter to the call.

In `@src/global_manifest.py`:
- Line 24: Replace the fragile sys.version.split(' ')[0] usage when building the
"python" entry in the manifest: locate where sys.version is used (the expression
sys.version.split(' ')[0]) and swap it to use platform.python_version() so the
manifest stores a reliable version string; import platform at the top if not
present and update the "python" key generation in the same function or module
where the manifest dict is constructed.
- Around line 32-34: The SHA-256 currently uses
dataset.encoded.numpy().tobytes() which depends on native endianness; make the
hash endian-stable by converting the array to a fixed byte-order or canonical
representation before hashing: for example, take TinyDataset().encoded.numpy(),
cast it to a defined endianness (e.g., little-endian int64) or serialize the
integer list (e.g., .tolist() then JSON canonicalize) and then compute
hashlib.sha256(...) on that canonical bytes; update the dataset_hash computation
(the variable dataset_hash and uses of TinyDataset and dataset.encoded)
accordingly and document the chosen approach.
🪄 Autofix (Beta)

✅ Autofix completed


ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0d8042b3-e8ac-4ed3-954e-26e8bbad2dfb

📥 Commits

Reviewing files that changed from the base of the PR and between 274e0d8 and 3047da1.

📒 Files selected for processing (18)
  • .gitignore
  • notebooks/colab_gpu_reproducibility.ipynb
  • proofs/cpu_determinism_log.txt
  • proofs/device_determinism_log.txt
  • proofs/env_fingerprint.json
  • proofs/eval_manifest.json
  • proofs/pipeline_manifest.json
  • requirements.txt
  • src/config.py
  • src/dataset.py
  • src/device.py
  • src/eval.py
  • src/global_manifest.py
  • src/gpu_reproducibility_test.py
  • src/main.py
  • src/model.py
  • src/reproducibility.py
  • src/telemetry.py

Comment thread proofs/device_determinism_log.txt
Comment thread requirements.txt Outdated
Comment thread src/dataset.py
Comment thread src/eval.py Outdated
dropout=TRAIN_CONFIG["dropout"]
).to(DEVICE)

checkpoint = torch.load("mid_checkpoint.pt", weights_only=False, map_location=DEVICE)

@coderabbitai coderabbitai Bot Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and inspect src/eval.py around the referenced line
ls -la
fd -n --hidden --max-depth 6 "eval.py" || true
sed -n '1,120p' src/eval.py || true
nl -ba src/eval.py | sed -n '20,70p' || true

# Find any other torch.load usage and checkpoint-related code
rg -n "torch\.load\(" -S src || true
rg -n "mid_checkpoint\.pt" -S . || true
rg -n "weights_only" -S . || true

# Check dependencies to see if weights_only is supported in this repo's PyTorch constraints
ls -ლა
fd -t f -a -n "requirements*.txt" . || true
fd -t f -a -n "pyproject.toml" . || true
fd -t f -a -n "environment*.yml" . || true
fd -t f -a -n "requirements*.in" . || true
for f in requirements*.txt pyproject.toml environment*.yml environment*.yaml Pipfile Pipfile.lock setup.py setup.cfg; do
  if [ -f "$f" ]; then echo "---- $f ----"; sed -n '1,200p' "$f"; fi
done

# If pip/conda pins torch, extract versions; otherwise just report nothing found
rg -n "torch(==|>=|<=|~=|=)" -S requirements*.txt pyproject.toml environment*.yml environment*.yaml Pipfile setup.py setup.cfg 2>/dev/null || true

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 5741


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show eval.py around the torch.load line with line numbers (use awk instead of nl)
sed -n '1,120p' src/eval.py | cat

# Show all torch.save usage (to infer checkpoint contents/schema)
rg -n "torch\.save\(" -S src || true

# Inspect reproducibility.py around each torch.load call site
for start in 1 70 120 150 185 230; do
  end=$((start+80))
  echo "---- src/reproducibility.py lines ${start}-${end} ----"
  sed -n "${start},${end}p" src/reproducibility.py || true
done

# Inspect global_manifest.py where mid_checkpoint.pt is opened as binary
sed -n '1,140p' src/global_manifest.py || true

# Check requirements.txt for torch version / constraints
sed -n '1,200p' requirements.txt || true

# Also locate any docs about checkpoint format (optional)
rg -n "checkpoint" -S README.md src | head -n 80 || true

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 31794


🌐 Web query:

PyTorch torch.load weights_only=True behavior non-tensor objects numpy_rng python_rng safe unpickler allowed types

💡 Result:

When weights_only=True is enabled in torch.load, PyTorch restricts the unpickler to a safe subset of types to prevent arbitrary code execution [1][2]. This mode is the default behavior in recent versions of PyTorch (starting in version 2.6) when the pickle_module argument is not provided [1][2]. Allowed Types and Behavior The restricted unpickler is limited to: - Plain torch.Tensors [1][2]. - Primitive Python types (e.g., numbers, strings, lists, dictionaries) [3][2]. - Any additional types explicitly registered via torch.serialization.add_safe_globals [3][4]. The weights_only unpickler does not allow dynamic imports during unpickling [1][2]. If an unsupported object (such as a custom class or a complex non-tensor object) is encountered, the unpickler will raise an error [2]. You can resolve this by allowlisting the required globals using torch.serialization.add_safe_globals or the torch.serialization.safe_globals context manager if the source is trusted [2]. Non-Tensor Objects (RNGs) - Python RNGs (random.Random) and NumPy RNGs are generally not included in the default safe allowlist because they are not necessary for loading state_dicts of standard PyTorch models [1][2]. - Attempting to load files containing these objects with weights_only=True will likely result in an "Unsupported global" error. - If you require these objects, you must manually register them as safe using the aforementioned functions [2]. Security Considerations The weights_only=True mode is designed to guard against denial of service and arbitrary code execution attacks [1]. While it significantly increases security when loading untrusted files, PyTorch warns that memory corruption issues are still theoretically possible, and loading data from untrusted sources should always be done with caution [1][5]. If you must load complex files, such as those containing entire nn.Module objects, setting weights_only=False is often necessary, but this should only be performed with files from trusted sources due to the associated security risks [1][2]. You can control this behavior using environment variables: - TORCH_FORCE_WEIGHTS_ONLY_LOAD=1: Forces all torch.load calls to use weights_only=True [1][2]. - TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1: Disables the default weights_only=True behavior [1][2].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '1,120p' src/main.py
rg -n "mid_checkpoint\.pt|model\.pt" -S src/main.py src/reproducibility.py src/eval.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 3176


🏁 Script executed:

#!/bin/bash
set -euo pipefail
sed -n '1,220p' src/device.py
rg -n "def get_device|def accel_rng_state|def restore_accel_rng_state|accel_rng_state\(" -S src/device.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 4443


Harden checkpoint loading against pickle execution in src/eval.py (line 36).
torch.load(..., weights_only=False) can execute arbitrary code if mid_checkpoint.pt is tampered. This repo’s mid_checkpoint.pt contains non-tensor Python/NumPy RNG state and other metadata (saved via torch.save), so switching to weights_only=True will likely fail unless you refactor the checkpoint format (or load a separate weights-only checkpoint).

Suggested hardening
-    checkpoint = torch.load("mid_checkpoint.pt", weights_only=False, map_location=DEVICE)
+    checkpoint = torch.load("mid_checkpoint.pt", weights_only=True, map_location=DEVICE)

Also, avoid relying on post-load hash checks for security: if you use integrity (e.g., SHA256 from a trusted manifest), compute/compare it before calling torch.load.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/eval.py` at line 36, The torch.load call that assigns checkpoint
(checkpoint = torch.load("mid_checkpoint.pt", weights_only=False,
map_location=DEVICE)) is vulnerable to arbitrary code execution and should be
hardened: either switch to loading a weights-only checkpoint by using
weights_only=True and refactor saving/loading so mid_checkpoint.pt contains only
tensors, or keep the existing checkpoint format but validate file integrity
before loading by computing and comparing a trusted SHA256/HMAC of
"mid_checkpoint.pt" and abort if it doesn't match; implement the chosen change
around the checkpoint load (the checkpoint variable assignment) and ensure any
downstream code that expects RNG/metadata is updated if you drop non-tensor
state.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will check this out later

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoari, sounds good! Would you like me to open a GitHub issue to track this hardening task so it doesn't get lost? I can create one with the details of the concern (switching to weights_only=True or adding pre-load integrity verification for mid_checkpoint.pt) and assign it to you.

Comment thread src/eval.py Outdated
Comment thread src/reproducibility.py
logger = TelemetryLogger(filepath=log_file)

if checkpoint_path_to_load:
checkpoint = torch.load(checkpoint_path_to_load, weights_only=False)

@coderabbitai coderabbitai Bot Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
ls -R

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 1375


🏁 Script executed:

#!/bin/bash
set -euo pipefail
ls -la
fd -t f -a src/reproducibility.py || true
stat src/reproducibility.py || true
sed -n '1,220p' src/reproducibility.py | nl -ba | sed -n '1,220p'

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 1131


🏁 Script executed:

#!/bin/bash
set -euo pipefail
nl -ba src/reproducibility.py | sed -n '1,240p'

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 115


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "torch\.load\(" src/reproducibility.py
rg -n "weights_only" src/reproducibility.py
rg -n "checkpoint_path_to_load|checkpoint" src/reproducibility.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 4206


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "(hash|sha256|sha1|digest|seal|signature|verify|hmac|checksum|gpg|pgp)" src/reproducibility.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 2320


🏁 Script executed:

#!/bin/bash
set -euo pipefail
for r in "35,75" "90,120" "120,160" "155,190" "185,215"; do
  echo "---- lines $r ----"
  nl -ba src/reproducibility.py | sed -n "${r}p"
done

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 155


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "argparse|sys\.argv|click|typer|fire|main\(" src/reproducibility.py
rg -n "checkpoint_path_to_load" -S src/reproducibility.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 54


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "torch\.load\([^)]*weights_only=False" -S .
rg -n "weights_only=False" -S .

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 1280


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "torch\.load\([^)]*weights_only=False" -S . | head -n 50
rg -n "weights_only=False" -S . | head -n 50

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 1280


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "weights_only=True|safe_open|zipfile|_use_new_zipfile_serialization" -S . | head -n 50

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 206


🏁 Script executed:

# noop
true

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 54


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "=== File head ==="
nl -ba src/reproducibility.py | sed -n '1,240p'

echo
echo "=== torch.load occurrences in file ==="
rg -n "torch\.load\(" src/reproducibility.py || true
echo
echo "=== weights_only usage ==="
rg -n "weights_only" src/reproducibility.py || true
echo
echo "=== hash/seal/verify keywords ==="
rg -n "(hash|sha256|sha1|digest|seal|signature|verify|hmac|checksum)" src/reproducibility.py || true

echo
echo "=== Context around specific lines ==="
for r in "35,75" "95,115" "125,145" "155,175" "185,205"; do
  echo "---- lines $r ----"
  nl -ba src/reproducibility.py | sed -n "${r}p"
done

echo
echo "=== CLI / argv usage ==="
rg -n "(argparse|sys\.argv|click|typer|fire)" src/reproducibility.py || true

echo
echo "=== Repo-wide weights_only=False ==="
rg -n "torch\.load\([^)]*weights_only=False" -S . | head -n 50 || true

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 151


🏁 Script executed:

# end

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 54


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "DEVICE" src/reproducibility.py || true
rg -n "checkpoint_hash" src/reproducibility.py || true
# Print small relevant windows without relying on nl/stat
sed -n '1,120p' src/reproducibility.py
sed -n '120,240p' src/reproducibility.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 11308


Harden checkpoint deserialization: remove torch.load(..., weights_only=False)

In src/reproducibility.py (lines 49, 105, 134, 167, 197), checkpoints are loaded with torch.load(..., weights_only=False). Any tampered/untrusted checkpoint can execute arbitrary code during deserialization before the later checkpoint_hash “verifying cryptographic seal” check runs.

Suggested hardening
- checkpoint = torch.load(checkpoint_path_to_load, weights_only=False)
+checkpoint = torch.load(
+    checkpoint_path_to_load,
+    map_location=DEVICE,
+    weights_only=True,
+)

mid_checkpoint.pt currently saves non-tensor Python objects like numpy_rng and python_rng; weights_only=True will likely require restructuring (e.g., save RNG state in a safe sidecar/format and load it separately, while keeping the checkpoint payload weights-only).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/reproducibility.py` at line 49, The checkpoint deserialization uses
torch.load(checkpoint_path_to_load, weights_only=False) which can execute
arbitrary code; change loading to torch.load(checkpoint_path_to_load,
weights_only=True) to restrict deserialization to tensors (update every
occurrence), and move non-tensor state (e.g., numpy_rng, python_rng) into a
separate safe sidecar (e.g., JSON/pickle-safe or explicit state file) that you
load with a safe parser after verifying the checkpoint_hash; update the code
paths that expect these RNG objects (locations referencing checkpoint,
numpy_rng, python_rng or functions that restore RNG state) to read the sidecar
and restore RNG state explicitly instead of relying on non-tensor objects inside
torch.load.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will check this later

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoari, no worries! Whenever you're ready to address it, the key points are:

  1. Switch all five torch.load(..., weights_only=False) calls (lines 49, 105, 134, 167, 197) to weights_only=True.
  2. Move non-tensor state (numpy_rng, python_rng) out of the checkpoint into a safe sidecar format (e.g., JSON or a separate .pkl guarded by hash verification), and restore RNG state from there explicitly — after verifying the checkpoint_hash.

Would you like me to open a GitHub issue to track this follow-up so it doesn't get lost?

Comment thread src/reproducibility.py
Comment thread src/reproducibility.py Outdated
Comment thread src/reproducibility.py
Comment thread src/reproducibility.py Outdated
ryoari and others added 3 commits June 9, 2026 14:30
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@ryoari ryoari changed the title refactor: new baseline refactor: changing the repo to be based on prior work established Jun 9, 2026
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 6 file(s) based on 10 unresolved review comments.

Files modified:

  • notebooks/colab_gpu_reproducibility.ipynb
  • requirements.txt
  • src/eval.py
  • src/global_manifest.py
  • src/model.py
  • src/reproducibility.py

Commit: 2c389fdb105e450a5e12db5d90643d78a7f7275b

The changes have been pushed to the feature/new-baseline branch.

Time taken: 6m 32s

Fixed 6 file(s) based on 10 unresolved review comments.

Co-authored-by: CodeRabbit <noreply@coderabbit.ai>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/eval.py (1)

37-45: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Pre-load integrity is still bypassable; weights_only=False remains exploitable.

At Line 45, deserialization happens before any trusted integrity decision. The checkpoint_hash checked at Lines 52-55 is attacker-controlled (it is stored inside the same checkpoint), so it cannot authenticate the file. file_hash at Line 40 is currently unused for enforcement.

Suggested direction (fail closed before torch.load)
-    with open(checkpoint_path, "rb") as f:
-        file_hash = hashlib.sha256(f.read()).hexdigest()
+    expected_file_hash = os.getenv("MID_CHECKPOINT_SHA256")
+    if not expected_file_hash:
+        raise RuntimeError("Missing MID_CHECKPOINT_SHA256 for pre-load checkpoint verification.")
+    with open(checkpoint_path, "rb") as f:
+        file_hash = hashlib.sha256(f.read()).hexdigest()
+    if file_hash != expected_file_hash:
+        raise RuntimeError("Checkpoint file hash mismatch; refusing to deserialize.")

     # Load checkpoint (contains model, optimizer, and RNG states)
     # weights_only=False required for non-tensor state (RNG, metadata)
-    # File hash computed above provides tamper detection
+    # Safe only after trusted pre-load hash verification
     checkpoint = torch.load(checkpoint_path, weights_only=False, map_location=DEVICE)
#!/bin/bash
set -euo pipefail

# Verify current trust chain: pre-load hash computed, but not enforced against trusted external value.
rg -n "file_hash|weights_only=False|checkpoint_hash|torch\\.load\\(" src/eval.py src/reproducibility.py
sed -n '34,62p' src/eval.py
sed -n '70,120p' src/reproducibility.py

Also applies to: 52-55

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/eval.py` around lines 37 - 45, Compute and enforce the pre-load file
integrity check using the locally computed file_hash before calling torch.load:
retrieve the expected trusted hash from a trusted external source (e.g., a
reproducibility function or environment/secure store) instead of using the
checkpoint's internal checkpoint_hash, compare expected_hash to file_hash and
abort/raise if they differ, and only then call torch.load(checkpoint_path,
weights_only=False, map_location=DEVICE); do not rely on
checkpoint['checkpoint_hash'] for authentication and ensure this verification
happens prior to torch.load so deserialization cannot be bypassed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@requirements.txt`:
- Line 1: requirements.txt currently pins torch==2.10.0 which is vulnerable to
PYSEC-2026-139; to mitigate, audit the codebase for any usages that load
serialized PyTorch artifacts (e.g., calls to torch.load, torch.jit.load,
torch.export.load or custom deserialization paths) and ensure they never accept
untrusted inputs and use secure options such as weights_only=True for
torch.export.load where applicable; add runtime guards/input validation around
any model-loading entrypoints and document the temporary mitigation in the repo
(and add a TODO to update the pin), then when an official patched PyTorch
release is available, update the torch pin in requirements.txt and re-lock
deterministic dependency hashes.

---

Duplicate comments:
In `@src/eval.py`:
- Around line 37-45: Compute and enforce the pre-load file integrity check using
the locally computed file_hash before calling torch.load: retrieve the expected
trusted hash from a trusted external source (e.g., a reproducibility function or
environment/secure store) instead of using the checkpoint's internal
checkpoint_hash, compare expected_hash to file_hash and abort/raise if they
differ, and only then call torch.load(checkpoint_path, weights_only=False,
map_location=DEVICE); do not rely on checkpoint['checkpoint_hash'] for
authentication and ensure this verification happens prior to torch.load so
deserialization cannot be bypassed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0367fb2e-e1c5-4723-9bc4-597fde396da4

📥 Commits

Reviewing files that changed from the base of the PR and between 86fbc43 and 2c389fd.

📒 Files selected for processing (6)
  • notebooks/colab_gpu_reproducibility.ipynb
  • requirements.txt
  • src/eval.py
  • src/global_manifest.py
  • src/model.py
  • src/reproducibility.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • notebooks/colab_gpu_reproducibility.ipynb
  • src/model.py
  • src/global_manifest.py
  • src/reproducibility.py

Comment thread requirements.txt
@@ -0,0 +1,8 @@
torch==2.10.0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

python - <<'PY'
import json, urllib.request

payload = {
    "package": {"name": "torch", "ecosystem": "PyPI"},
    "version": "2.10.0"
}
req = urllib.request.Request(
    "https://api.osv.dev/v1/query",
    data=json.dumps(payload).encode(),
    headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req) as resp:
    print(json.dumps(json.load(resp), indent=2))
PY

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 2655


🌐 Web query:

PYSEC-2026-139 torch 2.10.0

💡 Result:

PYSEC-2026-139 corresponds to a deserialization vulnerability in PyTorch version 2.10.0 [1]. This vulnerability is identified by CVE-2026-4538 [2][3]. The vulnerability resides within the pt2 (PyTorch 2.0 export/loading) Loading Handler component [1][4]. It involves an unspecified function that incorrectly processes untrusted serialized data, leading to a deserialization flaw [4]. An attacker with local access to a system can exploit this by crafting a malicious file to achieve arbitrary code execution or manipulate internal application states [4][3]. The issue was addressed in the community via pull request #176791 [2][5], which introduced a weights_only parameter to the torch.export.load API to allow users to enforce secure deserialization, preventing insecure fallbacks to pickle when loading untrusted artifacts [5]. As of the reports, this vulnerability affects PyTorch 2.10.0 [1][3]. Users are encouraged to exercise caution when loading untrusted files and to utilize the weights_only=True flag where supported by the API [5].

Citations:


🌐 Web query:

site:osv.dev PYSEC-2026-139

💡 Result:

PYSEC-2026-139 is a security vulnerability identified in PyTorch version 2.10.0 [1]. It affects the pt2 Loading Handler component [1]. The vulnerability involves a deserialization issue that can be exploited by an attacker with access to the local environment [1]. The vulnerability has a CVSS v3.1 score of 7.8, which is classified as High [1]. According to the advisory, the exploit is publicly available, and the PyTorch project had been informed of the issue but had not addressed it as of the time of publication [1]. Affected versions listed include a wide range of releases, with 2.10.0 being the specific version mentioned as affected [1].

Citations:


🌐 Web query:

osv.dev PYSEC-2026-139 affected versions fixed versions

💡 Result:

The vulnerability identified as PYSEC-2026-139 affects the PyTorch (torch) package on PyPI [1]. Affected Versions: The vulnerability affects PyTorch version 2.10.0 [1][2]. According to OSV and related vulnerability reports, it impacts a wide range of versions leading up to and including 2.10.0 [1]. Fixed Versions: As of June 9, 2026, there is no fixed version available for this vulnerability [2][3]. The project has been notified, but has not yet released a patch to address the issue [1][2]. Details: The vulnerability is a deserialization flaw located in the pt2 Loading Handler component of PyTorch [1]. It can be exploited by an attacker with local access to the system [1][3]. A public exploit exists for this vulnerability [1][3].

Citations:


🌐 Web query:

CVE-2026-4538 PyTorch fixed in which version torch

💡 Result:

As of June 9, 2026, there is no fixed version for CVE-2026-4538 [1][2]. This vulnerability involves a deserialization flaw in the pt2 Loading Handler component of PyTorch version 2.10.0 [3][4]. Security advisories and vulnerability databases report that the project had not yet released a fix for this specific issue as of the most recent updates [3][5][2]. Please note that CVE-2026-4538 is distinct from another PyTorch vulnerability, CVE-2026-24747, which affected earlier versions of PyTorch and was fixed in version 2.10.0 [6][7][8]. Users should be aware that while version 2.10.0 resolved the previous security concerns [6][8], it introduced or contained the deserialization vulnerability described in CVE-2026-4538 [3][4][9]. Organizations using affected software should monitor the official PyTorch security advisories or their package repository for forthcoming patches [3][10].

Citations:


Resolve PYSEC-2026-139 exposure for torch==2.10.0 (no patched pin available yet).

  • PYSEC-2026-139 (CVE-2026-4538) affects PyTorch 2.10.0 and is a high-severity deserialization flaw in the pt2 Loading Handler.
  • No fixed/patched PyTorch version is reported as available yet, so you can’t “move to the first patched pin” now.
  • Mitigate by ensuring this repo never loads untrusted serialized artifacts, and by using the secure/safe loading options where applicable (e.g., weights_only=True for torch.export.load where the code uses it), then update the pin immediately once a fixed release is published while keeping deterministic locking.
🧰 Tools
🪛 OSV Scanner (2.3.8)

[HIGH] 1-1: torch 2.10.0: undefined

(PYSEC-2026-139)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@requirements.txt` at line 1, requirements.txt currently pins torch==2.10.0
which is vulnerable to PYSEC-2026-139; to mitigate, audit the codebase for any
usages that load serialized PyTorch artifacts (e.g., calls to torch.load,
torch.jit.load, torch.export.load or custom deserialization paths) and ensure
they never accept untrusted inputs and use secure options such as
weights_only=True for torch.export.load where applicable; add runtime
guards/input validation around any model-loading entrypoints and document the
temporary mitigation in the repo (and add a TODO to update the pin), then when
an official patched PyTorch release is available, update the torch pin in
requirements.txt and re-lock deterministic dependency hashes.

Source: Linters/SAST tools

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants