refactor: changing the repo to be based on prior work established by ryoari · Pull Request #89 · AOSSIE-Org/OpenVerifiableLLM

ryoari · 2026-06-09T08:32:21Z

Refactors into a new baseline for successive work.

Due to collaborative effort of previous work, a lot of technical debt had built up where I wasn't sure of the code and it's effect on determinism.

As such, as I parse through the preexisting code to gain a better understanding. It is better to move everything to a legacy folder and re-introduce slowly after validating the code functions and it's effect.

Based on my prior supporting work in the https://github.com/ryoari/Verifiable-LLM-Baseline/

I rebase this repo to my previous effort, with clearly understood foundations.

this PR builds on #73 #37 #86

Adds in the following files:

src: contains all the code logic to run cpu and gpu determinism on device

proof: contains hash manifest to verify said cpu and gpu determinism, as well as proof of testing runs

requirements.txt : hardens to a fixed environment standard.

`src/`

File	Summary
`config.py`	Immutable training config + SHA-256 config hash
`dataset.py`	Tiny deterministic char-level dataset + batches
`device.py`	Device select (CPU/CUDA/XPU) + determinism setup
`eval.py`	Held-out eval, perplexity, sealed eval manifest
`global_manifest.py`	Combines all hashes into sealed pipeline manifest
`gpu_reproducibility_test.py`	Fresh-vs-fresh bitwise determinism check per device
`main.py`	`set_seed` entropy control + sanity smoke test
`model.py`	Deterministic NanoGPT (attention, block, TinyGPT)
`reproducibility.py`	Segmented prover/auditor + 5 tamper scenarios
`telemetry.py`	Per-step loss/grad/param logger + model hashing

`proofs/`

File	Summary
`cpu_determinism_log.txt`	Narrated CPU run history / verification log
`device_determinism_log.txt`	Machine-readable fresh-vs-fresh proof records
`env_fingerprint.json`	Torch/Python/OS/CPU environment snapshot
`eval_manifest.json`	Eval loss, perplexity, dataset + checkpoint hashes
`pipeline_manifest.json`	Final sealed global pipeline hash bundle

AI Usage Disclosure:

This PR does not contain AI-generated code at all.
This PR contains AI-generated code. I have read the AI Usage Policy and this PR complies with this policy. I have tested the code locally and I am responsible for it.

I have used the following AI models and tools:
CLAUDE for comments and reasoning in device.py and gpu_reproducibility.py

Checklist

My PR addresses a single issue, fixes a single bug or makes a single improvement.
My code follows the project's code style and conventions
If applicable, I have made corresponding changes or additions to the documentation
If applicable, I have made corresponding changes or additions to tests
My changes generate no new warnings or errors
I have joined the Discord server and I will share a link to this PR with the project maintainers there
I have read the Contribution Guidelines
Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.
I have filled this PR template completely and carefully, and I understand that my PR may be closed without review otherwise.

Summary by CodeRabbit

New Features
- End-to-end determinism & reproducibility tooling: deterministic model + tiny dataset demo, automated fresh‑vs‑fresh bitwise checks, sealed evaluation and pipeline manifests, and a CLI entrypoint for deterministic setup and quick demos.
Documentation
- Added a Colab workflow demonstrating GPU reproducibility steps and interpretation.
- Included proof logs and environment/evaluation manifests capturing reproducibility outcomes.
Chores
- Added dependency manifest and project ignores for local/dev artifacts.

coderabbitai · 2026-06-09T08:32:34Z

📝 Walkthrough

Walkthrough

Adds an end-to-end determinism and reproducibility pipeline: device/determinism utilities, a tiny dataset and TinyGPT model, deterministic training and auditor scenarios with telemetry, fresh-vs-fresh tests, evaluation, sealed manifests, proof logs, and a Colab notebook + dependency pins to run the workflow.

Changes

Determinism Testing & Verification Pipeline

Layer / File(s)	Summary
Project scaffolding and orchestration `.gitignore`, `requirements.txt`, `notebooks/colab_gpu_reproducibility.ipynb`	Repository ignores virtualenvs and artifacts, pins runtime deps, and adds a Colab notebook orchestrating the reproducibility workflow.
Device selection & determinism configuration `src/device.py`	Import-time cuBLAS workspace pinning, accelerator detection (CUDA → XPU → CPU), RNG capture/restore, accelerator seeding, and global deterministic algorithm configuration.
Training config and dataset `src/config.py`, `src/dataset.py`	Immutable `TRAIN_CONFIG` with deterministic hash and a tiny fixed-character dataset encoding corpus into precomputed tensors and a `get_batch` API.
TinyGPT model architecture `src/model.py`	Causal self-attention (masked), transformer Block (pre-LN), positional/token embeddings, and a bias-free logits head.
Seeding, telemetry, and entrypoint `src/telemetry.py`, `src/main.py`	Per-step JSONL telemetry logger and `set_seed` entrypoint that seeds Python/NumPy/PyTorch and configures accelerator determinism.
Training orchestration & auditor scenarios `src/reproducibility.py`	`run_training_segment` with checkpoint/RNG save-restore, four auditor scenarios (bad seed, gradient noise, sabotage, broken seal), and `verify` to compare telemetry and final hashes.
Fresh-vs-fresh reproducibility test `src/gpu_reproducibility_test.py`	Train two independent runs from scratch, compare loss curves and bitwise parameter equality, compute model hashes, append device-proof JSON.
Evaluation and global manifest sealing `src/eval.py`, `src/global_manifest.py`	Evaluate held-out batch, write `eval_manifest.json`, compute deterministic hashes for environment/config/dataset/checkpoint/eval, and seal into pipeline manifest.
Proof logs and artifacts `proofs/cpu_determinism_log.txt`, `proofs/device_determinism_log.txt`, `proofs/env_fingerprint.json`, `proofs/eval_manifest.json`, `proofs/pipeline_manifest.json`	Recorded console and JSON proof artifacts capturing audit outputs, environment fingerprint, evaluation metrics, and sealed global pipeline hash.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

backend, python, configuration, documentation, dependencies, pending-coderabbit-review, external-contributor, size/XL

Suggested reviewers

Archit381

Poem

🐰 I hopped through seeds and hashes tight,
cuBLAS pinned before the light,
Logs and manifests sealed so neat,
TinyGPT marches, step by beat—
Reproducible dreams in byte-sized flight.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The PR title references 'prior work established' but lacks specificity about what those changes entail, making it partially vague.	Consider a more specific title such as 'feat: add deterministic reproducibility tooling and baselines' or 'refactor: introduce reproducibility framework and proofs' to clearly convey the main changes.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/new-baseline

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 17

🧹 Nitpick comments (4)

src/global_manifest.py (2)

24-24: ⚡ Quick win

Fragile version string parsing.

The expression sys.version.split(' ')[0] assumes the Python version string always starts with the version number followed by a space. While this holds for standard CPython, alternative implementations or unusual build configurations could break this assumption.
♻️ More robust version parsing
-        "python": sys.version.split(' ')[0],
+        "python": platform.python_version(),
The platform.python_version() function reliably returns just the version number (e.g., "3.14.0").
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/global_manifest.py` at line 24, Replace the fragile sys.version.split('
')[0] usage when building the "python" entry in the manifest: locate where
sys.version is used (the expression sys.version.split(' ')[0]) and swap it to
use platform.python_version() so the manifest stores a reliable version string;
import platform at the top if not present and update the "python" key generation
in the same function or module where the manifest dict is constructed.
32-34: Make the dataset-hash endian-stable (or document the supported platforms).

TinyDataset().encoded is built deterministically from a constant string and uses fixed dtype=torch.long (int64), so the values feeding the hash are consistent. However, dataset.encoded.numpy().tobytes() uses native NumPy memory layout (including endianness), so the SHA-256 will differ on big-endian environments. For strict cross-endian reproducibility, hash a canonical representation (e.g., hash the integer list / JSON, or cast to a fixed endianness before tobytes()).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/global_manifest.py` around lines 32 - 34, The SHA-256 currently uses
dataset.encoded.numpy().tobytes() which depends on native endianness; make the
hash endian-stable by converting the array to a fixed byte-order or canonical
representation before hashing: for example, take TinyDataset().encoded.numpy(),
cast it to a defined endianness (e.g., little-endian int64) or serialize the
integer list (e.g., .tolist() then JSON canonicalize) and then compute
hashlib.sha256(...) on that canonical bytes; update the dataset_hash computation
(the variable dataset_hash and uses of TinyDataset and dataset.encoded)
accordingly and document the chosen approach.

src/device.py (1)

102-105: ⚡ Quick win

Add stacklevel to backend-mismatch warning for actionable call sites.

This warning currently points at restore_accel_rng_state() instead of the caller, which makes audit-path debugging harder.

Proposed diff

     if saved.get("backend") != current:
         warnings.warn(
             f"Skipping accelerator RNG restore: checkpoint backend "
             f"{saved.get('backend')!r} != current backend {current!r}."
+            ,
+            stacklevel=2
         )
         return

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/device.py` around lines 102 - 105, The warning in restore_accel_rng_state
currently points to inside that function; update the warnings.warn call in
restore_accel_rng_state to pass an appropriate stacklevel (e.g., stacklevel=2)
so the reported site is the caller, not the warning emitter; locate the
warnings.warn invocation that constructs the message using saved.get('backend')
and current and add the stacklevel parameter to the call.

Source: Linters/SAST tools

src/config.py (1)

4-20: ⚡ Quick win

“Immutable” config is currently mutable at runtime.

TRAIN_CONFIG can be mutated by any importer, which undermines the determinism contract this module describes.

Proposed diff

 import json
 import hashlib
+from types import MappingProxyType
 
 # The Immutable Training Configuration
-TRAIN_CONFIG = {
+TRAIN_CONFIG = MappingProxyType({
     "embed_dim": 16,
     "num_heads": 2,
     "max_seq_len": 32,
     "dropout": 0.1,
     "lr": 0.01,
     "optimizer": "Adam",
     "seed": 99,
     "total_steps": 10,
     "checkpoint_step": 5
-}
+})
 
 def get_config_hash():
     """Returns a deterministic SHA-256 hash of the configuration dict."""
-    encoded = json.dumps(TRAIN_CONFIG, sort_keys=True).encode()
+    encoded = json.dumps(dict(TRAIN_CONFIG), sort_keys=True).encode()
     return hashlib.sha256(encoded).hexdigest()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/config.py` around lines 4 - 20, TRAIN_CONFIG is currently a plain dict
and can be mutated at runtime; make it truly read-only by keeping the mutable
dict as a private variable (e.g. _TRAIN_CONFIG) and expose a read-only view
using types.MappingProxyType assigned to TRAIN_CONFIG, import types at top, and
ensure get_config_hash serializes the underlying mutable dict (e.g.
json.dumps(_TRAIN_CONFIG, sort_keys=True).encode()) rather than the
MappingProxyType so the hash remains deterministic.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@proofs/device_determinism_log.txt`:
- Around line 1-52: The JSON verification log contains inconsistent keys: the
first record uses "cuda": null while later records use "accelerator_version":
null; update the generator/output so all records use the same field name
(preferably "accelerator_version") by replacing the "cuda" key with
"accelerator_version" in the first record and ensuring any code that emits the
log uses the single canonical key ("accelerator_version") instead of "cuda".

In `@requirements.txt`:
- Around line 1-3: requirements.txt currently uses ranged constraints (e.g.,
"torch>=2.10.0,<3.0", "numpy>=2.4,<3.0", "tqdm>=4.67,<5.0"); update these to
exact pinned versions that match Legacy/uv.lock (replace the ranges with
explicit version strings) and update notebooks/colab_gpu_reproducibility.ipynb
to install from the lockfile or the exact pins (instead of pip install with
ranges) — either change the notebook cell to run a locked install (e.g., pip
install -r requirements.txt using the pinned file or pip install from the lock)
or replace the "!pip install numpy>=2.4,<3.0 tqdm>=4.67,<5.0" cell with the
exact pinned versions from Legacy/uv.lock to ensure deterministic installs.

In `@src/dataset.py`:
- Around line 18-23: Validate block_size inside get_batch to ensure it produces
equal-length x and y slices: check that block_size is a positive integer and <=
len(self.encoded) - 1 (since y uses index 1..block_size+1) and raise a clear
ValueError if not. Update the get_batch function (referencing get_batch,
self.encoded, x, y) to perform this guard before slicing so x =
self.encoded[:block_size] and y = self.encoded[1:block_size+1] always produce
matching shapes.

In `@src/eval.py`:
- Line 70: The print call print(f"\n ~> Manifest saved to eval_manifest.json")
uses an unnecessary f-string; replace it with a plain string literal (print("\n
~> Manifest saved to eval_manifest.json")) to remove the redundant f prefix and
satisfy the Ruff warning, locating and updating the print statement in eval.py.
- Around line 67-70: The code currently writes eval_manifest.json to the CWD
which can break downstream stages; update the write to save under the
deterministic proofs directory (e.g., "proofs/eval_manifest.json") by creating
the proofs directory if missing and writing manifest there instead of
open("eval_manifest.json", "w"); locate the write using the manifest variable
and the open(...) call in src/eval.py and replace it to build the path
(Path("proofs") / "eval_manifest.json") and ensure the directory exists before
json.dump.
- Line 36: The torch.load call that assigns checkpoint (checkpoint =
torch.load("mid_checkpoint.pt", weights_only=False, map_location=DEVICE)) is
vulnerable to arbitrary code execution and should be hardened: either switch to
loading a weights-only checkpoint by using weights_only=True and refactor
saving/loading so mid_checkpoint.pt contains only tensors, or keep the existing
checkpoint format but validate file integrity before loading by computing and
comparing a trusted SHA256/HMAC of "mid_checkpoint.pt" and abort if it doesn't
match; implement the chosen change around the checkpoint load (the checkpoint
variable assignment) and ensure any downstream code that expects RNG/metadata is
updated if you drop non-tensor state.

In `@src/global_manifest.py`:
- Around line 36-38: The manifest generation is computing inconsistent
model_checkpoint_hash values because different files/paths or non-deterministic
checkpoint contents are being read; update the logic around the model hash
calculation (the open("mid_checkpoint.pt", "rb") call and the model_hash
variable in src/global_manifest.py) to always read the same canonical checkpoint
file (use a single configured checkpoint path/name), verify the file exists and
is fully written before hashing, and compute the SHA256 over the exact bytes
deterministically; ensure the same model_hash value is then injected into all
manifests (proofs/*) from this single source so every generated manifest
references the identical checkpoint hash.
- Around line 36-38: The code currently opens "mid_checkpoint.pt" to compute
model_hash without checking its existence; modify the block that computes
model_hash to first verify the file exists (e.g., using os.path.exists or
pathlib.Path("mid_checkpoint.pt").exists()), and if missing raise a clear error
or log a descriptive message and exit (consistent with the existing
eval_manifest.json check). Update the model_hash computation area (the with
open("mid_checkpoint.pt", "rb") as f: ... block) to perform this existence check
before opening the file.

In `@src/main.py`:
- Around line 35-41: The top-level creation and printing of TinyDataset
(TinyDataset() and dataset.get_batch()) should not run at import time because
importing set_seed triggers this side effect; move that demo logic into a
guarded entrypoint (e.g., wrap the dataset = TinyDataset(); x, y =
dataset.get_batch(); print(...) block into an if __name__ == "__main__": or into
a function like main_demo() and call it only when executed as a script) so
importing src/reproducibility.py or other modules no longer emits output.

In `@src/model.py`:
- Around line 62-75: The forward method lacks a max-sequence guard and __init__
doesn't store max_seq_len; store the constructor arg as self.max_seq_len in
__init__ and add a clear guard at the start of forward (e.g., check T =
idx.size(1) or B,T = idx.size() and if T > self.max_seq_len raise a ValueError)
so inputs longer than the configured max_seq_len fail with an explicit message
referencing the model's max sequence length; update references in forward to use
self.max_seq_len.
- Around line 44-48: The Block.__init__ currently ignores the dropout parameter
when constructing the attention module; update the CausalSelfAttention
instantiation in Block.__init__ to pass the configured dropout through (e.g.,
call CausalSelfAttention(embed_dim, num_heads, max_seq_len, dropout) or named
arg dropout=dropout) and ensure the CausalSelfAttention constructor accepts and
uses that dropout parameter so attention dropout follows the Block's configured
value.

In `@src/reproducibility.py`:
- Line 56: Several print statements use f-strings without interpolation (e.g.,
the print call containing "FATAL ALERT! : Cryptographic seal broken!" and the
other literal messages at the noted locations); remove the unnecessary leading
'f' from those string literals so they become plain strings (e.g., change
print(f"...") to print("...")) for each occurrence referenced in
reproducibility.py (the literal messages at the indicated lines).
- Around line 29-30: The code computes active_end_step = end_step if end_step is
not None else TRAIN_CONFIG["total_steps"] but the training loop still references
end_step; update the loop to use active_end_step instead of end_step so the
fallback is honored and no None is used. Locate usages of end_step in the
training loop (the block that iterates or checks current step against end_step)
and replace them with active_end_step, keeping TRAIN_CONFIG and active_end_step
logic intact.
- Around line 53-59: The checkpoint verification currently only prints a “FATAL
ALERT” and continues; modify the checkpoint seal check (the block that calls
logger.hash_model(model) and compares loaded_hash to
checkpoint['checkpoint_hash']) to fail fast: after detecting mismatch, raise a
clear exception (e.g., RuntimeError) or call sys.exit(1) so execution stops
immediately, and include the expected and actual hash snippets in the exception
message for diagnostics instead of just printing them.
- Around line 127-137: The replay currently restores model and optimizer from
checkpoint but not the RNG states, so replay can drift; after loading
checkpoint['model'] and checkpoint['optimizer'] restore RNG state(s) from the
checkpoint before any noise injection by applying the saved RNG blobs (e.g.,
checkpoint['torch_rng_state'] via torch.set_rng_state,
checkpoint['cuda_rng_state'] via torch.cuda.set_rng_state_all,
checkpoint['np_rng_state'] via numpy.random.set_state, and
checkpoint['py_rng_state'] via random.setstate') so that subsequent operations
(including secret_noise_auditor gradient-noise injection) start from the exact
checkpoint RNG state.
- Line 247: The function currently returns only the telemetry match boolean (the
variable match) which can report success even on a hash mismatch; change the
return to the full verification verdict object instead of match so callers
receive the complete result (e.g., replace the final return of match with the
verification result variable used earlier in this function — commonly named
verdict or verification_result — ensuring you return that object containing
status, hashes, and telemetry rather than the match boolean).
- Line 49: The checkpoint deserialization uses
torch.load(checkpoint_path_to_load, weights_only=False) which can execute
arbitrary code; change loading to torch.load(checkpoint_path_to_load,
weights_only=True) to restrict deserialization to tensors (update every
occurrence), and move non-tensor state (e.g., numpy_rng, python_rng) into a
separate safe sidecar (e.g., JSON/pickle-safe or explicit state file) that you
load with a safe parser after verifying the checkpoint_hash; update the code
paths that expect these RNG objects (locations referencing checkpoint,
numpy_rng, python_rng or functions that restore RNG state) to read the sidecar
and restore RNG state explicitly instead of relying on non-tensor objects inside
torch.load.

---

Nitpick comments:
In `@src/config.py`:
- Around line 4-20: TRAIN_CONFIG is currently a plain dict and can be mutated at
runtime; make it truly read-only by keeping the mutable dict as a private
variable (e.g. _TRAIN_CONFIG) and expose a read-only view using
types.MappingProxyType assigned to TRAIN_CONFIG, import types at top, and ensure
get_config_hash serializes the underlying mutable dict (e.g.
json.dumps(_TRAIN_CONFIG, sort_keys=True).encode()) rather than the
MappingProxyType so the hash remains deterministic.

In `@src/device.py`:
- Around line 102-105: The warning in restore_accel_rng_state currently points
to inside that function; update the warnings.warn call in
restore_accel_rng_state to pass an appropriate stacklevel (e.g., stacklevel=2)
so the reported site is the caller, not the warning emitter; locate the
warnings.warn invocation that constructs the message using saved.get('backend')
and current and add the stacklevel parameter to the call.

In `@src/global_manifest.py`:
- Line 24: Replace the fragile sys.version.split(' ')[0] usage when building the
"python" entry in the manifest: locate where sys.version is used (the expression
sys.version.split(' ')[0]) and swap it to use platform.python_version() so the
manifest stores a reliable version string; import platform at the top if not
present and update the "python" key generation in the same function or module
where the manifest dict is constructed.
- Around line 32-34: The SHA-256 currently uses
dataset.encoded.numpy().tobytes() which depends on native endianness; make the
hash endian-stable by converting the array to a fixed byte-order or canonical
representation before hashing: for example, take TinyDataset().encoded.numpy(),
cast it to a defined endianness (e.g., little-endian int64) or serialize the
integer list (e.g., .tolist() then JSON canonicalize) and then compute
hashlib.sha256(...) on that canonical bytes; update the dataset_hash computation
(the variable dataset_hash and uses of TinyDataset and dataset.encoded)
accordingly and document the chosen approach.

🪄 Autofix (Beta)

✅ Autofix completed

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0d8042b3-e8ac-4ed3-954e-26e8bbad2dfb

📥 Commits

Reviewing files that changed from the base of the PR and between 274e0d8 and 3047da1.

📒 Files selected for processing (18)

.gitignore
notebooks/colab_gpu_reproducibility.ipynb
proofs/cpu_determinism_log.txt
proofs/device_determinism_log.txt
proofs/env_fingerprint.json
proofs/eval_manifest.json
proofs/pipeline_manifest.json
requirements.txt
src/config.py
src/dataset.py
src/device.py
src/eval.py
src/global_manifest.py
src/gpu_reproducibility_test.py
src/main.py
src/model.py
src/reproducibility.py
src/telemetry.py

coderabbitai · 2026-06-09T08:46:39Z

+        dropout=TRAIN_CONFIG["dropout"]
+        ).to(DEVICE)
+
+    checkpoint = torch.load("mid_checkpoint.pt", weights_only=False, map_location=DEVICE)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail # Locate and inspect src/eval.py around the referenced line ls -la fd -n --hidden --max-depth 6 "eval.py" || true sed -n '1,120p' src/eval.py || true nl -ba src/eval.py | sed -n '20,70p' || true # Find any other torch.load usage and checkpoint-related code rg -n "torch\.load\(" -S src || true rg -n "mid_checkpoint\.pt" -S . || true rg -n "weights_only" -S . || true # Check dependencies to see if weights_only is supported in this repo's PyTorch constraints ls -ლა fd -t f -a -n "requirements*.txt" . || true fd -t f -a -n "pyproject.toml" . || true fd -t f -a -n "environment*.yml" . || true fd -t f -a -n "requirements*.in" . || true for f in requirements*.txt pyproject.toml environment*.yml environment*.yaml Pipfile Pipfile.lock setup.py setup.cfg; do if [ -f "$f" ]; then echo "---- $f ----"; sed -n '1,200p' "$f"; fi done # If pip/conda pins torch, extract versions; otherwise just report nothing found rg -n "torch(==|>=|<=|~=|=)" -S requirements*.txt pyproject.toml environment*.yml environment*.yaml Pipfile setup.py setup.cfg 2>/dev/null || true

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 5741

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show eval.py around the torch.load line with line numbers (use awk instead of nl) sed -n '1,120p' src/eval.py | cat # Show all torch.save usage (to infer checkpoint contents/schema) rg -n "torch\.save\(" -S src || true # Inspect reproducibility.py around each torch.load call site for start in 1 70 120 150 185 230; do end=$((start+80)) echo "---- src/reproducibility.py lines ${start}-${end} ----" sed -n "${start},${end}p" src/reproducibility.py || true done # Inspect global_manifest.py where mid_checkpoint.pt is opened as binary sed -n '1,140p' src/global_manifest.py || true # Check requirements.txt for torch version / constraints sed -n '1,200p' requirements.txt || true # Also locate any docs about checkpoint format (optional) rg -n "checkpoint" -S README.md src | head -n 80 || true

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 31794

🌐 Web query:

PyTorch torch.load weights_only=True behavior non-tensor objects numpy_rng python_rng safe unpickler allowed types

💡 Result:

When weights_only=True is enabled in torch.load, PyTorch restricts the unpickler to a safe subset of types to prevent arbitrary code execution [1][2]. This mode is the default behavior in recent versions of PyTorch (starting in version 2.6) when the pickle_module argument is not provided [1][2]. Allowed Types and Behavior The restricted unpickler is limited to: - Plain torch.Tensors [1][2]. - Primitive Python types (e.g., numbers, strings, lists, dictionaries) [3][2]. - Any additional types explicitly registered via torch.serialization.add_safe_globals [3][4]. The weights_only unpickler does not allow dynamic imports during unpickling [1][2]. If an unsupported object (such as a custom class or a complex non-tensor object) is encountered, the unpickler will raise an error [2]. You can resolve this by allowlisting the required globals using torch.serialization.add_safe_globals or the torch.serialization.safe_globals context manager if the source is trusted [2]. Non-Tensor Objects (RNGs) - Python RNGs (random.Random) and NumPy RNGs are generally not included in the default safe allowlist because they are not necessary for loading state_dicts of standard PyTorch models [1][2]. - Attempting to load files containing these objects with weights_only=True will likely result in an "Unsupported global" error. - If you require these objects, you must manually register them as safe using the aforementioned functions [2]. Security Considerations The weights_only=True mode is designed to guard against denial of service and arbitrary code execution attacks [1]. While it significantly increases security when loading untrusted files, PyTorch warns that memory corruption issues are still theoretically possible, and loading data from untrusted sources should always be done with caution [1][5]. If you must load complex files, such as those containing entire nn.Module objects, setting weights_only=False is often necessary, but this should only be performed with files from trusted sources due to the associated security risks [1][2]. You can control this behavior using environment variables: - TORCH_FORCE_WEIGHTS_ONLY_LOAD=1: Forces all torch.load calls to use weights_only=True [1][2]. - TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1: Disables the default weights_only=True behavior [1][2].

Citations:

1: http://docs.pytorch.org/docs/2.11/notes/serialization.html

2: https://docs.pytorch.org/docs/main/notes/serialization.html

3: https://docs.pytorch.org/docs/main/generated/torch.load.html

4: https://docs.pytorch.org/docs/stable/generated/torch.load.md

5: https://github.com/pytorch/pytorch/blob/main/torch/serialization.py

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,120p' src/main.py rg -n "mid_checkpoint\.pt|model\.pt" -S src/main.py src/reproducibility.py src/eval.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 3176

🏁 Script executed:

#!/bin/bash set -euo pipefail sed -n '1,220p' src/device.py rg -n "def get_device|def accel_rng_state|def restore_accel_rng_state|accel_rng_state\(" -S src/device.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 4443

Harden checkpoint loading against pickle execution in src/eval.py (line 36).
torch.load(..., weights_only=False) can execute arbitrary code if mid_checkpoint.pt is tampered. This repo’s mid_checkpoint.pt contains non-tensor Python/NumPy RNG state and other metadata (saved via torch.save), so switching to weights_only=True will likely fail unless you refactor the checkpoint format (or load a separate weights-only checkpoint).

Suggested hardening

- checkpoint = torch.load("mid_checkpoint.pt", weights_only=False, map_location=DEVICE) + checkpoint = torch.load("mid_checkpoint.pt", weights_only=True, map_location=DEVICE)

Also, avoid relying on post-load hash checks for security: if you use integrity (e.g., SHA256 from a trusted manifest), compute/compare it before calling torch.load.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/eval.py` at line 36, The torch.load call that assigns checkpoint (checkpoint = torch.load("mid_checkpoint.pt", weights_only=False, map_location=DEVICE)) is vulnerable to arbitrary code execution and should be hardened: either switch to loading a weights-only checkpoint by using weights_only=True and refactor saving/loading so mid_checkpoint.pt contains only tensors, or keep the existing checkpoint format but validate file integrity before loading by computing and comparing a trusted SHA256/HMAC of "mid_checkpoint.pt" and abort if it doesn't match; implement the chosen change around the checkpoint load (the checkpoint variable assignment) and ensure any downstream code that expects RNG/metadata is updated if you drop non-tensor state.

will check this out later

@ryoari, sounds good! Would you like me to open a GitHub issue to track this hardening task so it doesn't get lost? I can create one with the details of the concern (switching to weights_only=True or adding pre-load integrity verification for mid_checkpoint.pt) and assign it to you.

coderabbitai · 2026-06-09T08:46:40Z

+    logger = TelemetryLogger(filepath=log_file)
+
+    if checkpoint_path_to_load:
+        checkpoint = torch.load(checkpoint_path_to_load, weights_only=False)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail ls -R

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 1375

🏁 Script executed:

#!/bin/bash set -euo pipefail ls -la fd -t f -a src/reproducibility.py || true stat src/reproducibility.py || true sed -n '1,220p' src/reproducibility.py | nl -ba | sed -n '1,220p'

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 1131

🏁 Script executed:

#!/bin/bash set -euo pipefail nl -ba src/reproducibility.py | sed -n '1,240p'

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 115

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "torch\.load\(" src/reproducibility.py rg -n "weights_only" src/reproducibility.py rg -n "checkpoint_path_to_load|checkpoint" src/reproducibility.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 4206

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "(hash|sha256|sha1|digest|seal|signature|verify|hmac|checksum|gpg|pgp)" src/reproducibility.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 2320

🏁 Script executed:

#!/bin/bash set -euo pipefail for r in "35,75" "90,120" "120,160" "155,190" "185,215"; do echo "---- lines $r ----" nl -ba src/reproducibility.py | sed -n "${r}p" done

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 155

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "argparse|sys\.argv|click|typer|fire|main\(" src/reproducibility.py rg -n "checkpoint_path_to_load" -S src/reproducibility.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 54

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "torch\.load\([^)]*weights_only=False" -S . rg -n "weights_only=False" -S .

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 1280

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "torch\.load\([^)]*weights_only=False" -S . | head -n 50 rg -n "weights_only=False" -S . | head -n 50

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 1280

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "weights_only=True|safe_open|zipfile|_use_new_zipfile_serialization" -S . | head -n 50

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 206

🏁 Script executed:

# noop true

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 54

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== File head ===" nl -ba src/reproducibility.py | sed -n '1,240p' echo echo "=== torch.load occurrences in file ===" rg -n "torch\.load\(" src/reproducibility.py || true echo echo "=== weights_only usage ===" rg -n "weights_only" src/reproducibility.py || true echo echo "=== hash/seal/verify keywords ===" rg -n "(hash|sha256|sha1|digest|seal|signature|verify|hmac|checksum)" src/reproducibility.py || true echo echo "=== Context around specific lines ===" for r in "35,75" "95,115" "125,145" "155,175" "185,205"; do echo "---- lines $r ----" nl -ba src/reproducibility.py | sed -n "${r}p" done echo echo "=== CLI / argv usage ===" rg -n "(argparse|sys\.argv|click|typer|fire)" src/reproducibility.py || true echo echo "=== Repo-wide weights_only=False ===" rg -n "torch\.load\([^)]*weights_only=False" -S . | head -n 50 || true

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 151

🏁 Script executed:

# end

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 54

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n "DEVICE" src/reproducibility.py || true rg -n "checkpoint_hash" src/reproducibility.py || true # Print small relevant windows without relying on nl/stat sed -n '1,120p' src/reproducibility.py sed -n '120,240p' src/reproducibility.py

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 11308

Harden checkpoint deserialization: remove torch.load(..., weights_only=False)

In src/reproducibility.py (lines 49, 105, 134, 167, 197), checkpoints are loaded with torch.load(..., weights_only=False). Any tampered/untrusted checkpoint can execute arbitrary code during deserialization before the later checkpoint_hash “verifying cryptographic seal” check runs.

Suggested hardening

- checkpoint = torch.load(checkpoint_path_to_load, weights_only=False) +checkpoint = torch.load( + checkpoint_path_to_load, + map_location=DEVICE, + weights_only=True, +)

mid_checkpoint.pt currently saves non-tensor Python objects like numpy_rng and python_rng; weights_only=True will likely require restructuring (e.g., save RNG state in a safe sidecar/format and load it separately, while keeping the checkpoint payload weights-only).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/reproducibility.py` at line 49, The checkpoint deserialization uses torch.load(checkpoint_path_to_load, weights_only=False) which can execute arbitrary code; change loading to torch.load(checkpoint_path_to_load, weights_only=True) to restrict deserialization to tensors (update every occurrence), and move non-tensor state (e.g., numpy_rng, python_rng) into a separate safe sidecar (e.g., JSON/pickle-safe or explicit state file) that you load with a safe parser after verifying the checkpoint_hash; update the code paths that expect these RNG objects (locations referencing checkpoint, numpy_rng, python_rng or functions that restore RNG state) to read the sidecar and restore RNG state explicitly instead of relying on non-tensor objects inside torch.load.

will check this later

@ryoari, no worries! Whenever you're ready to address it, the key points are:

Switch all five torch.load(..., weights_only=False) calls (lines 49, 105, 134, 167, 197) to weights_only=True.

Move non-tensor state (numpy_rng, python_rng) out of the checkpoint into a safe sidecar format (e.g., JSON or a separate .pkl guarded by hash verification), and restore RNG state from there explicitly — after verifying the checkpoint_hash.

Would you like me to open a GitHub issue to track this follow-up so it doesn't get lost?

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai · 2026-06-09T09:25:57Z

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 6 file(s) based on 10 unresolved review comments.

Files modified:

notebooks/colab_gpu_reproducibility.ipynb
requirements.txt
src/eval.py
src/global_manifest.py
src/model.py
src/reproducibility.py

Commit: 2c389fdb105e450a5e12db5d90643d78a7f7275b

The changes have been pushed to the feature/new-baseline branch.

Time taken: 6m 32s

Fixed 6 file(s) based on 10 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/eval.py (1)

37-45: ⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

Pre-load integrity is still bypassable; weights_only=False remains exploitable.

At Line 45, deserialization happens before any trusted integrity decision. The checkpoint_hash checked at Lines 52-55 is attacker-controlled (it is stored inside the same checkpoint), so it cannot authenticate the file. file_hash at Line 40 is currently unused for enforcement.

Suggested direction (fail closed before torch.load)

-    with open(checkpoint_path, "rb") as f:
-        file_hash = hashlib.sha256(f.read()).hexdigest()
+    expected_file_hash = os.getenv("MID_CHECKPOINT_SHA256")
+    if not expected_file_hash:
+        raise RuntimeError("Missing MID_CHECKPOINT_SHA256 for pre-load checkpoint verification.")
+    with open(checkpoint_path, "rb") as f:
+        file_hash = hashlib.sha256(f.read()).hexdigest()
+    if file_hash != expected_file_hash:
+        raise RuntimeError("Checkpoint file hash mismatch; refusing to deserialize.")

     # Load checkpoint (contains model, optimizer, and RNG states)
     # weights_only=False required for non-tensor state (RNG, metadata)
-    # File hash computed above provides tamper detection
+    # Safe only after trusted pre-load hash verification
     checkpoint = torch.load(checkpoint_path, weights_only=False, map_location=DEVICE)

#!/bin/bash
set -euo pipefail

# Verify current trust chain: pre-load hash computed, but not enforced against trusted external value.
rg -n "file_hash|weights_only=False|checkpoint_hash|torch\\.load\\(" src/eval.py src/reproducibility.py
sed -n '34,62p' src/eval.py
sed -n '70,120p' src/reproducibility.py

Also applies to: 52-55

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/eval.py` around lines 37 - 45, Compute and enforce the pre-load file
integrity check using the locally computed file_hash before calling torch.load:
retrieve the expected trusted hash from a trusted external source (e.g., a
reproducibility function or environment/secure store) instead of using the
checkpoint's internal checkpoint_hash, compare expected_hash to file_hash and
abort/raise if they differ, and only then call torch.load(checkpoint_path,
weights_only=False, map_location=DEVICE); do not rely on
checkpoint['checkpoint_hash'] for authentication and ensure this verification
happens prior to torch.load so deserialization cannot be bypassed.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@requirements.txt`:
- Line 1: requirements.txt currently pins torch==2.10.0 which is vulnerable to
PYSEC-2026-139; to mitigate, audit the codebase for any usages that load
serialized PyTorch artifacts (e.g., calls to torch.load, torch.jit.load,
torch.export.load or custom deserialization paths) and ensure they never accept
untrusted inputs and use secure options such as weights_only=True for
torch.export.load where applicable; add runtime guards/input validation around
any model-loading entrypoints and document the temporary mitigation in the repo
(and add a TODO to update the pin), then when an official patched PyTorch
release is available, update the torch pin in requirements.txt and re-lock
deterministic dependency hashes.

---

Duplicate comments:
In `@src/eval.py`:
- Around line 37-45: Compute and enforce the pre-load file integrity check using
the locally computed file_hash before calling torch.load: retrieve the expected
trusted hash from a trusted external source (e.g., a reproducibility function or
environment/secure store) instead of using the checkpoint's internal
checkpoint_hash, compare expected_hash to file_hash and abort/raise if they
differ, and only then call torch.load(checkpoint_path, weights_only=False,
map_location=DEVICE); do not rely on checkpoint['checkpoint_hash'] for
authentication and ensure this verification happens prior to torch.load so
deserialization cannot be bypassed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0367fb2e-e1c5-4723-9bc4-597fde396da4

📥 Commits

Reviewing files that changed from the base of the PR and between 86fbc43 and 2c389fd.

📒 Files selected for processing (6)

notebooks/colab_gpu_reproducibility.ipynb
requirements.txt
src/eval.py
src/global_manifest.py
src/model.py
src/reproducibility.py

🚧 Files skipped from review as they are similar to previous changes (4)

notebooks/colab_gpu_reproducibility.ipynb
src/model.py
src/global_manifest.py
src/reproducibility.py

coderabbitai · 2026-06-09T09:41:20Z

CVE-2026-4538

@@ -0,0 +1,8 @@
+torch==2.10.0


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail python - <<'PY' import json, urllib.request payload = { "package": {"name": "torch", "ecosystem": "PyPI"}, "version": "2.10.0" } req = urllib.request.Request( "https://api.osv.dev/v1/query", data=json.dumps(payload).encode(), headers={"Content-Type": "application/json"}, ) with urllib.request.urlopen(req) as resp: print(json.dumps(json.load(resp), indent=2)) PY

Repository: AOSSIE-Org/OpenVerifiableLLM

Length of output: 2655

🌐 Web query:

PYSEC-2026-139 torch 2.10.0

💡 Result:

PYSEC-2026-139 corresponds to a deserialization vulnerability in PyTorch version 2.10.0 [1]. This vulnerability is identified by CVE-2026-4538 [2][3]. The vulnerability resides within the pt2 (PyTorch 2.0 export/loading) Loading Handler component [1][4]. It involves an unspecified function that incorrectly processes untrusted serialized data, leading to a deserialization flaw [4]. An attacker with local access to a system can exploit this by crafting a malicious file to achieve arbitrary code execution or manipulate internal application states [4][3]. The issue was addressed in the community via pull request #176791 [2][5], which introduced a weights_only parameter to the torch.export.load API to allow users to enforce secure deserialization, preventing insecure fallbacks to pickle when loading untrusted artifacts [5]. As of the reports, this vulnerability affects PyTorch 2.10.0 [1][3]. Users are encouraged to exercise caution when loading untrusted files and to utilize the weights_only=True flag where supported by the API [5].

Citations:

1: https://osv.dev/vulnerability/PYSEC-2026-139

2: https://nvd.nist.gov/vuln/detail/CVE-2026-4538

3: https://dbugs.ptsecurity.com/vulnerability/PT-2026-26969

4: https://www.resolvedsecurity.com/vulnerability-catalog/CVE-2026-4538

5: [Security] Fix potential Arbitrary Code Execution in torch.export.load by forwarding weights_only pytorch/pytorch#176791

🌐 Web query:

site:osv.dev PYSEC-2026-139

💡 Result:

PYSEC-2026-139 is a security vulnerability identified in PyTorch version 2.10.0 [1]. It affects the pt2 Loading Handler component [1]. The vulnerability involves a deserialization issue that can be exploited by an attacker with access to the local environment [1]. The vulnerability has a CVSS v3.1 score of 7.8, which is classified as High [1]. According to the advisory, the exploit is publicly available, and the PyTorch project had been informed of the issue but had not addressed it as of the time of publication [1]. Affected versions listed include a wide range of releases, with 2.10.0 being the specific version mentioned as affected [1].

Citations:

1: https://osv.dev/vulnerability/PYSEC-2026-139

🌐 Web query:

osv.dev PYSEC-2026-139 affected versions fixed versions

💡 Result:

The vulnerability identified as PYSEC-2026-139 affects the PyTorch (torch) package on PyPI [1]. Affected Versions: The vulnerability affects PyTorch version 2.10.0 [1][2]. According to OSV and related vulnerability reports, it impacts a wide range of versions leading up to and including 2.10.0 [1]. Fixed Versions: As of June 9, 2026, there is no fixed version available for this vulnerability [2][3]. The project has been notified, but has not yet released a patch to address the issue [1][2]. Details: The vulnerability is a deserialization flaw located in the pt2 Loading Handler component of PyTorch [1]. It can be exploited by an attacker with local access to the system [1][3]. A public exploit exists for this vulnerability [1][3].

Citations:

1: https://osv.dev/vulnerability/PYSEC-2026-139

2: https://dbugs.ptsecurity.com/vulnerability/PT-2026-26969

3: https://www.resolvedsecurity.com/vulnerability-catalog/CVE-2026-4538

🌐 Web query:

CVE-2026-4538 PyTorch fixed in which version torch

💡 Result:

As of June 9, 2026, there is no fixed version for CVE-2026-4538 [1][2]. This vulnerability involves a deserialization flaw in the pt2 Loading Handler component of PyTorch version 2.10.0 [3][4]. Security advisories and vulnerability databases report that the project had not yet released a fix for this specific issue as of the most recent updates [3][5][2]. Please note that CVE-2026-4538 is distinct from another PyTorch vulnerability, CVE-2026-24747, which affected earlier versions of PyTorch and was fixed in version 2.10.0 [6][7][8]. Users should be aware that while version 2.10.0 resolved the previous security concerns [6][8], it introduced or contained the deserialization vulnerability described in CVE-2026-4538 [3][4][9]. Organizations using affected software should monitor the official PyTorch security advisories or their package repository for forthcoming patches [3][10].

Citations:

1: https://security.snyk.io/vuln/SNYK-PYTHON-TORCH-15746467

2: https://www.resolvedsecurity.com/vulnerability-catalog/CVE-2026-4538

3: https://nvd.nist.gov/vuln/detail/CVE-2026-4538

4: https://app.opencve.io/cve/CVE-2026-4538

5: https://ubuntu.com/security/CVE-2026-4538

6: https://raxe.ai/labs/advisories/RAXE-2026-019

7: http://github.com/pytorch/pytorch/security/advisories/GHSA-63cw-57p8-fm3p

8: https://osv.dev/vulnerability/GHSA-63cw-57p8-fm3p

9: https://osv.dev/vulnerability/PYSEC-2026-139

10: [Security] Fix potential Arbitrary Code Execution in torch.export.load by forwarding weights_only pytorch/pytorch#176791

Resolve PYSEC-2026-139 exposure for torch==2.10.0 (no patched pin available yet).

PYSEC-2026-139 (CVE-2026-4538) affects PyTorch 2.10.0 and is a high-severity deserialization flaw in the pt2 Loading Handler.

No fixed/patched PyTorch version is reported as available yet, so you can’t “move to the first patched pin” now.

Mitigate by ensuring this repo never loads untrusted serialized artifacts, and by using the secure/safe loading options where applicable (e.g., weights_only=True for torch.export.load where the code uses it), then update the pin immediately once a fixed release is published while keeping deterministic locking.

🧰 Tools

🪛 OSV Scanner (2.3.8)

[HIGH] 1-1: torch 2.10.0: undefined

(PYSEC-2026-139)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@requirements.txt` at line 1, requirements.txt currently pins torch==2.10.0 which is vulnerable to PYSEC-2026-139; to mitigate, audit the codebase for any usages that load serialized PyTorch artifacts (e.g., calls to torch.load, torch.jit.load, torch.export.load or custom deserialization paths) and ensure they never accept untrusted inputs and use secure options such as weights_only=True for torch.export.load where applicable; add runtime guards/input validation around any model-loading entrypoints and document the temporary mitigation in the repo (and add a TODO to update the pin), then when an official patched PyTorch release is available, update the torch pin in requirements.txt and re-lock deterministic dependency hashes.

Source: Linters/SAST tools

refactor: new baseline

3047da1

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

ryoari and others added 3 commits June 9, 2026 14:30

Update src/dataset.py

61b3db9

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Update src/eval.py

e14f122

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

📝 CodeRabbit Chat: Implement requested code changes

f6679cf

coderabbitai Bot mentioned this pull request Jun 9, 2026

📝 CodeRabbit Chat: Implement requested code changes #90

Merged

ryoari changed the title ~~refactor: new baseline~~ refactor: changing the repo to be based on prior work established Jun 9, 2026

Merge pull request #90 from AOSSIE-Org/coderabbitai/chat/e14f122

86fbc43

fix: apply CodeRabbit auto-fixes

2c389fd

Fixed 6 file(s) based on 10 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

Uh oh!

Conversation

ryoari commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactors into a new baseline for successive work.

this PR builds on #73 #37 #86

src: contains all the code logic to run cpu and gpu determinism on device

proof: contains hash manifest to verify said cpu and gpu determinism, as well as proof of testing runs

requirements.txt : hardens to a fixed environment standard.

src/

proofs/

AI Usage Disclosure:

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryoari Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryoari Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fixes Applied Successfully

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ryoari commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

`src/`

`proofs/`

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

coderabbitai Bot left a comment •

edited

Loading

coderabbitai Bot Jun 9, 2026 •

edited

Loading

coderabbitai Bot Jun 9, 2026 •

edited

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading