Skip to content

Fix: wire temperature through to generation in LocalLLM#7

Open
howardjaw wants to merge 1 commit into
pguso:mainfrom
howardjaw:fix/temperature-plumbing
Open

Fix: wire temperature through to generation in LocalLLM#7
howardjaw wants to merge 1 commit into
pguso:mainfrom
howardjaw:fix/temperature-plumbing

Conversation

@howardjaw

Copy link
Copy Markdown

Closes #6.

What

Fixes a bug where the temperature parameter on LocalLLM is silently ignored at generation time. Also addresses a related reproducibility issue where identical responses repeat across runs because no seed is passed.

Why

llama-cpp-python applies temperature at sampling time (per-call), not at model load. The current code passes temperature to the Llama(...) constructor, where it has no effect on subsequent __call__ invocations. As a result, readers following Lesson 01 Exercise 2 ("Change the temperature in shared/llm.py") see no change when they edit the default, which contradicts the lesson text.

Changes

Four edits in shared/llm.py: (1) store self.temperature = temperature in LocalLLM.__init__; (2) in generate(), default kwargs["temperature"] to self.temperature when no per-call override is supplied; (3) remove the now-unused temperature=temperature kwarg from the Llama(...) call; (4) add seed=-1 to the Llama(...) call so each model load uses a fresh random seed.

Testing

Ran lesson_01_basic_chat() twice with temperature=0.0 (near-identical output across runs, as expected) and twice with temperature=1.5 (clearly different output each run). Previously both temperatures produced byte-identical output across runs regardless of the value set in __init__.

The temperature param in __init__ was passed to Llama() at load time, but llama-cpp-python applies temperature at sampling time. The value was silently ignored and the library default took over for every call. This stores self.temperature in __init__ and uses it as the default in generate(). Also adds seed=-1 so each load gets fresh randomness, and removes the now-unused temperature kwarg from the Llama(...) call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

temperature parameter in shared/llm.py is silently ignored at generation time

1 participant