Fix: wire temperature through to generation in LocalLLM#7
Open
howardjaw wants to merge 1 commit into
Open
Conversation
The temperature param in __init__ was passed to Llama() at load time, but llama-cpp-python applies temperature at sampling time. The value was silently ignored and the library default took over for every call. This stores self.temperature in __init__ and uses it as the default in generate(). Also adds seed=-1 so each load gets fresh randomness, and removes the now-unused temperature kwarg from the Llama(...) call.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #6.
What
Fixes a bug where the
temperatureparameter onLocalLLMis silently ignored at generation time. Also addresses a related reproducibility issue where identical responses repeat across runs because no seed is passed.Why
llama-cpp-pythonappliestemperatureat sampling time (per-call), not at model load. The current code passestemperatureto theLlama(...)constructor, where it has no effect on subsequent__call__invocations. As a result, readers following Lesson 01 Exercise 2 ("Change thetemperatureinshared/llm.py") see no change when they edit the default, which contradicts the lesson text.Changes
Four edits in
shared/llm.py: (1) storeself.temperature = temperatureinLocalLLM.__init__; (2) ingenerate(), defaultkwargs["temperature"]toself.temperaturewhen no per-call override is supplied; (3) remove the now-unusedtemperature=temperaturekwarg from theLlama(...)call; (4) addseed=-1to theLlama(...)call so each model load uses a fresh random seed.Testing
Ran
lesson_01_basic_chat()twice withtemperature=0.0(near-identical output across runs, as expected) and twice withtemperature=1.5(clearly different output each run). Previously both temperatures produced byte-identical output across runs regardless of the value set in__init__.