llm is actively being iterated upon, so there will be breaking changes to the interface and to compatibility. Where possible, we will try to find ways to mitigate the breaking changes, but we do not expect to have a stable interface for some time.
llmnow uses the latest GGML version. This limits use to older unquantized models or to models quantized with the latest version (quantization version 2, file format GGJTv3). We are investigating ways to mitigate this breakage in the future.llm::InferenceRequestno longer implementsDefault::default.- The
infercallback now provides anInferenceResponseinstead of a string to disambiguate the source of the token. Additionally, it now returns anInferenceFeedbackto control whether or not the generation should continue. - Several fields have been renamed:
n_context_tokens->context_size
- Fix an issue with the binary build of
llm-cli.
Initial release.