All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
Nothing yet.
- Removed redundant
create_structured_completion()function fromcore.clientmodule- Function was unused in the codebase; structured outputs are implemented directly in
judge.py - Cleaned up unused imports and test cases
- Function was unused in the codebase; structured outputs are implemented directly in
- Hugging Face Hub integration for direct dataset uploads
push_to_hub()function in newhf_hubmodule to upload datasets to HF Hub- Uploads JSONL files (train.jsonl, val.jsonl), manifest.json, and auto-generated README.md
- CLI flags:
--push-to-hub,--repo-id,--hf-token,--private - Support for both public and private repositories
- Auto-generated dataset cards with dataset statistics, model info, usage examples, and citation
- Optional dependency:
huggingface_hub>=0.20.0(install withpip install toolsgen[hf]) - Example in
examples/hf_hub_upload/with dotenv configuration - Test suite for HF Hub functionality in
tests/test_hf_hub.py push_to_hubexported from maintoolsgenpackage for easier imports
- Quality tagging system for generated records
generate_quality_tags()method inJudgeResponseto automatically tag samples based on judge scores- Tags include overall quality levels (high/medium/low_quality) and dimension-specific tags (excellent/poor tool selection, arguments, clarity)
- Configurable thresholds for quality classification
quality_tagsfield automatically populated in generated records
- Hugging Face dataset integration utilities in
examples/nano_tool_calling_v1/dataset_to_tools()function to load tools from Hugging Face datasetsvalidate_json_schema()for OpenAI tool schema validation with recursive array type checkingpush_to_hf.pyscript for uploading generated datasets to Hugging Face Hub
- Complete example workflow for Nano Tool Calling v1 dataset generation
- Configuration, generation, validation, and publishing pipeline
- Analysis utilities for function inspection
- Comprehensive README with dataset card format
- Enhanced batch sampling progress bar display for better user feedback
- Improved parallel processing record ordering and ID assignment
- Records are now written to JSONL file immediately as they complete in parallel mode, rather than waiting for all generation to finish
- Improved memory efficiency by removing records from buffer after writing to disk
- Fixed integration tests to work with refactored module structure
- Parallel generation support with multiprocessing via
--workersand--worker-batch-sizeCLI flags num_workersandworker_batch_sizeconfiguration options inGenerationConfig- Parallel generation example in
examples/parallel/
- Fixed tool subset diversity preservation in parallel mode by sorting records by original sample index before assigning final IDs
- Made
max_tokensoptional across all chat completion helpers and dataset flows so callers can rely on model defaults unless a limit is explicitly set.
- Batching controls (
batch_size,shuffle_tools) inGenerationConfig, CLI flags, and docs to opt into chunked sampling. - Deterministic chunk-based sampling path that reuses batches in a wrap-around manner when generating many subsets.
- CLI now forwards batching parameters so dataset generation can reuse the refactored sampling logic end-to-end.
- Restored
toolsgen versionoutput by sourcing__version__from package metadata when running the CLI
- Official support declarations for Python 3.12, 3.13, and 3.14 in project metadata
- Project skeleton with package, CLI, tests, and basic schemas
- Core dataset generation functionality
- LLM-based user request generation
- Tool call generation with OpenAI-compatible APIs
- Record creation and serialization
- Complete CLI implementation with model configuration options
- Comprehensive test suite for schemas, sampling, config, and generator
- Public API exports in
__init__.py - Support for
param_awareandsemanticsampling strategies - LLM-as-a-judge scoring system
- Rubric-based evaluation (tool relevance, argument quality, clarity)
- Structured outputs using pydantic models for reliable parsing
- JSONL output format with train/val split support
- Prompts module with centralized prompt templates
- Semantic sampling strategy based on keyword similarity
- [BREAKING]
generate_dataset()function signature updated:- New signature:
generate_dataset(output_dir, gen_config, model_config, tools_path=None, tools=None) - Old signature:
generate_dataset(tools_path, output_dir, gen_config, model_config) - Now supports passing tools list directly via
toolsparameter as alternative totools_path - Exactly one of
tools_pathortoolsmust be provided
- New signature:
- [BREAKING] Reduced dependencies (removed typer, python-dotenv from core dependencies)
- [BREAKING] Flattened module structure - removed nested folders (io/, judge/, providers/)
- [BREAKING] CLI rewritten using
argparseinstead oftyper(stdlib only) - [BREAKING] Module reorganization:
generator.py,config.py,io/writer.pymerged intocore.pyjudge/scorer.pymoved tojudge.pyproviders/openai_compat.pyremoved - using OpenAI SDK directly
- [BREAKING] Import paths updated:
from toolsgen.config→from toolsgen.corefrom toolsgen.generator→from toolsgen.corefrom toolsgen.judge.scorer→from toolsgen.judge
- Simplified semantic sampling algorithm (reduced complexity)
- Environment variable loading: direct
os.environinstead of python-dotenv for core functionality - Enhanced error handling and user feedback in CLI
- Output format uses JSONL for datasets
- Judge system uses OpenAI SDK structured outputs directly
- Prompts extracted from inline strings to dedicated module
- Updated README with simplified installation and new import paths
- Dependency on
typer(replaced with stdlibargparse) - Dependency on
python-dotenvfrom core (moved to dev dependencies, used optionally in examples) - OpenAI wrapper abstraction layer (use SDK directly)
- Nested folder structure (io/, judge/, providers/)
- Missing dependencies declaration in
pyproject.toml - Reduced code complexity and improved maintainability