Releases: atasoglu/toolsgen
Releases · atasoglu/toolsgen
v0.5.1
v0.5.0
Added
- Hugging Face Hub integration for direct dataset uploads
push_to_hub()function in newhf_hubmodule to upload datasets to HF Hub- Uploads JSONL files (train.jsonl, val.jsonl), manifest.json, and auto-generated README.md
- CLI flags:
--push-to-hub,--repo-id,--hf-token,--private - Support for both public and private repositories
- Auto-generated dataset cards with dataset statistics, model info, usage examples, and citation
- Optional dependency:
huggingface_hub>=0.20.0(install withpip install toolsgen[hf]) - Example in
examples/hf_hub_upload/with dotenv configuration - Test suite for HF Hub functionality in
tests/test_hf_hub.py push_to_hubexported from maintoolsgenpackage for easier imports
v0.4.0
Added
- Quality tagging system for generated records
generate_quality_tags()method inJudgeResponseto automatically tag samples based on judge scores- Tags include overall quality levels (high/medium/low_quality) and dimension-specific tags (excellent/poor tool selection, arguments, clarity)
- Configurable thresholds for quality classification
quality_tagsfield automatically populated in generated records
v0.3.0
Added
- Hugging Face dataset integration utilities in
examples/nano_tool_calling_v1/dataset_to_tools()function to load tools from Hugging Face datasetsvalidate_json_schema()for OpenAI tool schema validation with recursive array type checkingpush_to_hf.pyscript for uploading generated datasets to Hugging Face Hub
- Complete example workflow for Nano Tool Calling v1 dataset generation
- Configuration, generation, validation, and publishing pipeline
- Analysis utilities for function inspection
- Comprehensive README with dataset card format
Changed
- Enhanced batch sampling progress bar display for better user feedback
- Improved parallel processing record ordering and ID assignment
v0.2.2
Changed
- Records are now written to JSONL file immediately as they complete in parallel mode, rather than waiting for all generation to finish
- Improved memory efficiency by removing records from buffer after writing to disk
v0.2.1
Fixed
- Fixed integration tests to work with refactored module structure
v0.2.0
Added
- Parallel generation support with multiprocessing via
--workersand--worker-batch-sizeCLI flags num_workersandworker_batch_sizeconfiguration options inGenerationConfig- Parallel generation example in
examples/parallel/
Fixed
- Fixed tool subset diversity preservation in parallel mode by sorting records by original sample index before assigning final IDs
v0.1.4
Changed
- Made
max_tokensoptional across all chat completion helpers and dataset flows so callers can rely on model defaults unless a limit is explicitly set.
v0.1.3
Added
- Batching controls (
batch_size,shuffle_tools) inGenerationConfig, CLI flags, and docs to opt into chunked sampling. - Deterministic chunk-based sampling path that reuses batches in a wrap-around manner when generating many subsets.
Changed
- CLI now forwards batching parameters so dataset generation can reuse the refactored sampling logic end-to-end.
v0.1.2
Fixed
- Restored
toolsgen versionoutput by sourcing__version__from package metadata when running the CLI