Releases · atasoglu/toolsgen · GitHub

10 Nov 20:42

v0.5.1 Latest

Latest

Removed

Removed redundant create_structured_completion() function from core.client module
- Function was unused in the codebase; structured outputs are implemented directly in judge.py
- Cleaned up unused imports and test cases

Assets 2

10 Nov 20:26

v0.5.0

Added

Hugging Face Hub integration for direct dataset uploads
- push_to_hub() function in new hf_hub module to upload datasets to HF Hub
- Uploads JSONL files (train.jsonl, val.jsonl), manifest.json, and auto-generated README.md
- CLI flags: --push-to-hub, --repo-id, --hf-token, --private
- Support for both public and private repositories
- Auto-generated dataset cards with dataset statistics, model info, usage examples, and citation
Optional dependency: huggingface_hub>=0.20.0 (install with pip install toolsgen[hf])
Example in examples/hf_hub_upload/ with dotenv configuration
Test suite for HF Hub functionality in tests/test_hf_hub.py
push_to_hub exported from main toolsgen package for easier imports

Assets 2

10 Nov 19:46

v0.4.0

Added

Quality tagging system for generated records
- generate_quality_tags() method in JudgeResponse to automatically tag samples based on judge scores
- Tags include overall quality levels (high/medium/low_quality) and dimension-specific tags (excellent/poor tool selection, arguments, clarity)
- Configurable thresholds for quality classification
- quality_tags field automatically populated in generated records

Assets 2

09 Nov 20:21

v0.3.0

Added

Hugging Face dataset integration utilities in examples/nano_tool_calling_v1/
- dataset_to_tools() function to load tools from Hugging Face datasets
- validate_json_schema() for OpenAI tool schema validation with recursive array type checking
- push_to_hf.py script for uploading generated datasets to Hugging Face Hub
Complete example workflow for Nano Tool Calling v1 dataset generation
- Configuration, generation, validation, and publishing pipeline
- Analysis utilities for function inspection
- Comprehensive README with dataset card format

Changed

Enhanced batch sampling progress bar display for better user feedback
Improved parallel processing record ordering and ID assignment

Assets 2

09 Nov 07:44

v0.2.2

Changed

Records are now written to JSONL file immediately as they complete in parallel mode, rather than waiting for all generation to finish
Improved memory efficiency by removing records from buffer after writing to disk

Assets 2

08 Nov 21:44

v0.2.1

Fixed

Fixed integration tests to work with refactored module structure

Assets 2

08 Nov 21:33

v0.2.0

Added

Parallel generation support with multiprocessing via --workers and --worker-batch-size CLI flags
num_workers and worker_batch_size configuration options in GenerationConfig
Parallel generation example in examples/parallel/

Fixed

Fixed tool subset diversity preservation in parallel mode by sorting records by original sample index before assigning final IDs

Assets 2

08 Nov 20:35

v0.1.4

Changed

Made max_tokens optional across all chat completion helpers and dataset flows so callers can rely on model defaults unless a limit is explicitly set.

Assets 2

08 Nov 15:10

v0.1.3

Added

Batching controls (batch_size, shuffle_tools) in GenerationConfig, CLI flags, and docs to opt into chunked sampling.
Deterministic chunk-based sampling path that reuses batches in a wrap-around manner when generating many subsets.

Changed

CLI now forwards batching parameters so dataset generation can reuse the refactored sampling logic end-to-end.

Assets 2

08 Nov 10:29

v0.1.2

Fixed

Restored toolsgen version output by sourcing __version__ from package metadata when running the CLI

Assets 2