Skip to content

Releases: atasoglu/toolsgen

v0.5.1

10 Nov 20:42

Choose a tag to compare

Removed

  • Removed redundant create_structured_completion() function from core.client module
    • Function was unused in the codebase; structured outputs are implemented directly in judge.py
    • Cleaned up unused imports and test cases

v0.5.0

10 Nov 20:26
3d861cb

Choose a tag to compare

Added

  • Hugging Face Hub integration for direct dataset uploads
    • push_to_hub() function in new hf_hub module to upload datasets to HF Hub
    • Uploads JSONL files (train.jsonl, val.jsonl), manifest.json, and auto-generated README.md
    • CLI flags: --push-to-hub, --repo-id, --hf-token, --private
    • Support for both public and private repositories
    • Auto-generated dataset cards with dataset statistics, model info, usage examples, and citation
  • Optional dependency: huggingface_hub>=0.20.0 (install with pip install toolsgen[hf])
  • Example in examples/hf_hub_upload/ with dotenv configuration
  • Test suite for HF Hub functionality in tests/test_hf_hub.py
  • push_to_hub exported from main toolsgen package for easier imports

v0.4.0

10 Nov 19:46
d3adb98

Choose a tag to compare

Added

  • Quality tagging system for generated records
    • generate_quality_tags() method in JudgeResponse to automatically tag samples based on judge scores
    • Tags include overall quality levels (high/medium/low_quality) and dimension-specific tags (excellent/poor tool selection, arguments, clarity)
    • Configurable thresholds for quality classification
    • quality_tags field automatically populated in generated records

v0.3.0

09 Nov 20:21
8521d3a

Choose a tag to compare

Added

  • Hugging Face dataset integration utilities in examples/nano_tool_calling_v1/
    • dataset_to_tools() function to load tools from Hugging Face datasets
    • validate_json_schema() for OpenAI tool schema validation with recursive array type checking
    • push_to_hf.py script for uploading generated datasets to Hugging Face Hub
  • Complete example workflow for Nano Tool Calling v1 dataset generation
    • Configuration, generation, validation, and publishing pipeline
    • Analysis utilities for function inspection
    • Comprehensive README with dataset card format

Changed

  • Enhanced batch sampling progress bar display for better user feedback
  • Improved parallel processing record ordering and ID assignment

v0.2.2

09 Nov 07:44

Choose a tag to compare

Changed

  • Records are now written to JSONL file immediately as they complete in parallel mode, rather than waiting for all generation to finish
  • Improved memory efficiency by removing records from buffer after writing to disk

v0.2.1

08 Nov 21:44

Choose a tag to compare

Fixed

  • Fixed integration tests to work with refactored module structure

v0.2.0

08 Nov 21:33
4a3734d

Choose a tag to compare

Added

  • Parallel generation support with multiprocessing via --workers and --worker-batch-size CLI flags
  • num_workers and worker_batch_size configuration options in GenerationConfig
  • Parallel generation example in examples/parallel/

Fixed

  • Fixed tool subset diversity preservation in parallel mode by sorting records by original sample index before assigning final IDs

v0.1.4

08 Nov 20:35

Choose a tag to compare

Changed

  • Made max_tokens optional across all chat completion helpers and dataset flows so callers can rely on model defaults unless a limit is explicitly set.

v0.1.3

08 Nov 15:10

Choose a tag to compare

Added

  • Batching controls (batch_size, shuffle_tools) in GenerationConfig, CLI flags, and docs to opt into chunked sampling.
  • Deterministic chunk-based sampling path that reuses batches in a wrap-around manner when generating many subsets.

Changed

  • CLI now forwards batching parameters so dataset generation can reuse the refactored sampling logic end-to-end.

v0.1.2

08 Nov 10:29

Choose a tag to compare

Fixed

  • Restored toolsgen version output by sourcing __version__ from package metadata when running the CLI