LLM Scribe is your professional toolkit for creating high-quality conversational datasets for Large Language Model fine-tuning. Whether you're a creative writer crafting character personalities or a developer preparing training data, LLM Scribe eliminates the technical barriers and formatting headaches.
No more struggling with JSON syntax or format specifications - LLM Scribe handles all the technical details while you focus on creating valuable content.
- Intuitive Interface - Focus on writing, not formatting
- Auto-save Functionality - Never lose your work with automatic saving on every interaction
- Progress Tracking - Set goals and monitor your dataset completion
- Tab Navigation - Rapidly cycle between fields for efficient data entry
- Light mode and Dark mode themes - Swap in settings
- Multiple Export Formats - Supports all major LLM training formats including ChatML, Alpaca, ShareGPT/Vicuna
- Format-Specific Customizations - Tailor your datasets with format-specific options
- Real-time Token Tracking - Monitor token usage with popular tokenizers (OpenAI, HuggingFace, Mistral)
- Customizable Fields - Enable/disable optional fields based on your specific needs
- System Message Support - Add system prompts for ChatGPT/ChatML formats
- Custom IDs - Assign unique identifiers for ShareGPT/Vicuna formats
- Easy Dataset Reloading - Seamlessly continue work on existing projects
- Multi-turn Conversation Support - Create contextually aware training data
- In-app Guidance - Helpful tooltips and explanations throughout the interface
chatgpt_chatml.jsonlchatml.jsonalpaca.jsonlalpaca.jsonsharegpt_vicuna.jsonlsharegpt_vicuna.jsongeneric.jsonl
chatgpt_chatml.jsonlchatml.jsonsharegpt_vicuna.jsonlsharegpt_vicuna.json- Plus all pair formats (automatically generated)
- Start with default settings to get all formats you need
- Choose between simple pair data or more advanced multi-turn conversations
- No technical knowledge required - just write and export
- Fine-tune your datasets with format-specific customizations
- Track token usage for cost and performance optimization
- Leverage advanced features for professional dataset creation
Please click the open book icon to get started once you open the app! It will give you all the info you need.
- Windows Only Application - Not compatible with macOS or Linux
This software includes a commercial license that grants you full commercial rights to all datasets and outputs you create. The underlying system and methodology are patent pending. For licensing inquiries regarding technology integration, please contact us.
Created with ❤️ by Gabriella Baris - Check out my portfolio for more projects and tools!
If LLM Scribe has been helpful for your projects, consider buying me a coffee! Your support helps keep this project alive and enables continued development of new features.
If you have any issues, find bugs, or need assistance, please message Gabriella@Kryptive.com for:
- Technical support
- Bug reports
- Additional format requests
- Tokenizer library additions
Interested in integrating this technology into your own products? Contact us for licensing the underlying system and methodology.
Version 1.0 | Patent Pending
Note on Tokenizer Libraries: LLM Scribe utilizes open-source libraries (tiktoken, Hugging Face transformers, Mistral AI Tokenizers) for token counting functionalities, each governed by their respective licenses.