WoolBall Python SDK

Python client for woolball server Transform idle browsers into a powerful distributed AI inference network For detailed examples and model lists, visit our GitHub repository.

This SDK is automatically generated by the Swagger Codegen project

Installation

pip

pip install woolball-sdk

✨ What is Woolball?

Woolball Server is an open-source network server that orchestrates AI inference jobs across a distributed network of browser-based compute nodes. Instead of relying on expensive cloud infrastructure, harness the collective power of idle browsers to run AI models efficiently and cost-effectively.

🎯 Supported AI Tasks

🔧 Provider	🎯 Task	🤖 Models	📊 Status
Transformers.js	🎤 Speech-to-Text	ONNX Models	✅ Ready
Transformers.js	🔊 Text-to-Speech	ONNX Models	✅ Ready
Kokoro.js	🔊 Text-to-Speech	ONNX Models	✅ Ready
Transformers.js	🌐 Translation	ONNX Models	✅ Ready
Transformers.js	📝 Text Generation	ONNX Models	✅ Ready
WebLLM	📝 Text Generation	MLC Models	✅ Ready
MediaPipe	📝 Text Generation	LiteRT Models	✅ Ready

📖 API Reference

📖 Text Generation

Generate text with powerful language models

🤗 Transformers.js Provider

🤖 Available Models

Model	Quantization	Description
`HuggingFaceTB/SmolLM2-135M-Instruct`	`fp16`	Compact model for basic text generation
`HuggingFaceTB/SmolLM2-360M-Instruct`	`q4`	Balanced performance and size
`Mozilla/Qwen2.5-0.5B-Instruct`	`q4`	Efficient model for general tasks
`onnx-community/Qwen2.5-Coder-0.5B-Instruct`	`q8`	Specialized for code generation

💡 Example Usage

import json
from swagger_client import ApiClient, TextGenerationApi

# Configure the API client
api_client = ApiClient()
api_client.host = 'http://localhost:9002'
text_generation_api = TextGenerationApi(api_client)

# Text generation with Transformers.js
input_data = json.dumps([
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of Brazil?"}
])

response = text_generation_api.text_generation(
    provider='transformers',
    model='HuggingFaceTB/SmolLM2-135M-Instruct',
    input=input_data,
    top_k=50,
    top_p=1.0,
    temperature=0.7,
    repetition_penalty=1.0,
    dtype='fp16',
    max_length=20,
    max_new_tokens=250,
    min_length=0,
    min_new_tokens=None,
    do_sample=True,
    num_beams=1,
    no_repeat_ngram_size=0,
    context_window_size=None,
    sliding_window_size=None,
    attention_sink_size=None,
    frequency_penalty=None,
    presence_penalty=None,
    bos_token_id=None,
    max_tokens=None,
    random_seed=None
)
print('Response:', response)

⚙️ Parameters

Parameter	Type	Default	Description
`model`	string	-	🤖 Model ID (e.g., "HuggingFaceTB/SmolLM2-135M-Instruct")
`dtype`	string	-	🔧 Quantization level (e.g., "fp16", "q4")
`max_length`	number	20	📏 Maximum length the generated tokens can have (includes input prompt)
`max_new_tokens`	number	null	🆕 Maximum number of tokens to generate, ignoring prompt length
`min_length`	number	0	📐 Minimum length of the sequence to be generated (includes input prompt)
`min_new_tokens`	number	null	🔢 Minimum numbers of tokens to generate, ignoring prompt length
`do_sample`	boolean	false	🎲 Whether to use sampling; use greedy decoding otherwise
`num_beams`	number	1	🔍 Number of beams for beam search. 1 means no beam search
`temperature`	number	1.0	🌡️ Value used to modulate the next token probabilities
`top_k`	number	50	🔝 Number of highest probability vocabulary tokens to keep for top-k-filtering
`top_p`	number	1.0	📊 If < 1, only tokens with probabilities adding up to top_p or higher are kept
`repetition_penalty`	number	1.0	🔄 Parameter for repetition penalty. 1.0 means no penalty
`no_repeat_ngram_size`	number	0	🚫 If > 0, all ngrams of that size can only occur once

🤖 WebLLM Provider

🤖 Available Models

Model	Description
`DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC`	DeepSeek R1 distilled model with reasoning capabilities
`DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC`	DeepSeek R1 distilled Llama-based model
`SmolLM2-1.7B-Instruct-q4f32_1-MLC`	Compact instruction-following model
`Llama-3.1-8B-Instruct-q4f32_1-MLC`	Meta's Llama 3.1 8B instruction model
`Qwen3-8B-q4f32_1-MLC`	Alibaba's Qwen3 8B model

💡 Example Usage

import json
from swagger_client import ApiClient, TextGenerationApi

# Configure the API client
api_client = ApiClient()
api_client.host = 'http://localhost:9002'
text_generation_api = TextGenerationApi(api_client)

# Text generation with WebLLM
input_data = json.dumps([
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of Brazil?"}
])

response = text_generation_api.text_generation(
    provider='webllm',
    model='DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC',
    input=input_data,
    top_k=None,
    top_p=0.95,
    temperature=0.7,
    repetition_penalty=None,
    dtype=None,
    max_length=None,
    max_new_tokens=None,
    min_length=None,
    min_new_tokens=None,
    do_sample=None,
    num_beams=None,
    no_repeat_ngram_size=None,
    context_window_size=None,
    sliding_window_size=None,
    attention_sink_size=None,
    frequency_penalty=None,
    presence_penalty=None,
    bos_token_id=None,
    max_tokens=None,
    random_seed=None
)
print('Response:', response)

⚙️ Parameters

Parameter	Type	Description
`model`	string	🤖 Model ID from MLC (e.g., "DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC")
`provider`	string	🔧 Must be set to "webllm" when using WebLLM models
`context_window_size`	number	🪟 Size of the context window for the model
`sliding_window_size`	number	🔄 Size of the sliding window for attention
`attention_sink_size`	number	🎯 Size of the attention sink
`repetition_penalty`	number	🔄 Penalty for repeating tokens
`frequency_penalty`	number	📊 Penalty for token frequency
`presence_penalty`	number	👁️ Penalty for token presence
`top_p`	number	📈 If < 1, only tokens with probabilities adding up to top_p or higher are kept
`temperature`	number	🌡️ Value used to modulate the next token probabilities
`bos_token_id`	number	🏁 Beginning of sequence token ID (optional)

📱 MediaPipe Provider

🤖 Available Models

Model	Device Type	Description
`https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma2-2b-it-cpu-int8.task`	CPU	Gemma2 2B model optimized for CPU inference
`https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma2-2b-it-gpu-int8.bin`	GPU	Gemma2 2B model optimized for GPU inference
`https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task`	CPU/GPU	Gemma3 1B model with INT4 quantization
`https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-4b-it-int4-web.task`	Web	Gemma3 4B model optimized for web deployment

💡 Example Usage

import json
from swagger_client import ApiClient, TextGenerationApi

# Configure the API client
api_client = ApiClient()
api_client.host = 'http://localhost:9002'
text_generation_api = TextGenerationApi(api_client)

# Text generation with MediaPipe
input_data = json.dumps([
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
])

response = text_generation_api.text_generation(
    provider='mediapipe',
    model='https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task',
    input=input_data,
    top_k=40,
    top_p=None,
    temperature=0.7,
    repetition_penalty=None,
    dtype=None,
    max_length=None,
    max_new_tokens=None,
    min_length=None,
    min_new_tokens=None,
    do_sample=None,
    num_beams=None,
    no_repeat_ngram_size=None,
    context_window_size=None,
    sliding_window_size=None,
    attention_sink_size=None,
    frequency_penalty=None,
    presence_penalty=None,
    bos_token_id=None,
    max_tokens=500,
    random_seed=12345
)
print('Response:', response)

⚙️ Parameters

Parameter	Type	Description
`model`	string	🤖 Model ID for MediaPipe LiteRT models on DigitalOcean Spaces
`provider`	string	🔧 Must be set to "mediapipe" when using MediaPipe models
`maxTokens`	number	🔢 Maximum number of tokens to generate
`randomSeed`	number	🎲 Random seed for reproducible results
`topK`	number	🔝 Number of highest probability vocabulary tokens to keep for top-k-filtering
`temperature`	number	🌡️ Value used to modulate the next token probabilities

🎤 Speech Recognition

Convert audio to text with Whisper models

🤖 Available Models

Model	Quantization	Description
`onnx-community/whisper-large-v3-turbo_timestamped`	`q4`	🎯 High accuracy with timestamps
`onnx-community/whisper-small`	`q4`	⚡ Fast processing

💡 Example Usage

import swagger_client
from swagger_client.rest import ApiException

# Configure the API client
api_client = swagger_client.ApiClient()
api_client.host = 'http://localhost:9002'
speech_recognition_api = swagger_client.SpeechRecognitionApi(api_client)

# 📁 Local file
with open('/path/to/your/file.mp3', 'rb') as audio_file:
    audio_data = audio_file.read()

response = speech_recognition_api.speech_recognition(
    input=audio_data,
    model='onnx-community/whisper-large-v3-turbo_timestamped',
    dtype='q4',
    language='en',
    task=None,
    return_timestamps=True,
    chunk_length_s=None,
    stride_length_s=None,
    force_full_sequences=None,
    stream=False,
    num_frames=None
)
print('Transcription:', response)

# 🔗 URL example
url_response = speech_recognition_api.speech_recognition(
    input='https://example.com/audio.mp3',
    model='onnx-community/whisper-large-v3-turbo_timestamped',
    dtype='q4',
    language='en',
    task=None,
    return_timestamps=True,
    chunk_length_s=None,
    stride_length_s=None,
    force_full_sequences=None,
    stream=False,
    num_frames=None
)
print('URL Transcription:', url_response)

⚙️ Parameters

Parameter	Type	Description
`model`	string	🤖 Model ID from Hugging Face (e.g., "onnx-community/whisper-large-v3-turbo_timestamped")
`dtype`	string	🔧 Quantization level (e.g., "q4")
`return_timestamps`	boolean \| 'word'	⏰ Return timestamps ("word" for word-level). Default is `false`.
`stream`	boolean	📡 Stream results in real-time. Default is `false`.
`chunk_length_s`	number	📏 Length of audio chunks to process in seconds. Default is `0` (no chunking).
`stride_length_s`	number	🔄 Length of overlap between consecutive audio chunks in seconds. If not provided, defaults to `chunk_length_s / 6`.
`force_full_sequences`	boolean	🎯 Whether to force outputting full sequences or not. Default is `false`.
`language`	string	🌍 Source language (auto-detect if null). Use this to potentially improve performance if the source language is known.
`task`	null \| 'transcribe' \| 'translate'	🎯 The task to perform. Default is `null`, meaning it should be auto-detected.
`num_frames`	number	🎬 The number of frames in the input audio.

🔊 Text-to-Speech

Generate natural speech from text

🤗 Transformers.js (MMS Models)

🤖 Available Models

Language	Model	Flag
English	`Xenova/mms-tts-eng`	🇺🇸
Spanish	`Xenova/mms-tts-spa`	🇪🇸
French	`Xenova/mms-tts-fra`	🇫🇷
German	`Xenova/mms-tts-deu`	🇩🇪
Portuguese	`Xenova/mms-tts-por`	🇵🇹
Russian	`Xenova/mms-tts-rus`	🇷🇺
Arabic	`Xenova/mms-tts-ara`	🇸🇦
Korean	`Xenova/mms-tts-kor`	🇰🇷

💡 Example Usage

from swagger_client import ApiClient, TextToSpeechApi

# Configure the API client
api_client = ApiClient()
api_client.host = 'http://localhost:9002'
text_to_speech_api = TextToSpeechApi(api_client)

# Text-to-speech with MMS
response = text_to_speech_api.text_to_speech(
    input='Hello, this is a test for text to speech.',
    model='Xenova/mms-tts-eng',
    dtype='q8',
    stream=False,
    voice=None
)
print('Audio generated:', response)

# Streaming example
stream_response = text_to_speech_api.text_to_speech(
    input='Hello, this is a test for streaming text to speech.',
    model='Xenova/mms-tts-eng',
    dtype='q8',
    stream=True,
    voice=None
)
print('Streaming audio:', stream_response)

⚙️ Parameters

Parameter	Type	Description	Required For
`model`	string	🤖 Model ID	All providers
`dtype`	string	🔧 Quantization level (e.g., "q8")	All providers
`stream`	boolean	📡 Whether to stream the audio response. Default is `false`.	All providers

🐱 Kokoro (Premium Voices)

🤖 Available Models

Model	Quantization	Description
`onnx-community/Kokoro-82M-ONNX`	`q8`	High-quality English TTS with multiple voices
`onnx-community/Kokoro-82M-v1.0-ONNX`	`q8`	Alternative Kokoro model version

💡 Example Usage

from swagger_client import ApiClient, TextToSpeechApi

# Configure the API client
api_client = ApiClient()
api_client.host = 'http://localhost:9002'
text_to_speech_api = TextToSpeechApi(api_client)

# Text-to-speech with Kokoro
response = text_to_speech_api.text_to_speech(
    input='Hello, this is a test using Kokoro voices.',
    model='onnx-community/Kokoro-82M-ONNX',
    dtype='q8',
    stream=False,
    voice='af_nova'
)
print('Kokoro audio generated:', response)

# Streaming example
stream_response = text_to_speech_api.text_to_speech(
    input='Hello, this is a test using Kokoro voices with streaming.',
    model='onnx-community/Kokoro-82M-ONNX',
    dtype='q8',
    stream=True,
    voice='af_nova'
)
print('Streaming Kokoro audio:', stream_response)

⚙️ Parameters

Parameter	Type	Description	Required For
`model`	string	🤖 Model ID	Required
`dtype`	string	🔧 Quantization level (e.g., "q8")	Required
`voice`	string	🎭 Voice ID (see below)	Required
`stream`	boolean	📡 Whether to stream the audio response. Default is `false`.	Optional

🎭 Available Voice Options

🇺🇸 American Voices

👩 Female: af_heart, af_alloy, af_aoede, af_bella, af_jessica, af_nova, af_sarah
👨 Male: am_adam, am_echo, am_eric, am_liam, am_michael, am_onyx

🇬🇧 British Voices

👩 Female: bf_emma, bf_isabella, bf_alice, bf_lily
👨 Male: bm_george, bm_lewis, bm_daniel, bm_fable

🌐 Translation

Translate between 200+ languages

🤖 Available Models

Model	Quantization	Description
`Xenova/nllb-200-distilled-600M`	`q8`	🌍 Multilingual translation model supporting 200+ languages

💡 Example Usage

from swagger_client import ApiClient, TranslationApi

# Configure the API client
api_client = ApiClient()
api_client.host = 'http://localhost:9002'
translation_api = TranslationApi(api_client)

# Translation example
response = translation_api.translation(
    model='Xenova/nllb-200-distilled-600M',
    dtype='q8',
    input='Hello, how are you today?',
    src_lang='eng_Latn',
    tgt_lang='por_Latn'
)
print('Translation:', response)

🌍 Language Support

Uses FLORES200 format - supports 200+ languages!

⚙️ Parameters

Parameter	Type	Description
`model`	string	🤖 Model ID (e.g., "Xenova/nllb-200-distilled-600M")
`dtype`	string	🔧 Quantization level (e.g., "q8")
`srcLang`	string	🌍 Source language code in FLORES200 format (e.g., "eng_Latn")
`tgtLang`	string	🌍 Target language code in FLORES200 format (e.g., "por_Latn")

🤝 Contributing

We welcome contributions! Here's how you can help:

🐛 Report bugs via GitHub Issues
💡 Suggest features in our Discord
🔧 Submit PRs for improvements
📖 Improve documentation

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Made with ❤️ by the Woolball team

🌟 Star us on GitHub • 💬 Join Discord

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.swagger-codegen		.swagger-codegen
docs		docs
swagger_client		swagger_client
test		test
.gitignore		.gitignore
.swagger-codegen-ignore		.swagger-codegen-ignore
.travis.yml		.travis.yml
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test-requirements.txt		test-requirements.txt
tox.ini		tox.ini

woolball-xyz/python-sdk

Folders and files

Latest commit

History

Repository files navigation

WoolBall Python SDK

Installation

pip

✨ What is Woolball?

🎯 Supported AI Tasks

📖 API Reference

📖 Text Generation

🤗 Transformers.js Provider

💡 Example Usage

⚙️ Parameters

🤖 WebLLM Provider

💡 Example Usage

⚙️ Parameters

📱 MediaPipe Provider

💡 Example Usage

⚙️ Parameters

🎤 Speech Recognition

🤖 Available Models

💡 Example Usage

⚙️ Parameters

🔊 Text-to-Speech

🤗 Transformers.js (MMS Models)

💡 Example Usage

⚙️ Parameters

🐱 Kokoro (Premium Voices)

💡 Example Usage

⚙️ Parameters

🌐 Translation

🤖 Available Models

💡 Example Usage

🌍 Language Support

⚙️ Parameters

🤝 Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages