Python client for woolball server Transform idle browsers into a powerful distributed AI inference network For detailed examples and model lists, visit our GitHub repository.
This SDK is automatically generated by the Swagger Codegen project
pip install woolball-sdkWoolball Server is an open-source network server that orchestrates AI inference jobs across a distributed network of browser-based compute nodes. Instead of relying on expensive cloud infrastructure, harness the collective power of idle browsers to run AI models efficiently and cost-effectively.
| 🔧 Provider | 🎯 Task | 🤖 Models | 📊 Status |
|---|---|---|---|
| Transformers.js | 🎤 Speech-to-Text | ONNX Models | ✅ Ready |
| Transformers.js | 🔊 Text-to-Speech | ONNX Models | ✅ Ready |
| Kokoro.js | 🔊 Text-to-Speech | ONNX Models | ✅ Ready |
| Transformers.js | 🌐 Translation | ONNX Models | ✅ Ready |
| Transformers.js | 📝 Text Generation | ONNX Models | ✅ Ready |
| WebLLM | 📝 Text Generation | MLC Models | ✅ Ready |
| MediaPipe | 📝 Text Generation | LiteRT Models | ✅ Ready |
Generate text with powerful language models
🤖 Available Models
| Model | Quantization | Description |
|---|---|---|
HuggingFaceTB/SmolLM2-135M-Instruct |
fp16 |
Compact model for basic text generation |
HuggingFaceTB/SmolLM2-360M-Instruct |
q4 |
Balanced performance and size |
Mozilla/Qwen2.5-0.5B-Instruct |
q4 |
Efficient model for general tasks |
onnx-community/Qwen2.5-Coder-0.5B-Instruct |
q8 |
Specialized for code generation |
import json
from swagger_client import ApiClient, TextGenerationApi
# Configure the API client
api_client = ApiClient()
api_client.host = 'http://localhost:9002'
text_generation_api = TextGenerationApi(api_client)
# Text generation with Transformers.js
input_data = json.dumps([
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of Brazil?"}
])
response = text_generation_api.text_generation(
provider='transformers',
model='HuggingFaceTB/SmolLM2-135M-Instruct',
input=input_data,
top_k=50,
top_p=1.0,
temperature=0.7,
repetition_penalty=1.0,
dtype='fp16',
max_length=20,
max_new_tokens=250,
min_length=0,
min_new_tokens=None,
do_sample=True,
num_beams=1,
no_repeat_ngram_size=0,
context_window_size=None,
sliding_window_size=None,
attention_sink_size=None,
frequency_penalty=None,
presence_penalty=None,
bos_token_id=None,
max_tokens=None,
random_seed=None
)
print('Response:', response)| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | - | 🤖 Model ID (e.g., "HuggingFaceTB/SmolLM2-135M-Instruct") |
dtype |
string | - | 🔧 Quantization level (e.g., "fp16", "q4") |
max_length |
number | 20 | 📏 Maximum length the generated tokens can have (includes input prompt) |
max_new_tokens |
number | null | 🆕 Maximum number of tokens to generate, ignoring prompt length |
min_length |
number | 0 | 📐 Minimum length of the sequence to be generated (includes input prompt) |
min_new_tokens |
number | null | 🔢 Minimum numbers of tokens to generate, ignoring prompt length |
do_sample |
boolean | false | 🎲 Whether to use sampling; use greedy decoding otherwise |
num_beams |
number | 1 | 🔍 Number of beams for beam search. 1 means no beam search |
temperature |
number | 1.0 | 🌡️ Value used to modulate the next token probabilities |
top_k |
number | 50 | 🔝 Number of highest probability vocabulary tokens to keep for top-k-filtering |
top_p |
number | 1.0 | 📊 If < 1, only tokens with probabilities adding up to top_p or higher are kept |
repetition_penalty |
number | 1.0 | 🔄 Parameter for repetition penalty. 1.0 means no penalty |
no_repeat_ngram_size |
number | 0 | 🚫 If > 0, all ngrams of that size can only occur once |
🤖 Available Models
| Model | Description |
|---|---|
DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC |
DeepSeek R1 distilled model with reasoning capabilities |
DeepSeek-R1-Distill-Llama-8B-q4f16_1-MLC |
DeepSeek R1 distilled Llama-based model |
SmolLM2-1.7B-Instruct-q4f32_1-MLC |
Compact instruction-following model |
Llama-3.1-8B-Instruct-q4f32_1-MLC |
Meta's Llama 3.1 8B instruction model |
Qwen3-8B-q4f32_1-MLC |
Alibaba's Qwen3 8B model |
import json
from swagger_client import ApiClient, TextGenerationApi
# Configure the API client
api_client = ApiClient()
api_client.host = 'http://localhost:9002'
text_generation_api = TextGenerationApi(api_client)
# Text generation with WebLLM
input_data = json.dumps([
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of Brazil?"}
])
response = text_generation_api.text_generation(
provider='webllm',
model='DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC',
input=input_data,
top_k=None,
top_p=0.95,
temperature=0.7,
repetition_penalty=None,
dtype=None,
max_length=None,
max_new_tokens=None,
min_length=None,
min_new_tokens=None,
do_sample=None,
num_beams=None,
no_repeat_ngram_size=None,
context_window_size=None,
sliding_window_size=None,
attention_sink_size=None,
frequency_penalty=None,
presence_penalty=None,
bos_token_id=None,
max_tokens=None,
random_seed=None
)
print('Response:', response)| Parameter | Type | Description |
|---|---|---|
model |
string | 🤖 Model ID from MLC (e.g., "DeepSeek-R1-Distill-Qwen-7B-q4f16_1-MLC") |
provider |
string | 🔧 Must be set to "webllm" when using WebLLM models |
context_window_size |
number | 🪟 Size of the context window for the model |
sliding_window_size |
number | 🔄 Size of the sliding window for attention |
attention_sink_size |
number | 🎯 Size of the attention sink |
repetition_penalty |
number | 🔄 Penalty for repeating tokens |
frequency_penalty |
number | 📊 Penalty for token frequency |
presence_penalty |
number | 👁️ Penalty for token presence |
top_p |
number | 📈 If < 1, only tokens with probabilities adding up to top_p or higher are kept |
temperature |
number | 🌡️ Value used to modulate the next token probabilities |
bos_token_id |
number | 🏁 Beginning of sequence token ID (optional) |
🤖 Available Models
| Model | Device Type | Description |
|---|---|---|
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma2-2b-it-cpu-int8.task |
CPU | Gemma2 2B model optimized for CPU inference |
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma2-2b-it-gpu-int8.bin |
GPU | Gemma2 2B model optimized for GPU inference |
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task |
CPU/GPU | Gemma3 1B model with INT4 quantization |
https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-4b-it-int4-web.task |
Web | Gemma3 4B model optimized for web deployment |
import json
from swagger_client import ApiClient, TextGenerationApi
# Configure the API client
api_client = ApiClient()
api_client.host = 'http://localhost:9002'
text_generation_api = TextGenerationApi(api_client)
# Text generation with MediaPipe
input_data = json.dumps([
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
])
response = text_generation_api.text_generation(
provider='mediapipe',
model='https://woolball.sfo3.cdn.digitaloceanspaces.com/gemma3-1b-it-int4.task',
input=input_data,
top_k=40,
top_p=None,
temperature=0.7,
repetition_penalty=None,
dtype=None,
max_length=None,
max_new_tokens=None,
min_length=None,
min_new_tokens=None,
do_sample=None,
num_beams=None,
no_repeat_ngram_size=None,
context_window_size=None,
sliding_window_size=None,
attention_sink_size=None,
frequency_penalty=None,
presence_penalty=None,
bos_token_id=None,
max_tokens=500,
random_seed=12345
)
print('Response:', response)| Parameter | Type | Description |
|---|---|---|
model |
string | 🤖 Model ID for MediaPipe LiteRT models on DigitalOcean Spaces |
provider |
string | 🔧 Must be set to "mediapipe" when using MediaPipe models |
maxTokens |
number | 🔢 Maximum number of tokens to generate |
randomSeed |
number | 🎲 Random seed for reproducible results |
topK |
number | 🔝 Number of highest probability vocabulary tokens to keep for top-k-filtering |
temperature |
number | 🌡️ Value used to modulate the next token probabilities |
Convert audio to text with Whisper models
| Model | Quantization | Description |
|---|---|---|
onnx-community/whisper-large-v3-turbo_timestamped |
q4 |
🎯 High accuracy with timestamps |
onnx-community/whisper-small |
q4 |
⚡ Fast processing |
import swagger_client
from swagger_client.rest import ApiException
# Configure the API client
api_client = swagger_client.ApiClient()
api_client.host = 'http://localhost:9002'
speech_recognition_api = swagger_client.SpeechRecognitionApi(api_client)
# 📁 Local file
with open('/path/to/your/file.mp3', 'rb') as audio_file:
audio_data = audio_file.read()
response = speech_recognition_api.speech_recognition(
input=audio_data,
model='onnx-community/whisper-large-v3-turbo_timestamped',
dtype='q4',
language='en',
task=None,
return_timestamps=True,
chunk_length_s=None,
stride_length_s=None,
force_full_sequences=None,
stream=False,
num_frames=None
)
print('Transcription:', response)
# 🔗 URL example
url_response = speech_recognition_api.speech_recognition(
input='https://example.com/audio.mp3',
model='onnx-community/whisper-large-v3-turbo_timestamped',
dtype='q4',
language='en',
task=None,
return_timestamps=True,
chunk_length_s=None,
stride_length_s=None,
force_full_sequences=None,
stream=False,
num_frames=None
)
print('URL Transcription:', url_response)| Parameter | Type | Description |
|---|---|---|
model |
string | 🤖 Model ID from Hugging Face (e.g., "onnx-community/whisper-large-v3-turbo_timestamped") |
dtype |
string | 🔧 Quantization level (e.g., "q4") |
return_timestamps |
boolean | 'word' | ⏰ Return timestamps ("word" for word-level). Default is false. |
stream |
boolean | 📡 Stream results in real-time. Default is false. |
chunk_length_s |
number | 📏 Length of audio chunks to process in seconds. Default is 0 (no chunking). |
stride_length_s |
number | 🔄 Length of overlap between consecutive audio chunks in seconds. If not provided, defaults to chunk_length_s / 6. |
force_full_sequences |
boolean | 🎯 Whether to force outputting full sequences or not. Default is false. |
language |
string | 🌍 Source language (auto-detect if null). Use this to potentially improve performance if the source language is known. |
task |
null | 'transcribe' | 'translate' | 🎯 The task to perform. Default is null, meaning it should be auto-detected. |
num_frames |
number | 🎬 The number of frames in the input audio. |
Generate natural speech from text
🤖 Available Models
| Language | Model | Flag |
|---|---|---|
| English | Xenova/mms-tts-eng |
🇺🇸 |
| Spanish | Xenova/mms-tts-spa |
🇪🇸 |
| French | Xenova/mms-tts-fra |
🇫🇷 |
| German | Xenova/mms-tts-deu |
🇩🇪 |
| Portuguese | Xenova/mms-tts-por |
🇵🇹 |
| Russian | Xenova/mms-tts-rus |
🇷🇺 |
| Arabic | Xenova/mms-tts-ara |
🇸🇦 |
| Korean | Xenova/mms-tts-kor |
🇰🇷 |
from swagger_client import ApiClient, TextToSpeechApi
# Configure the API client
api_client = ApiClient()
api_client.host = 'http://localhost:9002'
text_to_speech_api = TextToSpeechApi(api_client)
# Text-to-speech with MMS
response = text_to_speech_api.text_to_speech(
input='Hello, this is a test for text to speech.',
model='Xenova/mms-tts-eng',
dtype='q8',
stream=False,
voice=None
)
print('Audio generated:', response)
# Streaming example
stream_response = text_to_speech_api.text_to_speech(
input='Hello, this is a test for streaming text to speech.',
model='Xenova/mms-tts-eng',
dtype='q8',
stream=True,
voice=None
)
print('Streaming audio:', stream_response)| Parameter | Type | Description | Required For |
|---|---|---|---|
model |
string | 🤖 Model ID | All providers |
dtype |
string | 🔧 Quantization level (e.g., "q8") | All providers |
stream |
boolean | 📡 Whether to stream the audio response. Default is false. |
All providers |
🤖 Available Models
| Model | Quantization | Description |
|---|---|---|
onnx-community/Kokoro-82M-ONNX |
q8 |
High-quality English TTS with multiple voices |
onnx-community/Kokoro-82M-v1.0-ONNX |
q8 |
Alternative Kokoro model version |
from swagger_client import ApiClient, TextToSpeechApi
# Configure the API client
api_client = ApiClient()
api_client.host = 'http://localhost:9002'
text_to_speech_api = TextToSpeechApi(api_client)
# Text-to-speech with Kokoro
response = text_to_speech_api.text_to_speech(
input='Hello, this is a test using Kokoro voices.',
model='onnx-community/Kokoro-82M-ONNX',
dtype='q8',
stream=False,
voice='af_nova'
)
print('Kokoro audio generated:', response)
# Streaming example
stream_response = text_to_speech_api.text_to_speech(
input='Hello, this is a test using Kokoro voices with streaming.',
model='onnx-community/Kokoro-82M-ONNX',
dtype='q8',
stream=True,
voice='af_nova'
)
print('Streaming Kokoro audio:', stream_response)| Parameter | Type | Description | Required For |
|---|---|---|---|
model |
string | 🤖 Model ID | Required |
dtype |
string | 🔧 Quantization level (e.g., "q8") | Required |
voice |
string | 🎭 Voice ID (see below) | Required |
stream |
boolean | 📡 Whether to stream the audio response. Default is false. |
Optional |
🎭 Available Voice Options
🇺🇸 American Voices
- 👩 Female:
af_heart,af_alloy,af_aoede,af_bella,af_jessica,af_nova,af_sarah - 👨 Male:
am_adam,am_echo,am_eric,am_liam,am_michael,am_onyx
🇬🇧 British Voices
- 👩 Female:
bf_emma,bf_isabella,bf_alice,bf_lily - 👨 Male:
bm_george,bm_lewis,bm_daniel,bm_fable
Translate between 200+ languages
| Model | Quantization | Description |
|---|---|---|
Xenova/nllb-200-distilled-600M |
q8 |
🌍 Multilingual translation model supporting 200+ languages |
from swagger_client import ApiClient, TranslationApi
# Configure the API client
api_client = ApiClient()
api_client.host = 'http://localhost:9002'
translation_api = TranslationApi(api_client)
# Translation example
response = translation_api.translation(
model='Xenova/nllb-200-distilled-600M',
dtype='q8',
input='Hello, how are you today?',
src_lang='eng_Latn',
tgt_lang='por_Latn'
)
print('Translation:', response)Uses FLORES200 format - supports 200+ languages!
| Parameter | Type | Description |
|---|---|---|
model |
string | 🤖 Model ID (e.g., "Xenova/nllb-200-distilled-600M") |
dtype |
string | 🔧 Quantization level (e.g., "q8") |
srcLang |
string | 🌍 Source language code in FLORES200 format (e.g., "eng_Latn") |
tgtLang |
string | 🌍 Target language code in FLORES200 format (e.g., "por_Latn") |
We welcome contributions! Here's how you can help:
- 🐛 Report bugs via GitHub Issues
- 💡 Suggest features in our Discord
- 🔧 Submit PRs for improvements
- 📖 Improve documentation
This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ by the Woolball team