Skip to content

Commit e2911d7

Browse files
author
Gourav Kumar
committed
v3.0-alpha Week 5: Deployment tools
- Add sparseflow-audit (cost/ROI analyzer) - Add sparseflow-convert (model conversion) - Add sparseflow-benchmark (performance validation) - Add deployment documentation Tools provide: - GPU requirement calculation - Annual cost projections - Carbon footprint analysis - Accuracy impact reporting - Performance validation - ROI analysis Ready for enterprise evaluation.
1 parent 5171eb7 commit e2911d7

File tree

3 files changed

+433
-0
lines changed

3 files changed

+433
-0
lines changed

docs/DEPLOYMENT.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# SparseFlow Deployment Guide
2+
3+
## Quick Start
4+
5+
### 1. Check GPU Compatibility
6+
```bash
7+
python3 -c "import sparseflow; print(sparseflow.check_sparse_support())"
8+
```
9+
10+
Requirements:
11+
- NVIDIA GPU with compute capability ≥ 8.0 (Ampere or newer)
12+
- CUDA 11.8+
13+
- PyTorch 2.0+
14+
15+
### 2. Analyze Your Deployment
16+
```bash
17+
sparseflow-audit --model llama-7b --qps 1000
18+
```
19+
20+
This shows:
21+
- GPU requirements (dense vs sparse)
22+
- Annual cost savings
23+
- Carbon footprint reduction
24+
- ROI timeline
25+
26+
### 3. Convert Your Model
27+
```bash
28+
sparseflow-convert \
29+
--input model.pt \
30+
--output model_sparse.sf \
31+
--validate
32+
```
33+
34+
Validates 2:4 patterns and reports accuracy impact.
35+
36+
### 4. Benchmark Performance
37+
```bash
38+
sparseflow-benchmark --size 4096x4096 --iterations 100
39+
```
40+
41+
Measures actual speedup on your hardware.
42+
43+
## Production Deployment
44+
45+
### Step 1: Model Conversion
46+
```python
47+
import torch
48+
from torch import nn
49+
import sparseflow as sf
50+
51+
# Load your model
52+
model = torch.load("model.pt")
53+
54+
# Convert Linear layers
55+
for name, module in model.named_modules():
56+
if isinstance(module, nn.Linear):
57+
sparse_layer, diff = sf.SparseLinear.from_dense(
58+
module,
59+
method="magnitude",
60+
return_diff=True
61+
)
62+
print(f"{name}: {diff['max_error']:.6f} max error")
63+
64+
# Replace in model
65+
# parent.layer = sparse_layer
66+
```
67+
68+
### Step 2: Validate Accuracy
69+
```python
70+
# Test on validation set
71+
dense_accuracy = evaluate(dense_model, val_loader)
72+
sparse_accuracy = evaluate(sparse_model, val_loader)
73+
74+
print(f"Accuracy delta: {sparse_accuracy - dense_accuracy:.4f}")
75+
```
76+
77+
Typical accuracy impact: < 0.5% on most tasks
78+
79+
### Step 3: Deploy
80+
```python
81+
# Inference
82+
x = torch.randn(1, 4096, device='cuda', dtype=torch.float16)
83+
y = sparse_model(x) # 2× faster
84+
```
85+
86+
## Cost Analysis
87+
88+
### Example: LLaMA 7B @ 1000 QPS
89+
90+
**Dense (baseline):**
91+
- GPUs: 16× A100-80GB
92+
- Annual GPU cost: $515K
93+
- Annual power cost: $67K
94+
- Total: $582K/year
95+
96+
**SparseFlow:**
97+
- GPUs: 8× A100-80GB
98+
- Annual GPU cost: $258K
99+
- Annual power cost: $34K
100+
- Total: $292K/year
101+
102+
**Savings:**
103+
- $290K/year (50% reduction)
104+
- 14 tons CO₂/year
105+
- ROI: Immediate
106+
107+
## Troubleshooting
108+
109+
### "CUDA not available"
110+
111+
- Check: `nvidia-smi`
112+
- Install: CUDA Toolkit 11.8+
113+
114+
### "2:4 sparse not supported"
115+
116+
- Requires: Ampere (SM80) or newer
117+
- Check: `torch.cuda.get_device_capability()`
118+
119+
### "Slower than dense"
120+
121+
- Check batch size (need ≥32 for speedup)
122+
- Check GPU utilization
123+
- Try different tile sizes
124+
125+
## Support
126+
127+
- GitHub Issues: https://github.com/MapleSilicon/SparseFlow/issues
128+
- Email: engineering@maplesilicon.com

0 commit comments

Comments
 (0)